Boolean analysis: the method

278 R . Janssens Mathematical Social Sciences 38 1999 275 –293 Fig. 2. Digraph representing the integration process of group H. are called actual patterns. The other component patterns are called possible patterns. Both the traditional 0000 and integrated position 1111 are included by definition. The previous example presents the possible attitude changes in the case only two response patterns are observed. Usually, more patterns will be found. In this article, a method based on the Boolean analysis of questionnaires Flament, 1976 is introduced which enables one to structure the integration process based on a large set of response patterns.

2. Boolean analysis: the method

‘Boolean analysis’ is used as a collective name for those methods based on the translation of research problems into a Boolean framework, and by solving them by 2 applying the axioms and theorems of Boolean algebra. Based on the work of Flament 1976, the method is used here to illustrate its applicability for structuring the indicators of the integration process. Boolean algebra entails the possibility of introducing a partial order relation within the set of indicators. In fact, Boolean analysis enables one to deduce a unique structure representing the integration process of a particular group such that the structures derived from different datasets can be compared. For this purpose, two Boolean principles are used: Boolean minimization, and Boolean implication. Before presenting the method, some notions need to be introduced. Every respondent is characterized by his response pattern, an ordered n-tuple representing the attitude of the respondent towards a set of items. In Boolean terms, such a pattern is called a Boolean expression, while an item is called a generator. The raw dataset, including all response patterns, is called the set of primitive Boolean expres- sions. Boolean analysis also deals with subpatterns of the response patterns, representing the responses on a subset of the set of items. If one takes the response pattern 1000 as an example, the subpattern referring to the first three items or the pattern referring to the first, third and fourth item, are both written as 100. To avoid possible ambiguity, the items can also be represented by letters so that a positive response is written as a small letter and a negative one as its complement. Referring to the example, 1000 can be written as ab9c9d9 and the subpatterns as ab9c9 and ac9d9. The Boolean term for a pattern is a fundamental product. If this pattern contains information about all the items, 2 For a comparison with other methods, see Janssens 1998. R . Janssens Mathematical Social Sciences 38 1999 275 –293 279 it is called complete. So in Boolean algebra, the set of response patterns corresponds to a set of complete fundamental products. On every set of generators G 5 ha, b, c, . . . , nj, having a value of 1 or 0, two binary operations Boolean product and Boolean sum and a unary operation 9 complement are defined which act on any generators a and b in G according to the following truth tables Table 1. 2.1. Boolean minimization A first basic feature of Boolean analysis, as a data analytic approach, is the principle of Boolean minimization. It refers to algorithms that construct equivalent Boolean expressions for a given expression, in such a way that a unique expression using a minimum of symbols is obtained. The basic idea is to reduce a set of primitive Boolean expressions to a smaller number of Boolean expressions describing the original set. Boolean minimization can be defined as follows. Consider a set of Boolean expressions C, each expression being a complete fundamental product. If two complete fundamental products, c and c , differ according 1 2 to only one generator which is included complemented in c and uncomplemented in c 1 2 or vice versa, this generator can be removed to create a simpler expression q composed of the common generators. This new expression q is included in the two complete fundamental products: q c 5 c and q c 5 c . In that case, we say that the 1 1 2 2 fundamental product q covers both complete fundamental products c and c . By 1 2 repeating this process of minimization, one obtains a set of subpatterns for which no further minimization is possible. This set of subpatterns is called the set of prime implicants. Every set of Boolean expressions C can be reduced uniquely to a set of prime implicants. An example illustrates this principle. Suppose a set C of complete fundamental products, and C 5 hab9cd, ab9cd9, ab9c9d, ab9c9d9 j. By applying the above principle, ab9cd9 and ab9cd yields ab9c, ab9c9d and ab9c9d9 yields ab9c9. According to a second minimization step, ab9c and ab9c9 yields ab9. Similarly, ab9cd, and ab9c9d yields ab9d, ab9cd9 and ab9c9d9 yields ab9d9 and consequently ab9d and ab9d9 yields ab9. As a result, the set of Boolean expressions C can be represented by the subpattern ab9. Summarizing, every set of response patterns can be reduced to a set of subpatterns characterizing the original dataset; this set of subpatterns covers the whole dataset, so that by knowing the set of prime implicants, the original dataset can be reconstructed. Table 1 Truth tables for the Boolean operators , , and 9 a b a b a b a9 1 1 1 1 1 1 1 1 1 1 280 R . Janssens Mathematical Social Sciences 38 1999 275 –293 2.2. Boolean implication Boolean minimization does not provide a structure, it only offers a way to reduce a set of Boolean expressions to a more condensed and workable form. By defining an order relation on the basis of prime implicants, Flament 1976 provides the stepping stone between this condensed form and a structured set of indicators. Flament 1976, starts from the set of all possible response patterns, whether they appear in the dataset or not. The theoretical set of all possible patterns is called R, so that: R 5 h p u p 5 pattern i with a frequency f j. i i i These response patterns can be divided into two sets: R the set of patterns with frequency equal to 0, and, R the set of actual patterns with non-zero frequencies: R 5 h p u p 5 pattern i with a frequency f 5 0j i i i R 5 h p u p 5 pattern i, with a frequency f . 0j. i i i Any given response pattern will be an element of one of these two sets, i.e. R5R R . Because both sets are mutually exclusive R R 5[, information about one of the sets is sufficient to reconstruct the dataset: R 5R\R and R 5R\R . Instead of starting from the set of actual patterns, Flament 1976 looks for the set of prime implicants of R . A prime implicant of the set R is called an ultimate canonical projection PCU. As a consequence, the set of PCUs covers all patterns which are not observed among the respondents. In other words: if ab is a PCU, then nobody answered both a and b positively. In the case of integration, if ab is derived as a PCU, somebody who shows attitude a, will never show attitude b. So, Flament 1976 concludes that a implies b9 written as: a → b9. A PCU however, yields more information than a particular implication. If ab9 is a PCU, then one also knows that if somebody answers b9, he will not answer a, so the implication b9 → a9 holds as well. In general, a PCU of k length k generates 2 22 equivalent implications. Every non-empty subpattern of a PCU implies the Boolean sum of the complemented responses of its remaining literals. Of course, an implication does not reflect a causal relation between the indicators but gives some information about their co-occurrence. The goal of using Boolean analysis is to represent the data as a hierarchical structure. From every PCU, a set of equivalent implications can be derived. By selecting one implication per PCU and chaining them into one structure, R can be represented in one scheme. Such a representation, Flament 1976 calls an implication scheme. Because he also proved that implications are reflexive, antisymmetric and transitive, a partial order is defined on the responses. An example shows how implication schemes can be constructed. Suppose R 5 habcd9, abc9d, ab9cd, abc9d9, ab9cd9, a9bcd9, ab9c9d, a9b9cd9, a9bc9d9, ab9c9d9 j. As a result of Boolean minimization, the set of PCUs5hab9, ac9, ad9, cd9, bd9 j. Take the relation between a and b as an example see Table 2. If a particular R . Janssens Mathematical Social Sciences 38 1999 275 –293 281 Table 2 232 contingency table of items a and b based on the presence or absence of the items b9 b a9 1 1 a 1 subpattern is observed, it is indicated by 1, if this subpattern is not observed, it is indicated by 0. If a person responds negatively towards item a, both options b and b9 are plausible. Knowing somebody answers a9 does not say anything about the answer of that person on indicator b. If he has responded positively towards a, he has also responded positively towards b. So, if a is given as an answer, b has been given as well. As a result: a → b. If he has responded negatively towards b, he has also responded negatively towards a. So, if b9 is given as an answer, a9 has been given as well. As a result: b9 → a9. One may notice that, if the cell in the contingency table referring to a9b would be zero, a9b would also be a PCU so that the implication b → a holds as well. In that case, an equivalence relation a ↔ b is found. If no zero-cells are detected, the relation between both indicators cannot be specified. To obtain a hierarchical structure, the method requires one to choose one implication out of the set of possible implications for each PCU. According to the subject under investigation, a choice is made. On the one hand, Boolean analysis does not provide the researcher with a unique solution, on the other hand, he has a restricted autonomy to select several solutions without violating the methodological principles. In the current subject, where a set of indicators of integration is selected and evaluated based on their presence or absence, it is interesting to select those implications indicating the presence of an indicator. According to this criterion, the following implications are selected: a → b a → c a → d c → d b → d The resulting implication scheme is presented in Fig. 3. By applying the above principles, Boolean analysis enables one to represent every dataset as a partially ordered set of indicators. Nevertheless, the fact that the solution is not unique illustrates a possible weakness of the method. Fig. 3. Implication scheme based on the theoretical example. 282 R . Janssens Mathematical Social Sciences 38 1999 275 –293

3. Boolean analysis and the measurement of integration