Boolean analysis and the measurement of integration

282 R . Janssens Mathematical Social Sciences 38 1999 275 –293

3. Boolean analysis and the measurement of integration

The interesting feature of the Boolean approach is the translation of a set of response patterns into a hierarchical structure. But the method has its weaknesses as well. The first problem concerns the obtained structure. Although Boolean analysis results in an implication scheme, there is no guarantee that the obtained scheme results in a structure that suits the formal description of the process under investigation, in this case cultural integration. Moreover, an implication scheme yields a partial order defined on the separate items, while we are looking for a representation describing a structure composed of response patterns. A second problem mentioned above is the fact that the resulting structure is not unique, which makes a comparison of different structures quite difficult. Both problems are discussed herein. 3.1. Calculating PCUs and structuring the process of integration Boolean analysis results in a set of equivalent implication schemes, while the substantive nature of the integration process may suggest only certain of the schemes. How can one be sure that the resulting structure fits the required conditions? A solution for this problem is suggested by the theory of knowledge spaces that provides a mathematical translation of the process of knowledge aggregation for an introduction, see Falmagne et al. 1990. Although this theory is applied within a different context, it is looking for a similar structure that meets the formal description of the integration process. According to Degreef et al. 1986, a knowledge structure is a pair X, k, where X is a set of questions, and k is a family of subsets called knowledge states. A knowledge structure is called a knowledge space if both the set X and the empty set [ are states and if every union of states is a state. For the aggregation of knowledge, Doignon and 3 Falmagne 1985 are able to generalize Birkhoff’s Theorem and prove that if a knowledge structure, and as a consequence a knowledge space, is closed under union and intersection, it is well-graded. This means that starting from the [-state, by increasing from one level to another, every response pattern situated on level l can be reached by adding just one positive response to a response pattern situated on level l 21. This condition meets the gradual aspect of the process of integration. Theuns 1992 proved that there is a one-to-one correspondence between a set of PCUs consisting each of one positive and one negative response and the family of knowledge spaces that are closed under union and intersection. Considering the correspondence between union and intersection on the one hand, and Boolean sum and Boolean product on the other hand, one may conclude that, if the original dataset can be represented by a set of PCUs of the form xy9 or x9 y, one obtains a structure that meets the conditions for structuring the process of integration. The fact that all these PCUs have length 2 results in only two equivalent implications. Because the set of PCUs only consists of PCUs with one 3 For any set X, the formula xQ y iff y [K implies x [K for all K [K, defines a 1–1 correspondence between the set of all quasi orders Q on X and the set of all families K of subsets of X that are closed under union and intersection Birkhoff, 1937. R . Janssens Mathematical Social Sciences 38 1999 275 –293 283 positive and one negative response, all implications can be written either as x → y or as y9 → x9 or otherwise x9 → y9 or x → y. This results in an implication scheme with only positive or only negative responses. This implication scheme corresponds to a unique set of response patterns. These patterns can be represented in a digraph with the source describing the traditional state with all negative responses and the sink describing the integrated one with all positive responses. All response patterns are situated between these two states. Starting from the source, the different arcs connect the response patterns such that the corresponding degree of integration always increases by 1. All response patterns found on each path represent a total ordered scale. 3.2. Obtaining a unique solution Boolean analysis results in a set of PCUs. Looking for a well-graded structure, not all PCUs are accepted. By reconstructing the set of response patterns corresponding to the selected PCUs, possibly some response patterns of R are treated as if they belong to R . The division of the set of all possible response patterns R into R and R is no longer based on the presence or absence of that pattern, but on the selection of the PCUs. Such a ‘deviation’ is called a dichotomization method x. The set of response patterns obtained by applying this dichotomization method x is called R . The response patterns x belonging to R are defined as accepted response patterns. Those patterns moving from x R to R are defined as rejected response patterns and belong to R . In this case, the x dichotomization is based on the definition of the integration process. Alternatively, dichotomization methods can also be based on mathematical criteria, for instance the frequency of appearance of the response pattern see Flament, 1976; Van Buggenhaut and Degreef, 1987; Theuns, 1992. Looking for PCUs of length 2, implies the evaluation of the relation between two indicators. Suppose two indicators i and j, with p representing the frequencies denoted by the indices see Table 3. According to our dichotomization method, only PCUs of length 2 are accepted, so we concentrate on p and p . As a consequence, three possible relations between the ij 9 i 9 j indicators can be discerned: if either p 5 0 or p 5 0, then a dominance relation is found ij 9 i 9 j if both p 5 0 and p 5 0, then an equivalence relation is found ij 9 i 9 j if p ± 0 and p ± 0, then the relation between both indicators is not specified. ij 9 i 9 j Table 3 232 contingency table of items i and j based on the frequency of appearance of the items j9 j i9 p p p i 9 j 9 i 9 j i 9 i p p p ij 9 ij i p p m j 9 j 284 R . Janssens Mathematical Social Sciences 38 1999 275 –293 If a dominant relation is found, either i9 j or ij9 is a PCU. If an equivalence relation is found, both i9 j and ij9 are PCUs. In sociology, working with large datasets, p or p ij 9 i 9 j are seldom 0. The reason can be twofold: somebody can make an error by responding to the interviewer, or somebody may show a deviating attitude uncommon for the group he belongs too. Consequently, the straightforward dichotomization at zero frequency does not result in a satisfactory solution because one rarely finds ‘pure’ dominance or equivalence relations. Therefore, an additional criterion is needed to decide about whether the frequency of appearance of p or p can be considered as 0 or not. Such a ij 9 i 9 j criterion is defined as the dichotomization threshold a. The theory of integration does not provide us with such a criterion, so a mathematical criterion remains as the only possible solution. There are two options to evaluate the relation between both indicators. One may focus on a dominance relation and decide about a fixed threshold a such that if either p a or p a, the proportion smaller than a is considered as 0. A problem ij 9 i 9 j with this kind of rigid criterion occurs when one of the frequencies is situated just above a and the other one is less than a. In that case, a dominance relation is ‘forced’, notwithstanding both frequencies are quite low. Consequently, a lot of meaningful response patterns might be rejected. A second possibility is to decide about equivalence first, making a decision about both frequencies at the same time. In fact, one has to decide about the impact of dropping a given proportion p 1 p of the population. ij 9 i 9 j Depending on this decision, a is calculated. The pros and cons of the first option are already described in detail see Flament, 1976; Van Buggenhaut and Degreef, 1987; Theuns, 1992. In this article, the second option is discussed. In general, the search for a dichotomization threshold a can be described as follows: Step 1. one has to decide about equivalence or not. • If the relation between i and j is declared an equivalence relation, a max hp , p j ij 9 i 9 j and p 1 p is declared 0. ij 9 i 9 j • If the relation between i and j is considered non-equivalent, neither p nor p are ij 9 i 9 j declared 0. As a result, all a , min hp , p j are possible thresholds. ij 9 i 9 j Step 2. One has to decide about dominance or not. • If at least one equivalence relation is found, a max hp , p j and all frequency ij 9 i 9 j counts between 0 and max hp , p j are declared 0. In that case, a relation between ij 9 i 9 j two indicators k and l is considered as a dominance relation if min hp , p j kl 9 k 9l max hp , p j and maxhp , p j . maxhp , p j, given the fact that the relation ij 9 i 9 j kl 9 k 9l ij 9 i 9 j between k and l is not equivalent. • If no equivalence relation is found, a fixed a must be calculated by some other consideration see later. The motive for dichotomization is to obtain a structure that reflects the norm of the group from which the respondents are a sample. One is looking for a core set of attitudes characterizing the group, a kind of measure of central tendency. This ‘average’ does not refer to a single indicator, e.g. the response pattern with the highest frequency, but to a R . Janssens Mathematical Social Sciences 38 1999 275 –293 285 structure that represents our dataset. The dichotomization method x has to divide the dataset into a set R of response patterns belonging to the norm, and a set R of x x response patterns that are not accepted or only manifested by a limited number of members. Thus, one has to decide whether or not the attitude marked by a possible PCU i.e. every pair of indicators ij9 or i9 j, being subpatterns of some response patterns belongs to the norm. In a very loose group, one will find a greater variety of possible attitudes than is the case for a more coherent group. In statistical terms, the different attitudes on the traditional-integrated continuum can be conceived as an unknown distribution around the norm, with a coherent group characterized by a small standard deviation and a loose group characterized by a large standard deviation. If one wants to compare both groups, the norm must be based on a similar proportion of the population, and not on a similar set of response patterns. The decision about this proportion can be based on the theorem of Chebyshev. This theorem allows one to determine the minimum proportion of the values that lie within a specified number of standard deviations of the mean. Chebyshev’s inequality states that, if X in this case, the degree of integration is a random variable: 2 Pr[ uX 2 EXu kdX] , 1 k 1 with EX the expected value and dX the standard deviation of the random variable X. It is possible to rewrite Chebyshev’s inequality in the following words: ‘for any set of observations sample or population, the minimum proportion of the values that lie 2 within k standard deviations of the mean is at least 1 2 1 k , where k is any constant greater than 1’ Mason and Lind, 1990, p. 135. As a result of this theorem, one knows that at least 75 of the population is situated between the mean and plus or minus two standard deviations, and 88.9 between the mean and plus or minus three standard deviations. For the measurement of integration, the proportion of respondents situated between the mean and plus or minus two standard deviations is defined as the group norm. As a result, this norm is based on the responses of at least 75 of the population. Because 88.9 of the population is situated between the mean and plus or minus three standard deviations, this proportion is used to decide about equivalence or non-equivalence. If a relation between two indicators is declared an equivalence relation, one is not allowed to drop more than 11.1 of the population. As a result of this decision, the dichotomization threshold a is calculated as follows. Based on a comparison of all possible PCU’s, one decides whether or not the relation between two indicators is equivalent or not: • if [1 2 p 1 p m] 0.889 then the relation between the indicators i and j is ij 9 i 9 j regarded as an equivalence relation, so that the dichotomization threshold a max hp , p j; ij 9 i 9 j • if [1 2 p 1 p m] , 0.889 then the relation between the indicators i and j is ij 9 i 9 j regarded as a non-equivalence relation, so that the dichotomization threshold a , min hp , p j; ij 9 i 9 j 286 R . Janssens Mathematical Social Sciences 38 1999 275 –293 • if [1 2 p 1 p m] 0.750 then the relation between the indicators i and j is ij 9 i 9 j regarded as a dominance relation if min hp , p j a and maxhp , p j . a; ij 9 i 9 j ij 9 i 9 j • if [1 2 p 1 p m] , 0.750 then the relation between the indicators i and j cannot ij 9 i 9 j be specified if min hp , p j . a. ij 9 i 9 j This first step determines the interval in which a is situated. If the dichotomization threshold a is based solely on the relation between two indicators, it may occur that for different pairs of indicators different respondents are dropped, such that the total number of respondents dropped exceeds 11.1 in case of equivalence or 25 in case of dominance. If such is the case, the pair of indicators on which the decision about a is based will not be declared an equivalence or dominance relation and the dichotomization threshold will be based on another pair of indicators. In other words: ;i, j [I: iRj is declared equivalent iff [1 2 p 1 p m] 0.889, ij 9 i 9 j and a results in a solution that holds for at least 88.9 of the respondents. And ;i, j [I: iRj is declared dominant iff [1 2 p 1 p m] 0.750, and a results in a solution that ij 9 i 9 j holds for at least 75 of the respondents. Still there is a problem when no equivalent relations are found. In that case, one needs to derive a maximum dichotomization threshold a that decides about dominance. This value can also be derived, based on the theorem of Chebyshev. If a is the maximum threshold, and one wants to obtain a solution that holds for at least 75 of the population, the theorem can be written as 1 2 2 a m 0.75. As a consequence, a m 8. Because m is known, a can always be calculated. The procedure, as described above results in a set of PCUs that can be presented in an implication scheme. A corresponding digraph indicates the possible stages between the traditional and integrated states, while the orderings reflect the possible integration paths between both states.

4. Boolean analysis and the integration process: some examples