The r x c Contingency Table

5.2.2 The r x c Contingency Table

The r × c contingency table is an obvious extension of the 2 × 2 contingency table, when there are more than two categories of the nominal (or ordinal) variable involved. However, some aspects described in the previous section, namely the Yates’ correction and the computation of exact probabilities, are only applicable to

. . . Class c

Population 1

O 11 O 12 . . . O 1 c n 1

Population 2

O 21 O 22 . . . O 2 c n 2

Population r

O r 1 O r 2 . . . O rc

Figure 5.4. The r × c contingency table with the sample sizes (n i ) and the observed absolute frequencies (counts O ij ).

194 5 Non-Parametric Tests of Hypotheses

The r × c contingency table is shown in Figure 5.4. All samples from the r populations are assumed to be independent and randomly drawn. All observations are assumedly categorised into exactly one of c categories. The total number of cases is:

n=n 1 +n 2 + ...+ n r =c 1 +c 2 + ... + c c ,

where the c j are the column counts, i.e., the total number of observations in the jth class:

c j = ∑ O ij .

Let p ij denote the probability that a randomly selected case of population i is

from class j. The hypotheses formalised for the r × c contingency table are a generalisation of the two-sided hypotheses for the 2 × 2 contingency table (see 5.2.1):

H 0 : For any class, the probabilities are the same for all populations: p 1j =p 2j = …=p rj , ∀j.

H 1 : There are at least two populations with different probabilities in one class: ∃ i, j, p ij ≠p kj .

The test statistic is also a generalisation of 5.18:

r c ( 2 O ij − E ij )

T = ∑∑

5.23 i = 1 j = 1 E ij

, with E ij =

If H 0 is true, we expect the observed counts O ij to be near the expected counts

E ij , estimated as in the above formula 5.23, using the row and column marginal counts. The asymptotic distribution of T is the chi-square distribution with

df = (r − 1)(c – 1) degrees of freedom. As with the chi-square goodness of fit test described in section 5.1.3, the approximation is considered acceptable if the following conditions are met:

i. For df = 1, i.e. for 2 × 2 contingency tables, no E ij must be smaller than 5;

ii. For df > 1, no E ij must be smaller than 1 and no more than 20% of the E ij must be smaller than 5.

The SPSS STATISTICA, MATLAB and R commands for testing r × c contingency tables are indicated in Commands 5.7.

Example 5.11

Q: Consider the male and female populations of the Freshmen dataset. Based on the evidence provided by the respective samples, is it possible to conclude that

5.2 Contingency Tables 195

male and female students have different behaviour participating in the “initiation” on their own will?

A: Question 7 (column Q7) of the freshmen dataset addresses the issue of participating in the initiation on their own will. The 2 × 5 contingency table, using variables SEX and Q7, has more than 20% of the cells with expected counts below

5 because of the reduced number of cases ranked 1 and 2. We, therefore, create a new variable Q7_12 where the ranks 1 and 2 are merged into a new rank, coded 12.

The contingency table for the variables SEX and Q7_12 is shown in Table 5.11. The chi-square value for this table has an observed significance p = 0.15; therefore, we do not reject the null hypothesis of equal behaviour of male and female students at the 5% level.

Since one of the variables, SEX, is nominal, we can determine the association measures suitable to nominal variables, as we did in section 2.3.6. In this example the phi and uncertainty coefficients both have significances (0.15 and 0.08, respectively) that do not support the rejection of the null hypothesis (no association between the variables) at the 5% level.

Table 5.12. Contingency table obtained with SPSS for the SEX and Q7_12 variables of the freshmen dataset. Q7_12 is created with the SPSS recode command, using Q7. Note that three missing cases are not included.

Q7_12 Total

3 4 5 12 SEX male Count

18 36 29 12 95 Expected Count 14.0 36.8 30.9 13.3 95.0 female Count

1 14 13 6 34 Expected Count

5.0 13.2 11.1 4.7 34.0 Total Count

19 50 42 18 129 Expected Count