1 If we focus on two different characteristics of an organism, each controlled by a sin-
Example 14.1 If we focus on two different characteristics of an organism, each controlled by a sin-
gle gene, and cross a pure strain having genotype AB with a pure strain having genotype ab (capital letters denoting dominant alleles and small letters recessive alleles), the resulting genotype will be AaBb. If these first-generation organisms are then crossed among themselves (a dihybrid cross), there will be four phenotypes depending on whether a dominant allele of either type is present. Mendel’s laws of
inheritance imply that these four phenotypes should have probabilities 9 , 3 3
and 16 of arising in any given dihybrid cross. The article “Linkage Studies of the Tomato” (Trans. Royal Canadian Institute,
1931: 1–19) reports the following data on phenotypes from a dihybrid cross of tall cut-leaf tomatoes with dwarf potato-leaf tomatoes. There are k54 categories cor- responding to the four possible phenotypes, with the null hypothesis being
The expected cell counts are 9n16, 3n16, 3n16, and n16, and the test is based on k2153
df. The total sample size was n 5 1611 . Observed and expected counts
are given in Table 14.2.
Table 14.2 Observed and Expected Cell Counts for Example 14.1
Tall, Dwarf, Dwarf,
cut leaf
potato leaf
cut leaf
potato leaf
n i
np i0
CHAPTER 14 Goodness-of-Fit Tests and Categorical Data Analysis
The contribution to x 2 from the first cell is
Cells 2, 3, and 4 contribute .658, .274, and .108, respectively, so x 2 5 .433 1 .658 1 .274 1 .108 5 1.473 . A test with significance level .10 requires x 2 .10, 3 , the number in
the 3 df row and .10 column of Appendix Table A.7. This critical value is 6.251. Since
1.473 is not at least 6.251, H 0 cannot be rejected even at this rather large level of sig-
nificance. The data is quite consistent with Mendel’s laws.
■
Although we have developed the chi-squared test for situations in which k . 2, it can also be used when k52 . The null hypothesis in this case can be stated as
H 0 :p 1 5p 10 , since the relations p 2 512p 1 and p 20 512p 10 make the inclusion of in p 2 5p 20 H 0 redundant. The alternative hypothesis is H a :p 1 2 p 10 . These
hypotheses can also be tested using a two-tailed z test with test statistic
Surprisingly, the two test procedures are completely equivalent. This is because it can
be shown that Z 2 5x 2 and (z a2 ) 2 5x 2 , so that 1,a 2 x x 2 1,a if and only if (iff ) uZuz a2 . If the alternative hypothesis is either H a :p 1 .p 10 or H a :p 1 ,p 10 , the
chi-squared test cannot be used. One must then revert to an upper- or lower-tailed z test.
As is the case with all test procedures, one must be careful not to confuse sta-
tistical significance with practical significance. A computed x 2 that exceeds x 2 a,k21
may be a result of a very large sample size rather than any practical differences between the hypothesized p i0 ’ s and true p ’s. Thus if p
10 5p 20 5p 30 5 3 , but the true p i ’s have values .330, .340, and .330, a large value of x 2 is sure to arise with a sufficiently large n. Before rejecting H 0 , the pˆ i ’s should be examined to see whether
i
they suggest a model different from that of H 0 from a practical point of view.