1 If we focus on two different characteristics of an organism, each controlled by a sin-

Example 14.1 If we focus on two different characteristics of an organism, each controlled by a sin-

  gle gene, and cross a pure strain having genotype AB with a pure strain having genotype ab (capital letters denoting dominant alleles and small letters recessive alleles), the resulting genotype will be AaBb. If these first-generation organisms are then crossed among themselves (a dihybrid cross), there will be four phenotypes depending on whether a dominant allele of either type is present. Mendel’s laws of

  inheritance imply that these four phenotypes should have probabilities 9 , 3 3

  and 16 of arising in any given dihybrid cross. The article “Linkage Studies of the Tomato” (Trans. Royal Canadian Institute,

  1931: 1–19) reports the following data on phenotypes from a dihybrid cross of tall cut-leaf tomatoes with dwarf potato-leaf tomatoes. There are k54 categories cor- responding to the four possible phenotypes, with the null hypothesis being

  The expected cell counts are 9n16, 3n16, 3n16, and n16, and the test is based on k2153

  df. The total sample size was n 5 1611 . Observed and expected counts

  are given in Table 14.2.

  Table 14.2 Observed and Expected Cell Counts for Example 14.1

  Tall, Dwarf, Dwarf,

  cut leaf

  potato leaf

  cut leaf

  potato leaf

  n i

  np i0

  CHAPTER 14 Goodness-of-Fit Tests and Categorical Data Analysis

  The contribution to x 2 from the first cell is

  Cells 2, 3, and 4 contribute .658, .274, and .108, respectively, so x 2 5 .433 1 .658 1 .274 1 .108 5 1.473 . A test with significance level .10 requires x 2 .10, 3 , the number in

  the 3 df row and .10 column of Appendix Table A.7. This critical value is 6.251. Since

  1.473 is not at least 6.251, H 0 cannot be rejected even at this rather large level of sig-

  nificance. The data is quite consistent with Mendel’s laws.

  ■

  Although we have developed the chi-squared test for situations in which k . 2, it can also be used when k52 . The null hypothesis in this case can be stated as

  H 0 :p 1 5p 10 , since the relations p 2 512p 1 and p 20 512p 10 make the inclusion of in p 2 5p 20 H 0 redundant. The alternative hypothesis is H a :p 1 2 p 10 . These

  hypotheses can also be tested using a two-tailed z test with test statistic

  Surprisingly, the two test procedures are completely equivalent. This is because it can

  be shown that Z 2 5x 2 and (z a2 ) 2 5x 2 , so that 1,a 2 x x 2 1,a if and only if (iff ) uZuz a2 . If the alternative hypothesis is either H a :p 1 .p 10 or H a :p 1 ,p 10 , the

  chi-squared test cannot be used. One must then revert to an upper- or lower-tailed z test.

  As is the case with all test procedures, one must be careful not to confuse sta-

  tistical significance with practical significance. A computed x 2 that exceeds x 2 a,k21

  may be a result of a very large sample size rather than any practical differences between the hypothesized p i0 ’ s and true p ’s. Thus if p

  10 5p 20 5p 30 5 3 , but the true p i ’s have values .330, .340, and .330, a large value of x 2 is sure to arise with a sufficiently large n. Before rejecting H 0 , the pˆ i ’s should be examined to see whether

  i

  they suggest a model different from that of H 0 from a practical point of view.