When Parameters Are Estimated
x 2 When Parameters Are Estimated
As before, k will denote the number of categories or cells, and p i will denote the probability of an observation falling in the ith cell. The null hypothesis now states
that each p i is a function of a small number of parameters u 1 ,…, u m with the u i ’s
otherwise unspecified:
H 0 :p 1 5p 1 (u),…, p k 5p k (u)
where u 5 (u 1 , … ,u m )
H a : the hypothesis H 0 is not true
(14.2) For example, for H 0 of (14.1), m 5 1 (there is only one u ), p 1 sud 5 u 2 ,
p 2 sud 5 2us1 2 ud, and p 3 sud 5 s1 2 ud 2 .
In the case k 5 2, there is really only a single rv, N 1 (since N 1 N 2 5 n ), which has a binomial distribution. The joint probability that N 1 5 n 1 and N 2 5 n 2
is then
P n (N
1 5 n 1 ,N 2 5 n 2 )5
n _
n
1 + p 1 1 ? p 2 2 ~p 1 n 1 ? p n 2 2
where p 1 p 2 5 1 and n 1 n 2 5 n . For general k, the joint distribution of N 1 ,…, N k
is the multinomial distribution (Section 5.1) with
P (N
1 5 n 1 ,…, N k 5 n k )~p 1 n 1 ? p 2 n 2 ?…? p n k k (14.3)
When H 0 is true, (14.3) becomes
P (N 1 5 n 1 ,…, N k 5 n k ) ~ [p 1 (u)] n 1 ?…? [p k (u)] n k (14.4) To apply a chi-squared test, u 5 su 1 ,…, u m d must be estimated.
meThod of eSTimaTion
Let n 1 ,n 2 ,…, n k denote the observed values of N 1 ,…, N k . Then uˆ 1 ,…, uˆ m are
those values of the u i ’s that maximize (14.4).
The resulting estimators uˆ 1 , … ,uˆ m are the maximum likelihood estimators of
u 1 ,…, u m ; this principle of estimation was discussed in Section 6.2.
14.2 Goodness-of-Fit tests for Composite hypotheses 629
exam ple 14.5
In humans there is a blood group, the MN group, that is composed of individuals
having one of the three blood types M, MN, and N. Type is determined by two alleles, and there is no dominance, so the three possible genotypes give rise to three pheno- types. A population consisting of individuals in the MN group is in equilibrium if
P sMd 5 p 1 5u 2 P sMNd 5 p 2 5 2u s1 2 ud P sNd 5 p 3 5 s1 2 ud 2
for some u. Suppose a sample from such a population yielded the results shown in Table 14.4.
Table 14.4 Observed Counts for Example 14.5
Type
M MN N
Observed 125 225 150 n 5 500
Then
[p (u)] n 1 [p (u)] n 2 [p (u)] n 3 5 [(u 2 )] n 1 [2u(1 2 u)] n 2 [(1 2 u) 1 2 2 3 ] n 3
52 n 2 ?u 2n 1 n 2 ? (1 2 u) n 2 1 2n 3
Maximizing this with respect to u (or, equivalently, maximizing the natural loga- rithm of this quantity, which is easier to differentiate) yields
[(2n 1 n 2 ) 1 (n 2 1 2n 3 )]
2n
With n 1 5 125 and n 2 5 225, uˆ 5 4751000 5 .475.
n Once u 5 (u 1 , … ,u m ) has been estimated by uˆ 5 (uˆ 1 , … , uˆ m ), the estimated
expected cell counts are the np i (uˆ)’s. These are now used in place of the np i 0 ’s of
Section 14.1 to specify a x 2 statistic.
Theorem Under general “regularity” conditions on u 1 , …, u m and the p i (u)’s, if u 1 , …, u m
are estimated by the method of maximum likelihood as described previously and n is large,
all cells o
o i5 1 np i (uˆ)
(observed 2 estimated expected) 2 k [N i 2 np i (uˆ)] 2
x 2 5 5
estimated expected
has approximately a chi-squared distribution with k 2 1 2 m df when H 0 of (14.2) is true. The P-value is therefore (roughly) the area under the x 2 k2 12m curve to the right of the calculated x 2 . In practice, the test can be used if
np i (uˆ) 5 for every i. Notice that the number of degrees of freedom is reduced by the number of u i ’s estimated.
example 14.6
With uˆ 5 .475 and n 5 500, the estimated expected cell counts are np 1 (uˆ) 5 500(uˆ) 2 5
(Example 14.5
112.81, np 2 (uˆ) 5 (500)(2)(.475)(1.475) 5 249.38, and np 3 suˆd5500 2112.812
continued)
249.38 5 137.81. Then
2 s125 2 112.81d 2 s225 2 249.38d 2 s150 2 137.81d x 2 5 1 1 5 4.78
630 Chapter 14 Goodness-of-Fit tests and Categorical Data analysis
Appendix Table A.11 shows that for df 5 3 2 1 2 1 5 1, P value < .029. Therefore
H 0 is rejected at significance level .05 (but not at level .01).
n
example 14.7
Consider a series of games between two teams, I and II, that terminates as soon as one team has won four games (with no possibility of a tie). A simple probability model for such a series assumes that outcomes of successive games are independent and that the probability of team I winning any particular game is a constant u. We arbitrarily designate I the better team, so that u .5. Any particular series can then
terminate after 4, 5, 6, or 7 games. Let p 1 sud, p 2 sud, p 3 sud, p 4 sud denote the prob-
ability of termination in 4, 5, 6, and 7 games, respectively. Then
p 1 sud 5 PsI wins in 4 gamesd 1 PsII wins in 4 gamesd
5u 4 1 s1 2 ud 4 p 2 sud 5 PsI wins 3 of the first 4 and the fifthd 1P sI loses 3 of the first 4 and the fifthd
5 3 u 3 s1 2 ud ? u 1 1 u s1 2 ud 3 ? s1 2 ud
5 4u s1 2 ud[u 3 1 s1 2 ud 3 ] p 3 sud 5 10u 2 s1 2 ud 2 [u 2 1 s1 2 ud 2 ] p 4 sud 5 20u 3 s1 2 ud 3
The article “Seven-Game Series in Sports” by Groeneveld and Meeden
(Mathematics Magazine, 1975: 187–192) tested the fit of this model to results of National Hockey League playoffs during the period 1943–1967 (when league mem- bership was stable). The data appears in Table 14.5.
Table 14.5 Observed and Expected Counts for the Simple Model
Cell
Number of games played
Observed frequency
15 26 24 18 n5 83
Estimated expected frequency
The estimated expected cell counts are 83p i (uˆ), where uˆ is the value of u that maximizes
{u 4 1 s1 2 ud 4 } 15 ? {4u s1 2 ud[u 3 1 s1 2 ud 3 ]} 26
? {10u 2 s1 2 ud 2 [u 2 1 s1 2 ud 2 ]} 24 ? {20u 3 s1 2 ud 3 } 18 (14.5)
Standard calculus methods fail to yield a nice formula for the maximizing value uˆ, so it must be computed using numerical methods. The result is uˆ 5 .654, from which p i suˆd and the estimated expected cell counts are computed. The computed value of
x 2 is .360. According to the k 2 1 2 m 5 4 2 1 2 1 5 2 df column of Table A.11, P -value . .10. There is thus no reason to reject the simple model as applied to the NHL playoff series.
The cited article also considered World Series data for the period 1903–1973.
For the simple model, x 2 5 5.97; Table A.11 yields P-value < .05. At significance
level .10, the model is of doubtful validity. The suggested reason for this is that
P sseries lasts six gamesuseries lasts at least six gamesd .5
14.2 Goodness-of-Fit tests for Composite hypotheses 631
whereas of the 38 series that actually lasted at least six games, only 13 lasted exactly six. The following alternative model is then introduced:
The first two p i ’s are identical to the simple model, whereas u 2 is the conditional
probability of (14.6) (which can now be any number between 0 and 1). The values
of uˆ 1 and uˆ 2 that maximize the expression analogous to expression (14.5) are deter- mined numerically as uˆ 1 5 .614, uˆ 2 5 .342. A summary appears in Table 14.6, and
x 2 5 .384. Since two parameters are estimated, df 5 k 2 1 2 m 5 1. The P-value considerably exceeds .10, indicating a good fit of the data to this new model.
Table 14.6 Observed and Expected Counts for the More Complex Model
Number of games played
Observed frequency
Estimated expected frequency
10.85 18.08 12.68 24.39 n
One of the conditions on the u i ’s in the theorem is that they be functionally independent of one another. That is, no single u i can be determined from the values of other u i ’s, so that m is the number of functionally independent parameters estimated.
A general rule of thumb for degrees of freedom in a chi-squared test is the following.
2 number of freely
number of independent
1 determined cell counts 2 1 parameters estimated 2
x df 5
This rule will be used in connection with several different chi-squared tests in the next section.
Goodness of Fit for discrete distributions
Many experiments involve observing a random sample X 1 ,X 2 , … ,X n from some
discrete distribution. One may then wish to investigate whether the underlying dis- tribution is a member of a particular family, such as the Poisson or negative binomial family. In the case of both a Poisson and a negative binomial distribution, the set of possible values is infinite, so the values must be grouped into k subsets before
a chi-squared test can be used. The groupings should be done so that the expected frequency in each cell (group) is at least 5. The last cell will then correspond to
X values of c, c 1 1, c 1 2,… for some value c.
This grouping can considerably complicate the computation of the uˆ i ’s and estimated expected cell counts. This is because the theorem requires that the uˆ i ’s be
obtained from the cell counts N 1 , … ,N k rather than the sample values X 1 , … ,X n .
example 14.8
Table 14.7 presents count data on the number of Larrea divaricata plants found in each
of 48 sampling quadrats, as reported in the article “Some Sampling Characteristics
of Plants and Arthropods of the Arizona Desert” (Ecology, 1962: 567–571) .
632 Chapter 14 Goodness-of-Fit tests and Categorical Data analysis
Table 14.7 Observed Counts for Example 14.8
Cell
Number of plants
The article’s author fit a Poisson distribution to the data. Let m denote the Poisson parameter and suppose for the moment that the six counts in cell 5 were
actually 4, 4, 5, 5, 6, 6. Then denoting sample values by x 1 ,…, x 48 , nine of the x i ’s
were 0, nine were 1, and so on. The likelihood of the observed sample is
The value of m for which this is maximized is mˆ 5 ox i yn 5 101y48 5 2.10 (the value reported in the article).
However, the mˆ required for x 2 is obtained by maximizing Expression (14.4)
rather than the likelihood of the full sample. The cell probabilities are
i5 o 0 !
so the right-hand side of (14.4) becomes
e 2m m 0 9 e 2m m 1 9 e 2m m 2 10 e 2m m 3 14 3 e 2m m i 6
3 0! 4 3 1! 4 3 2! 4 3 3! 4 3 o i5 0 ! 4
There is no nice formula for mˆ, the maximizing value of m, in this latter expression, so it must be obtained numerically.
n
Because the parameter estimates are usually more difficult to compute from the grouped data than from the full sample, they are typically computed using this latter method. If these “full” estimators are used in the chi-squared statistic, the dis-
tribution of the statistic when H 0 is true is quite complicated, so the actual P-value
cannot be determined. However, the following result usually enables us to reach a conclusion at the desired significance level a.
Theorem Let uˆ 1 ,…,uˆ m
be the maximum likelihood estimators of u 1 , …, u m based on the full sample X 1 , …, X n , and let x 2 denote the statistic based on these estimators.
Also let
P 1 5 the P-value for an upper-tailed chi-squared test based on k 2 1 df P 2 5 the P-value for an upper-tailed chi-squared test based on k 2 1 2 m df
Then it can be shown that
P 1 P-value P 2 (14.7)
14.2 Goodness-of-Fit tests for Composite hypotheses 633
That is, the P-value for the test under consideration is sandwiched in between the P- values for two “pure” upper-tailed chi-squared tests based on different df’s. The test procedure implied by (14.7) has the unusual feature that under some circum- stances judgment must be withheld until more data is available.
Select a significance level a. Then If a P 1 , do not reject H 0
If a P 2 , reject H 0 (14.8)
If P 1 ,a,P 2 , withhold judgment
Suppose, for example, that k 5 6, m 5 2, and a 5 .05. The two relevant df’s are 6 2
1 5 5 and 6 2 1 2 2 5 3. Then if x 2 5 7.0, Table A.11 shows that the P-value for a
3 df test is about .07 and the P-value for a 5 df test exceeds .10. Therefore we would
not be able to reject H 0 because .05 is at most the smaller of the two pure chi-squared P -values. If, however, x 2 5 15, then the 3 df P-value is roughly .002 and the 5 df
P -value is approximately .01. Because .05 is at least the larger of these pure P-values,
we are given license to reject H 0 . Only if .05 lies between the two pure chi-squared
P -values would we not be able to reach a conclusion.
example 14.9 Using mˆ 5 2.10, the estimated expected cell counts are computed from np i smˆd,
(Example 14.8
where n 5 48. For example,
continued)
e 2 2.1 s2.1d 0
np
1 smˆd 5 48 ?
5 s48dse 2 d 5 5.88
Similarly, np 2 smˆd 5 12.34, np 3 smˆd 5 12.96, np 4 smˆd 5 9.07, and np 5 smd 5 48 2
5.88 2 … 2 9.07 5 7.75. Then
2 s9 2 5.88d 2 s6 2 7.75d x 2 5 1…1
The relevant dfs are 5 2 1 5 4 and 5 2 2 5 3. Then Table A.11 shows that the P-value for a 3 df test is about .0955 and that for a 4 df test exceeds .10. Therefore at significance
level .05, H 0 cannot be rejected because the P-exceeds .0955 and therefore certainly
exceeds .05. At this level, it is plausible that the actual distribution is Poisson. However, if the selected significance level were instead .10, we’d be in the inconclusive situation because the P-value could be (slightly) smaller than .10 or larger than .10.
n
Sometimes even the maximum likelihood estimates based on the full sample are quite difficult to compute. This is the case, for example, for the two-parameter (generalized) negative binomial distribution. In such situations, method-of-moments estimates are often used, though it is not known to what extent the use of moments
estimators affects the null distribution of x 2 .
Goodness of Fit for continuous distributions
The chi-squared test can also be used to test whether the sample comes from a speci- fied family of continuous distributions, such as the exponential family or the normal family. The choice of cells (class intervals) is even more arbitrary in the continu- ous case than in the discrete case. To ensure that the chi-squared test is valid, the cells should be chosen independently of the sample observations. Once the cells are
chosen, it is almost always quite difficult to estimate unspecified parameters (such as m and s in the normal case) from the observed cell counts, so instead mle’s based
634 Chapter 14 Goodness-of-Fit tests and Categorical Data analysis
on the full sample are computed. The test procedure is again specified by (14.7) and (14.8).