7 Consider a series of games between two teams, I and II, that terminates as soon as
Example 14.7 Consider a series of games between two teams, I and II, that terminates as soon as
one team has won four games (with no possibility of a tie). A simple probability model for such a series assumes that outcomes of successive games are independent and that the probability of team I winning any particular game is a constant u. We arbitrarily designate I the better team, so that u .5 . Any particular series can then
terminate after 4, 5, 6, or 7 games. Let p 1 (u), p 2 (u), p 3 (u), p 4 (u) denote the proba-
bility of termination in 4, 5, 6, and 7 games, respectively. Then
p 1 (u) 5 P(I wins in 4 games) 1 P(II wins in 4 games)
5u 4 1 (1 2 u) 4 p 2 (u) 5 P(I wins 3 of the first 4 and the fifth) 1P(I loses 3 of the first 4 and the fifth)
5 a 3 3b u (1 2 u) u1 a 1b u(1 2 u) 3 (1 2 u)
5 4u(1 2 u)[u 3 1 (1 2 u) 3 ] p 3 (u) 5 10u 2 (1 2 u) 2 [u 2 1 (1 2 u) 2 ] p (u) 5 20u 3 (1 2 u) 4 3
The article “Seven-Game Series in Sports” by Groeneveld and Meeden (Mathematics Magazine, 1975: 187–192) tested the fit of this model to results of National Hockey League playoffs during the period 1943–1967 (when league mem- bership was stable). The data appears in Table 14.5.
Table 14.5 Observed and Expected Counts for the Simple Model Cell
Number of games played
Observed frequency
15 26 24 18 n 5 83
Estimated expected frequency
The estimated expected cell counts are 83p i (ˆ u) , where is the value of u that maximizes u ˆ
4 1 (1 2 u) 4 15 3 3 5u 26 6
54u(1 2 u)[u 1 (1 2 u) ] 6 2 (1 2 u) 2 [u 2
510u
1 (1 2 u) 2 ] 24 3 3 6 18 520u (1 2 u) 6 (14.5)
Standard calculus methods fail to yield a nice formula for the maximizing value , u ˆ so it must be computed using numerical methods. The result is u 5 .654 ˆ , from which p i (ˆ u) and the estimated expected cell counts are computed. The computed
value of x 2 is .360, and (since k212m54212152 ) x 2 .10,2 5 4.605 .
There is thus no reason to reject the simple model as applied to NHL playoff series.
The cited article also considered World Series data for the period 1903–1973. For the simple model, x 2 5 5.97 , so the model does not seem appropriate. The sug-
gested reason for this is that for the simple model
P(series lasts six games u series lasts at least six games) .5
whereas of the 38 series that actually lasted at least six games, only 13 lasted exactly six. The following alternative model is then introduced:
p (u ,u )5u 4 1 1 (1 2 u 1 4 1 2 1 )
p (u ,u
2 1 2 ) 5 4u 1 (1 2 u 1 ) [u 1 1 (1 2 u 1 ) ]
p
3 (u 1 ,u 2 ) 5 10u 1 (1 2 u ) 2 1 u 2
p
4 (u 1 ,u 2 ) 5 10u 1 (1 2 u 1 ) (1 2 u 2 )
CHAPTER 14 Goodness-of-Fit Tests and Categorical Data Analysis
The first two p i ’s are identical to the simple model, whereas u 2 is the conditional
probability of (14.6) (which can now be any number between 0 and 1). The values
of u ˆ 1 and u ˆ 2 that maximize the expression analogous to expression (14.5) are
determined numerically as u ˆ 1 5 .614, ˆ u 2 5 .342 . A summary appears in Table
14.6, and x 2 5 .384 . Since two parameters are estimated,
df 5 k 2 1 2 m 5 1
with x 2 .10,1 5 2.706 , indicating a good fit of the data to this new model.
Table 14.6 Observed and Expected Counts for the More Complex Model Number of games played
Observed frequency
Estimated expected frequency
■
One of the conditions on the i ’s in the theorem is that they be functionally inde- pendent of one another. That is, no single u i can be determined from the values of other u i ’s, so that m is the number of functionally independent parameters estimated. A gen- eral rule of thumb for degrees of freedom in a chi-squared test is the following.
number of freely
number of independent
x 2
df 5 a
determined cell counts b2a parameters estimated b
This rule will be used in connection with several different chi-squared tests in the next section.