7 Consider a series of games between two teams, I and II, that terminates as soon as

Example 14.7 Consider a series of games between two teams, I and II, that terminates as soon as

  one team has won four games (with no possibility of a tie). A simple probability model for such a series assumes that outcomes of successive games are independent and that the probability of team I winning any particular game is a constant u. We arbitrarily designate I the better team, so that u .5 . Any particular series can then

  terminate after 4, 5, 6, or 7 games. Let p 1 (u), p 2 (u), p 3 (u), p 4 (u) denote the proba-

  bility of termination in 4, 5, 6, and 7 games, respectively. Then

  p 1 (u) 5 P(I wins in 4 games) 1 P(II wins in 4 games)

  5u 4 1 (1 2 u) 4 p 2 (u) 5 P(I wins 3 of the first 4 and the fifth) 1P(I loses 3 of the first 4 and the fifth)

  5 a 3 3b u (1 2 u) u1 a 1b u(1 2 u) 3 (1 2 u)

  5 4u(1 2 u)[u 3 1 (1 2 u) 3 ] p 3 (u) 5 10u 2 (1 2 u) 2 [u 2 1 (1 2 u) 2 ] p (u) 5 20u 3 (1 2 u) 4 3

  The article “Seven-Game Series in Sports” by Groeneveld and Meeden (Mathematics Magazine, 1975: 187–192) tested the fit of this model to results of National Hockey League playoffs during the period 1943–1967 (when league mem- bership was stable). The data appears in Table 14.5.

  Table 14.5 Observed and Expected Counts for the Simple Model Cell

  Number of games played

  Observed frequency

  15 26 24 18 n 5 83

  Estimated expected frequency

  The estimated expected cell counts are 83p i (ˆ u) , where is the value of u that maximizes u ˆ

  4 1 (1 2 u) 4 15 3 3 5u 26 6

  54u(1 2 u)[u 1 (1 2 u) ] 6 2 (1 2 u) 2 [u 2

  510u

  1 (1 2 u) 2 ] 24 3 3 6 18 520u (1 2 u) 6 (14.5)

  Standard calculus methods fail to yield a nice formula for the maximizing value , u ˆ so it must be computed using numerical methods. The result is u 5 .654 ˆ , from which p i (ˆ u) and the estimated expected cell counts are computed. The computed

  value of x 2 is .360, and (since k212m54212152 ) x 2 .10,2 5 4.605 .

  There is thus no reason to reject the simple model as applied to NHL playoff series.

  The cited article also considered World Series data for the period 1903–1973. For the simple model, x 2 5 5.97 , so the model does not seem appropriate. The sug-

  gested reason for this is that for the simple model

  P(series lasts six games u series lasts at least six games) .5

  whereas of the 38 series that actually lasted at least six games, only 13 lasted exactly six. The following alternative model is then introduced:

  p (u ,u )5u 4 1 1 (1 2 u 1 4 1 2 1 )

  p (u ,u

  2 1 2 ) 5 4u 1 (1 2 u 1 ) [u 1 1 (1 2 u 1 ) ]

  p

  3 (u 1 ,u 2 ) 5 10u 1 (1 2 u ) 2 1 u 2

  p

  4 (u 1 ,u 2 ) 5 10u 1 (1 2 u 1 ) (1 2 u 2 )

  CHAPTER 14 Goodness-of-Fit Tests and Categorical Data Analysis

  The first two p i ’s are identical to the simple model, whereas u 2 is the conditional

  probability of (14.6) (which can now be any number between 0 and 1). The values

  of u ˆ 1 and u ˆ 2 that maximize the expression analogous to expression (14.5) are

  determined numerically as u ˆ 1 5 .614, ˆ u 2 5 .342 . A summary appears in Table

  14.6, and x 2 5 .384 . Since two parameters are estimated,

  df 5 k 2 1 2 m 5 1

  with x 2 .10,1 5 2.706 , indicating a good fit of the data to this new model.

  Table 14.6 Observed and Expected Counts for the More Complex Model Number of games played

  Observed frequency

  Estimated expected frequency

  ■

  One of the conditions on the i ’s in the theorem is that they be functionally inde- pendent of one another. That is, no single u i can be determined from the values of other u i ’s, so that m is the number of functionally independent parameters estimated. A gen- eral rule of thumb for degrees of freedom in a chi-squared test is the following.

  number of freely

  number of independent

  x 2

  df 5 a

  determined cell counts b2a parameters estimated b

  This rule will be used in connection with several different chi-squared tests in the next section.