3 To see whether the time of onset of labor among expectant mothers is uniformly
Example 14.3 To see whether the time of onset of labor among expectant mothers is uniformly
distributed throughout a 24-hour day, we can divide a day into k periods, each of length 24k. The null hypothesis states that f(x) is the uniform pdf on the interval [0, 24], so that p i0 5 1k . The article “The Hour of Birth” (British J. of Preventive and Social Medicine, 1953: 43–59) reports on 1186 onset times, which were categorized into k 5 24 1-hour intervals beginning at midnight, resulting in cell counts of 52, 73,
89, 88, 68, 47, 58, 47, 48, 53, 47, 34, 21, 31, 40, 24, 37, 31, 47, 34, 36, 44, 78, and 59.
Each expected cell count is 1186 1 , and the resulting value of x 2 is 162.77.
Since x .01,23 5 41.637 , the computed value is highly significant, and the null hypoth- esis is resoundingly rejected. Generally speaking, it appears that labor is much more likely to commence very late at night than during normal waking hours. ■
For testing whether a sample comes from a specific normal distribution, the fundamental parameters are u 1 5m and u 2 5s , and each p i0 will be a function of these parameters.
Example 14.4 At a certain university, final exams are supposed to last 2 hours. The psychology
department constructed a departmental final for an elementary course that was believed to satisfy the following criteria: (1) actual time taken to complete the exam is normally distributed, (2) m 5 100 min, and (3) exactly 90 of all students will
14.1 Goodness-of-Fit Tests When Category Probabilities Are Completely Specified
finish within the 2-hour period. To see whether this is actually the case, 120 stu- dents were randomly selected, and their completion times recorded. It was decided that k58 intervals should be used. The criteria imply that the 90th percentile of the completion time distribution is m 1 1.28 s 5 120 . Since m 5 100 , this implies that . s 5 15.63
The eight intervals that divide the standard normal scale into eight equally likely segments are [0, .32), [.32, .675), [.675, 1.15), and [1.15, `) , and their four counter- parts are on the other side of 0. For m 5 100 and s 5 15.63 , these intervals become
[100, 105), [105, 110.55), [110.55, 117.97), and [117.97, `). Thus p 5 i0 1 8 5 .125
(i 5 1, c, 8), so each expected cell count is np i0
5 120 (.125) 5 15. The observed cell counts were 21, 17, 12, 16, 10, 15, 19, and 10, resulting in a x 2 of 7.73. Since
x 2 .10,7 5 12.017 and 7.73 is not 12.017 , there is no evidence for concluding that the criteria have not been met.
■
EXERCISES Section 14.1 (1–11)
1. What conclusion would be appropriate for an upper-tailed
chi-squared test in each of the following situations?
2 Frequency
a. a 5 .05, df 5 4, x
5 12.25 b. a 5 .01, df 5 3, x 2 5 8.54
Direction
c. a 5 .10, df 5 2, x 2 5 4.36
Frequency
d. a 5 .01, k 5 6, x 2 5 10.20
Direction
2. Say as much as you can about the P-value for an upper-tailed chi-squared test in each of the following situations:
Frequency
a. x 2 5 7.5, df 5 2
b. x 2 5 13.0, df 5 6
c. x 2 5 18.0, df 5 9
d. x 2 5 21.3, df 5 5
5. An information-retrieval system has ten storage locations.
e. x 2 5 5.0, k 5 4
Information has been stored with the expectation that the long-run proportion of requests for location i is given
3. The article “Racial Stereotypes in Children’s Television
by p i 5 (5.5 2 u i 2 5.5 u )30 . A sample of 200 retrieval
Commercials” (J. of Adver. Res., 2008: 80–93) reported the
requests gave the following frequencies for locations 1–10,
following frequencies with which ethnic characters appeared
respectively: 4, 15, 23, 25, 38, 31, 32, 14, 10, and 8. Use a
in recorded commercials that aired on Philadelphia television
chi-squared test at significance level .10 to decide whether
stations.
the data is consistent with the a priori proportions (use the P-value approach).
African
Ethnicity:
American Asian Caucasian Hispanic
6. The article “The Gap Between Wine Expert Ratings and
Frequency:
6 Consumer Preferences” (Intl. J. of Wine Business Res., 2008: 335–351) studied differences between expert and
The 2000 census proportions for these four ethnic groups
consumer ratings by considering medal ratings for wines,
are .177, .032, .734, and .057, respectively. Does the data
which could be gold (G), silver (S), or bronze (B). Three
suggest that the proportions in commercials are different
categories were then established: 1. Rating is the same
from the census proportions? Carry out a test of appropriate
[(G,G), (B,B), (S,S)]; 2. Rating differs by one medal
hypotheses using a significance level of .01, and also say as
[(G,S), (S,G), (S,B), (B,S)]; and 3. Rating differs by two
much as you can about the P-value.
medals [(G,B), (B,G)]. The observed frequencies for these three categories were 69, 102, and 45, respectively. On the
4. It is hypothesized that when homing pigeons are disoriented
hypothesis of equally likely expert ratings and consumer
in a certain manner, they will exhibit no preference for any
ratings being assigned completely by chance, each of the
direction of flight after takeoff (so that the direction X should
nine medal pairs has probability 19. Carry out an appro-
be uniformly distributed on the interval from 0° to 360°). To
priate chi-squared test using a significance level of .10 by
test this, 120 pigeons are disoriented, let loose, and the direc-
first obtaining P-value information.
tion of flight of each is recorded; the resulting data follows. Use the chi-squared test at level .10 to see whether the data
7. Criminologists have long debated whether there is a relation-
supports the hypothesis.
ship between weather conditions and the incidence of violent
CHAPTER 14 Goodness-of-Fit Tests and Categorical Data Analysis
crime. The author of the article “Is There a Season for
Homicide?” (Criminology, 1988: 287–296) classified 1361
homicides according to season, resulting in the accompanying
data. Test the null hypothesis of equal proportions using
a 5 .01 by using the chi-squared table to say as much as pos-
sible about the P-value.
10. a. Show that another expression for the chi-squared statistic is
x 2 5 g i 2n
i51 np i0
8. The article “Psychiatric and Alcoholic Admissions Do Not Occur Disproportionately Close to Patients’ Birthdays”
Why is it more efficient to compute x 2 using this formula?
(Psychological Reports, 1992: 944–946) focuses on the
b. When the null hypothesis is (H 0 :p 1 5p 2 5c5
existence of any relationship between the date of patient
p k 5 1k (i.e., for p i0 5 1k all i), how does the formula
admission for treatment of alcoholism and the patient’s
of part (a) simplify? Use the simplified expression to cal-
culate x birthday. Assuming a 365-day year (i.e., excluding leap 2 for the pigeondirection data in Exercise 4. year), in the absence of any relation, a patient’s admission
11. a. Having obtained a random sample from a population,
date is equally likely to be any one of the 365 possible days.
you wish to use a chi-squared test to decide whether the
The investigators established four different admission
population distribution is standard normal. If you base
categories: (1) within 7 days of birthday; (2) between 8 and
the test on six class intervals having equal probability
30 days, inclusive, from the birthday; (3) between 31 and 90
under H 0 , what should be the class intervals?
days, inclusive, from the birthday; and (4) more than
b. If you wish to use a chi-squared test to test H 0 : the pop-
90 days from the birthday. A sample of 200 patients gave
ulation distribution is normal with m 5 .5, s 5 .002
observed frequencies of 11, 24, 69, and 96 for categories 1,
and the test is to be based on six equiprobable (under H 0 )
2, 3, and 4, respectively. State and test the relevant hypothe-
class intervals, what should be these intervals?
ses using a significance level of .01.
c. Use the chi-squared test with the intervals of part (b) to decide, based on the following 45 bolt diameters,
9. The response time of a computer system to a request for a
whether bolt diameter is a normally distributed variable
certain type of information is hypothesized to have an
with in., m 5 .5 s 5 .002 in.
exponential distribution with parameter l51 sec (so if
X 5 response time , the pdf of X under H 0 is for f 0 (x) 5 e 2x
a. If you had observed X 1 ,X 2 , c, X n and wanted to use the
chi-squared test with five class intervals having equal
probability under H 0 , what would be the resulting class
b. Carry out the chi-squared test using the following data
resulting from a random sample of 40 response times: