) using a confidence level of 95: (1.30) 2 (1.68) 2
(m 2 ) using a confidence level of 95: (1.30) 2 (1.68) 2
That is, with 95 confidence,
23.34 , m 1 2m 2 , 22.44 . We can therefore be
highly confident that the true average shear strength for the 12-in. bolts exceeds that for the 38-in. bolts by between 2.44 kip and 3.34 kip. Notice that if we relabel so
that m 1 refers to 12-in. bolts and m 2 to 38-in. bolts, the confidence interval is now
centered at 2.89 and the value .45 is still subtracted and added to obtain the confi- 1 dence limits. The resulting interval is (2.44, 3.34), and the interpretation is identical to that for the interval previously calculated.
■ If the variances s 2 1 and s 2 are at least approximately known and the investigator
uses equal sample sizes, then the common sample size n that yields a 100(1 2 a) interval of width w is
4z 2 (s a2 2 1 1s 2 ) n5 w 2
which will generally have to be rounded up to an integer.
EXERCISES Section 9.1 (1–16)
1. An article in the November 1983 Consumer Reports compared
b. Suppose the population standard deviations of lifetime are
various types of batteries. The average lifetimes of Duracell
1.8 hours for Duracell batteries and 2.0 hours for
Alkaline AA batteries and Eveready Energizer Alkaline AA
Eveready batteries. With the sample sizes given in part (a),
batteries were given as 4.1 hours and 4.5 hours, respectively.
what is the variance of the statistic
, and what is its
Suppose these are the population average lifetimes.
standard deviation?
a. Let be the sample average lifetime of 100 Duracell bat- X c. For the sample sizes given in part (a), draw a picture of the
teries and Y
be the sample average lifetime of 100
approximate distribution curve of X2Y (include a mea-
Eveready batteries. What is the mean value of
(i.e.,
surement scale on the horizontal axis). Would the shape of
where is the distribution of X2Y centered)? How does
the curve necessarily be the same for sample sizes of 10
your answer depend on the specified sample sizes?
batteries of each type? Explain.
9.1 z Tests and Confidence Intervals for a Difference Between Two Population Means
2. The National Health Statistics Reports dated Oct. 22, 2008,
d. Assuming that m5n , what sample sizes are required to
included the following information on the heights (in.) for
ensure that b 5 .1 when m 1 2m 2 5 21.2 ?
non-Hispanic white females:
6. An experiment to compare the tension bond strength of poly- mer latex modified mortar (Portland cement mortar to which
Sample Sample Std. Error polymer latex emulsions have been added during mixing) to
that of unmodified mortar resulted in x 5 18.12 kgfcm 2 for
the modified mortar (m 5 40) and y 5 16.87 kgfcm 2 for the
60 and older
unmodified mortar (n 5 32) . Let m 1 and m 2 be the true aver- age tension bond strengths for the modified and unmodified
a. Calculate and interpret a confidence interval at confidence
mortars, respectively. Assume that the bond strength distribu-
level approximately 95 for the difference between pop-
tions are both normal.
ulation mean height for the younger women and that for
a. Assuming that s 1 ⫽ 1.6 and s 2 ⫽ 1.4, test H 0 :m 1 ⫺m 2 ⫽0
the older women.
versus H a :m 1 ⫺m 2 ⬎ 0 at level .01.
b. Let m 1 denote the population mean height for those aged
b. Compute the probability of a type II error for the test of
20–39 and m 2 denote the population mean height for those
part (a) when m 1 2m 2 51 .
aged 60 and older. Interpret the hypotheses H 0 :m 1 2 c. Suppose the investigator decided to use a level .05 test and
m 2 51 and H a :m 1 2m 2 .1 , and then carry out a test of
wished b 5 .10 when m 1 2m 2 51 . If m 5 40 , what
these hypotheses at significance level .001 using the rejec-
value of n is necessary?
tion region approach.
d. How would the analysis and conclusion of part (a) change
c. What is the P-value for the test you carried out in (b)?
if s 1 and s 2 were unknown but s 1 5 1.6 and s 2 5 1.4 ?
Based on this P-value, would you reject the null hypothe-
7. Is there any systematic tendency for part-time college faculty
sis at any reasonable significance level? Explain.
to hold their students to different standards than do full-time
d. What hypotheses would be appropriate if m 1 referred to the
faculty? The article “Are There Instructional Differences
older age group, m 2 to the younger age group, and you
Between Full-Time and Part-Time Faculty?” (College
wanted to see if there was compelling evidence for conclud-
Teaching, 2009: 23–26) reported that for a sample of 125
ing that the population mean height for younger women
courses taught by full-time faculty, the mean course GPA was
exceeded that for older women by more than 1 in.?
2.7186 and the standard deviation was .63342, whereas for a
3. Let m 1 denote true average tread life for a premium brand of
sample of 88 courses taught by part-timers, the mean and
P20565R15 radial tire, and let m 2 denote the true average
standard deviation were 2.8639 and .49241, respectively.
tread life for an economy brand of the same size. Test
Does it appear that true average course GPA for part-time H 0 :m 1 2m 2 5 5000 versus at H a :m 1 2m 2 . 5000 level faculty differs from that for faculty teaching full-time? Test
.01, using the following data: m 5 45, x 5 42,500,
the appropriate hypotheses at significance level .01 by first
s 1 5 2200, n 5 45, y 5 36,800 , and s 2 5 1500 .
obtaining a P-value.
4. a. Use the data of Example 9.4 to compute a 95 CI for
8. Tensile-strength tests were carried out on two different grades m 1 2m 2 . Does the resulting interval suggest that m 1 2m 2 of wire rod (“Fluidized Bed Patenting of Wire Rods,” Wire J.,
has been precisely estimated?
June 1977: 56–61), resulting in the accompanying data.
b. Use the data of Exercise 3 to compute a 95 upper confi-
dence bound for m 1 2m 2 .
Sample Mean
Sample
5. Persons having Reynaud’s syndrome are apt to suffer a sud-
Grade
Size
(kgmm 2 ) SD
den impairment of blood circulation in fingers and toes. In an experiment to study the extent of this impairment, each sub-
AISI 1064
m 5 129
x 5 107.6 s 1 5 1.3
ject immersed a forefinger in water and the resulting heat out-
AISI 1078
n 5 129
y 5 123.6 s 2 5 2.0
put (calcm 2 min) was measured. For m 5 10 subjects with the syndrome, the average heat output was x 5 .64 , and for
a. Does the data provide compelling evidence for concluding
n 5 10 nonsufferers, the average output was 2.05. Let m 1 and
that true average strength for the 1078 grade exceeds that
m
2 denote the true average heat outputs for the two types of
for the 1064 grade by more than 10 kgmm ? Test the
subjects. Assume that the two distributions of heat output are
appropriate hypotheses using the P-value approach.
normal with s 1 5 .2 and s 2 5 .4 .
b. Estimate the difference between true average strengths for
a. Consider testing H
0 : m 1 ⫺m 2 ⫽ ᎐1.0 versus H a : 1 ⫺ 2 ⬍ ᎐1.0 at level .01. Describe in words what H a says,
the two grades in a way that provides information about
precision and reliability.
and then carry out the test.
9. The article “Evaluation of a Ventilation Strategy to Prevent
b. Compute the P-value for the value of Z obtained in part (a).
Barotrauma in Patients at High Risk for Acute Respiratory
c. What is the probability of a type II error when the actual
Distress Syndrome” (New Engl. J. of Med., 1998: 355–358)
difference between m 1 and m 2 is ? m 1 2m 2 5 21.2
reported on an experiment in which 120 patients with similar clinical features were randomly divided into a control group
CHAPTER 9 Inferences Based on Two Samples
and a treatment group, each consisting of 60 patients. The sam-
Calculate and interpret a 99 CI for the difference between
ple mean ICU stay (days) and sample standard deviation for
true average 7-day strength and true average 28-day
the treatment group were 19.9 and 39.1, respectively, whereas
strength.
these values for the control group were 13.7 and 15.8.
13. A mechanical engineer wishes to compare strength proper-
a. Calculate a point estimate for the difference between
ties of steel beams with similar beams made with a particu-
true average ICU stay for the treatment and control
lar alloy. The same number of beams, n, of each type will be
groups. Does this estimate suggest that there is a signif-
tested. Each beam will be set in a horizontal position with a
icant difference between true average stays under the
support on each end, a force of 2500 lb will be applied at the
two conditions?
center, and the deflection will be measured. From past expe-
b. Answer the question posed in part (a) by carrying out a
rience with such beams, the engineer is willing to assume
formal test of hypotheses. Is the result different from what
that the true standard deviation of deflection for both types
you conjectured in part (a)?
of beam is .05 in. Because the alloy is more expensive, the
c. Does it appear that ICU stay for patients given the
engineer wishes to test at level .01 whether it has smaller
ventilation treatment is normally distributed? Explain
average deflection than the steel beam. What value of n is
your reasoning.
appropriate if the desired type II error probability is .05
d. Estimate true average length of stay for patients given the
when the difference in true average deflection favors the
ventilation treatment in a way that conveys information
alloy by .04 in.?
about precision and reliability.
14. The level of monoamine oxidase (MAO) activity in blood
10. An experiment was performed to compare the fracture
platelets (nmmg proteinh) was determined for each indi-
toughness of high-purity 18 Ni maraging steel with com-
vidual in a sample of 43 chronic schizophrenics, resulting in
mercial-purity steel of the same type (Corrosion Science,
x 5 2.69 and s 1 5 2.30 , as well as for 45 normal subjects,
1971: 723–736). For m 5 32 specimens, the sample aver-
resulting in y 5 6.35 and s 2 5 4.03 . Does this data strongly
age toughness was x 5 65.6 for the high-purity steel,
suggest that true average MAO activity for normal subjects
whereas for n 5 38 specimens of commercial steel
is more than twice the activity level for schizophrenics?
y 5 59.8 . Because the high-purity steel is more expensive,
Derive a test procedure and carry out the test using a 5 .01.
its use for a certain application can be justified only if its
[Hint: H 0 and H a here have a different form from the
fracture toughness exceeds that of commercial-purity steel
three standard cases. Let m 1 and m 2 refer to true average
by more than 5. Suppose that both toughness distributions
MAO activity for schizophrenics and normal subjects,
are normal.
respectively, and consider the parameter u 5 2m 2m .
a. Assuming that s 1 5 1.2 and s
2 5 1.1 , test the relevant
Write H and H in terms of u, estimate u, and derive sˆ ˆ
hypotheses using
a 5 .001 .
0 a u
(“Reduced Monoamine Oxidase Activity in Blood Plate-
b. Compute b for the test conducted in part (a) when m
lets from Schizophrenic Patients,” Nature, July 28, 1972:
1 2m 2 56 .
225–226).]
11. The level of lead in the blood was determined for a sam-
15. a. Show for the upper-tailed test with s and s
1 2 known that as either m or n increases, b decreases when
ple of 152 male hazardous-waste workers ages 20–30 and
also for a sample of 86 female workers, resulting in a
m
1 2m 2 . ⌬ 0 .
mean 6 standard error of
5.5 6 0.3 for the men and
b. For the case of equal sample sizes (m 5 n) and fixed a,
3.8 6 0.2 for the women (“Temporal Changes in Blood
what happens to the necessary sample size n as b is
Lead Levels of Hazardous Waste Workers in New Jersey,
decreased, where b is the desired type II error probabil-
1984–1987,” Environ. Monitoring and Assessment, 1993:
ity at a fixed alternative?
99–107). Calculate an estimate of the difference between true average blood lead levels for male and female work-
16. To decide whether two different types of steel have the same
ers in a way that provides information about reliability
true average fracture toughness values, n specimens of each
and precision.
type are tested, yielding the following results:
12. The accompanying table gives summary data on cube com-
pressive strength (Nmm 2 ) for concrete specimens made
Type
Sample Average
Sample SD
with a pulverized fuel-ash mix (“A Study of Twenty-Five-
Year-Old Pulverized Fuel Ash Concrete Used in Foundation
Structures,” Proc. Inst. Civ. Engrs., Mar. 1985: 149–165):
Calculate the P-value for the appropriate two-sample z test,
Age Sample Sample Sample assuming that the data was based on n 5 100 . Then repeat
the calculation for
. Is the small P-value for 7 68 26.99 4.89 n 5 400 indicative of a difference that has practical signif-
28 74 35.76 6.43 icance? Would you have been satisfied with just a report of the P-value? Comment briefly.
9.2 The Two-Sample t Test and Confidence Interval