2 The drying time of a certain type of paint under specified test conditions is known
Example 8.2 The drying time of a certain type of paint under specified test conditions is known
to be normally distributed with mean value 75 min and standard deviation 9 min. Chemists have proposed a new additive designed to decrease average drying time. It is believed that drying times with this additive will remain normally distributed with s59 . Because of the expense associated with the additive, evidence should strongly suggest an improvement in average drying time before such a conclusion is adopted. Let m denote the true average drying time when the additive is used. The
appropriate hypotheses are H 0 : m 5 75 versus H a : m , 75 . Only if H 0 can be
rejected will the additive be declared successful and then be used.
Experimental data is to consist of drying times from n 5 25 test specimens. Let X 1 , c, X 25 denote the 25 drying times—a random sample of size 25 from a nor-
mal distribution with mean value m and standard deviation s59 . The sample mean
drying time then has a normal distribution with expected value X m X 5m and stan- dard deviation s X 5 s 1n 5 9 125 5 1.80 . When H 0 is true, m X 5 75 , so only an value substantially less than 75 would strongly contradict H x 0 . A reasonable
rejection region has the form xc , where the cutoff value c is suitably chosen. Consider the choice c 5 70.8 , so that the test procedure consists of test statistic X and rejection region x 70.8 . Because the rejection region consists only of small values of the test statistic, the test is said to be lower-tailed. Calculation of a and b
now involves a routine standardization of X followed by reference to the standard
normal probabilities of Appendix Table A.3:
a 5 P(type I error) 5 P(H 0 is rejected when it is true)
5 P(X 70.8 when X | normal with m X 5 75, s X 5 1.8)
b(72) 5 P(type II error when m 5 72)
5 P(H 0 is not rejected when it is false because m 5 72)
5 P(X . 70.8 when X , normal with m X 5 72 and s X 5 1.8)
b(70) 5 1 2 a
1.8 b 5 .3300 b(67) 5 .0174
CHAPTER 8 Tests of Hypotheses Based on a Single Sample
For the specified test procedure, only 1 of all experiments carried out as described
will result in H 0 being rejected when it is actually true. However, the chance of a type
II error is very large when m 5 72 (only a small departure from H 0 ), somewhat less when m 5 70 , and quite small when m 5 67 (a very substantial departure from H 0 ).
These error probabilities are illustrated in Figure 8.1. Notice that a is computed
using the probability distribution of the test statistic when H 0 is true, whereas deter- mination of b requires knowing the test statistic’s distribution when H 0 is false.
Shaded area
73 75 70.8 (a)
Shaded area
72 75 70.8 (b)
Shaded area
70 75 70.8 (c)
Figure 8.1 a and b illustrated for Example 8.2: (a) the distribution of when X m 5 75 ( H 0 true); (b) the distribution of when X m 5 72 ( H 0 false); (c) the distribution of when X m 5 70 ( H 0 false)
As in Example 8.1, if the more realistic null hypothesis m 75 is considered,
there is an a for each parameter value for which H 0 is true: a(75), a(75.8), a(76.5),
and so on. It is easily verified, though, that a(75) is the largest of all these type I error probabilities. Focusing on the boundary value amounts to working explicitly with the “worst case.”
■
The specification of a cutoff value for the rejection region in the examples just
considered was somewhat arbitrary. Use of R 8 5 58, 9, c, 206 in Example 8.1 gave
a 5 .102, b(.3) 5 .772 , and b(.5) 5 .132 . Many would think these error probabili- ties intolerably large. Perhaps they can be decreased by changing the cutoff value.
Example 8.3 Let us use the same experiment and test statistic X as previously described in the auto- (Example 8.1
mobile bumper problem but now consider the rejection region R 9 5 59, 10, c, 206.
continued)
Since X still has a binomial distribution with parameters n 5 20 and p,
a 5 P(H 0 is rejected when p 5 .25)
5 P(X 9 when X , Bin(20, .25)) 5 1 2 B(8; 20, .25) 5 .041
8.1 Hypotheses and Test Procedures
The type I error probability has been decreased by using the new rejection region. However, a price has been paid for this decrease:
b(.3) 5 P(H 0 is not rejected when p 5 .3)
5 P(X 8 when X , Bin(20, .3)) 5 B(8; 20, .3) 5 .887 b(.5) 5 B(8; 20, .5) 5 .252
Both these b’s are larger than the corresponding error probabilities .772 and .132 for
the region R 8 . In retrospect, this is not surprising; a is computed by summing over
probabilities of test statistic values in the rejection region, whereas b is the proba- bility that X falls in the complement of the rejection region. Making the rejection region smaller must therefore decrease a while increasing b for any p . .25 .
■
Example 8.4 The use of cutoff value c 5 70.8 in the paint-drying example resulted in a very small (Example 8.2
value of a (.01) but rather large b’s. Consider the same experiment and test statistic
continued)
X with the new rejection region x 72 . Because is still normally distributed with X
mean value m X 5m and s X 5 1.8 ,
a 5 P(H 0 is rejected when it is true)
5 P(X 72 when X , N(75, 1.8 2 ))
b(72) 5 P(H 0 is not rejected when m 5 72)
5 P(X . 72 when X is a normal rv with mean 72 and standard deviation 1.8)
b(70) 5 1 2 a
1.8 b 5 .1335 b(67) 5 .0027
The change in cutoff value has made the rejection region larger (it includes more x values), resulting in a decrease in b for each fixed m less than 75. However, a for this new region has increased from the previous value .01 to approximately .05. If a type
I error probability this large can be tolerated, though, the second region (c 5 72) is preferable to the first (c 5 70.8) because of the smaller b ’s.
■
The results of these examples can be generalized in the following manner.
PROPOSITION
Suppose an experiment and a sample size are fixed and a test statistic is chosen. Then decreasing the size of the rejection region to obtain a smaller value of a results
in a larger value of b for any particular parameter value consistent with H a .
This proposition says that once the test statistic and n are fixed, there is no rejection region that will simultaneously make both a and all b’s small. A region must be cho- sen to effect a compromise between a and b.
Because of the suggested guidelines for specifying H 0 and H a , a type I error is
usually more serious than a type II error (this can always be achieved by proper choice of the hypotheses). The approach adhered to by most statistical practitioners is then to specify the largest value of a that can be tolerated and find a rejection region having that value of a rather than anything smaller. This makes b as small as possi- ble subject to the bound on a. The resulting value of a is often referred to as the significance level of the test. Traditional levels of significance are .10, .05, and .01,
CHAPTER 8 Tests of Hypotheses Based on a Single Sample
though the level in any particular problem will depend on the seriousness of a type I error—the more serious this error, the smaller should be the significance level. The corresponding test procedure is called a level A test (e.g., a level .05 test or a level .01 test). A test with significance level a is one for which the type I error prob- ability is controlled at the specified level.
Example 8.5 Again let m denote the true average nicotine content of brand B cigarettes. The
objective is to test H 0 : m 5 1.5 versus H a : m . 1.5 based on a random sample
X 1 ,X 2 , c, X 32 of nicotine content. Suppose the distribution of nicotine content is
known to be normal with s 5 .20 . Then is normally distributed with mean value X
m X 5m and standard deviation s X 5 .20 132 5 .0354 .
Rather than use itself as the test statistic, let’s standardize , assuming that X
H 0 is true.
Test statistic: Z 5
s 1n
Z expresses the distance between and its expected value when H X 0 is true as some
number of standard deviations. For example, z53 results from an that is 3 stan-
dard deviations larger than we would have expected it to be were H 0 true.
Rejecting H 0 when “considerably” exceeds 1.5 is equivalent to rejecting H x 0
when z “considerably” exceeds 0. That is, the form of the rejection region is zc .
Let’s now determine c so that a 5 .05 . When H 0 is true, Z has a standard normal dis-
tribution. Thus
a 5 P(type I error) 5 P(rejecting H 0 when H 0 is true)
5 P(Z c when Z , N(0, 1))
The value c must capture upper-tail area .05 under the z curve. Either from Section 4.3 or directly from Appendix Table A.3, c5z .05 5 1.645 .
Notice that z 1.645 is equivalent to x 2 1.5 (.0354)(1.645) , that is, x 1.56 . Then b involves the probability that
X , 1.56 and can be calculated for
any m greater than 1.5.
■
EXERCISES Section 8.1 (1–14)
1. For each of the following assertions, state whether it is a
f. H 0 : m 5 120, H a : m 5 150
legitimate statistical hypothesis and why:
h. H 0 :p 1 2p 2 5 2.1, H a :p 1 2p 2 , 2.1
c. H: s .20
d. 1 s H: s 2 ,1
3. To determine whether the pipe welds in a nuclear power
e. H: X 2 Y 5 5
plant meet specifications, a random sample of welds is
f. H: l .01 , where l is the parameter of an exponential
selected, and tests are conducted on each weld in the sample.
distribution used to model component lifetime
Weld strength is measured as the force required to break the
2. For the following pairs of assertions, indicate which do not
weld. Suppose the specifications state that mean strength of
comply with our rules for setting up hypotheses and why (the
welds should exceed 100 lbin 2 ; the inspection team decides
subscripts 1 and 2 differentiate between quantities for two
to test H 0 : m 5 100 versus H a : m . 100 . Explain why it
different populations or samples):
might be preferable to use this H a rather than m , 100 .
a. H 0 : m 5 100, H a : m . 100
4. Let m denote the true average radioactivity level (picocuries
b. H 0 : s 5 20, H a : s 20
per liter). The value 5 pCiL is considered the dividing line
c. H 0 : p 2 .25, H a : p 5 .25
between safe and unsafe water. Would you recommend testing
d. H 0 :m 1 2m 2 5 25, H a :m 1 2m 2 . 100
H 0 :m55 versus or versus H a :m.5 H 0 :m55 H a : m , 5?
e. H 0 :S 1 5S ,H :S 2 2 a 1 S 2
8.1 Hypotheses and Test Procedures
Explain your reasoning. [Hint: Think about the consequences
b. In the context of this problem situation, describe what the
of a type I and type II error for each possibility.]
type I and type II errors are.
5. Before agreeing to purchase a large order of polyethylene
c. What is the probability distribution of the test statistic X
sheaths for a particular type of high-pressure oil-filled sub-
when H 0 is true? Use it to compute the probability of a
marine power cable, a company wants to see conclusive evi-
type I error.
dence that the true standard deviation of sheath thickness is
d. Compute the probability of a type II error for the selected
less than .05 mm. What hypotheses should be tested, and
region when p 5 .3 , again when p 5 .4 , and also for both
why? In this context, what are the type I and type II errors?
p 5 .6 and . p 5 .7
e. Using the selected region, what would you conclude if 6
6. Many older homes have electrical systems that use fuses
of the 25 queried favored company 1?
rather than circuit breakers. A manufacturer of 40-amp fuses wants to make sure that the mean amperage at which
10. A mixture of pulverized fuel ash and Portland cement to be
its fuses burn out is in fact 40. If the mean amperage is lower
used for grouting should have a compressive strength of more
than 40, customers will complain because the fuses require 2 than 1300 KNm . The mixture will not be used unless exper- replacement too often. If the mean amperage is higher than
imental evidence indicates conclusively that the strength
40, the manufacturer might be liable for damage to an elec-
specification has been met. Suppose compressive strength for
trical system due to fuse malfunction. To verify the amperage
specimens of this mixture is normally distributed with
of the fuses, a sample of fuses is to be selected and inspected.
s 5 60 . Let m denote the true average compressive strength.
If a hypothesis test were to be performed on the resulting
a. What are the appropriate null and alternative hypotheses?
data, what null and alternative hypotheses would be of inter-
b. Let X denote the sample average compressive strength
est to the manufacturer? Describe type I and type II errors in
for n 5 10 randomly selected specimens. Consider the
the context of this problem situation.
test procedure with test statistic X and rejection region x 1331.26 . What is the probability distribution of the
7. Water samples are taken from water used for cooling as it is
test statistic when H 0 is true? What is the probability of a
being discharged from a power plant into a river. It has been
type I error for the test procedure?
determined that as long as the mean temperature of the dis-
c. What is the probability distribution of the test statistic
charged water is at most 150°F, there will be no negative effects
when
? Using the test procedure of part (b),
on the river’s ecosystem. To investigate whether the plant is in
what is the probability that the mixture will be judged
compliance with regulations that prohibit a mean discharge
unsatisfactory when in fact m 5 1350 (a type II error)?
water temperature above 150°, 50 water samples will be taken
d. How would you change the test procedure of part (b) to
at randomly selected times and the temperature of each sample
obtain a test with significance level .05? What impact
recorded. The resulting data will be used to test the hypotheses
would this change have on the error probability of part (c)?
H 0 : m 5 1508 versus H a : m . 1508 . In the context of this situ-
e. Consider the standardized test statistic
ation, describe type I and type II errors. Which type of error
Z 5 (X 2 1300)(s 1n) 5 (X 2 1300)13.42 . What
would you consider more serious? Explain.
are the values of Z corresponding to the rejection region
8. A regular type of laminate is currently being used by a manu-
of part (b)?
facturer of circuit boards. A special laminate has been devel-
11. The calibration of a scale is to be checked by weighing a
oped to reduce warpage. The regular laminate will be used on
10-kg test specimen 25 times. Suppose that the results of dif-
one sample of specimens and the special laminate on another
ferent weighings are independent of one another and that the
sample, and the amount of warpage will then be determined for
weight on each trial is normally distributed with s 5 .200 kg.
each specimen. The manufacturer will then switch to the spe-
Let m denote the true average weight reading on the scale.
cial laminate only if it can be demonstrated that the true aver-
a. What hypotheses should be tested?
age amount of warpage for that laminate is less than for the
b. Suppose the scale is to be recalibrated if either
regular laminate. State the relevant hypotheses, and describe
x 10.1032 or x 9.8968 . What is the probability that
the type I and type II errors in the context of this situation.
recalibration is carried out when it is actually unnecessary?
9. Two different companies have applied to provide cable tele-
c. What is the probability that recalibration is judged un-
vision service in a certain region. Let p denote the proportion
necessary when in fact m 5 10.1 ? When m 5 9.8 ?
of all potential subscribers who favor the first company over
d. Let z 5 (x 2 10)(s 1n) . For what value c is the rejec-
the second. Consider testing H 0 : p 5 .5 versus H a : p 2 .5
tion region of part (b) equivalent to the “two-tailed”
based on a random sample of 25 individuals. Let X denote the
region of either zc or z 2c ?
number in the sample who favor the first company and x rep-
e. If the sample size were only 10 rather than 25, how should
resent the observed value of X.
the procedure of part (d) be altered so that a 5 .05 ?
a. Which of the following rejection regions is most appro-
f. Using the test of part (e), what would you conclude from
priate and why?
the following sample data?
R 1 5 5x: x 7 or x 186, R 2 5 5x: x 86,
R 3 5 5x: x 176
CHAPTER 8 Tests of Hypotheses Based on a Single Sample
g. Reexpress the test procedure of part (b) in terms of the
e. Let Z 5 (X 2 120)(s 1n) . What is the significance
standardized test statistic Z 5 (X 2 10)(s 1n) .
level for the rejection region 5z: z 22.336 ? For the
region ? 5z: z 22.886
12. A new design for the braking system on a certain type of car
has been proposed. For the current system, the true average
13. Let X 1 , c, X n denote a random sample from a normal pop-
braking distance at 40 mph under specified conditions is
ulation distribution with a known value of s.
known to be 120 ft. It is proposed that the new design be
a. For testing the hypotheses H 0 :m5m 0 versus
implemented only if sample data strongly indicates a reduc-
H a :m.m 0 (where m 0 is a fixed number), show that the
tion in true average braking distance for the new design.
test with test statistic X and rejection region
a. Define the parameter of interest and state the relevant
xm 0 1 2.33s 1n has significance level .01.
hypotheses.
b. Suppose the procedure of part (a) is used to test
b. Suppose braking distance for the new system is normally
H 0 :mm 0 versus . H a :m.m 0 If m 0 5 100, n 5 25 , distributed with s 5 10 . Let denote the sample average X and s55 , what is the probability of committing a type I
braking distance for a random sample of 36 observations.
error when m 5 99 ? When m 5 98 ? In general, what can
Which of the following three rejection regions is appro-
be said about the probability of a type I error when the
priate: R 1 5 5x: x 124.806, R 2 5 5x: x 115.206,
actual value of m is less than m 0 ? Verify your assertion.
R 3 5 5x: either x 125.13 or x 114.876 ?
14. Reconsider the situation of Exercise 11 and suppose the
c. What is the significance level for the appropriate region
rejection region is
5x: x 10.1004 or x 9.89406 5
of part (b)? How would you change the region to obtain
5z: z 2.51 or z 22.656
a test with a 5 .001 ?
a. What is a for this procedure?
d. What is the probability that the new design is not imple-
b. What is b when m 5 10.1 ? When
m 5 9.9
? Is this
mented when its true average braking distance is actually
desirable?
115 ft and the appropriate region from part (b) is used?