2 The drying time of a certain type of paint under specified test conditions is known

Example 8.2 The drying time of a certain type of paint under specified test conditions is known

  to be normally distributed with mean value 75 min and standard deviation 9 min. Chemists have proposed a new additive designed to decrease average drying time. It is believed that drying times with this additive will remain normally distributed with s59 . Because of the expense associated with the additive, evidence should strongly suggest an improvement in average drying time before such a conclusion is adopted. Let m denote the true average drying time when the additive is used. The

  appropriate hypotheses are H 0 : m 5 75 versus H a : m , 75 . Only if H 0 can be

  rejected will the additive be declared successful and then be used.

  Experimental data is to consist of drying times from n 5 25 test specimens. Let X 1 , c, X 25 denote the 25 drying times—a random sample of size 25 from a nor-

  mal distribution with mean value m and standard deviation s59 . The sample mean

  drying time then has a normal distribution with expected value X m X 5m and stan- dard deviation s X 5 s 1n 5 9 125 5 1.80 . When H 0 is true, m X 5 75 , so only an value substantially less than 75 would strongly contradict H x 0 . A reasonable

  rejection region has the form xc , where the cutoff value c is suitably chosen. Consider the choice c 5 70.8 , so that the test procedure consists of test statistic X and rejection region x 70.8 . Because the rejection region consists only of small values of the test statistic, the test is said to be lower-tailed. Calculation of a and b

  now involves a routine standardization of X followed by reference to the standard

  normal probabilities of Appendix Table A.3:

  a 5 P(type I error) 5 P(H 0 is rejected when it is true)

  5 P(X 70.8 when X | normal with m X 5 75, s X 5 1.8)

  b(72) 5 P(type II error when m 5 72)

  5 P(H 0 is not rejected when it is false because m 5 72)

  5 P(X . 70.8 when X , normal with m X 5 72 and s X 5 1.8)

  b(70) 5 1 2 a

  1.8 b 5 .3300 b(67) 5 .0174

  CHAPTER 8 Tests of Hypotheses Based on a Single Sample

  For the specified test procedure, only 1 of all experiments carried out as described

  will result in H 0 being rejected when it is actually true. However, the chance of a type

  II error is very large when m 5 72 (only a small departure from H 0 ), somewhat less when m 5 70 , and quite small when m 5 67 (a very substantial departure from H 0 ).

  These error probabilities are illustrated in Figure 8.1. Notice that a is computed

  using the probability distribution of the test statistic when H 0 is true, whereas deter- mination of b requires knowing the test statistic’s distribution when H 0 is false.

  Shaded area

  73 75 70.8 (a)

  Shaded area

  72 75 70.8 (b)

  Shaded area

  70 75 70.8 (c)

  Figure 8.1 a and b illustrated for Example 8.2: (a) the distribution of when X m 5 75 ( H 0 true); (b) the distribution of when X m 5 72 ( H 0 false); (c) the distribution of when X m 5 70 ( H 0 false)

  As in Example 8.1, if the more realistic null hypothesis m 75 is considered,

  there is an a for each parameter value for which H 0 is true: a(75), a(75.8), a(76.5),

  and so on. It is easily verified, though, that a(75) is the largest of all these type I error probabilities. Focusing on the boundary value amounts to working explicitly with the “worst case.”

  ■

  The specification of a cutoff value for the rejection region in the examples just

  considered was somewhat arbitrary. Use of R 8 5 58, 9, c, 206 in Example 8.1 gave

  a 5 .102, b(.3) 5 .772 , and b(.5) 5 .132 . Many would think these error probabili- ties intolerably large. Perhaps they can be decreased by changing the cutoff value.

  Example 8.3 Let us use the same experiment and test statistic X as previously described in the auto- (Example 8.1

  mobile bumper problem but now consider the rejection region R 9 5 59, 10, c, 206.

  continued)

  Since X still has a binomial distribution with parameters n 5 20 and p,

  a 5 P(H 0 is rejected when p 5 .25)

  5 P(X 9 when X , Bin(20, .25)) 5 1 2 B(8; 20, .25) 5 .041

  8.1 Hypotheses and Test Procedures

  The type I error probability has been decreased by using the new rejection region. However, a price has been paid for this decrease:

  b(.3) 5 P(H 0 is not rejected when p 5 .3)

  5 P(X 8 when X , Bin(20, .3)) 5 B(8; 20, .3) 5 .887 b(.5) 5 B(8; 20, .5) 5 .252

  Both these b’s are larger than the corresponding error probabilities .772 and .132 for

  the region R 8 . In retrospect, this is not surprising; a is computed by summing over

  probabilities of test statistic values in the rejection region, whereas b is the proba- bility that X falls in the complement of the rejection region. Making the rejection region smaller must therefore decrease a while increasing b for any p . .25 .

  ■

  Example 8.4 The use of cutoff value c 5 70.8 in the paint-drying example resulted in a very small (Example 8.2

  value of a (.01) but rather large b’s. Consider the same experiment and test statistic

  continued)

  X with the new rejection region x 72 . Because is still normally distributed with X

  mean value m X 5m and s X 5 1.8 ,

  a 5 P(H 0 is rejected when it is true)

  5 P(X 72 when X , N(75, 1.8 2 ))

  b(72) 5 P(H 0 is not rejected when m 5 72)

  5 P(X . 72 when X is a normal rv with mean 72 and standard deviation 1.8)

  b(70) 5 1 2 a

  1.8 b 5 .1335 b(67) 5 .0027

  The change in cutoff value has made the rejection region larger (it includes more x values), resulting in a decrease in b for each fixed m less than 75. However, a for this new region has increased from the previous value .01 to approximately .05. If a type

  I error probability this large can be tolerated, though, the second region (c 5 72) is preferable to the first (c 5 70.8) because of the smaller b ’s.

  ■

  The results of these examples can be generalized in the following manner.

  PROPOSITION

  Suppose an experiment and a sample size are fixed and a test statistic is chosen. Then decreasing the size of the rejection region to obtain a smaller value of a results

  in a larger value of b for any particular parameter value consistent with H a .

  This proposition says that once the test statistic and n are fixed, there is no rejection region that will simultaneously make both a and all b’s small. A region must be cho- sen to effect a compromise between a and b.

  Because of the suggested guidelines for specifying H 0 and H a , a type I error is

  usually more serious than a type II error (this can always be achieved by proper choice of the hypotheses). The approach adhered to by most statistical practitioners is then to specify the largest value of a that can be tolerated and find a rejection region having that value of a rather than anything smaller. This makes b as small as possi- ble subject to the bound on a. The resulting value of a is often referred to as the significance level of the test. Traditional levels of significance are .10, .05, and .01,

  CHAPTER 8 Tests of Hypotheses Based on a Single Sample

  though the level in any particular problem will depend on the seriousness of a type I error—the more serious this error, the smaller should be the significance level. The corresponding test procedure is called a level A test (e.g., a level .05 test or a level .01 test). A test with significance level a is one for which the type I error prob- ability is controlled at the specified level.

  Example 8.5 Again let m denote the true average nicotine content of brand B cigarettes. The

  objective is to test H 0 : m 5 1.5 versus H a : m . 1.5 based on a random sample

  X 1 ,X 2 , c, X 32 of nicotine content. Suppose the distribution of nicotine content is

  known to be normal with s 5 .20 . Then is normally distributed with mean value X

  m X 5m and standard deviation s X 5 .20 132 5 .0354 .

  Rather than use itself as the test statistic, let’s standardize , assuming that X

  H 0 is true.

  Test statistic: Z 5

  s 1n

  Z expresses the distance between and its expected value when H X 0 is true as some

  number of standard deviations. For example, z53 results from an that is 3 stan-

  dard deviations larger than we would have expected it to be were H 0 true.

  Rejecting H 0 when “considerably” exceeds 1.5 is equivalent to rejecting H x 0

  when z “considerably” exceeds 0. That is, the form of the rejection region is zc .

  Let’s now determine c so that a 5 .05 . When H 0 is true, Z has a standard normal dis-

  tribution. Thus

  a 5 P(type I error) 5 P(rejecting H 0 when H 0 is true)

  5 P(Z c when Z , N(0, 1))

  The value c must capture upper-tail area .05 under the z curve. Either from Section 4.3 or directly from Appendix Table A.3, c5z .05 5 1.645 .

  Notice that z 1.645 is equivalent to x 2 1.5 (.0354)(1.645) , that is, x 1.56 . Then b involves the probability that

  X , 1.56 and can be calculated for

  any m greater than 1.5.

  ■

  EXERCISES Section 8.1 (1–14)

  1. For each of the following assertions, state whether it is a

  f. H 0 : m 5 120, H a : m 5 150

  legitimate statistical hypothesis and why:

  h. H 0 :p 1 2p 2 5 2.1, H a :p 1 2p 2 , 2.1

  c. H: s .20

  d. 1 s H: s 2 ,1

  3. To determine whether the pipe welds in a nuclear power

  e. H: X 2 Y 5 5

  plant meet specifications, a random sample of welds is

  f. H: l .01 , where l is the parameter of an exponential

  selected, and tests are conducted on each weld in the sample.

  distribution used to model component lifetime

  Weld strength is measured as the force required to break the

  2. For the following pairs of assertions, indicate which do not

  weld. Suppose the specifications state that mean strength of

  comply with our rules for setting up hypotheses and why (the

  welds should exceed 100 lbin 2 ; the inspection team decides

  subscripts 1 and 2 differentiate between quantities for two

  to test H 0 : m 5 100 versus H a : m . 100 . Explain why it

  different populations or samples):

  might be preferable to use this H a rather than m , 100 .

  a. H 0 : m 5 100, H a : m . 100

  4. Let m denote the true average radioactivity level (picocuries

  b. H 0 : s 5 20, H a : s 20

  per liter). The value 5 pCiL is considered the dividing line

  c. H 0 : p 2 .25, H a : p 5 .25

  between safe and unsafe water. Would you recommend testing

  d. H 0 :m 1 2m 2 5 25, H a :m 1 2m 2 . 100

  H 0 :m55 versus or versus H a :m.5 H 0 :m55 H a : m , 5?

  e. H 0 :S 1 5S ,H :S 2 2 a 1 S 2

  8.1 Hypotheses and Test Procedures

  Explain your reasoning. [Hint: Think about the consequences

  b. In the context of this problem situation, describe what the

  of a type I and type II error for each possibility.]

  type I and type II errors are.

  5. Before agreeing to purchase a large order of polyethylene

  c. What is the probability distribution of the test statistic X

  sheaths for a particular type of high-pressure oil-filled sub-

  when H 0 is true? Use it to compute the probability of a

  marine power cable, a company wants to see conclusive evi-

  type I error.

  dence that the true standard deviation of sheath thickness is

  d. Compute the probability of a type II error for the selected

  less than .05 mm. What hypotheses should be tested, and

  region when p 5 .3 , again when p 5 .4 , and also for both

  why? In this context, what are the type I and type II errors?

  p 5 .6 and . p 5 .7

  e. Using the selected region, what would you conclude if 6

  6. Many older homes have electrical systems that use fuses

  of the 25 queried favored company 1?

  rather than circuit breakers. A manufacturer of 40-amp fuses wants to make sure that the mean amperage at which

  10. A mixture of pulverized fuel ash and Portland cement to be

  its fuses burn out is in fact 40. If the mean amperage is lower

  used for grouting should have a compressive strength of more

  than 40, customers will complain because the fuses require 2 than 1300 KNm . The mixture will not be used unless exper- replacement too often. If the mean amperage is higher than

  imental evidence indicates conclusively that the strength

  40, the manufacturer might be liable for damage to an elec-

  specification has been met. Suppose compressive strength for

  trical system due to fuse malfunction. To verify the amperage

  specimens of this mixture is normally distributed with

  of the fuses, a sample of fuses is to be selected and inspected.

  s 5 60 . Let m denote the true average compressive strength.

  If a hypothesis test were to be performed on the resulting

  a. What are the appropriate null and alternative hypotheses?

  data, what null and alternative hypotheses would be of inter-

  b. Let X denote the sample average compressive strength

  est to the manufacturer? Describe type I and type II errors in

  for n 5 10 randomly selected specimens. Consider the

  the context of this problem situation.

  test procedure with test statistic X and rejection region x 1331.26 . What is the probability distribution of the

  7. Water samples are taken from water used for cooling as it is

  test statistic when H 0 is true? What is the probability of a

  being discharged from a power plant into a river. It has been

  type I error for the test procedure?

  determined that as long as the mean temperature of the dis-

  c. What is the probability distribution of the test statistic

  charged water is at most 150°F, there will be no negative effects

  when

  ? Using the test procedure of part (b),

  on the river’s ecosystem. To investigate whether the plant is in

  what is the probability that the mixture will be judged

  compliance with regulations that prohibit a mean discharge

  unsatisfactory when in fact m 5 1350 (a type II error)?

  water temperature above 150°, 50 water samples will be taken

  d. How would you change the test procedure of part (b) to

  at randomly selected times and the temperature of each sample

  obtain a test with significance level .05? What impact

  recorded. The resulting data will be used to test the hypotheses

  would this change have on the error probability of part (c)?

  H 0 : m 5 1508 versus H a : m . 1508 . In the context of this situ-

  e. Consider the standardized test statistic

  ation, describe type I and type II errors. Which type of error

  Z 5 (X 2 1300)(s 1n) 5 (X 2 1300)13.42 . What

  would you consider more serious? Explain.

  are the values of Z corresponding to the rejection region

  8. A regular type of laminate is currently being used by a manu-

  of part (b)?

  facturer of circuit boards. A special laminate has been devel-

  11. The calibration of a scale is to be checked by weighing a

  oped to reduce warpage. The regular laminate will be used on

  10-kg test specimen 25 times. Suppose that the results of dif-

  one sample of specimens and the special laminate on another

  ferent weighings are independent of one another and that the

  sample, and the amount of warpage will then be determined for

  weight on each trial is normally distributed with s 5 .200 kg.

  each specimen. The manufacturer will then switch to the spe-

  Let m denote the true average weight reading on the scale.

  cial laminate only if it can be demonstrated that the true aver-

  a. What hypotheses should be tested?

  age amount of warpage for that laminate is less than for the

  b. Suppose the scale is to be recalibrated if either

  regular laminate. State the relevant hypotheses, and describe

  x 10.1032 or x 9.8968 . What is the probability that

  the type I and type II errors in the context of this situation.

  recalibration is carried out when it is actually unnecessary?

  9. Two different companies have applied to provide cable tele-

  c. What is the probability that recalibration is judged un-

  vision service in a certain region. Let p denote the proportion

  necessary when in fact m 5 10.1 ? When m 5 9.8 ?

  of all potential subscribers who favor the first company over

  d. Let z 5 (x 2 10)(s 1n) . For what value c is the rejec-

  the second. Consider testing H 0 : p 5 .5 versus H a : p 2 .5

  tion region of part (b) equivalent to the “two-tailed”

  based on a random sample of 25 individuals. Let X denote the

  region of either zc or z 2c ?

  number in the sample who favor the first company and x rep-

  e. If the sample size were only 10 rather than 25, how should

  resent the observed value of X.

  the procedure of part (d) be altered so that a 5 .05 ?

  a. Which of the following rejection regions is most appro-

  f. Using the test of part (e), what would you conclude from

  priate and why?

  the following sample data?

  R 1 5 5x: x 7 or x 186, R 2 5 5x: x 86,

  R 3 5 5x: x 176

  CHAPTER 8 Tests of Hypotheses Based on a Single Sample

  g. Reexpress the test procedure of part (b) in terms of the

  e. Let Z 5 (X 2 120)(s 1n) . What is the significance

  standardized test statistic Z 5 (X 2 10)(s 1n) .

  level for the rejection region 5z: z 22.336 ? For the

  region ? 5z: z 22.886

  12. A new design for the braking system on a certain type of car

  has been proposed. For the current system, the true average

  13. Let X 1 , c, X n denote a random sample from a normal pop-

  braking distance at 40 mph under specified conditions is

  ulation distribution with a known value of s.

  known to be 120 ft. It is proposed that the new design be

  a. For testing the hypotheses H 0 :m5m 0 versus

  implemented only if sample data strongly indicates a reduc-

  H a :m.m 0 (where m 0 is a fixed number), show that the

  tion in true average braking distance for the new design.

  test with test statistic X and rejection region

  a. Define the parameter of interest and state the relevant

  xm 0 1 2.33s 1n has significance level .01.

  hypotheses.

  b. Suppose the procedure of part (a) is used to test

  b. Suppose braking distance for the new system is normally

  H 0 :mm 0 versus . H a :m.m 0 If m 0 5 100, n 5 25 , distributed with s 5 10 . Let denote the sample average X and s55 , what is the probability of committing a type I

  braking distance for a random sample of 36 observations.

  error when m 5 99 ? When m 5 98 ? In general, what can

  Which of the following three rejection regions is appro-

  be said about the probability of a type I error when the

  priate: R 1 5 5x: x 124.806, R 2 5 5x: x 115.206,

  actual value of m is less than m 0 ? Verify your assertion.

  R 3 5 5x: either x 125.13 or x 114.876 ?

  14. Reconsider the situation of Exercise 11 and suppose the

  c. What is the significance level for the appropriate region

  rejection region is

  5x: x 10.1004 or x 9.89406 5

  of part (b)? How would you change the region to obtain

  5z: z 2.51 or z 22.656

  a test with a 5 .001 ?

  a. What is a for this procedure?

  d. What is the probability that the new design is not imple-

  b. What is b when m 5 10.1 ? When

  m 5 9.9

  ? Is this

  mented when its true average braking distance is actually

  desirable?

  115 ft and the appropriate region from part (b) is used?