19 The fuel efficiency (mpg) of any particular new vehicle under specified driving con-

Example 8.19 The fuel efficiency (mpg) of any particular new vehicle under specified driving con-

  ditions may not be identical to the EPA figure that appears on the vehicle’s sticker. Suppose that four different vehicles of a particular type are to be selected and driven over a certain course, after which the fuel efficiency of each one is to be deter- mined. Let m denote the true average fuel efficiency under these conditions.

  Consider testing H 0 : m 5 20 versus H 0 : m . 20 using the one-sample t test based

  on the resulting sample. Since the test is based on n2153 degrees of freedom, the P-value for an upper-tailed test is the area under the t curve with 3 df to the right of the calculated t.

  Let’s first suppose that the null hypothesis is true. We asked Minitab to generate 10,000 different samples, each containing 4 observations, from a normal population distribution with mean value m 5 20 and standard deviation s52 . The first sample and resulting summary quantities were

  336

  CHAPTER 8 Tests of Hypotheses Based on a Single Sample

  1 x 5 20.830, x 2 5 22.232, x 3 5 20.276, x 4 5 17.718 20.264 2 20

  x 5 20.264

  s 5 1.8864 t 5

  5 .2799

  .1.8864 14

  The P-value is the area under the 3-df t curve to the right of .2799, which according to Minitab is .3989. Using a significance level of .05, the null hypothesis would of course not be rejected. The values of t for the next four samples were 21.7591, .6082 , 2.7020 , and 3.1053, with corresponding P-values .912, .293, .733, and .0265.

  Figure 8.11(a) shows a histogram of the 10,000 P-values from this simula- tion experiment. About 4.5 of these P-values are in the first class interval from

  0 to .05. Thus when using a significance level of .05, the null hypothesis is rejected in roughly 4.5 of these 10,000 tests. If we continued to generate samples and carry out the test for each sample at significance level .05, in the long run 5 of

  the P-values would be in the first class interval. This is because when H 0 is true

  and a test with significance level .05 is used, by definition the probability of reject-

  ing H 0 is .05. Looking at the histogram, it appears that the distribution of P-values is rela-

  tively flat. In fact, it can be shown that when H 0 is true, the probability distribution

  of the P-value is a uniform distribution on the interval from 0 to 1. That is, the den- sity curve is completely flat on this interval, and thus must have a height of 1 if the total area under the curve is to be 1. Since the area under such a curve to the left of

  .05 is (.05)(1) 5 .05 , we again have that the probability of rejecting H 0 when it is

  true that it is .05, the chosen significance level.

  Now consider what happens when H 0 is false because

  . We again had

  Minitab generate 10,000 different samples of size 4 (each from a normal distribution with m 5 21 and s52 ), calculate t 5 (x 2 20)(s24) for each one, and then determine the P-value. The first such sample resulted in x 5 20.6411, s 5 .49637, t 5 2.5832 , P-value 5 .0408 . Figure 8.11(b) gives a histogram of the resulting P-values. The shape of this histogram is quite different from that of Figure 8.11(a)— there is a much greater tendency for the P-value to be small (closer to 0) when

  m 5 21 than when m 5 20 . Again H 0 is rejected at significance level .05 whenever

  the P-value is at most .05 (in the first class interval). Unfortunately, this is the case for only about 19 of the P-values. So only about 19 of the 10,000 tests correctly reject the null hypothesis; for the other 81, a type II error is committed. The diffi- culty is that the sample size is quite small and 21 is not very different from the value asserted by the null hypothesis.

  Figure 8.11(c) illustrates what happens to the P-value when H 0 is false because

  m 5 22 (still with n54 and s52 ). The histogram is even more concentrated toward values close to 0 than was the case when m 5 21 . In general, as m moves further to the right of the null value 20, the distribution of the P-value will become more and more concentrated on values close to 0. Even here a bit fewer than 50 of the P-values are smaller than .05. So it is still slightly more likely than not that the null hypothesis is incorrectly not rejected. Only for values of m much larger than 20 (e.g., at least 24 or 25) is it highly likely that the P-value will be smaller than .05 and thus give the correct conclusion.

  The big idea of this example is that because the value of any test statistic is random, the P-value will also be a random variable and thus have a distribution. The further the actual value of the parameter is from the value specified by the null hypothesis, the more the distribution of the P-value will be concentrated on values

  close to 0 and the greater the chance that the test will correctly reject H 0 (corre-

  sponding to smaller b).

  Figure 8.11 P-value simulation results for Example 8.19

  ■

  EXERCISES Section 8.4 (47–62)

  47. For which of the given P-values would the null hypothesis

  50. Newly purchased tires of a certain type are supposed to be

  be rejected when performing a level .05 test?

  filled to a pressure of 30 lbin 2 . Let m denote the true

  average pressure. Find the P-value associated with each

  d. .047

  e. .148

  given z statistic value for testing H 0 : m 5 30 versus

  48. Pairs of P-values and significance levels, a, are given. For

  H a : m 2 30 .

  each pair, state whether the observed P-value would lead to

  a. 2.10

  b. 21.75 c. 2.55 d. 1.41 e. 25.3

  rejection of H 0 at the given significance level.

  51. Give as much information as you can about the P-value of a

  a. P-value 5 .084, a 5 .05

  t test in each of the following situations:

  b. P-value 5 .003, a 5 .001

  a. Upper-tailed test,

  df 5 8, t 5 2.0

  c. P-value 5 .498, a 5 .05

  b. Lower-tailed test,

  df 5 11, t 5 22.4

  d. P-value 5 .084, a 5 .10

  c. Two-tailed test,

  df 5 15, t 5 21.6

  e. P-value 5 .039, a 5 .01

  d. Upper-tailed test,

  df 5 19, t 5 2.4

  f. P-value 5 .218, a 5 .10

  e. Upper-tailed test,

  df 5 5, t 5 5.0

  49. Let m denote the mean reaction time to a certain stimulus.

  f. Two-tailed test,

  df 5 40, t 5 24.8

  For a large-sample z test of H 0 :m55 versus H a :m.5 ,

  52. The paint used to make lines on roads must reflect enough

  find the P-value associated with each of the given values of

  light to be clearly visible at night. Let m denote the true

  the z test statistic.

  average reflectometer reading for a new type of paint under

  consideration. A test of H 0 : m 5 20 versus H a : m . 20 will

  CHAPTER 8 Tests of Hypotheses Based on a Single Sample

  be based on a random sample of size n from a normal pop-

  b. If the true percentage of “early yields” is actually 50

  ulation distribution. What conclusion is appropriate in each

  (so that the theoretical point is the median of the yield

  of the following situations?

  distribution) and a level .01 test is used, what is the prob-

  a. n 5 15, t 5 3.2, a 5 .05

  ability that the company concludes a modification of the

  b. n 5 9, t 5 1.8, a 5 .01

  process is necessary?

  c. n 5 24, t 5 2.2

  57. The article “Heavy Drinking and Polydrug Use Among

  53. Let m denote true average serum receptor concentration for

  College Students” (J. of Drug Issues, 2008: 445–466) stated

  all pregnant women. The average for all women is known to

  that 51 of the 462 college students in a sample had a lifetime

  be 5.63. The article “Serum Transferrin Receptor for the

  abstinence from alcohol. Does this provide strong evidence

  Detection of Iron Deficiency in Pregnancy” (Amer. J. of

  for concluding that more than 10 of the population sam-

  Clinical Nutr., 1991: 1077–1081) reports that

  pled had completely abstained from alcohol use? Test the

  P-value . .10 for a test of H 0 : m 5 5.63 versus

  appropriate hypotheses using the P-value method. [Note:

  H a : m 2 5.63 based on n 5 176 pregnant women. Using a

  The article used more advanced statistical methods to study

  significance level of .01, what would you conclude?

  the use of various drugs among students characterized as

  54. The article “Analysis of Reserve and Regular Bottlings: Why

  light, moderate, and heavy drinkers.]

  Pay for a Difference Only the Critics Claim to Notice?”

  58. A random sample of soil specimens was obtained, and the

  (Chance, Summer 2005, pp. 9–15) reported on an experiment

  amount of organic matter () in the soil was determined for

  to investigate whether wine tasters could distinguish between

  each specimen, resulting in the accompanying data (from

  more expensive reserve wines and their regular counterparts.

  “Engineering Properties of Soil,” Soil Science, 1998: 93–102).

  Wine was presented to tasters in four containers labeled A, B,

  C, and D, with two of these containing the reserve wine and

  the other two the regular wine. Each taster randomly selected

  three of the containers, tasted the selected wines, and indi-

  cated which of the three heshe believed was different from

  the other two. Of the n 5 855 tasting trials, 346 resulted in

  The values of the sample mean, sample standard deviation,

  correct distinctions (either the one reserve that differed from

  and (estimated) standard error of the mean are 2.481, 1.616,

  the two regular wines or the one regular wine that differed

  and .295, respectively. Does this data suggest that the true

  from the two reserves). Does this provide compelling evi-

  average percentage of organic matter in such soil is some-

  dence for concluding that tasters of this type have some abil-

  thing other than 3? Carry out a test of the appropriate

  ity to distinguish between reserve and regular wines? State

  hypotheses at significance level .10 by first determining the

  and test the relevant hypotheses using the P-value approach.

  P-value. Would your conclusion be different if a 5 .05 had

  Are you particularly impressed with the ability of tasters to

  been used? [Note: A normal probability plot of the data

  distinguish between the two types of wine?

  shows an acceptable pattern in light of the reasonably large

  55. An aspirin manufacturer fills bottles by weight rather than

  sample size.]

  by count. Since each bottle should contain 100 tablets, the

  59. The accompanying data on cube compressive strength

  average weight per tablet should be 5 grains. Each of

  (MPa) of concrete specimens appeared in the article

  100 tablets taken from a very large lot is weighed, resulting

  “Experimental Study of Recycled Rubber-Filled High-

  in a sample average weight per tablet of 4.87 grains and a

  Strength Concrete” (Magazine of Concrete Res., 2009:

  sample standard deviation of .35 grain. Does this informa-

  549–556):

  tion provide strong evidence for concluding that the company is not filling its bottles as advertised? Test the

  appropriate hypotheses using a 5 .01 by first computing

  the P-value and then comparing it to the specified

  a. Is it plausible that the compressive strength for this type

  significance level.

  of concrete is normally distributed?

  56. Because of variability in the manufacturing process, the

  b. Suppose the concrete will be used for a particular appli-

  actual yielding point of a sample of mild steel subjected to

  cation unless there is strong evidence that true average

  increasing stress will usually differ from the theoretical

  strength is less than 100 MPa. Should the concrete be

  yielding point. Let p denote the true proportion of samples

  used? Carry out a test of appropriate hypotheses using

  that yield before their theoretical yielding point. If on the

  the P-value method.

  basis of a sample it can be concluded that more than 20 of

  60. A certain pen has been designed so that true average writing

  all specimens yield before the theoretical point, the produc-

  lifetime under controlled conditions (involving the use of a

  tion process will have to be modified.

  writing machine) is at least 10 hours. A random sample of

  a. If 15 of 60 specimens yield before the theoretical point,

  18 pens is selected, the writing lifetime of each is deter-

  what is the P-value when the appropriate test is used, and

  mined, and a normal probability plot of the resulting data

  what would you advise the company to do?

  supports the use of a one-sample t test.

  8.5 Some Comments on Selecting a Test

  a. What hypotheses should be tested if the investigators

  recalibration necessary? Carry out a test of the relevant

  believe a priori that the design specification has been

  hypotheses using the P-value approach with a 5 .05 .

  satisfied?

  62. The relative conductivity of a semiconductor device is

  b. What conclusion is appropriate if the hypotheses of part

  determined by the amount of impurity “doped” into the

  (a) are tested, t 5 22.3 , and a 5 .05 ?

  device during its manufacture. A silicon diode to be used for

  c. What conclusion is appropriate if the hypotheses of part

  a specific purpose requires an average cut-on voltage of .60

  (a) are tested, t 5 21.8 , and a 5 .01 ?

  V, and if this is not achieved, the amount of impurity must

  d. What should be concluded if the hypotheses of part (a)

  be adjusted. A sample of diodes was selected and the cut-on

  are tested and t 5 23.6 ?

  voltage was determined. The accompanying SAS output

  61. A spectrophotometer used for measuring CO concentration

  resulted from a request to test the appropriate hypotheses.

  [ppm (parts per million) by volume] is checked for accuracy

  T by taking readings on a manufactured gas (called span gas) Prob. . T u

  in which the CO concentration is very precisely controlled at 70 ppm. If the readings suggest that the spectrophotome-

  [Note: SAS explicitly tests H 0 :m50 , so to test H 0 : m 5 .60,

  ter is not working properly, it will have to be recalibrated.

  the null value .60 must be subtracted from each x i ; the reported

  Assume that if it is properly calibrated, measured concen-

  mean is then the average of the (x i 2 .60) values. Also, SAS’s

  tration for span gas samples is normally distributed. On the

  P-value is always for a two-tailed test.] What would be con-

  basis of the six readings—85, 77, 82, 68, 72, and 69—is

  cluded for a significance level of .01? .05? .10?

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

19 819 7

STUDI AREA TRAFFIC CONTROL SYSTEM (ATCS) PADA PERSIMPANGAN DI KOTA MALANG (JALAN A. YANI – L. A. SUCIPTO – BOROBUDUR)

6 78 2

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

1 29 9

FENOLOGI KEDELAI BERDASARKAN KRITERIA FEHR-CAVINESS PADA DELAPAN PERSILANGAN SERTA EMPAT TETUA KEDELAI (Glycine max. L. Merrill)

0 46 16

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

7 202 3

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

0 63 87

The Correlation between students vocabulary master and reading comprehension

16 145 49

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

8 140 133

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

9 128 37

Transmission of Greek and Arabic Veteri

0 1 22