The Level of Significance of a Statistical Test

We illustrate the calculation of a level of significance with several examples. EXAMPLE 5.12 Refer to Example 5.7. a. Determine the level of significance p-value for the statistical test and reach a decision concerning the research hypothesis using a ⫽ .01. b. If the preset value of a is .05 instead of .01, does your decision concern- ing H a change? Solution a. The null and alternative hypotheses are H : m ⱕ 380 H a : m ⬎ 380 From the sample data, with s replacing s, the computed value of the test statistic is The level of significance for this test i.e., the weight of evidence for reject- ing H is the probability of observing a value of greater than or equal to 390 assuming that the null hypothesis is true; that is, m ⫽ 380. This value can be computed by using the z-value of the test statistic, 2.01, because p-value ⫽ P ⱖ 390, assuming m ⫽ 380 ⫽ Pz ⱖ 2.01 Referring to Table 1 in the Appendix, Pz ⱖ 2.01 ⫽ 1 ⫺ Pz ⬍ 2.01 ⫽ 1 ⫺ .9778 ⫽ .0222. This value is shown by the shaded area in Figure 5.13. Because the p-value is greater than a .0222 ⬎ .01, we fail to reject H and conclude that the data do not support the research hypothesis. y y z ⫽ y ⫺ 380 s 兾 1n ⫽ 390 ⫺ 380 35.2 兾 150 ⫽ 2.01 Decision Rule for Hypothesis Testing Using the p -Value 1. If the p-value ⱕ a, then reject H . 2. If the p-value ⬎ a, then fail to reject H . FIGURE 5.13 Level of significance for Example 5.12 z = 0 f z z 2.01 p = .0222 b. Another person examines the same data but with a preset value for a ⫽ .05. This person is willing to support a higher risk of a Type I error, and hence the decision is to reject H because the p-value is less than a .0222 ⱕ .05. It is important to emphasize that the value of a used in the decision rule is preset and not selected after calculating the p-value. As we can see from Example 5.12, the level of significance represents the probability of observing a sample outcome more contradictory to H than the ob- served sample result. The smaller the value of this probability, the heavier the weight of the sample evidence against H . For example, a statistical test with a level of sig- nificance of p ⫽ .01 shows more evidence for the rejection of H than does another statistical test with p ⫽ .20. EXAMPLE 5.13 Refer to Example 5.9. Using a preset value of a ⫽ .05, is there sufficient evidence in the data to support the research hypothesis? Solution The null and alternative hypotheses are H : m ⱖ 33 H a : m ⬍ 33 From the sample data, with s replacing s, the computed value of the test statistic is The level of significance for this test statistic is computed by determining which values of are more extreme to H than the observed . Because H a specifies m less than 33, the values of that would be more extreme to H are those values less than 31.2, the observed value. Thus, p-value ⫽ P ⱕ 31.2, assuming m ⫽ 33 ⫽ Pz ⱕ ⫺1.27 ⫽ .1020 There is considerable evidence to support H . More precisely, p-value ⫽ .1020 ⬎ .05 ⫽ a, and hence we fail to reject H . Thus, we conclude that there is insufficient evidence p-value ⫽ .1020 to support the research hypothesis. Note that this is ex- actly the same conclusion reached using the traditional approach. For two-tailed tests, H a : m ⫽ m , we still determine the level of significance by computing the probability of obtaining a sample having a value of the test statistic that is more contradictory to H than the observed value of the test statistic. How- ever, for two-tailed research hypotheses, we compute this probability in terms of the magnitude of the distance from to the null value of m because both values of much less than m and values of much larger than m contradict m ⫽ m . Thus, the level of significance is written as p-value ⫽ P 兩 ⫺ m 兩 ⱖ observed 兩 ⫺ m 兩 ⫽ P兩z兩 ⱖ 兩computed z兩 ⫽ 2Pz ⱖ 兩computed z兩 To summarize, the level of significance p-value can be computed as Case 1 Case 2 Case 3 H : m ⱕ m H : m ⱖ m H : m ⫽ m H a : m ⬎ m H a : m ⬍ m H a : m ⫽ m p-value: Pz ⱖ computed z Pz ⱕ computed z 2Pz ⱖ 兩computed z兩 EXAMPLE 5.14 Refer to Example 5.6. Using a preset value of a ⫽ .01, is there sufficient evidence in the data to support the research hypothesis? Solution The null and alternative hypotheses are H : m = 190 H a : m ⫽ 190 y y y y y y y y y z ⫽ y ⫺ m s 兾 1n ⫽ 31.2 ⫺ 33 8.4 兾 135 ⫽ ⫺ 1.27 From the sample data, with s replacing s, the computed value of the test statistic is The level of significance for this test statistic is computed using the formula on page 248. p-value ⫽ 2Pz ⱖ 兩 computed z| ⫽ 2Pz ⱖ |⫺2.60| ⫽ 2Pz ⱖ 2.60 ⫽ 21 ⫺ .9953 ⫽ .0047 Because the p-value is very small, there is very little evidence to support H . More precisely, p-value ⫽ .0047 ⱕ .05 ⫽ a, and hence we reject H . Thus, there is suffi- cient evidence p-value ⫽ .0047 to support the research hypothesis and conclude that the mean cholesterol level differs from 190. Note that this is exactly the same conclusion reached using the traditional approach. There is much to be said in favor of this approach to hypothesis testing. Rather than reaching a decision directly, the statistician or person performing the statistical test presents the experimenter with the weight of evidence for rejecting the null hypothesis. The experimenter can then draw his or her own conclusion. Some experimenters reject a null hypothesis if p ⱕ .10, whereas others require p ⱕ .05 or p ⱕ .01 for rejecting the null hypothesis. The experimenter is left to make the decision based on what he or she believes is enough evidence to indicate rejection of the null hypothesis. Many professional journals have followed this approach by reporting the re- sults of a statistical test in terms of its level of significance. Thus, we might read that a particular test was significant at the p ⫽ .05 level or perhaps the p ⬍ .01 level. By reporting results this way, the reader is left to draw his or her own conclusion. One word of warning is needed here. The p-value of .05 has become a magic level, and many seem to feel that a particular null hypothesis should not be rejected unless the test achieves the .05 level or lower. This has resulted in part from the decision-based approach with a preset at .05. Try not to fall into this trap when reading journal articles or reporting the results of your statistical tests. After all, sta- tistical significance at a particular level does not dictate importance or practical sig- nificance. Rather, it means that a null hypothesis can be rejected with a specified low risk of error. For example, suppose that a company is interested in determining whether the average number of miles driven per car per month for the sales force has risen above 2,600. Sample data from 400 cars show that ⫽ 2,640 and s ⫽ 35. For these data, the z statistic for H : m ⫽ 2,600 is z ⫽ 22.86 based on s ⫽ 35; the level of significance is p ⬍ .0000000001. Thus, even though there has only been a 1.5 increase in the average monthly miles driven for each car, the result is highly statistically significant. Is this increase of any practical significance? Probably not. What we have done is proved conclusively that the mean m has increased slightly. The company should not just examine the size of the p-value. It is very im- portant to also determine the size of the difference between the null value of the population mean m and the estimated value of the population mean . This differ- ence is called the estimated effect size. In this example the estimated effect size would be ⫺ m ⫽ 2,640 ⫺ 2,600 ⫽ 40 miles driven per month. This is the quantity that the company should consider when attempting to determine if the change in the population mean has practical significance. Throughout the text we will conduct statistical tests from both the decision- based approach and from the level-of-significance approach to familiarize you with both avenues of thought. For either approach, remember to consider the practical significance of your findings after drawing conclusions based on the statistical test. y y y z ⫽ y ⫺ m s 兾1n ⫽ 178.2 ⫺ 190 45.3 兾1100 ⫽ ⫺ 2.60

5.7 Inferences about M for a Normal Population,

S Unknown The estimation and test procedures about m presented earlier in this chapter were based on the assumption that the population variance was known or that we had enough observations to allow s to be a reasonable estimate of s. In this section, we present a test that can be applied when s is unknown, no matter what the sample size, provided the population distribution is approximately normal. In Section 5.8, we will provide inference techniques for the situation where the population distribution is nonnormal. Consider the following example. Researchers would like to deter- mine the average concentration of a drug in the bloodstream 1 hour after it is given to patients suffering from a rare disease. For this situation, it might be impossible to obtain a random sample of 30 or more observations at a given time. What test procedure could be used in order to make inferences about m? W. S. Gosset faced a similar problem around the turn of the century. As a chemist for Guinness Breweries, he was asked to make judgments on the mean quality of various brews, but he was not supplied with large sample sizes to reach his conclusions. Gosset thought that when he used the test statistic with s replaced by s for small sample sizes, he was falsely rejecting the null hy- pothesis H : m ⫽ m at a slightly higher rate than that specified by a. This problem intrigued him, and he set out to derive the distribution and percentage points of the test statistic for n ⬍ 30. For example, suppose an experimenter sets a at a nominal level—say, .05. Then he or she expects falsely to reject the null hypothesis approximately 1 time in 20. However, Gosset proved that the actual probability of a Type I error for this test was somewhat higher than the nominal level designated by a. He published the results of his study under the pen name Student, because at that time it was against company policy for him to publish his results in his own name. The quantity is called the t statistic and its distribution is called the Student’s t distribution or, simply, Student’s t. See Figure 5.14. Although the quantity possesses a t distribution only when the sample is selected from a normal popula- tion, the t distribution provides a reasonable approximation to the distribution of when the sample is selected from a population with a mound-shaped distribution. We summarize the properties of t here. y ⫺ m s 兾 1n y ⫺ m s 兾 1n y ⫺ m s 兾 1n y ⫺ m s 兾 1n z ⫽ y ⫺ m s 兾 1n Student’s t FIGURE 5.14 PDFs of two t distributions and a standard normal distribution .45 .40 .35 .30 .25 .20 .15 .10 .05 – 6 – 4 – 2 2 4 6 Normal distribution t distribution, df = 5 t distribution, df = 2 y PDFs Properties of Student’s t Distribution 1. There are many different t distributions. We specify a particular one by a parameter called the degrees of freedom df. See Figure 5.14. 2. The t distribution is symmetrical about 0 and hence has mean equal to 0, the same as the z distribution.

3.

The t distribution has variance df 兾df ⫺ 2, and hence is more variable than the z distribution, which has variance equal to 1. See Figure 5.14. 4. As the df increases, the t distribution approaches the z distribution. Note that as df increases, the variance df 兾df ⫺ 2 approaches 1. 5. Thus, with we conclude that t has a t distribution with df ⫽ n ⫺ 1, and, as n in- creases, the distribution of t approaches the distribution of z. t ⫽ y ⫺ m s 兾 1n The phrase ‘‘degrees of freedom’’ sounds mysterious now, but the idea will eventually become second nature to you. The technical definition requires ad- vanced mathematics, which we will avoid; on a less technical level, the basic idea is that degrees of freedom are pieces of information for estimating s using s. The standard deviation s for a sample of n measurements is based on the deviations Because always, if n ⫺ 1 of the deviations are known, the last nth is fixed mathematically to make the sum equal 0. It is therefore noninforma- tive. Thus, in a sample of n measurements there are n ⫺ 1 pieces of information degrees of freedom about s. A second method of explaining degrees of freedom is to recall that s measures the dispersion of the population values about m, so prior to estimating s we must first estimate m. Hence, the number of pieces of in- formation degrees of freedom in the data that can be used to estimate s is n ⫺ 1, the number of original data values minus the number of parameters estimated prior to estimating s. Because of the symmetry of t, only upper-tail percentage points probabili- ties or areas of the distribution of t have been tabulated; these appear in Table 2 ⌺ y i ⫺ y ⫽ 0 y i ⫺ y.