A Statistical Test for M

being proposed by the person conducting the study, called the research hypothesis, m ⬎ 60 in our example. The second theory is the negation of this hypothesis, called the null hypothesis, m ⱕ 60 in our example. The goal of the study is to decide whether the data tend to support the research hypothesis. A statistical test is based on the concept of proof by contradiction and is composed of the five parts listed here. research hypothesis null hypothesis statistical test 1. Research hypothesis also called the alternative hypothesis, denoted by H a . 2. Null hypothesis, denoted by H .

3.

Test statistics, denoted by T.S. 4. Rejection region, denoted by R.R. 5. Check assumptions and draw conclusions. For example, the Texas AM agricultural extension service wants to determine whether the mean yield per acre in bushels for a particular variety of soybeans has increased during the current year over the mean yield in the previous 2 years when m was 520 bushels per acre. The first step in setting up a statistical test is determining the proper specification of H and H a . The following guidelines will be helpful: 1. The statement that m equals a specific value will always be included in H . The particular value specified for m is called its null value and is denoted m . 2. The statement about m that the researcher is attempting to support or detect with the data from the study is the research hypothesis, H a .

3.

The negation of H a is the null hypothesis, H . 4. The null hypothesis is presumed correct unless there is overwhelming evidence in the data that the research hypothesis is supported. In our example, m is 520. The research statement is that yield in the current year has increased above 520; that is, H a : m ⬎ 520. Note that we will include 520 in the null hypothesis. Thus, the null hypothesis, the negation of H a , is H : m ⱕ 520. To evaluate the research hypothesis, we take the information in the sample data and attempt to determine whether the data support the research hypothesis or the null hypothesis, but we will give the benefit of the doubt to the null hypothesis. After stating the null and research hypotheses, we then obtain a random sample of 1-acre yields from farms throughout the state. The decision to state whether or not the data support the research hypothesis is based on a quantity computed from the sample data called the test statistic. If the population distribution is determined to be mound shaped, a logical choice as a test statistic for m is or some function of . If we select as the test statistic, we know that the sampling distribution of is approximately normal with a mean m and standard deviation provided the population distribution is normal or the sample size is fairly large. We are attempting to decide between H a : m ⬎ 520 or H : m ⱕ 520. The decision will be to either reject H or fail to reject H . In developing our decision rule, we will assume that m ⫽ 520, the null value of m. We will now determine the values of , called the rejection region, which we are very unlikely to observe if m ⫽ 520 or if m is any other value in H . The rejection region contains the values of that support the research hypothesis and contradict the null hypothesis, hence the region of values for that reject the null hypothesis. The rejection region will be the values of in the upper tail of the null distribution m ⫽ 520 of . See Figure 5.5. y y y y y s 兾1n, y y y y test statistic rejection region As with any two-way decision process, we can make an error by falsely rejecting the null hypothesis or by falsely accepting the null hypothesis. We give these errors the special names Type I error and Type II error. Type I error Type II error FIGURE 5.5 Assuming that H is true, contradictory values of are in the upper tail. y = 520 f y Acceptance region Rejection region y Contradictory values of y DEFINITION 5.1 A Type I error is committed if we reject the null hypothesis when it is true. The probability of a Type I error is denoted by the symbol a. DEFINITION 5.2 A Type II error is committed if we accept the null hypothesis when it is false and the research hypothesis is true. The probability of a Type II error is denoted by the symbol b Greek letter beta. The two-way decision process is shown in Table 5.3 with corresponding probabilities associated with each situation. Although it is desirable to determine the acceptance and rejection regions to simultaneously minimize both a and b, this is not possible. The probabilities associated with Type I and Type II errors are inversely related. For a fixed sample size n, as we change the rejection region to increase a, then b decreases, and vice versa. To alleviate what appears to be an impossible bind, the experimenter specifies a tolerable probability for a Type I error of the statistical test. Thus, the experimenter may choose a to be .01, .05, .10, and so on. Specification of a value for a then locates the rejection region. Determination of the associated probability of a Type II error is more complicated and will be delayed until later in the chapter. Let us now see how the choice of a locates the rejection region. Returning to our soybean example, we will reject the null hypothesis for large values of the sample mean . Suppose we have decided to take a sample of n ⫽ 36 one-acre plots, and from these data we compute ⫽ 573 and s ⫽ 124. Can we conclude that the mean yield for all farms is above 520? Before answering this question we must specify A. If we are willing to take the risk that 1 time in 40 we would incorrectly reject the null hypothesis, then y y TABLE 5.3 Two-way decision process Null Hypothesis Decision True False Reject H Type I error Correct a 1 ⫺ b Accept H Correct Type II error 1 ⫺ a b specifying A a ⫽ 1 兾40 ⫽ .025. An appropriate rejection region can be specified for this value of a by referring to the sampling distribution of . Assuming that m ⫽ 520 and n is large enough so that s can be replaced by s, then is normally distributed, with m ⫽ 520 and . Because the shaded area of Figure 5.6a corresponds to a, locating a rejection region with an area of .025 in the right tail of the distribution of is equivalent to determining the value of z that has an area .025 to its right. Referring to Table 1 in the Appendix, this value of z is 1.96. Thus, the rejection region for our example is located 1.96 standard errors above the mean m ⫽ 520. If the observed value of is greater than 1.96 standard errors above m ⫽ 520, we reject the null hypothesis, as shown in Figure 5.6a. The reason that we only need to consider m ⫽ 520 in computing a is that for all other values of m in H — that is, m ⬍ 520—the probability of Type I error would be smaller than the probability of Type I error when m ⫽ 520. This can be seen by examining Figure 5.6b The area under the curve, centered at 500, in the rejection region is less than the area associated with the curve centered at 520. Thus, a for m ⫽ 500 is less than the a for m ⫽ 520—that is, a 500 ⬍ a 520 ⫽ .025. This conclusion can be extended to any value of m less than 520—that is, all values of m in H : m ⱕ 520. EXAMPLE 5.5 Set up all the parts of a statistical test for the soybean example and use the sample data to reach a decision on whether to accept or reject the null hypothesis. Set a ⫽ .025. Assume that s can be estimated by s. Solution The first four parts of the test are as follows. H : m ⱕ 520 H a : m ⬎ 520 T.S.: R.R.: For a ⫽ .025, reject the null hypothesis if lies more than 1.96 standard errors above m ⫽ 520. y y y 1.96s 兾1n y s 兾1n ⬇ 124兾136 ⫽ 20.67 y y FIGURE 5.6a Rejection region for the soybean example when a ⫽ .025 = 520 Rejection region Area equals .025. y 1.96 n f y FIGURE 5.6b Size of rejection region when m ⫽ 500 500 520 f y Rejection region y 1.96 n The computed value of is 573. To determine the number of standard errors that lies above m ⫽ 520, we compute a z score for using the formula Substituting into the formula with s replacing s, we have Check assumptions and draw conclusions: With a sample size of n ⫽ 36, the Central Limit Theorem should hold for the distribution of . Because the observed value of lies more than 1.96, in fact 2.56, standard errors above the hypothesized mean m ⫽ 520, we reject the null hypothesis in favor of the research hypothesis and conclude that the average soybean yield per acre is greater than 520. The statistical test conducted in Example 5.5 is called a one-tailed test because the rejection region is located in only one tail of the distribution of . If our research hypothesis is H a : m ⬍ 520, small values of would indicate rejection of the null hypothesis. This test would also be one-tailed, but the rejection region would be located in the lower tail of the distribution of . Figure 5.7 displays the rejection region for the alternative hypothesis H a : m ⬍ 520 when a ⫽ .025. We can formulate a two-tailed test for the research hypothesis H a : m ⫽ 520, where we are interested in detecting whether the mean yield per acre of soybeans is greater or less than 520. Clearly both large and small values of would contradict the null hypothesis, and we would locate the rejection region in both tails of the distribution of . A two-tailed rejection region for H a : m ⫽ 520 and a ⫽ .05 is shown in Figure 5.8. y y y y y y y z ⫽ y ⫺ m s 兾1n ⫽ 573 ⫺ 520 124 兾136 ⫽ 2.56 z ⫽ y ⫺ m s 兾1n y y y one-tailed test FIGURE 5.7 Rejection region for H a : m ⬍ 520 when a ⫽ .025 for the soybean example = 520 f y Rejection region y = .025 1.96 n FIGURE 5.8 Two-tailed rejection region for H a : m ⫽ 520 when a ⫽ .05 for the soybean example = 520 Rejection region y Area = .025 Area = .025 Rejection region 1.96 n 1.96 n f y two-tailed test EXAMPLE 5.6 Elevated serum cholesterol levels are often associated with cardiovascular disease. Cholesterol levels are often thought to be associated with type of diet, amount of exercise, and genetically related factors. A recent study examined cholesterol levels among recent immigrants from China. Researchers did not have any prior information about this group and wanted to evaluate whether their mean cholesterol level differed from the mean cholesterol level of middle-aged women in the United States. The distribution of cholesterol levels in U.S. women aged 30 –50 is known to be approximately normally distributed with a mean of 190 mgdL. A random sample of n ⫽ 100 female Chinese immigrants aged 30 –50 who had immigrated to the United States in the past year was selected from INS records. They were administered blood tests that yielded cholesterol levels having a mean of 178.2 mgdL and a standard deviation of 45.3 mgdL. Is there significant evidence in the data to demonstrate that the mean cholesterol level of the new immigrants differs from 190 mgdL? Solution The researchers were interested in determining if the mean cholesterol level was different from 190; thus, the research hypothesis for the statistical test is H a : m ⫽ 190. The null hypothesis is the negation of the research hypothesis: H : m ⫽ 190. With a sample size of n ⫽ 100, the Central Limit Theorem should hold and hence the sampling distribution of is approximately normal. Using a ⫽ .05, z a 兾2 ⫽ z .025 ⫽ 1.96. The two-tailed rejection region for this test is given by lower rejection ⫽ 181.1 upper rejection ⫽ 198.9 The two regions are shown in Figure 5.9. m ⫾ 1.96s 兾1n, i.e., 190 ⫾ 1.9645.3兾1100 i.e., 190 ⫾ 8.88 i.e., y FIGURE 5.9 Rejection region for H a : m ⫽ 190 when a ⫽ .05 = 190 y Area = .025 Area = .025 Rejection region Rejection region 1.96 n 1.96 n 181.1 198.9 f y We can observe from Figure 5.9 that ⫽ 178.2 falls into the lower rejection region. Therefore, we conclude there is significant evidence in the data that the mean cholesterol level of middle-aged Chinese immigrants differs from 190 mgdL. Alternatively, we can determine how many standard errors lies away from m ⫽ 190 and compare this value to z a 兾2 ⫽ z .025 ⫽ 1.96. From the data, we compute The observed value for lies more than 1.96 standard errors below the specified mean value of 190, so we reject the null hypothesis in favor of the alternative H a : m ⫽ 190. We have thus reached the same conclusion as we reached using the rejection region. The two methods will always result in the same conclusion. y z ⫽ y ⫺ m s 兾1n ⫽ 178.2 ⫺ 190 45.3 兾1100 ⫽ ⫺ 2.60 y y The mechanics of the statistical test for a population mean can be greatly simplified if we use z rather than as a test statistic. Using H : m ⱕ m where m is some specified value H a : m ⬎ m and the test statistic then for a ⫽ .025 we reject the null hypothesis if z ⱖ 1.96—that is, if lies more than 1.96 standard errors above the mean. Similarly, for a ⫽ .05 and H a : m ⫽ m , we reject the null hypothesis if the computed value of z ⱖ 1.96 or the computed value of z ⱕ ⫺1.96. This is equivalent to rejecting the null hypothesis if the computed value of 兩z兩 ⱖ 1.96. The statistical test for a population mean m is summarized next. Three different sets of hypotheses are given with their corresponding rejection regions. In a given situation, you will choose only one of the three alternatives with its associated rejection region. The tests given are appropriate only when the population distribution is normal with known s. The rejection region will be approximately the correct region even when the population distribution is nonnormal provided the sample size is large; in most cases, n ⱖ 30 is sufficient. We can then apply the results from the Central Limit Theorem with the sample standard deviation s replacing s to conclude that the sampling distribution of is approximately normal. z ⫽ y ⫺ m 兾s兾1n y z ⫽ y ⫺ m s 兾1n y test for a population mean EXAMPLE 5.7 As a part of her evaluation of municipal employees, the city manager audits the parking tickets issued by city parking officers to determine the number of tickets that were contested by the car owner and found to be improperly issued. In past years, the number of improperly issued tickets per officer had a normal distribution with mean m ⫽ 380 and s ⫽ 35.2. Because there has recently been a change in Summary of a Statistical Test for ␮ with a Normal Population Distribution ␴ Known or Large Sample Size n Hypotheses: Case 1. H : m ⱕ m vs. H a : m ⬎ m right-tailed test Case 2. H : m ⱖ m vs. H a : m ⬍ m left-tailed test Case 3. H : m ⫽ m vs. H a : m ⫽ m two-tailed test T.S.: R.R.: For a probability a of a Type I error, Case 1. Reject . Case 2. Reject H if z ⱕ ⫺z a . Case 3. Reject H if 兩z兩 ⱖ z a 兾2 . Note: These procedures are appropriate if the population distribution is normally distributed with s known. In most situations, if n ⱖ 30, then the Central Limit Theorem allows us to use these procedures when the population distribution is nonnormal. Also, if n ⱖ 30, then we can replace s with the sample standard deviation s. The situation in which n ⬍ 30 is presented later in this chapter. H if z ⱖ z a z ⫽ y ⫺ m s 兾1n the city’s parking regulations, the city manager suspects that the mean number of improperly issued tickets has increased. An audit of 50 randomly selected officers is conducted to test whether there has been an increase in improper tickets. Use the sample data given here and a ⫽ .01 to test the research hypothesis that the mean number of improperly issued tickets is greater than 380. The audit generates the following data: n ⫽ 50 and Solution Using the sample data with a ⫽ .01, the 5 parts of a statistical test are as follows. H : m ⱕ 380 H a : m ⬎ 380 T.S.: R.R.: For a ⫽ .01 and a right-tailed test, we reject H if z ⱖ z .01 , where z .01 ⫽ 2.33. Check assumptions and draw conclusions: Because the observed value of z, 2.01, does not exceed 2.33, we might be tempted to accept the null hypothesis that m ⱕ 380. The only problem with this conclusion is that we do not know b, the probability of incorrectly accepting the null hypothesis. To hedge somewhat in situations in which z does not fall in the rejection region and b has not been calculated, we recommend stating that there is insufficient evidence to reject the null hypothesis. To reach a conclusion about whether to accept H , the experimenter would have to compute b. If b is small for reasonable alternative values of m, then H is accepted. Otherwise, the experimenter should conclude that there is insufficient evidence to reject the null hypothesis. We can illustrate the computation of B, the probability of a Type II error, using the data in Example 5.7. If the null hypothesis is H : m ⱕ 380, the probability of incorrectly accepting H will depend on how close the actual mean is to 380. For example, if the actual mean number of improperly issued tickets is 400, we would expect b to be much smaller than if the actual mean is 387. The closer the actual mean is to m the more likely we are to obtain data having a value in the acceptance region. The whole process of determining b for a test is a ‘‘what-if’’ type of process. In practice, we compute the value of b for a number of values of m in the alternative hypothesis H a and plot b versus m in a graph called the OC curve. Alternatively, tests of hypotheses are evaluated by computing the probability that the test rejects false null hypotheses, called the power of the test. We note that power ⫽ 1 ⫺ b. The plot of power versus the value of m is called the power curve. We attempt to design tests that have large values of power and hence small values for b. Let us suppose that the actual mean number of improper tickets is 395 per officer. What is b? With the null and research hypotheses as before, H : m ⱕ 380 H a : m ⬎ 380 and with a ⫽ .01, we use Figure 5.10a to display b. The shaded portion of Figure 5.10a represents b, as this is the probability of falling in the acceptance region when the null hypothesis is false and the actual value of m is 395. The power of the test for detecting that the actual value of m is 395 is 1 ⫺ b, the area in the rejection region. y y z ⫽ y ⫺ m s 兾1n ⫽ 390 ⫺ 380 35.2 兾150 ⫽ 10 35.2 兾7.07 ⫽ 2.01 y ⫽ 390. computing B OC curve power power curve Let us consider two other possible values for m—namely, 387 and 400. The corresponding values of b are shown as the shaded portions of Figures 5.10b and c, respectively; power is the unshaded portion in the rejection region of Figure 5.10b and c. The three situations illustrated in Figure 5.10 confirm what we alluded to earlier; that is, the probability of a Type II error b decreases and hence power increases the further m lies away from the hypothesized means under H . The following notation will facilitate the calculation of b. Let m denote the null value of m and let m a denote the actual value of the mean in H a . Let bm a be the probability of a Type II error if the actual value of the mean is m a and let PWRm a be the power at m a . Note that PWRm a equals 1 ⫺ bm a . Although we never really know the actual mean, we select feasible values of m and determine b for each of these values. This will allow us to determine the probability of a Type II error occurring if one of these feasible values happens to be the actual value of the mean. The decision whether or not to accept H depends on the magnitude of b for one or more reasonable values for m a . Alternatively, researchers calculate the power curve for a test of hypotheses. Recall that the power of the test at m a PWRm a is the probability the test will detect that H is false when the actual value of m is m a . Hence, we want tests of hypotheses in which PWRm a is large when m a is far from m . FIGURE 5.10 The probability b of a Type II error when m ⫽ 395, 387 and 400 n n n For a one-tailed test, H : m ⱕ m or H : m ⱖ m , the value of b at m a is the probability that z is less than This probability is written as The value of bm a is found by looking up the probability corresponding to the number in Table 1 in the Appendix. Formulas for b are given here for one- and two-tailed tests. Examples using these formulas follow. z a ⫺ 冷 m ⫺ m a | 兾s兾1n b m a ⫽ P Bz ⬍ z a ⫺ 冷 m ⫺ m a 冷 s 兾1n R z a ⫺ 冷 m ⫺ m a 冷 s 兾 1n Calculation of ␤ for a One- or Two-Tailed Test about ␮ 1. One-tailed test: PWRm a ⫽ 1 ⫺ bm a 2. Two-tailed test: PWRm a ⫽ 1 ⫺ bm a b m a ⬇ P 冢 z ⱕ z a 兾2 ⫺ 冷 m ⫺ m a | s 兾1n 冣 b m a ⫽ P 冢 z ⱕ z a ⫺ 冷 m ⫺ m a 冷 s 兾1n 冣 EXAMPLE 5.8 Compute b and power for the test in Example 5.7 if the actual mean number of improperly issued tickets is 395. Solution The research hypothesis for Example 5.7 was H a : m ⬎ 380. Using a ⫽ .01 and the computing formula for b with m ⫽ 380 and m a ⫽ 395, we have ⫽ P[z ⬍ 2.33 ⫺ 3.01] ⫽ P[z ⬍ ⫺.68] Referring to Table 1 in the Appendix, the area corresponding to z ⫽ ⫺.68 is .2483. Hence, b395 ⫽ .2483 and PWR395 ⫽ 1 ⫺ .2483 ⫽ .7517. Previously, when did not fall in the rejection region, we concluded that there was insufficient evidence to reject H because b was unknown. Now when falls in the acceptance region, we can compute b corresponding to one or more alternative values for m that appear reasonable in light of the experimental setting. Then provided we are willing to tolerate a probability of falsely accepting the null hypothesis equal to the computed value of b for the alternative values of m considered, our decision is to accept the null hypothesis. Thus, in Example 5.8, if the actual mean number of improperly issued tickets is 395, then there is about a .25 probability 1 in 4 chance of accepting the hypothesis that m is less than or equal to 380 when in fact m equals 395. The city manager would have to analyze the y y b 395 ⫽ P Bz ⬍ z .01 ⫺ 冷 m ⫺ m a 冷 s 兾 1n R ⫽ PBz ⬍ 2.33 ⫺ 冷 380 ⫺ 395 冷 35.2 兾 150 R consequence of making such a decision. If the risk was acceptable then she could state that the audit has determined that the mean number of improperly issued tickets has not increased. If the risk is too great, then the city manager would have to expand the audit by sampling more than 50 officers. In the next section, we will describe how to select the proper value for n. EXAMPLE 5.9 As the public concern about bacterial infections increases, a soap manufacture quickly promoted a new product to meet the demand for an antibacterial soap. This new product has a substantially higher price than the “ordinary soaps” on the market. A consumer testing agency notes that ordinary soap also kills bacteria and questions whether the new antibacterial soap is a substantial improvement over ordinary soap. A procedure for examining the ability of soap to kill bacteria is to place a solution containing the soap onto a petri dish and then add E. coli bacteria. After a 24-hour incubation period, a count of the number of bacteria colonies on the dish is taken. From previous studies using many different brands of ordinary soaps, the mean bacteria count is 33 for ordinary soap products. The consumer group runs the test on the antibacterial soap using 35 petri dishes. The results from the 35 petri dishes is a mean bacterial count of 31.2 with a standard deviation of 8.4. Do the data provide sufficient evidence that the antibacterial soap is more effective than ordinary soap in reducing bacteria counts? Use a ⫽ .05. Solution Let m be the population mean bacterial count for the antibacterial soap and s be the population standard deviation. The 5 parts to our statistical test are as follows. H : m ⱖ 33 H a : m ⬍ 33 T.S.: R.R.: For a ⫽ .05, we will reject the null hypothesis if z ⱕ ⫺ z .05 ⫽ ⫺ 1.645. Check assumptions and draw conclusions: With n ⫽ 35, the sample size is proba- bly large enough that the Central Limit Theorem would justify our assuming that the sampling distribution of is approximately normal. Because the observed value of z, ⫺1.27, is not less than ⫺1.645, the test statistic does not fall in the rejection region. We reserve judgment on accepting H until we calculate the chance of a Type II error b for several values of m falling in the alternative hypothesis, values of m less than 33. In other words, we conclude that there is insufficient evidence to reject the null hypothesis and hence there is not sufficient evidence that the antibacterial soap is more effective than ordinary soap. How- ever, we next need to calculate the chance that the test may have resulted in a Type II error. EXAMPLE 5.10 Refer to Example 5.9. Suppose that the consumer testing agency thinks that the manufacturer of the antibacterial soap will take legal action if the antibacterial soap has a population mean bacterial count that is considerably less than 33, say 28. Thus, the consumer group wants to know the probability of a Type II error in its test if the population mean m is 28 or smaller—that is, determine b28 because b m ⱕ b28 for m ⱕ 28. y z ⫽ y ⫺ m o s 兾 1n ⫽ 31.2 ⫺ 33 8.4 兾 135 ⫽ ⫺ 1.27 Solution Using the computational formula for b with m ⫽ 33, m a ⫽ 28, and a ⫽ .05, we have The area corresponding to z ⫽ ⫺1.88 in Table 1 of the Appendix is .0301. Hence, b 28 ⫽ .0301 and PWR28 ⫽ 1 ⫺ .0301 ⫽ .9699 Because b is relatively small, we accept the null hypothesis and conclude that the antibacterial soap is not more effective than ordinary soap in reducing bacteria counts. The manufacturer of the antibacterial soap wants to determine the chance that the consumer group may have made an error in reaching its conclusions. The manufacturer wants to compute the probability of a Type II error for a selection of potential values of m in H a . This would provide them with an indication of how likely a Type II error may have occurred when in fact the new soap is considerably more effective in reducing bacterial counts in comparison to the mean count for ordinary soap, m ⫽ 33. Repeating the calculations for obtaining b28, we obtain the values in Table 5.4. ⫽ P[z ⱕ ⫺1.88] b 38 ⫽ P Bz ⱕ z .05 ⫺ |m ⫺ m a | s 兾1n R ⫽ PBz ⱕ 1.645 ⫺ |33 ⫺ 28| 8.4 兾135 R TABLE 5.4 Probability of Type II error and power for values of m in H a M 33 32 31 30 29 28 27 26 25 B M .9500 .8266 .5935 .3200 .1206 .0301 .0049 .0005 .0000 PWRM .0500 .1734 .4065 .6800 .8794 .9699 .9951 .9995 .9999 Figure 5.11 is a plot of the bm values in Table 5.4 with a smooth curve through the points. Note that as the value of m decreases, the probability of Type II error decreases to 0 and the corresponding power value increases to 1.0. The company could examine this curve to determine whether the chances of Type II error are reasonable for values of m in H a that are important to the company. From Table 5.4 or Figure 5.11, we observe that b28 ⫽ .0301, a relatively small number. Based on the results from Example 5.9, we find that the test statistic does not fall in the rejection region. The manufacturer has decided that if the true population FIGURE 5.11 Probability of Type II error .3 .4 .5 .6 .7 .8 .9 1.0 Probability of T ype II error .2 .1 25 26 27 28

29 30

31 32 33 Mean mean bacteria count for its antibacterial soap is 29 or less, it would have a product that would be considered a substantial improvement over ordinary soap. Based on the values of the probability of Type II error displayed in Table 5.4, the chance is relatively small that the test run by the consumer agency has resulted in a Type II error for values of the mean bacterial count of 29 or smaller. Thus, the consumer testing agency was relatively certain in reporting that the new antibacterial soap did not decrease the mean bacterial in comparison to ordinary soap. In Section 5.2, we discussed how we measure the effectiveness of interval esti- mates. The effectiveness of a statistical test can be measured by the magnitudes of the Type I and Type II errors, a and bm. When a is preset at a tolerable level by the experimenter, bm a is a function of the sample size for a fixed value of m a . The larger the sample size n, the more information we have concerning m, the less likely we are to make a Type II error, and, hence, the smaller the value of bm a . To illustrate this idea, suppose we are testing the hypotheses H : m ⱕ 84 against H a : m ⬎ 84, where m is the mean of a population having a normal distribution with s ⫽ 1.4. If we take a ⫽ .05, then the probability of Type II errors is plotted in Figure 5.12a for three possible sample sizes n ⫽ 10, 18, and 25. Note that b84.6 becomes smaller as we increase n from 10 to 25. Another relationship of interest is that between a and bm. For a fixed sample size n, if we change the rejection region to increase the value of a, the value of bm a will decrease. This relationship can be observed in Figure 5.12b. Fix the sample size at 25 and plot bm for three different values of a ⫽ .05, .01, .001. We observe that b84.6 becomes smaller as a increases from .001 to .05. A similar set of graphs can be obtained for the power of the test by simply plotting PWRm ⫽ 1 ⫺ bm versus m. The relationships described would be reversed; that is, for fixed a increasing the value of the sample size would increase the value of PWRm and, for fixed sample size, increasing the value of a would increase the value of PWRm. We will consider now the problem of design- ing an experiment for testing hypotheses about m when a is specified and bm a is preset for a fixed value m a . This problem reduces to determining the sample size needed to achieve the fixed values of a and bm a . Note that in those cases in which the determined value of n is too large for the initially specified values of a and b, we can increase our specified value of a and achieve the desired value of bm a with a smaller sample size. FIGURE 5.12 a bm curve for a ⫽ .05, n ⫽ 10, 18, 25. b bm curve for n ⫽ 25, a ⫽ .05, .01, .001 86.0 85.8 85.6 85.4 85.2 85.0 84.8 84.6 84.4 84.2 84.0 1.0 .9 .8 .7 .6 .5 .4 .3 .2 .1 Mean Probability of Type II error 10 18 25 a 86.0 85.8 85.6 85.4 85.2 85.0 84.8 84.6 84.4 84.2 84.0 1.0 .9 .8 .7 .6 .5 .4 .3 .2 .1 Mean Probability of Type II error .05 .01 .001 b

5.5 Choosing the Sample Size for Testing M

The quantity of information available for a statistical test about m is measured by the magnitudes of the Type I and II error probabilities, a and bm, for various values of m in the alternative hypothesis H a . Suppose that we are interested in testing H : m ⱕ m against the alternative H a : m ⬎ m . First, we must specify the value of a. Next we must determine a value of m in the alternative, m 1 , such that if the actual value of the mean is larger than m 1 , then the consequences of making a Type II error would be substantial. Finally, we must select a value for bm 1 , b. Note that for any value of m larger than m 1 , the probability of Type II error will be smaller than bm 1 ; that is, b m ⬍ bm 1 , for all m ⬎ m 1 Let ⌬ ⫽ m 1 ⫺ m . The sample size necessary to meet these requirements is Note: If s 2 is unknown, substitute an estimated value from previous studies or a pilot study to obtain an approximate sample size. The same formula applies when testing H : m ⱖ m against the alternative H a : m ⬍ m , with the exception that we want the probability of a Type II error to be of magnitude b or less when the actual value of m is less than m 1 , a value of the mean in H a ; that is, b m ⬍ b, for all m ⬍ m 1 with ⌬ ⫽ m ⫺ m 1 . EXAMPLE 5.11 A cereal manufacturer produces cereal in boxes having a labeled weight of 12 ounces. The boxes are filled by machines that are set to have a mean fill per box of 16.37 ounces. Because the actual weight of a box filled by these machines has a normal distribution with a standard deviation of approximately .225 ounces, the percentage of boxes having weight less than 16 ounces is 5 using this setting. The manufacturer is concerned that one of its machines is underfilling the boxes and wants to sample boxes from the machine’s output to determine whether the mean weight m is less than 16.37—that is, to test H : m ⱖ 16.37 H a : m ⬍ 16.37 with a ⫽ .05. If the true mean weight is 16.27 or less, the manufacturer needs the probability of failing to detect this underfilling of the boxes with a probability of at most .01, or risk incurring a civil penalty from state regulators. Thus, we need to determine the sample size n such that our test of H versus H a has a ⫽ .05 and b m less than .01 whenever m is less than 16.27 ounces. Solution We have a ⫽ .05, b ⫽ .01, ⌬ ⫽ 16.37 ⫺ 16.27 ⫽ .1, and s ⫽ .225. Using our formula with z .05 ⫽ 1.645 and z .01 ⫽ 2.33, we have Thus, the manufacturer must obtain a random sample of n ⫽ 80 boxes to conduct this test under the specified conditions. n ⫽ .225 2 1.645 ⫹ 2.33 2 .1 2 ⫽ 79.99 ⬇ 80 n ⫽ s 2 z a ⫹ z b 2 ∆ 2 Suppose that after obtaining the sample, we compute ounces. The computed value of the test statistic is Because the rejection region is z ⬍ ⫺1.645, the computed value of z does not fall in the rejection region. What is our conclusion? In similar situations in previous sections, we would have concluded that there is insufficient evidence to reject H . Now, however, knowing that bm ⱕ .01 when m ⱕ 16.27, we would feel safe in our conclusion to accept H : m ⱖ 16.37. Thus, the manufacturer is somewhat secure in concluding that the mean fill from the examined machine is at least 16.37 ounces. With a slight modification of the sample size formula for the one-tailed tests, we can test H : m ⫽ m H a : m ⫽ m for a specified a, b, and ⌬, where b m ⱕ b, whenever 兩m ⫺ m 兩 ⱖ ⌬ Thus, the probability of Type II error is at most b whenever the actual mean differs from m by at least ⌬. A formula for an approximate sample size n when testing a two-sided hypothesis for m is presented here: z ⫽ y ⫺ 16.37 s 兾 1n ⫽ 16.35 ⫺ 16.37 .225 兾 180 ⫽ ⫺ .795 y ⫽ 16.35 Approximate Sample Size for a Two-Sided Test of H : ␮ ⴝ ␮ Note: If s 2 is unknown, substitute an estimated value to get an approximate sample size. n ⬵ s 2 ∆ 2 z a 兾2 ⫹ z b 2

5.6 The Level of Significance of a Statistical Test

In Section 5.4, we introduced hypothesis testing along rather traditional lines: we defined the parts of a statistical test along with the two types of errors and their associated probabilities a and bm a . The problem with this approach is that if other researchers want to apply the results of your study using a different value for a then they must compute a new rejection region before reaching a decision concerning H and H a . An alternative approach to hypothesis testing follows the following steps: specify the null and alternative hypotheses, specify a value for a, collect the sample data, and determine the weight of evidence for rejecting the null hypothesis. This weight, given in terms of a probability, is called the level of significance or p-value of the statistical test. More formally, the level of significance is defined as follows: the probability of obtaining a value of the test statistic that is as likely or more likely to reject H as the actual observed value of the test statistic, assuming that the null hypothesis is true. Thus, if the level of significance is a small value, then the sample data fail to support H and our decision is to reject H . On the other hand, if the level of significance is a large value, then we fail to reject H . We must next decide what is a large or small value for the level of significance. The following decision rule yields results that will always agree with the testing procedures we introduced in Section 5.5. level of significance p-value

A Statistical Test for M

3.

3.

29 30

5.5 Choosing the Sample Size for Testing M

5.6 The Level of Significance of a Statistical Test

Parts

Dokumen yang terkait

Qanun Aceh Nomor 5 Tahun 2010 Tentang Pe

INTRODUCTION An Analysis Of Adjectival Construction On Michael Buble’s Album “To Be Loved 2013".

An Introduction to IG 6th edition

introduction to real analysis third edition robert g bartle and donald r sherbert

An Introduction to Chemical Kinetics

An Introduction to Statistical Analysis in Research WIth Applications in the Biological and Life Sciences

An Introduction to Machine Learning 2nd Edition pdf pdf

Prentice Hall An Introduction To Programming Using Visual Basic 2005 6th Edition Mar 2006 ISBN 0130306541

Starting an Online Business For Dummies, 6th Edition

Methods of Multivariate Analysis Second Edition

Dukungan

Links

A Statistical Test for M

3.

3.

29 30

5.5 Choosing the Sample Size for Testing M

5.6 The Level of Significance of a Statistical Test

Parts

Dokumen yang terkait

Qanun Aceh Nomor 5 Tahun 2010 Tentang Pe

INTRODUCTION An Analysis Of Adjectival Construction On Michael Buble’s Album “To Be Loved 2013".

An Introduction to IG 6th edition

introduction to real analysis third edition robert g bartle and donald r sherbert

An Introduction to Chemical Kinetics

An Introduction to Statistical Analysis in Research WIth Applications in the Biological and Life Sciences

An Introduction to Machine Learning 2nd Edition pdf pdf

Prentice Hall An Introduction To Programming Using Visual Basic 2005 6th Edition Mar 2006 ISBN 0130306541

Starting an Online Business For Dummies, 6th Edition

Methods of Multivariate Analysis Second Edition

Dokumen yang Anda mencari sudah siap untuk unduhkan