Estimation and Tests for Comparing

possesses a probability distribution in repeated sampling referred to as an F distribution. The formula for the probability distribution is omitted here, but we will specify its properties. F distribution Properties of the F Distribution

1. Unlike t or z but like , F can assume only positive values.

2. The F distribution, unlike the normal distribution or the t distribu-

tion but like the distribution, is nonsymmetrical. See Figure 7.7.

3. There are many F distributions, and each one has a different shape.

We specify a particular one by designating the degrees of freedom associated with and . We denote these quantities by df 1 and df 2 , respectively. See Figure 7.7.

4. Tail values for the F distribution are tabulated and appear in Table 8

in the Appendix. s 2 2 s 2 1 x 2 x 2 Table 8 in the Appendix records upper-tail values of F corresponding to areas , .10, .05, .025, .01, .005, and .001. The degrees of freedom for , des- ignated by df 1 , are indicated across the top of the table; df 2 , the degrees of freedom for , appear in the first column to the left. Values of are given in the next col- umn. Thus, for df 1 ⫽ 5 and df 2 ⫽ 10, the critical values of F corresponding to ⫽ .25, .10, .05, .025, .01, .005, and .001 are, respectively, 1.59, 2.52, 3.33, 4.24, 5.64, 6.78, and 10.48. It follows that only 5 of the measurements from an F distribution with df 1 ⫽ 5 and df 2 ⫽ 10 would exceed 3.33 in repeated sampling. See Figure 7.8. Sim- ilarly, for df 1 ⫽ 24 and df 2 ⫽ 10, the critical values of F corresponding to tail areas of ⫽ .01 and .001 are, respectively, 4.33 and 7.64. a a a s 2 2 s 2 1 a ⫽ .25 FIGURE 7.7 Densities of two F distributions .8 .7 .6 .5 .4 .3 .2 .1 1 2 3 4 5 6 7 8 9 10 df 1 = 10, df 2 = 20 df 1 = 5, df 2 = 10 Value of F F density A statistical test comparing and utilizes the test statistic . When , and follows an F distribution with df 1 ⫽ n 1 ⫺ 1 and df 2 ⫽ n 2 ⫺ 1. For a one-tailed alternative hypothesis, the designation of which popula- tion is 1 and which population is 2 is made such that H a is of the form . Then the rejection region is located in the upper-tail of the F distribution. We summarize the test procedure next. s 2 1 ⬎ s 2 2 s 2 1 兾s 2 2 s 2 1 兾s 2 2 ⫽ 1 s 2 1 ⫽ s 2 2 s 2 1 兾s 2 2 s 2 2 s 2 1 FIGURE 7.8 Critical value for the F distributions df 1 ⫽ 5, df 2 ⫽ 10 .7 .6 .5 .4 .3 .2 .1 1 2 3 4 5 6 7 8 9 10 Value of F F density Area = .05 Distribution of s 1 s 2 2 2 A Statistical Test Comparing ␴ 2 1 and ␴ 2 2 H :

1. H

a : 1.

2. H

a : 2. T.S.: R.R.: For a specified value of and with df 1 ⫽ n 1 ⫺ 1, df 2 ⫽ n 2 ⫺ 1,

1. Reject H

if .

2. Reject H

if . F ⱕ F 1⫺ a 兾2,df 1 ,df 2 or if F ⱖ F a 兾2,df 1 ,df 2 F ⱖ F a ,df 1 ,df 2 a F ⫽ s 2 1 兾s 2 2 s 2 1 ⫽ s 2 2 s 2 1 ⫽ s 2 2 s 2 1 ⬎ s 2 2 s 2 1 ⱕ s 2 2 Table 8 in the Appendix provides the upper percentiles of the F distribution. The lower percentiles are obtained from the upper percentiles using the following relationship. Let be the upper percentile and be the lower per- centile of an F distribution with df 1 and df 2 . Then, Note that the degrees of freedom have been reversed for the upper F percentile on the right-hand side of the equation. F 1⫺ a,df 1 ,df 2 ⫽ 1 F a ,df 2 ,df 1 a F 1⫺ a,df 1 ,df 2 a F a ,df 1 ,df 2 EXAMPLE 7.4 Determine the lower .025 percentile for an F distribution with df 1 ⫽ 7 and df 2 ⫽ 10. Solution From Table 8 in the Appendix, the upper .025 percentile for the F dis- tribution with df 1 ⫽ 10 and df 2 ⫽ 7 is F .025,10,7 ⫽ 4.76. Thus, the lower .025 per- centile is given by EXAMPLE 7.5 In the research study discussed in Chapter 6, we were concerned with assessing the restoration of land damaged by an oil spill. Random samples of 80 tracts from the unaffected and oil spill areas were selected for use in the assessment of how well the oil spill area was restored to its prespill status. Measurements of flora density were taken on each of the 80 tracts. These 80 densities were then used to test whether the unaffected control tracts had a higher mean density than the restored spill sites: H a : Con ⬎ Spill . A confidence interval was also placed on the effect size: Con ⫺ Spill . We mentioned in Chapter 6 that in selecting the test statistic and constructing confidence intervals for 1 ⫺ 2 we require that the random samples be drawn from normal populations with possible different means but the variances need to be equal in order to apply the pooled t-procedures. Use the sample data summarized next to test the equality of the population variances for the flora densities. Use . Control Plots: Spill Plots: Solution The four parts of the statistical test of H : follow: H : H a : T.S.: Prior to setting the rejection region, we must first determine whether the two ran- dom samples appear to be from normally distributed populations. Figures 6.9 and 6.10a and b indicate that the oil spill sites appear to be selected from a normal distribution. However, the control sites appear to have a distribution somewhat skewed to the left. Although the normality condition is not exactly satisfied, we will still apply the F test to this situation. In the next section, we will introduce a test statistic that is not as sensitive to deviations from normality. F ⫽ s 2 1 s 2 2 ⫽ 16.37 2 9.88 2 ⫽ 2.75 s 2 1 ⫽ s 2 2 s 2 1 ⫽ s 2 2 s 2 1 ⫽ s 2 2 n 2 ⫽ 40 s 2 ⫽ 9.88 y 2 ⫽ 26.93 n 1 ⫽ 40 s 1 ⫽ 16.37 y 1 ⫽ 38.48 a ⫽ .05 m m m m m m F .975,7,10 ⫽ 1 F .025,10,7 ⫽ 1 4.76 ⫽ 0.21 R.R.: For a two-tailed test with ⫽ .05, we reject H if we used the values for df 1 ⫽ df 2 ⫽ 40 as an approximation since Table 8 in the Appendix does not have values for df 1 ⫽ df 2 ⫽ 39. Conclusion: Because F ⫽ 2.75 exceeds 1.88, we reject H : and conclude that the two populations have unequal variances. Thus, our decision to use the separate-variance t test in the analysis of the oil spill data was the correct decision. s 2 1 ⫽ s 2 2 or if F ⱕ F .975,39,39 ⬇ 1兾1.88 ⫽ .53 F ⱖ F .025,39,39 ⬇ 1.88 a In Chapter 6, our tests of hypotheses concerned either population means or a shift parameter. For both types of parameters, it was important to provide an estimate of the effect size along with the conclusion of the test of hypotheses. In the case of testing population means, the effect size was in terms of the difference in the two means: . When comparing population variances, the appropriate measure is the ratio of the population variances: . Thus, we need to formulate a confidence interval for the ratio: . A 1001 ⫺ confidence interval for this ratio is given here. a s 2 1 兾s 2 2 s 2 1 兾s 2 2 m 1 ⫺ m 2 General Confidence Interval for ␴ 2 1 兾␴ 2 2 with Confidence Coefficient 1 ⴚ ␣ where and , with and . Note: A confidence interval for is found by taking the square root of the endpoints of the confidence interval for . s 2 1 兾s 2 2 s 1 兾s 2 df 2 ⫽ n 2 ⫺ 1 df 1 ⫽ n 1 ⫺ 1 F L ⫽ F 1⫺ a 兾2,df 2 ,df 1 ⫽ 1 兾F a 兾2,df 1 ,df 2 F U ⫽ F a 兾2,df 2 ,df 1 s 2 1 s 2 2 F L ⱕ s 2 1 s 2 2 ⱕ s 1 2 s 2 2 F U EXAMPLE 7.6 Refer to Example 7.5. We rejected the hypothesis that the variance of flora density for the control and oil spill sites were equal. The researchers would then want to estimate the magnitude of the disagreement in the variances. Using the data in Example 7.5, construct a 95 confidence interval for . Solution The confidence interval for the ratio of the two variances is given by , where and F U ⫽ ⫽ . Thus, we have the 95 confidence interval given by Thus, we are 95 confident that the flora density in the control plots is between 1.45 and 5.66 times as variable as the oil spill plots. It should be noted that although our estimation procedure for is appropriate for any confidence coefficient , Table 8 in the Appendix allows us to construct confidence intervals for with the more commonly used con- fidence coefficients, such as .90, .95, .98, .99, and so on. For more detailed tables of the F distribution, see Pearson and Hartley 1966. EXAMPLE 7.7 The life length of an electrical component was studied under two operating voltages, 110 and 220. Ten different components were randomly assigned to operate at 110 volts and 16 different components were randomly assigned to operate at 220 volts. The times to failure in hundreds of hours for the 26 components were obtained and yielded the following summary statistics and normal probability plots see Figures 7.9 and 7.10 as well as Table 7.3. s 2 1 兾s 2 2 1 ⫺ a s 2 1 兾s 2 2 冢 16.37 2 9.88 2 .53, 16.37 2 9.88 2 1.88 冣 ⫽ 1.45, 5.66. F .025,39,39 ⫽ 1.88 F a 兾2,n 2 ⫺ 1,n 1 ⫺ 1 F L ⫽ F 1⫺ a 兾2,n 2 ⫺ 1,n 1 ⫺ 1 ⫽ F .975,39,39 ⫽ 1 兾F .025,39,39 ⫽ 1 兾1.88 ⫽ .53 冢 s 2 1 s 2 2 F L , s 2 1 s 2 2 F U 冣 s 2 1 兾s 2 2 Standard Voltage Sample Size Mean Deviation 110 10 20.04 .474 220 16 9.99 .233 The researchers wanted to estimate the relative size of the variation in life length under 110 and 220 volts. Use the data to construct a 90 confidence interval for , the ratio of the standard deviations in life lengths for the components under the two operating voltages. Solution Before constructing the confidence interval, it is necessary to check whether the two populations of life lengths were both normally distributed. From the normal probability plots, it would appear that both samples of life lengths are from the normal distribution. Next, we need to find the upper and lower percentiles for the F distribution with df 1 ⫽ 10 ⫺ 1 ⫽ 9 and df 2 ⫽ 16 ⫺ 1 ⫽ 15. From Table 8 in the Appendix, we find F U ⫽ F .05,15,9 ⫽ 3.01 and F L ⫽ F .95,15,9 ⫽ 1 兾F .05,9,15 ⫽ 1 兾2.59 ⫽ .386 a 兾2 ⫽ .10兾2 ⫽ .05 s 1 兾s 2 FIGURE 7.9 Normal probability plot for life length under 110 volts FIGURE 7.10 Normal probability plot for life length under 220 volts .999 .99 .95 .80 .50 .20 .05 .01 .001 21.0 20.5 20.0 19.5 Time to failure at 110 volts Probability .999 .99 .95 .80 .50 .20 .05 .01 .001 10.45 10.20 9.95 9.70 Time to failure at 220 volts Probability TABLE 7.3 Life length summary statistics Substituting into the confidence interval formula, we have a 90 confidence inter- val for : It follows that our 90 confidence interval for is given by Thus, we are 90 confident that is between 1.26 and 3.53 times as large as . A simulation study was conducted to investigate the effect on the level of the F test of sampling from heavy-tailed and skewed distributions rather than the re- quired normal distribution. The five distributions were described in Example 7.3. For each pair of sample sizes n 1 , n 2 ⫽ 10, 10, 10, 20, or 20, 20, random samples of the specified sizes were selected from one of the five distributions. A test of was conducted using an F test with . This process was repeated 2,500 times for each of the five distributions and three sets of sample sizes. The results are given in Table 7.4. The values given in Table 7.4 are estimates of the probability of Type I errors, , for the F test of equality of two population variances. When the samples are from a normally distributed population, the value of is nearly equal to the nom- inal level of .05 for all three pairs of sample sizes. This is to be expected, because the F test was constructed to test hypotheses when the population distributions have normal distributions. However, when the population distribution is a sym- metric short-tailed distribution like the uniform distribution, the value of is much smaller than the specified value of .05. Thus, the probability of Type II errors of the F test would most likely be much larger than what would occur when sam- pling from normally distributed populations. When we have population distribu- tions that are symmetric and heavy-tailed, like the t with df ⫽ 5, the values of are two to three times larger than the specified value of .05. Thus, the F test commits many more Type I errors than would be expected when the population distribu- tions are of this type. A similar problem occurs when we sample with skewed pop- ulation distributions such as the two gamma distributions. In fact, the Type I error rates are extremely large in these situations, thus rendering the F test invalid for these types of distributions. a a a a a ⫽ .05 H : s 2 1 ⫽ s 2 2 vs. H a : s 2 1 ⫽ s 2 2 s 2 s 1 11.5975 ⱕ s 1 s 2 ⱕ 1 12.4569 or 1.26 ⱕ s 1 s 2 ⱕ 3.53 s 1 兾s 2 1.5975 ⱕ s 1 2 s 2 2 ⱕ 12.4569 .474 2 .233 2 .386 ⱕ s 2 1 s 2 2 ⱕ .474 2 .233 2 3.01 s 2 1 兾s 2 2 TABLE 7.4 Proportion of times H : was rejected a ⫽ .05 s 2 1 ⫽ s 2 2 Distribution Sample Gamma Gamma Sizes Normal Uniform t df ⴝ 5 shape ⴝ 1 shape ⴝ .1 10, 10 .054 .010 .121 .225 .693 10, 20 .056 .0068 .140 .236 .671 20, 20 .050 .0044 .150 .264 .673

7.4 Tests for Comparing

t 2 Population Variances In the previous section, we discussed a method for comparing variances from two normally distributed populations based on taking independent random samples from the populations. In many situations, we will need to compare more than two populations. For example, we may want to compare the variability in the level of nutrients of five different suppliers of a feed supplement or the variability in scores of the students using SAT preparatory materials from the three major publishers of those materials. Thus, we need to develop a statistical test that will allow us to compare population variances. We will consider two procedures. The first procedure, Hartley’s test, is very simple to apply but has the restriction that the population distributions must be normally distributed and the sample sizes equal. The second procedure, Brown-Forsythe-Levene BFL test, is more complex in its computations but does not restrict the population distributions or the sample sizes. BFL test can be obtained from many of the statistical software packages. For exam- ple, SAS and Minitab both use BFL test for comparing population variances.

H. O. Hartley 1950 developed the Hartley F

max test for evaluating the hypotheses The Hartley F max requires that we have independent random samples of the same size n from t normally distributed populations. With the exception that we require n 1 ⫽ n 2 ⫽ . . . ⫽ n t ⫽ n, the Hartley test is a logical extension of the F test from the previous section for testing t ⫽ 2 variances. With denoting the sample variance computed from the ith sample, let ⫽ the smallest of the s and ⫽ the largest of the s. The Hartley test statistic is The test procedure is summarized here. F max ⫽ s 2 max s 2 min F max s 2 i s 2 max s 2 i s 2 min s 2 i H : s 2 1 ⫽ s 2 2 ⫽ … ⫽ s 2 t vs. H a : s 2 i s not all equal t ⬎ 2 Hartley F max test Hartley’s F max Test for Homogeneity of Population Variances H : homogeneity of variances H a : Population variances are not all equal T.S.: R.R.: For a specified value of , reject H if F max exceeds the tabulated F value Table 12 for specified t, and df 2 ⫽ n ⫺ 1, where n is the common sample size for the t random samples. Check assumptions and draw conclusions. a F max ⫽ s 2 max s 2 min s 2 1 ⫽ s 2 2 ⫽ … ⫽ s 2 t We will illustrate the application of the Hartley test with the following example. EXAMPLE 7.8 Wludyka and Nelson [Technometrics 1997, 39:274 –285] describe the following experiment. In the manufacture of soft contact lenses, a monomer is injected into a plastic frame, the monomer is subjected to ultraviolet light and heated the time, temperature, and light intensity are varied, the frame is removed, and the lens is hydrated. It is thought that temperature can be manipulated to target the power the strength of the lens, so interest is in comparing the variability in power. The data are coded deviations from target power using monomers from three different suppliers as shown in Table 7.5. We wish to test . H : s 2 1 ⫽ s 2 2 ⫽ s 2 3 Deviations from Target Power for Three Suppliers Sample Supplier 1 2 3 4 5 6 7 8 9 n 1 191.9 189.1 190.9 183.8 185.5 190.9 192.8 188.4 189.0 9 8.69 2 178.2 174.1 170.3 171.6 171.7 174.7 176.0 176.6 172.8 9 6.89 3 218.6 208.4 187.1 199.5 202.0 211.1 197.6 204.4 206.8 9 80.22 s 2 i Solution Before conducting the Hartley test, we must check the normality condi- tion. The data are evaluated for normality using a boxplot given in Figure 7.11. FIGURE 7.11 Boxplot of deviations from target power for three suppliers 3 2 1 170 180 190 200 210 220 Supplier De viations All three data sets appear to be from normally distributed populations. Thus, we will apply the Hartley F max test to the data sets. From Table 12 in the Appendix, with a ⫽ .05, t ⫽ 3, and df 2 ⫽ 9 ⫺ 1 ⫽ 8, we have F max,.05 ⫽ 6.00. Thus, our rejection region will be R.R.: Reject H if F max F max,.05 ⫽ 6.00 and Thus, Thus, we reject H and conclude that the variances are not all equal. If the sample sizes are not all equal, we can take , where is the largest sample size. no longer has an exact level . In fact, the test is liberal in the sense that the probability of Type I error is slightly more than the nominal value . Thus, the test is more likely to falsely reject H than the test having all n i s equal when sampling from normal populations with the variances all equal. a a F max n max n ⫽ n max F max ⫽ s 2 max s 2 min ⫽ 80.22 6.89 ⫽ 11.64 ⬎ 6.00 s 2 max ⫽ max8.69, 6.89, 80.22 ⫽ 80.22 s 2 min ⫽ min8.69, 6.89, 80.22 ⫽ 6.89 ⱖ TABLE 7.5 Data from three suppliers