2. Jay L. Devore Probability and Statistics

increases, and it increases as the variance s 2 increases since greater underlying variability makes it more difficult to detect any given departure from H . Because hand computation of b and sample size determination for the F test are quite difficult as in the case of t tests, statisticians have constructed sets of curves from which b can be obtained. Sets of curves for numerator df and are displayed in Figure 10.5 and Figure 10.6, respectively. After the values of s 2 and the a i ’s for which b is desired are specified, these are used to compute the value of f, where . We then enter the appropriate set of curves at the value of f on the horizontal axis, move up to the curve associated with error df n 2 , and move over to the value of power on the vertical axis. Finally, . b 5 1 2 power f 2 5 JI ga i 2 s 2 n 1 5 4 n 1 5 3 60 60 30 30 20 20 15 15 12 12 10 9 8 7 6 10 9 8 7 6 ⴥ ⴝ .01 ␣ ⴝ .05 ␣ 1 2 3 4 5 3 2 1 ␾ for ⴝ .01 ␣ ␾ for ⴝ .05 ␣ .99 .98 .97 .96 .95 .94 .92 .90 .80 .70 .60 .50 .40 .30 .10 Power ⴝ 1 ⴚ ␤ 1 ⴝ 4 ␯ 2 ⴝ ⴥ ␯ Figure 10.6 Power curves for the ANOVA F test n 1 5 4 From E. S. Pearson and H. O. Hartley, “Charts of the Power Function for Analysis of Variance Tests, Derived from the Non-central F Distribution,” Biometrika, vol. 38, 1951: 112, by permission of Biometrika Trustees. 60 60 30 30 20 20 15 15 12 12 10 9 8 7 6 10 9 8 7 6 ⴥ ⴝ .01 ␣ ⴝ .05 ␣ 1 2 3 4 5 3 2 1 ␾ for ⴝ .01 ␣ ␾ for ⴝ .05 ␣ .99 .98 .97 .96 .95 .94 .92 .90 .80 .70 .60 .50 .40 .30 .10 Power ⴝ 1 ⴚ ␤ 1 ⴝ 3 ␯ 2 ⴝ ⴥ ␯ Figure 10.5 Power curves for the ANOVA F test n 1 5 3 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. The effects of four different heat treatments on yield point tonsin 2 of steel ingots are to be investigated. A total of eight ingots will be cast using each treatment. Suppose the true standard deviation of yield point for any of the four treatments is . How likely is it that H will not be rejected at level .05 if three of the treatments have the same expected yield point and the other treatment has an expected yield point that is 1 tonin 2 greater than the common value of the other three i.e., the fourth yield is on average 1 standard deviation above those for the first three treatments? Suppose that and . Then so and . Degrees of freedom for the F test are and , so interpolating visually between and gives and . This b is rather large, so we might decide to increase the value of J. How many ingots of each type would be required to yield for the alternative under consideration? By trying different values of J, it can be verified that will meet the requirement, but any smaller J will not. ■ As an alternative to the use of power curves, the SAS statistical software pack- age has a function that calculates the cumulative area under a noncentral F curve inputs F a , numerator df, denominator df, and f 2 , and this area is b. Minitab does this and also something rather different. The user is asked to specify the maximum difference between m i ’s rather than the individual means. For example, we might wish to calculate the power of the test when and . Then the maximum difference is . However, the power depends not only on this maximum difference but on the values of all the m i ’s. In this situation Minitab calculates the smallest possible value of power subject to and , which occurs when the two other m’s are both halfway between 100 and 106. If this power is .85, then we can say that the power is at least .85 and b is at most .15 when the two most extreme m’s are separated by 6 the com- mon sample size, a, and s must also be specified. The software will also determine the necessary common sample size if maximum difference and minimum power are specified. Relationship of the F Test to the t Test When the number of treatments or populations is , all formulas and results con- nected with the F test still make sense, so ANOVA can be used to test versus . In this case, a two-tailed, two-sample t test can also be used. In Section 9.3, we mentioned the pooled t test, which requires equal variances, as an alternative to the two-sample t procedure. It can be shown that the single-factor ANOVA F test and the two-tailed pooled t test are equivalent; for any given data set, the P-values for the two tests will be identical, so the same conclusion will be reached by either test. The two-sample t test is more flexible than the F test when for two rea- sons. First, it is valid without the assumption that ; second, it can be used to test an upper-tailed t test or as well as . In the case of , there is unfortunately no general test procedure known to have good properties without assuming equal variances. I

3 H

a : m 1 2 m 2 H a : m 1 , m 2 H a : m 1 . m 2 s 1 5 s 2 I 5 2 H a : m 1 2 m 2 H : m 1 5 m 2 I 5 2 m 4 5 106 m 1 5 100 106 2 100 5 6 m 4 5 106 I 5 4, m 1 5 100, m 2 5 101, m 3 5 102, J 5 24 b .05 b .53 power .47 n 2 5 30 n 2 5 20 J 2 1 5 28 n 2 5 I n 1 5 I 2 1 5 3 f 5 1.22 f 2 5 8 4 ca 2 1 4 b 2

1 a2

1 4 b 2

1 a2

1 4 b 2 1 a 3 4 b 2 d 5 3 2 m 1 2 m 5 2 1 4 , a 2 5 2 1 4 , a 3 5 2 1 4 , a 4 5 3 4 a 1 5 m 4 5 m 1 1 1, m 5 gm i 4 5 m 1 1 1 4 m 1 5 m 2 5 m 3 s 5 1 Example 10.8 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Unequal Sample Sizes When the sample sizes from each population or treatment are not equal, let denote the I sample sizes, and let denote the total number of observations. The accompanying box gives ANOVA formulas and the test procedure. n 5 ⌺ i J i J 1 , J 2 , c , J I Test statistic value: Rejection region: f F a ,I21,n2I f 5 MSTr MSE where MSTr 5 SSTr I 2 1 MSE 5 SSE n 2 I SSE 5 g I i5 1 g J i j5 1 X ij 2 X i 2 5 SST 2 SSTr df 5 g J i 2 1 5 n 2 I SSTr 5 g I i5 1 g J i j5 1 X i 2 X 2 5 g I i5 1 1 J i X i 2 2 1 n X 2 df 5 I 2 1 SST 5 g I i5 1 g J i j5 1 X ij 2 X 2 5 g I i5 1 g J i j5 1 X ij 2 2 1 n X 2 df 5 n 2 1 The article “On the Development of a New Approach for the Determination of Yield Strength in Mg-based Alloys” Light Metal Age, Oct. 1998: 51–53 presented the following data on elastic modulus GPa obtained by a new ultrasonic method for specimens of a certain alloy produced using three different casting processes. J i Permanent molding 45.5 45.3 45.4 44.4 44.6 43.9 44.6 44.0 8 357.7 44.71 Die casting 44.2 43.9 44.7 44.2 44.0 43.8 44.6 43.1 8 352.5 44.06 Plaster molding 46.0 45.9 44.8 46.2 45.1 45.5 6 273.5 45.58 22 983.7 Let m 1 , m 2 , and m 3 denote the true average elastic moduli for the three different processes under the given circumstances. The relevant hypotheses are versus H a : at least two of the m i ’s are different. The test statistic is, of course, , based on numerator df and denominator df. Relevant quantities include The remaining computations are displayed in the accompanying ANOVA table. Since , the P-value is smaller than .001. Thus the null hypothesis should be rejected at any reasonable significance level; there is compelling F .001,2,19 5 10.16 , 12.56 5 f SSE 5 13.93 2 7.93 5 6.00 SSTr 5 357.7 2 8 1 352.5 2 8 1 273.5 2 6 2 43,984.84 5 7.93 SST 5 43,998.73 2 43,984.80 5 13.93 gg x ij 2 5 43,998.73 CF 5 983.7 2 22 5 43,984.80 n 2 I 5 22 2 3 5 19 I 2 1 5 2 F 5 MSTrMSE H : m 1 5 m 2 5 m 3 x i x i Example 10.9 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Let Then the probability is approximately that for every i and j and with . i 2 j j 5 1, c , I i 5 1, c , I X i 2 X j 2 w ij m i 2 m j X i 2 X j 1 w ij 1 2 a w ij 5 Q a ,I,n2I B MSE 2 a 1 J i 1 1 J j b evidence for concluding that a true average elastic modulus somehow depends on which casting process is used. Sum of Mean Source of Variation df Squares Square f Treatments 2 7.93 3.965 12.56 Error 19 6.00 .3158 Total 21 13.93 There is more controversy among statisticians regarding which multiple compar- isons procedure to use when sample sizes are unequal than there is in the case of equal sample sizes. The procedure that we present here is recommended in the excel- lent book Beyond ANOVA: Basics of Applied Statistics see the chapter bibliography for use when the I sample sizes are reasonably close to one another “mild imbalance”. It modifies Tukey’s method by using averages of pairs of 1J i ’s in place of 1J. J 1 , J 2 , c J I The simultaneous confidence level is only approximate rather than exact as it is with equal sample sizes. Underscoring can still be used, but now the w ij factor used to decide whether and can be connected will depend on J i and J j . The sample sizes for the elastic modulus data were , and . A simultaneous confidence level of approxi- mately 95 requires , from which Since and m 2 are judged not signifi- cantly different. The accompanying underscoring scheme shows that m 1 and m 3 appear to differ significantly, as do m 2 and m 3 . ■ Data Transformation The use of ANOVA methods can be invalidated by substantial differences in the vari- ances which until now have been assumed equal with common value s 2 . It sometimes happens that , a known function of m i so that when H is false, the variances are not equal. For example, if has a Poisson distribution X ij V X ij 5 s i 2 5 g m i s 1 2 , c , s I 2 2. Die 1. Permanent 44.06 44.71 3. Plaster 45.58 x 1 2 x 2 5 44.71 2 44.06 5 .65 , w 12 , m 1 w 12 5 3.59 B .316 2 a 1 8 1 1 8 b 5 .713, w 13 5 .771 w 23 5 .771 Q .05,3,19 5 3.59 I 5 3, n 2 I 5 19, MSE 5 .316 J 1 5 8, J 2 5 8, J 3 5 6 x j x i . 1001 2 a Example 10.10 Example 10.9 continued ■ Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. with parameter l i approximately normal if , then and , so is the known function. In such cases, one can often transform the ’s to so that they will have approximately equal variances while leaving the trans- formed variables approximately normal, and then the F test can be used on the transformed observations. The key idea in choosing is that often . We now wish to find the function for which a constant for every i. g m i [hrm i ] 2 5 c h V X ij [hrm i ] 2 5 g m i [hrm i ] 2 V [hX ij ] h h X ij X ij g m i 5 m i s i 2 5 l i m i 5 l i l i 10 PROPOSITION If , a known function of m i , then a transformation that “stabilizes the variance” so that is approximately the same for each i is given by . h x~ 冮 [gx] 2 12

dx V

[hX ij ] h X ij V X ij 5 gm i 10.7 all A i ’s and normally distributed and independent of one another. P ij ’s VP ij 5 s 2 V A i 5 s A 2 . X ij 5 m 1 A i 1 P ij with EA i 5 EP ij 5 0 In the Poisson case, , so hx should be proportional to . Thus Poisson data should be transformed to before the analysis. A Random Effects Model The single-factor problems considered so far have all been assumed to be examples of a fixed effects ANOVA model. By this we mean that the chosen levels of the fac- tor under study are the only ones considered relevant by the experimenter. The single-factor fixed effects model is 10.6 where the are random and both m and the a i ’s are fixed parameters. In some single-factor problems, the particular levels studied by the experi- menter are chosen, either by design or through sampling, from a large population of levels. For example, to study the effects on task performance time of using different operators on a particular machine, a sample of five operators might be chosen from a large pool of operators. Similarly, the effect of soil pH on the yield of maize plants might be studied by using soils with four specific pH values chosen from among the many possible pH levels. When the levels used are selected at random from a larger population of possible levels, the factor is said to be random rather than fixed, and the fixed effects model 10.6 is no longer appropriate. An analogous random effects model is obtained by replacing the fixed a i ’s in 10.6 by random variables. P ij ’s X ij 5 m 1 a i 1 P ij g a i 5 h x ij 5 1x ij 冮 x 2 12 dx 5 2x 12 g x 5 x The condition in 10.7 is similar to the condition in 10.6; it states that the expected or average effect of the ith level measured as a departure from m is zero. For the random effects model 10.7, the hypothesis of no effects due to dif- ferent levels is , which says that different levels of the factor contribute nothing to variability of the response. Although the hypotheses in the single-factor H : s A 2 5 ⌺a i 5 E A i 5 0 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. fixed and random effects models are different, they are tested in exactly the same way, by forming and rejecting H if . This can be jus- tified intuitively by noting that as for fixed effects, whereas 10.8 where are the sample sizes and . The factor in parentheses on the right side of 10.8 is nonnegative, so again if H is true and if H is false. The study of nondestructive forces and stresses in materials furnishes important information for efficient engineering design. The article “Zero-Force Travel-Time Parameters for Ultrasonic Head-Waves in Railroad Rail” Materials Evaluation, 1985: 854–858 reports on a study of travel time for a certain type of wave that results from longitudinal stress of rails used for railroad track. Three measurements were made on each of six rails randomly selected from a population of rails. The investigators used random effects ANOVA to decide whether some variation in travel time could be attributed to “between-rail variability.” The data is given in the accom- panying table each value, in nanoseconds, resulted from subtracting 36.1 m’s from the original observation along with the derived ANOVA table. The value f is highly significant, so is rejected in favor of the conclusion that differences between rails is a source of travel-time variability. Source of df Sum of Mean f Variation Squares Square 1: 55 53 54 162 Treatments 5 9310.5 1862.1 115.2 2: 26 37 32 95 Error 12 194.0 16.17 3: 78 91 85 254 Total 17 9504.5 4: 92 100 96 288 5: 49 51 50 150 6: 80 85 83 248 x .. 5 1197 x i. H : s A 2 5 E MSTr . s 2 E MSTr 5 s 2 n 5 gJ i J 1 , J 2 , c , J I E MSTr 5 s 2 1 1 I 2 1 ° n 2 g J i 2 n ¢ s A 2 E MSE 5 s 2 f F a , I21,n2I F 5 MSTrMSE Example 10.11 EXERCISES Section 10.3 22–34 22. The following data refers to yield of tomatoes kgplot for four different levels of salinity. Salinity level here refers to electrical conductivity EC, where the chosen levels were EC , and 10.2 nmhoscm. 1.6: 59.5 53.3 56.8 63.1 58.7 3.8: 55.2 59.1 52.8 54.5 6.0: 51.7 48.8 53.9 49.0 10.2: 44.6 48.5 41.0 47.3 46.1 Use the F test at level to test for any differences in true average yield due to the different salinity levels. 23. Apply the modified Tukey’s method to the data in Exercise 22 to identify significant differences among the m i ’ s. a 5 .05 5 1.6, 3.8, 6.0 24. The accompanying summary data on skeletal-muscle CS activity nmolminmg appeared in the article “Impact of Lifelong Sedentary Behavior on Mitochondrial Function of Mice Skeletal Muscle” J. of Gerontology, 2009: 927–939: Old Old Young Sedentary Active Sample size 10 8 10 Sample mean 46.68 47.71 58.24 Sample sd 7.16 5.59 8.43 Carry out a test to decide whether true average activity differs for the three groups. If appropriate, investigate differences amongst the means with a multiple comparisons method. ■ Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 25. Lipids provide much of the dietary energy in the bodies of infants and young children. There is a growing interest in the quality of the dietary lipid supply during infancy as a major determinant of growth, visual and neural development, and long-term health. The article “Essential Fat Requirements of Preterm Infants” Amer. J. of Clinical Nutrition, 2000: 245S–250S reported the following data on total polyunsat- urated fats for infants who were randomized to four dif- ferent feeding regimens: breast milk, corn-oil-based formula, soy-oil-based formula, or soy-and-marine-oil-based formula: accompanying data on folacin content for randomly selected specimens of the four leading brands of green tea. 1: 7.9 6.2 6.6 8.6 8.9 10.1 9.6 2: 5.7 7.5 9.8 6.1 8.4 3: 6.8 7.5 5.0 7.4 5.3 6.1 4: 6.4 7.1 7.9 4.5 5.0 4.0 Data is based on “Folacin Content of Tea,” J. of the Amer. Dietetic Assoc. , 1983: 627–632. Does this data suggest that true average folacin content is the same for all brands? a. Carry out a test using via the P-value method. b. Assess the plausibility of any assumptions required for your analysis in part a. c. Perform a multiple comparisons analysis to identify sig- nificant differences among brands. 28. For a single-factor ANOVA with sample sizes , show that where . 29. When sample sizes are equal , the parameters of the alternative parameterization are restricted by . For unequal sample sizes, the most natural restriction is . Use this to show that What is EMSTr when H is true? [This expectation is cor- rect if is replaced by the restriction or any other single linear restriction on the a i ’s used to reduce the model to I independent parameters, but sim- plifies the algebra and yields natural estimates for the model parameters in particular, .] 30. Reconsider Example 10.8 involving an investigation of the effects of different heat treatments on the yield point of steel ingots. a. If and , what is b for a level .05 F test when , and ? b. For the alternative of part a, what value of J is neces- sary to obtain ? c. If there are heat treatments, , and , what is b for the level .05 F test when four of the m i ’s are equal and the fifth differs by 1 from the other four? 31. When sample sizes are not equal, the noncentrality parame- ter is and . Referring to Exercise 22, what is the power of the test when , and ? 32. In an experiment to compare the quality of four different brands of magnetic recording tape, five 2400-ft reels of each brand A–D were selected and the number of flaws in each reel was determined. A: 10 5 12 14 8 B: 14 12 17 9 8 C: 13 18 10 15 18 D: 17 16 12 22 14 m 4 5 m 2 1 s m 2 5 m 3 , m 1 5 m 2 2 s f 2 5 1I g J i a i 2 s 2 g J i a i 2 s 2 s 5 1 J 5 10 I 5 5 b 5 .05 m 4 5 m 1 1 1 m 1 5 m 2 , m 3 5 m 1 2 1 s 5 1 J 5 8 a ˆ i 5 x i 2 x gJ i a i 5 ga i 5 gJ i a i 5 E MSTr 5 s 2 1 1 I 2 1 g J i a i 2 gJ i a i 5 ⌺a i 5 a 1 , a 2 , c a I J i 5 J n 5 gJ i SSTr 5 gJ i X i 2 X 2 5 g J i X i 2 2 nX 2 I J i i 5 1, 2, c a 5 .05 Sample Sample Sample Regimen Size Mean SD Breast milk 8 43.0 1.5 CO 13 42.4 1.3 SO 17 43.1 1.2 SMO 14 43.5 1.2 a . What assumptions must be made about the four total polyunsaturated fat distributions before carrying out a single-factor ANOVA to decide whether there are any differences in true average fat content? b. Carry out the test suggested in part a. What can be said about the P-value? 26. Samples of six different brands of dietimitation margarine were analyzed to determine the level of physiologically active polyunsaturated fatty acids PAPFUA, in percent- ages, resulting in the following data: Imperial 14.1 13.6 14.4 14.3 Parkay 12.8 12.5 13.4 13.0 12.3 Blue Bonnet 13.5 13.4 14.1 14.3 Chiffon 13.2 12.7 12.6 13.9 Mazola 16.8 17.2 16.4 17.3 18.0 Fleischmann’s 18.1 17.2 18.7 18.4 The preceding numbers are fictitious, but the sample means agree with data reported in the January 1975 issue of Consumer Reports. a. Use ANOVA to test for differences among the true aver- age PAPFUA percentages for the different brands. b. Compute CIs for all . c. Mazola and Fleischmann’s are corn-based, whereas the others are soybean-based. Compute a CI for [Hint: Modify the expression for that led to 10.5 in the previous section.] 27. Although tea is the world’s most widely consumed beverage after water, little is known about its nutritional value. Folacin is the only B vitamin present in any significant amount in tea, and recent advances in assay methods have made accurate determination of folacin content feasible. Consider the V ˆu m 1 1 m 2 1 m 3 1 m 4 4 2 m 5 1 m 6 2 m i 2 m j ’s Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. It is believed that the number of flaws has approximately a Poisson distribution for each brand. Analyze the data at level .01 to see whether the expected number of flaws per reel is the same for each brand. 33. Suppose that is a binomial variable with parameters n and p i so approximately normal when and . Then nq i 10 np i 10 X ij since , . How should the ’s be transformed so as to stabilize the vari- ance? [Hint: .] 34. Simplify EMSTr for the random effects model when . J 1 5 J 2 5 c5 J I 5 J g m i 5 m i 1 2 m i n X ij V X ij 5 s i 2 5 np i 1 2 p i 5 m i 1 2 m i n m i 5 np i SUPPLEMENTARY EXERCISES 35–46 35. An experiment was carried out to compare flow rates for four different types of nozzle. a. Sample sizes were 5, 6, 7, and 6, respectively, and calcu- lations gave . State and test the relevant hypotheses using b. Analysis of the data using a statistical computer package yielded . At level .01, what would you conclude, and why? 36. The article “Computer-Assisted Instruction Augmented with Planned TeacherStudent Contacts” J. of Exp. Educ., Winter, 1980–1981: 120–126 compared five different methods for teaching descriptive statistics. The five methods were tradi- tional lecture and discussion LD, programmed textbook instruction R, programmed text with lectures RL, com- puter instruction C, and computer instruction with lectures CL. Forty-five students were randomly assigned, 9 to each method. After completing the course, the students took a 1-hour exam. In addition, a 10-minute retention test was administered 6 weeks later. Summary quantities are given. P -value 5 .029 a 5 .01 f 5 3.68 of six motors. The amount of motor vibration measured in microns was recorded when each of the 30 motors was run- ning. The data for this study follows. State and test the rele- vant hypotheses at significance level .05, and then carry out a multiple comparisons analysis if appropriate. Mean 1: 13.1 15.0 14.0 14.4 14.0 11.6 13.68 2: 16.3 15.7 17.2 14.9 14.4 17.2 15.95 3: 13.7 13.9 12.4 13.8 14.9 13.3 13.67 4: 15.7 13.7 14.4 16.0 13.9 14.7 14.73 5: 13.5 13.4 13.2 12.7 13.4 12.3 13.08 38. An article in the British scientific journal Nature “Sucrose Induction of Hepatic Hyperplasia in the Rat,” August 25, 1972: 461 reports on an experiment in which each of five groups consisting of six rats was put on a diet with a differ- ent carbohydrate. At the conclusion of the experiment, the DNA content of the liver of each rat was determined mgg liver, with the following results: Exam Retention Test Method s i s i LD 29.3 4.99 30.20 3.82 R 28.0 5.33 28.80 5.26 RL 30.2 3.33 26.20 4.66 C 32.4 2.94 31.10 4.91 CL 34.2 2.74 30.20 3.53 x i. x i. The grand mean for the exam was 30.82, and the grand mean for the retention test was 29.30. a. Does the data suggest that there is a difference among the five teaching methods with respect to true mean exam score? Use . b. Using a .05 significance level, test the null hypothesis of no difference among the true mean retention test scores for the five different teaching methods. 37. Numerous factors contribute to the smooth running of an electric motor “Increasing Market Share Through Improved Product and Process Design: An Experimental Approach,” Quality Engineering, 1991: 361–369. In particular, it is desirable to keep motor noise and vibration to a minimum. To study the effect that the brand of bearing has on motor vibra- tion, five different motor bearing brands were examined by installing each type of bearing on different random samples a 5 .05 Assuming also that , does the data indicate that true average DNA content is affected by the type of car- bohydrate in the diet? Construct an ANOVA table and use a .05 level of significance. 39. Referring to Exercise 38, construct a t CI for which measures the difference between the average DNA content for the starch diet and the combined average for the four other diets. Does the resulting interval include zero? 40. Refer to Exercise 38. What is b for the test when true aver- age DNA content is identical for three of the diets and falls below this common value by 1 standard deviation s for the other two diets? 41. Four laboratories 1–4 are randomly selected from a large population, and each is asked to make three determinations u 5 m 1 2 m 2 1 m 3 1 m 4 1 m 5 4 g gx ij 2 5 183.4 Carbohydrate Starch 2.58 Sucrose 2.63 Fructose 2.13 Glucose 2.41 Maltose 2.49 x i. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 43. Let be numbers satisfying . Then is called a contrast in the m i ’s. Notice that with , which implies that every pairwise differ- ence between m i ’s is a contrast so is, e.g., . A method attributed to Scheffé gives simultaneous CIs with simultaneous confidence level for all possible contrasts an infinite number of them. The interval for is Using the critical flicker frequency data of Exercise 42, calculate the Scheffé intervals for the contrasts , and this last contrast compares blue to the average of brown and green. Which contrasts appear to differ significantly from 0, and why? 44. Four types of mortars—ordinary cement mortar OCM, polymer impregnated mortar PIM, resin mortar RM, and polymer cement mortar PCM—were subjected to a compression test to measure strength MPa. Three strength observations for each mortar type are given in the article “Polymer Mortar Composite Matrices for Maintenance- Free Highly Durable Ferrocement” J. of Ferrocement, 1984: 337–345 and are reproduced here. Construct an ANOVA table. Using a .05 significance level, determine whether the data suggests that the true mean strength is not the same for all four mortar types. If you determine that the true mean strengths are not all equal, use Tukey’s method to identify the significant differences. OCM 32.15 35.53 34.20 PIM 126.32 126.80 134.79 RM 117.91 115.02 114.58 PCM 29.09 30.87 29.80 45. Suppose the ’s are “coded” by . How does the value of the F statistic computed from the ’s compare to the value computed from the ’s? Justify your assertion. 46. In Example 10.11, subtract from each observation in the i th sample to obtain a set of 18 residuals. Then construct a normal probability plot and comment on the plausibility of the normality assumption. i 5 1, c , 6 x i . x ij y ij y ij 5 cx ij 1 d x ij .5m 1 1 .5m 2 2 m 3 m 1 2 m 2 , m 1 2 m 3 , m 2 2 m 3 g c i x i 6 g c i 2 J i 12 [I 2 1 MSE F a , I21,n2I ] 12 gc i m i 1001 2 a m 1 2 .5m 2 2 .5m 3 5 m 1 2 m 2 gc i m i c 1 5 1, c 2 5 2 1, c 3 5 c5 c I 5 gc i m i 5 c 1 m 1 1 c 1 c I m I gc i 5 c 1 , c 2 , c , c I of the percentage of methyl alcohol in specimens of a com- pound taken from a single batch. Based on the accompany- ing data, are differences among laboratories a source of variation in the percentage of methyl alcohol? State and test the relevant hypotheses using significance level .05. 1: 85.06 85.25 84.87 2: 84.99 84.28 84.88 3: 84.48 84.72 85.10 4: 84.10 84.55 84.05 42. The critical flicker frequency cff is the highest frequency in cyclessec at which a person can detect the flicker in a flickering light source. At frequencies above the cff, the light source appears to be continuous even though it is actually flickering. An investigation carried out to see whether true average cff depends on iris color yielded the following data based on the article “The Effects of Iris Color on Critical Flicker Frequency,” J. of General Psych., 1973: 91–95: Bibliography Miller, Rupert, Beyond ANOVA: The Basics of Applied Statistics, Wiley, New York, 1986. An excellent source of information about assumption checking and alternative methods of analysis. Montgomery, Douglas, Design and Analysis of Experiments 7th ed., Wiley, New York, 2009. A very up-to-date presentation of ANOVA models and methodology. Neter, John, William Wasserman, and Michael Kutner, Applied Linear Statistical Models 5th ed., Irwin, Homewood, IL, 2004. The second half of this book contains a very well-presented survey of ANOVA; the level is comparable to that of the present text, but the discussion is more comprehensive, making the book an excellent reference. Ott, R. Lyman and Michael Longnecker. An Introduction to Statistical Methods and Data Analysis 6th ed., Duxbury Press, Belmont, CA, 2010. Includes several chapters on ANOVA methodology that can profitably be read by students desiring a very nonmathematical exposition; there is a good chapter on various multiple comparison methods. Iris Color 1. Brown

2. Green 3. Blue

26.8 26.4 25.7 27.9 24.2 27.2 23.7 28.0 29.9 25.0 26.9 28.5 26.3 29.1 29.4 24.8 28.3 25.7 24.5 J i 8 5 6 204.7 134.6 169.0 25.59 26.92 28.17 x 5 508.3 n 5 19 x i x i. a. State and test the relevant hypotheses at significance level .05 by using the F table to obtain an upper andor lower bound on the P-value. [Hint: and .] b. Investigate differences between iris colors with respect to mean cff. CF 5 13,598.36 g gx ij 2 5 13,659.67 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 419 11 Multifactor Analysis of Variance INTRODUCTION In the previous chapter, we used the analysis of variance ANOVA to test for equality of either I different population means or the true average responses associated with I different levels of a single factor alternatively referred to as I different treatments. In many experimental situations, there are two or more factors that are of simultaneous interest. This chapter extends the methods of Chapter 10 to investigate such multifactor situations. In the first two sections, we concentrate on the case of two factors. We will use I to denote the number of levels of the first factor A and J to denote the number of levels of the second factor B. Then there are IJ possible combi- nations consisting of one level of factor A and one of factor B. Each such com- bination is called a treatment, so there are IJ different treatments. The number of observations made on treatment i, j will be denoted by . In Section 11.1, we consider . An important special case of this type is a randomized block design, in which a single factor A is of primary interest but another fac- tor, “blocks,” is created to control for extraneous variability in experimental units or subjects. Section 11.2 focuses on the case , with brief mention of the difficulties associated with unequal ’s. Section 11.3 considers experiments involving more than two factors. When the number of factors is large, an experiment consisting of at least one observation for each treatment would be expensive and time consuming. One frequently encountered situation, which we discuss in Section 11.4, is that in which there are p factors, each of which has two levels. There are then 2 p dif- ferent treatments. We consider both the case in which observations are made on all these treatments a complete design and the case in which observations are made for only a selected subset of treatments an incomplete design. K ij K ij 5 K . 1 K ij 5 1 K ij Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.