Green 3. Blue Jay L. Devore Probability and Statistics

419 11 Multifactor Analysis of Variance INTRODUCTION In the previous chapter, we used the analysis of variance ANOVA to test for equality of either I different population means or the true average responses associated with I different levels of a single factor alternatively referred to as I different treatments. In many experimental situations, there are two or more factors that are of simultaneous interest. This chapter extends the methods of Chapter 10 to investigate such multifactor situations. In the first two sections, we concentrate on the case of two factors. We will use I to denote the number of levels of the first factor A and J to denote the number of levels of the second factor B. Then there are IJ possible combi- nations consisting of one level of factor A and one of factor B. Each such com- bination is called a treatment, so there are IJ different treatments. The number of observations made on treatment i, j will be denoted by . In Section 11.1, we consider . An important special case of this type is a randomized block design, in which a single factor A is of primary interest but another fac- tor, “blocks,” is created to control for extraneous variability in experimental units or subjects. Section 11.2 focuses on the case , with brief mention of the difficulties associated with unequal ’s. Section 11.3 considers experiments involving more than two factors. When the number of factors is large, an experiment consisting of at least one observation for each treatment would be expensive and time consuming. One frequently encountered situation, which we discuss in Section 11.4, is that in which there are p factors, each of which has two levels. There are then 2 p dif- ferent treatments. We consider both the case in which observations are made on all these treatments a complete design and the case in which observations are made for only a selected subset of treatments an incomplete design. K ij K ij 5 K . 1 K ij 5 1 K ij Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Example 11.1 When factor A consists of I levels and factor B consists of J levels, there are IJ different combinations pairs of levels of the two factors, each called a treatment. With the number of observations on the treatment consisting of factor A at level i and factor B at level j, we restrict attention in this section to the case , so that the data consists of IJ observations. Our focus is on the fixed effects model, in which the only levels of interest for the two factors are those actually represented in the experiment. Situations in which at least one factor is random are discussed briefly at the end of the section. K ij 5 1 K ij 5 Is it really as easy to remove marks on fabrics from erasable pens as the word erasable might imply? Consider the following data from an experiment to com- pare three different brands of pens and four different wash treatments with respect to their ability to remove marks on a particular type of fabric based on “An Assessment of the Effects of Treatment, Time, and Heat on the Removal of Erasable Pen Marks from Cotton and CottonPolyester Blend Fabrics,” J. of Testing and Evaluation, 1991: 394–397. The response variable is a quantitative indicator of overall specimen color change; the lower this value, the more marks were removed. Washing Treatment 1 2 3 4 Total Average 1 .97 .48 .48 .46 2.39 .598 Brand of Pen 2 .77 .14 .22 .25 1.38 .345 3 .67 .39 .57 .19 1.82 .455 Total 2.41 1.01 1.27 .90 5.59 Average .803 .337 .423 .300 .466 11.1 Two-Factor ANOVA with K ij 5 1 Is there any difference in the true average amount of color change due either to the different brands of pens or to the different washing treatments? ■ As in single-factor ANOVA, double subscripts are used to identify random variables and observed values. Let The ’s are usually presented in a rectangular table in which the various rows are identified with the levels of factor A and the various columns with the levels of factor B . In the erasable-pen experiment of Example 11.1, the number of levels of factor A is , the number of levels of factor B is , and so on. J 5 4, x 13 5 .48, x 22 5 .14 I 5 3 x ij x ij 5 the observed value of X ij held at level i and factor B is held at level j X ij 5 the random variable rv denoting the measurement when factor A is Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Whereas in single-factor ANOVA we were interested only in row means and the grand mean, now we are interested also in column means. Let with observed values , , and . Totals rather than averages are denoted by omitting the horizontal bar so , etc.. Intuitively, to see whether there is any effect due to the levels of factor A, we should compare the observed with one another. Information about the different levels of factor B should come from the . The Fixed Effects Model Proceeding by analogy to single-factor ANOVA, one’s first inclination in specifying a model is to let the true average response when factor A is at level i and factor B at level j. This results in IJ mean parameters. Then let where is the random amount by which the observed value differs from its expectation. The are assumed normal and independent with common variance . Unfortunately, there is no valid test procedure for this choice of parameters. This is because there are parameters the and but only IJ observa- tions, so after using each as an estimate of , there is no way to estimate . The following alternative model is realistic yet involves relatively few parameters. s 2 m ij x ij s 2 m ij ’s IJ 1 1 s 2 P ij ’s P ij X ij 5 m ij 1 P ij m ij 5 x j ’s x i ’s x . j 5 ⌺ i x ij x x j x i . 5 g I i5 1 g J j5 1 X ij IJ X 5 the grand mean when factor B is held at level j the average of measurements obtained X j 5 when factor A is held at level i the average of measurements obtained X i 5 5 g I i5 1 X ij I 5 g J j5 1 X ij J Assume the existence of I parameters and J parameters , such that 11.1 so that 11.2 m ij 5 a i 1 b j X ij 5 a i 1 b j 1 P ij i 5 1, c , I, j 5 1, c , J b 1 , b 2 , c , b J a 1 , a 2 , c , a I Including s 2 , there are now model parameters, so if and , then there will be fewer parameters than observations in fact, we will shortly modify 11.2 so that even andor will be accommodated. The model specified in 11.1 and 11.2 is called an additive model because each mean response is the sum of an effect due to factor A at level i a i and an effect due to factor B at level . The difference between mean responses for j b j m ij J 5 2 I 5 2 J 3 I 3 I 1 J 1 1 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Plotting the observed ’s in a manner analogous to that of Figure 11.1 results in Figure 11.2. Although there is some “crossing over” in the observed ’s, the pattern is reasonably representative of what would be expected under additivity with just one observation per treatment. x ij x ij 1 2 3 4 Levels of A a Levels of B Mean response 1 2 3 4 Levels of A b Levels of B Mean response Figure 11.1 Mean responses for two types of model: a additive; b nonadditive Color change .4 .3 .1 .2 1 2 Washing treatment Brand 1 3 4 .5 .6 .7 .8 .9 1.0 Brand 2 Brand 3 Figure 11.2 Plot of data from Example 11.1 factor A at level i and level when B is held at level j is . When the model is additive, which is independent of the level j of the second factor. A similar result holds for . Thus additivity means that the difference in mean responses for two lev- els of one of the factors is the same for all levels of the other factor. Figure 11.1a shows a set of mean responses that satisfy the condition of additivity. A nonaddi- tive configuration is illustrated in Figure 11.1b. m ij 2 m ij r m ij 2 m i rj 5 a i 1 b j 2 a i r 1 b j 5 a i 2 a i r m ij 2 m i rj i r Example 11.2 Example 11.1 continued ■ Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. By subtracting any constant c from all a i ’s and adding c to all b j ’s, other configura- tions corresponding to the same additive model are obtained. This nonuniqueness is eliminated by use of the following model. Expression 11.2 is not quite the final model description because the a i ’s and b j ’s are not uniquely determined. Here are two different configurations of the a i ’s and b j ’s that yield the same additive : m ij ’s 11.4 versus H aB : at least one b j

2 H

0B : b 1 5 b 2 5 c5 b J 5 versus H aA : at least one a i

2 H

0A : a 1 5 a 2 5 c5 a I 5 No factor A effect implies that all a i ’s are equal, so they must all be 0 since they sum to 0, and similarly for the b j ’s. Test Procedures The description and analysis follow closely that for single-factor ANOVA. There are now four sums of squares, each with an associated number of df: 11.3 where , and the are assumed independent, normally distributed, with mean 0 and common variance s 2 . P ij ’s g I i5 1 a i 5 0, g J j5 1 b j 5 X ij 5 m 1 a i 1 b j 1 P ij m 22 5 6 m 21 5 3 a 2 5 1 m 22 5 6 m 21 5 3 a 2 5 2 m 12 5 5 m 11 5 2 a 1 5 m 12 5 5 m 11 5 2 a 1 5 1 b 2 5 5 b 1 5 2 b 2 5 4 b 1 5 1 This is analogous to the alternative choice of parameters for single-factor ANOVA discussed in Section 10.3. It is not difficult to verify that 11.3 is an additive model in which the parameters are uniquely determined for example, for the mentioned previously: , and . Notice that there are only independently determined a i ’s and independently determined b j ’s. Including m, 11.3 specifies mean parameters. The interpretation of the parameters in 11.3 is straightforward: m is the true grand mean mean response averaged over all levels of both factors, a i is the effect of factor A at level i measured as a deviation from m, and b j is the effect of factor B at level j. Unbiased and maximum likelihood estimators for these parameters are There are two different null hypotheses of interest in a two-factor experiment with . The first, denoted by H 0A , states that the different levels of factor A have no effect on true average response. The second, denoted by H 0B , asserts that there is no factor B effect. K ij 5 1 m ˆ 5 X aˆ i 5 X i 2 X bˆ j 5 X j 2 X I 1 J 2 1 J 2 1 I 2 1 b 2 5 1.5 m 5 4, a 1 5 2 .5, a 2 5 .5, b 1 5 2 1.5 m ij ’s Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Example 11.3 Example 11.2 continued DEFINITION 11.5 The fundamental identity is 11.6 SST 5 SSA 1 SSB 1 SSE SSE 5 g I i5 1 g J j5 1 X ij 2 X i 2 X j 1 X 2 df 5 I 2 1J 2 1 SSB 5 g I i5 1 g J j5 1 X j 2 X 2 5 I g J j5 1 X j 2 X 2 df 5 J 2 1 SSA 5 g I i5 1 g J j5 1 X i 2 X 2 5 J g I i5 1 X i 2 X 2 df 5 I 2 1 SST 5 g I i5 1 g J j5 1 X ij 2 X 2 df 5 IJ 2 1 There are computing formulas for SST, SSA, and SSB analogous to those given in Chapter 10 for single-factor ANOVA. But the wide availability of statistical software has rendered these formulas almost obsolete. The expression for SSE results from replacing m, a i , and b j by their estimators in . Error df is IJ ⫺ number of mean parameters esti- mated . Total variation is split into a part SSE that is not explained by either the truth or the falsity of H 0A or H 0B and two parts that can be explained by possible falsity of the two null hypotheses. Statistical theory now says that if we form F ratios as in single-factor ANOVA, when is true, the corresponding F ratio has an F distribution with numer- ator and denominator . df 5 I 2 1J 2 1 df 5 I 2 1 J 2 1 H 0A H 0B 5 IJ 2 [1 1 I 2 1 1 J 2 1] 5 I 2 1J 2 1 g[X ij 2 m 1 a i 1 b j ] 2 The and for the color-change data are displayed along the margins of the data table given previously. Table 11.1 summarizes the calculations. x j ’s x i ’s Table 11.1 ANOVA Table for Example 11.3 Source of Variation df Sum of Squares Mean Square f Factor A brand Factor B wash treatment Error Total SST 5 .6947 IJ 2 1 5 11 MSE 5 .01447 SSE 5 .0868 I 2 1J 2 1 5 6 f B 5 11.05 MSB 5 .1599 SSB 5 .4797 J 2 1 5 3 f A 5 4.43 MSA 5 .0641 SSA 5 .1282 I 2 1 5 2 Hypotheses Test Statistic Value Rejection Region H 0A versus H aA H 0B versus H aB f B F a ,J21,I21J21 f B 5 MSB MSE f A F a ,I21,I21J21 f A 5 MSA MSE Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. The critical value for testing H 0A at level of significance .05 is . Since cannot be rejected at significance level .05. True average color change does not appear to depend on the brand of pen. Because and , H 0B is rejected at significance level .05 in favor of the assertion that color change varies with washing treatment. A statistical computer package gives P -values of .066 and .007 for these two tests. ■ Plausibility of the normality and constant variance assumptions can be investigated graphically. Define predicted values also called fitted values , and the residuals the differences between the observations and predicted values . We can check the normality assumption with a normal probability plot of the residuals, and the constant variance assumption with a plot of the residuals against the fitted values. Figure 11.3 shows these plots for the data of Example 11.3. x ij 2 xˆ ij 5 x ij 2 x i 2 x j 1 x x 1 x i 2 x 1 x j 2 x 5 x i 1 x j 2 x xˆ ij 5 m ˆ 1 aˆ i 1 bˆ j 5 11.05 4.76 F .05,3,6 5 4.76 4.43 , 5.14, H 0A F .05,2,6 5 5.14

0.0 ⫺0.2

⫺0.1 0.1

0.2 Residual

99 95 90 80 70 60 30 1 40 5 50 20 10 Normal Probability Plot of the Residuals a

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fitted Value

0.15 0.10

0.05 0.0

⫺0.5 ⫺0.10 Residuals Versus the Fitted Values b Percent Residual Figure 11.3 Diagnostic plots from Minitab for Example 11.3 The normal probability plot is reasonably straight, so there is no reason to question normality for this data set. On the plot of the residuals against the fitted val- ues, look for substantial variation in vertical spread when moving from left to right. For example, a narrow range for small fitted values and a wide range for high fitted values would suggest that the variance is higher for larger responses this happens often, and it can sometimes be cured by replacing each observation by its logarithm. Figure 11.3b shows no evidence against the constant variance assumption. Expected Mean Squares The plausibility of using the F tests just described is demonstrated by computing the expected mean squares. For the additive model, EMSB 5 s 2 1 I J 2 1 g J j5 1 b j 2 EMSA 5 s 2 1 J I 2 1 g I i5 1 a i 2 EMSE 5 s 2 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Example 11.4 Example 11.3 continued If H 0A is true, MSA is an unbiased estimator of s 2 , so F is a ratio of two unbiased estimators of s 2 . When H 0A is false, MSA tends to overestimate s 2 . Thus H 0A should be rejected when the ratio F A is too large. Similar comments apply to MSB and H 0B . Multiple Comparisons After rejecting either H 0A or H 0B , Tukey’s procedure can be used to identify signifi- cant differences between the levels of the factor under investigation. 1. For comparing levels of factor A, obtain . For comparing levels of factor B, obtain . 2. Compute Q a ,I,I21J21 2MSEJ for factor A comparisons means being compared w 5 Q estimated standard deviation of the sample Q a ,J,I21J21 Q a ,I,I21J21 Identification of significant differences among the four washing treatments requires and . The four factor B sample means column averages are now listed in increasing order, and any pair differing by less than .340 is underscored by a line segment: Washing treatment 1 appears to differ significantly from the other three treatments, but no other significant differences are identified. In particular, it is not apparent which among treatments 2, 3, and 4 is best at removing marks. ■ Randomized Block Experiments In using single-factor ANOVA to test for the presence of effects due to the I dif- ferent treatments under study, once the IJ subjects or experimental units have been chosen, treatments should be allocated in a completely random fashion. That is, J subjects should be chosen at random for the first treatment, then another sample of J chosen at random from the remaining subjects for the second treat- ment, and so on. It frequently happens, though, that subjects or experimental units exhibit het- erogeneity with respect to other characteristics that may affect the observed responses. Then, the presence or absence of a significant F value may be due to this extraneous variation rather than to the presence or absence of factor effects. This is why paired experiments were introduced in Chapter 9. The analogy to a paired exper- iment when is called a randomized block experiment. An extraneous factor, “blocks,” is constructed by dividing the IJ units into J groups with I units in each I . 2 IJ 2 J x 4 x 2 .300 337 x 3 x 1 .423 .803 w 5 4.902.014473 5 .340 Q .05,4,6 5 4.90 because, e.g., the standard deviation of is . 3. Arrange the sample means in increasing order, underscore those pairs differing by less than w, and identify pairs not underscored by the same line as correspon- ding to significantly different levels of the given factor. s 1J X i Q a ,J,I21J21 MSEI for factor B comparisons e 5 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.