Three-Factor ANOVA
11.3 Three-Factor ANOVA
To indicate the nature of models and analyses when ANOVA experiments involve more than two factors, we will focus here on the case of three fixed factors—A, B, and C. The numbers of levels of these factors will be denoted by I, J, and K, respec-
tively, and L ijk 5 the number of observations made with factor A at level i, factor B
at level j, and factor C at level k. The analysis is quite complicated when the L ijk ’s
are not all equal, so we further specialize to L ijk 5 L . Then X ijkl and x ijkl denote the
observed value, before and after the experiment is performed, of the lth replication sl 5 1, 2,…, Ld when the three factors are fixed at levels i, j, and k.
To understand the parameters that will appear in the three-factor ANOVA
model, first recall that in two-factor ANOVA with replications, E(X ijk )5m ij 5m 1
a i 1b j 1g ij , where the restrictions o i a i 5 o j b j 5 0, o i g ij 5 0 for every j, and
o j g ij 5 0 for every i were necessary to obtain a unique set of parameters. If we use dot subscripts on the m ij ’s to denote averaging (rather than summation), then
m i. 2m ?? 5 o m ij 2 o m o ij 5a i
J j
IJ i j
is the effect of factor A at level i averaged over levels of factor B, whereas
m ij 2m .j 5m ij 2 o m ij 5a i 1g ij
I i
is the effect of factor A at level i specific to factor B at level j. When the effect of A at level i depends on the level of B, there is interaction between the factors, and the
g ij ’s are not all zero. In particular,
m ij 2m .j 2m i. 1m ?? 5g ij (11.11)
the Fixed Effects Model and test Procedures
The fixed effects model for three-factor ANOVA with L ijk 5 L is
X ijkl 5m ijk 1« ijkl i5 1,…, I, j 5 1,…, J
k 5 1,…, K, l 5 1,…, L
where the e ijkl ’s are normally distributed with mean 0 and variance s 2 , and m ijk 5m1a i 1b i 1d k 1g ij AB 1g ik AC 1g jk BC 1g ijk (11.13)
The restrictions necessary to obtain uniquely defined parameters are that the sum over any subscript of any parameter on the right-hand side of (11.13) equal 0.
The parameters g AB ij ,g AC , and g ik BC jk are called two-factor interactions, and g ijk
is called a three-factor interaction; the a i ’s, b j ’s, and d k ’s are the main effects param- eters. For any fixed level k of the third factor, analogous to (11.11),
is the interaction of the ith level of A with the jth level of B specific to the kth level of C, whereas
m 2m 2m 1m 5g ij. AB i.. .j. ??? ij
11.3 three-Factor aNOVa 461
is the interaction between A at level i and B at level j averaged over levels of C. If the interaction of A at level i and B at level j does not depend on k, then all g ijk ’s equal
0. Thus nonzero g ijk ’s represent nonadditivity of the two-factor g ij AB ’s over the various
levels of the third factor C. If the experiment included more than three factors, there would be corresponding higher-order interaction terms with analogous interpreta- tions. Note that in the previous argument, if we had considered fixing the level of either A or B (rather than C, as was done) and examining the g ijk ’s, their interpretation would be the same; if any of the interactions of two factors depend on the level of the third factor, then there are nonzero g ijk ’s.
When L . 1, there is a sum of squares for each main effect, each two-factor interaction, and the three-factor interaction. To write these in a way that indicates how sums of squares are defined when there are more than three factors, note that any of the model parameters in (11.13) can be estimated unbiasedly by averaging
X ijkl over appropriate subscripts and taking differences. Thus
mˆ 5 X ???? aˆ i 5 X i??? 2 X ???? gˆ ij AB 5 X ij?? 2 X i??? 2 X ?j?? 1 X ????
gˆ ijk 5 X ijk? 2 X ij?? 2 X i?k? 2 X ?jk? 1 X i??? 1 X ?j?? 1 X ?? k? 2 X ????
with other main effects and interaction estimators obtained by symmetry.
DEFINITION
Relevant sums of squares are
SST 5
(X 2 X ) 2 o df 5 IJKL 2 1
i o j o o
aˆ 2 5 JKL (X 2 X ) 2 o df 5 I 2 1 o i i??? ????
i o j k o l
o i
SSAB 5
(gˆ AB ) ij 2 o df 5 (I 2 1)(J 2 1)
i o j o k o l
5 KL
(X ij.. 2 X i... 2 X 1 X ) o 2 .j... ????
i o j
SSABC 5
gˆ 2 5 L
gˆ o 2 o o o ijk o o o ijk df 5 (I 2 1)(J 2 1) (K 2 1)
(X ijkl 2 X ) 2 o df 5 IJK(L 2 1)
i o o o
with the remaining main effect and two-factor interaction sums of squares obtained by symmetry. SST is the sum of the other eight SSs.
Each sum of squares (excepting SST) when divided by its df gives a mean square. Expected mean squares are
E(MSE) 5 s 2
JKL
E(MSA) 5 s 2
I2 1 o i i
1 a 2
(I 2 1)(J 2 1) o i o ij j
2 E(MSAB) 5 s KL 1 (g AB ) 2
(I 2 1)(J 2 1)(K 2 1) o i o j o ijk k
L
E(MSABC) 5 s 2 1 (g ) 2
with similar expressions for the other expected mean squares. Main effect and inter- action hypotheses are tested by forming F ratios with MSE in each denominator.
462 Chapter 11 Multifactor analysis of Variance
null Hypothesis test Statistic Value P-Value determination
MSA
H 0A : all a i ’s 5 0
I2 1, IJK(L21) curve to the right of f A
f A 5 Area under the F
AB f 5 Area under the F
MSE
(I21)(J21), IJK(L21)
curve to the right of f AB
MSABC
H 0ABC : all g ijk ’s 5 0 f ABC 5 Area under the
MSE
F (I21)(J21) (K21), IJK(L21) curve to
the right of f ABC
Usually the main effect hypotheses are tested only if all interactions are judged not significant.
This analysis assumes that L ijk 5 L.
1. If L 5 1, then as in the two-factor
case, the highest-order interactions must be assumed absent to obtain an MSE that
estimates s 2 . Setting L 5 1 and disregarding the fourth subscript summation over l, the foregoing formulas for sums of squares are still valid, and error sum of squares is SSE 5 o i o
j o k gˆ ijk with X ijk? 5 X ijk in the expression for gˆ ijk .
ExamplE 11.10 There has been increased interest in recent years in renewable fuels such as biodiesel,
a form of diesel fuel derived from vegetable oils and animal fats. Advantages over petroleum diesel include nontoxicity, biodegradability, and lower greenhouse gas
emissions. The article “Application of the Full Factorial Design to Optimization
of Base-Catalyzed Sunflower Oil Ethanolysis” (Fuel, 2013: 433–442) reported on an investigation of three factors on the purity () of the biodiesel fuel fatty acid ethyl ester (FAEE). The factors and levels are as follows:
A :
Reaction temperature
25ºC, 50ºC, 75ºC
B :
Ethanol-to-oil molar ratio
C :
Catalyst loading
.75 wt., 1.00 wt., 1.25 wt.
The data appears in Table 11.8, where I 5 J 5 K 5 3 and L 5 2. Table 11.8 Purity () data for Example 11.10
The resulting ANOVA table is shown in Table 11.9. The P-value for testing H 0ABC is .165, which is larger than any sensible significance level. This null hypothesis there- fore cannot be rejected; it appears that the extent of interaction between any pair of factors is the same for each level of the remaining factor.
Figure 11.8 shows two-factor interaction plots. For example, the dots in the plot appearing in the C row and B column represent the x. jk? ’s—that is, the observa- tions averaged over the levels of the first factor for each combination of levels of the
11.3 three-Factor aNOVa 463
Table 11.9 ANOVA Table for the Purity Data of Example 11.10
second and third factors. The bottom three dots connected by solid line segments represent the third level of factor C at each level of factor B. The fact that connected line segments are quite close to being parallel is evidence for the absence of BC interactions, and indeed the P-value in Table 11.9 for testing this null hypothesis is .360. However, the P-values for testing H 0AB and H 0AC are .020 and .000, respec- tively. So at significance level .05, we are forced to conclude that AB interactions and
AC interactions are present. The line segments in the AC interaction plot are clearly not close to being parallel. It appears from the interaction plots that expected purity will be maximized when all factors are at their highest levels. As it happens, this is also the message from the main effects plots, but those cannot generally be trusted when interactions are present.
Interaction Plot for Purity
Data Means 1 2 3
Figure 11.8 Interaction and main effect plots from MINITAB for Example 11.10
464 Chapter 11 Multifactor analysis of Variance
Main Effects Plot for Purity
Data Means A B C
Figure 11.8 (continued)
n
Diagnostic plots for checking the normality and constant variance assumptions can be constructed as described in previous sections. Tukey’s procedure can be used in three-factor (or more) ANOVA. The second subscript on Q is the number of sam- ple means being compared, and the third is degrees of freedom for error.
Models with random and mixed effects are also sometimes appropriate. Sums of squares and degrees of freedom are identical to the fixed effects case, but expected mean squares are, of course, different for the random main effects or interactions. A good reference is the book by Douglas Montgomery listed in the chapter bibliography.