More on Single-Factor ANOVA
10.3 More on Single-Factor ANOVA
We now briefly consider some additional issues relating to single-factor ANOVA. These include an alternative description of the model parameters, b for the F test, the relationship of the test to procedures previously considered, data transformation,
a random effects model, and formulas for the case of unequal sample sizes.
the AnoVA Model
The assumptions of single-factor ANOVA can be described succinctly by means of the “model equation”
X ij 5m i 1e ij
where e ij represents a random deviation from the population or true treatment mean m i . The e ij ’s are assumed to be independent, normally distributed rv’s (implying that the X ij ’s are also) with E(e ij ) 5 0 [so that E(X )5m
i ] and V(e ij )5s [from which
ij
V (X )5s 2 ij for every i and j]. An alternative description of single-factor ANOVA will give added insight and suggest appropriate generalizations to models involving more than one factor. Define a parameter m by
I i5 o
and the parameters a 1 ,…, a I by
a i 5m i 2m (i 5 1,…, I)
Then the treatment mean m i can be written as m 1 a i , where m represents the true average overall response in the experiment, and a i is the effect, measured as a depar- ture from m, due to the ith treatment. Whereas we initially had I parameters, we now
have I 1 1 (m, a 1 ,…, a I ). However, because oa i 5 0 (the average departure from
the overall mean response is zero), only I of these new parameters are independently
10.3 More on Single-Factor aNOVa 427
determined, so there are as many independent parameters as there were before. In terms of m and the a i ’ s, the model becomes
X ij 5m1a 1 1e ij (i 5 1,…, I; j5 1,…, J)
In Chapter 11, we will develop analogous models for multifactor ANOVA. The claim that the m i ’s are identical is equivalent to the equality of the a i ’s, and because
oa i 5 0, the null hypothesis becomes
H 0 :a 1 5a 2 5…5a I 5 0
Recall that MSTr is an unbiased estimator of s 2 when H 0 is true but otherwise
tends to overestimate s 2 . Here is a more precise result:
J
E (MSTr) 5 s 2
I2 1 o
1 a i 2
When H 0 is true, oa i 2 5 0 so E(MSTr) 5 s 2 (MSE is unbiased whether or not H 0 is true). If oa i 2 is used as a measure of the extent to which H 0 is false, then a larger value of oa i 2 will result in a greater tendency for MSTr to overestimate s 2 . In the
next chapter, formulas for expected mean squares for multifactor models will be used to suggest how to form F ratios to test various hypotheses.
Proof of the Formula for E(MStr) For any rv Y, E(Y 2 ) 5 V(Y) 1 [E(Y)] 2 , so
X .. 5 E ( X 2 ) 2 E(X 2 1 ..)
J o i IJ 2 J o i? i IJ
E(SSTr) 5 E
i?
X 2
J o i?
i?
5 {V(X ) 1 [E(X ) 2 ]} 2 {V(X..) 1 [E(X..)] 2 }
i
IJ
J o i
5 {Js 2 1 [J(m 1 a i )] 2 }2 [IJs 2 1 (IJm) 2 ]
IJ
o i i
o i
5 Is 2 1 IJm 2 1 2mJ a o i 1 J a i 2 2s 2 IJm 2
i
5 (I 2 1)s 2 1 J a 2 (since a 5 o 0) i
The result then follows from the relationship MSTr 5 SSTr y(I 2 1).
n
b for the F test
Consider a set of parameter values a 1 ,a 2 ,…, a I for which H 0 is not true. The prob- ability of a type II error, b, is the probability that H 0 is not rejected when that set is
the set of true values. One might think that b would have to be determined separately for each different configuration of a i ’s. Fortunately, since b for the F test depends
on the a i ’s and s 2 only through oa i 2 ys 2 , it can be simultaneously evaluated for many different alternatives. For example, oa 2 i 5 4 for each of the following sets of a i ’s for
which H 0 is false, so b is identical for all three alternatives:
1. a 1 52 1, a 2 52 1, a 3 5 1, a 4 5 1
2. a 1 52 Ï2, a 2 5 Ï2, a 3 5 0, a 4 5 0
3. a 1 52 Ï3, a 2 5 Ï1y3, a 3 5 Ï1y3, a 4 5 Ï1y3
The quantity J oa i 2 ys 2 is called the noncentrality parameter for one-way ANOVA (because when H 0 is false the test statistic has a noncentral F distribution
with this as one of its parameters), and b is a decreasing function of the value of this
parameter. Thus, for fixed values of s 2 and J, the null hypothesis is more likely to be
428 Chapter 10 the analysis of Variance
rejected for alternatives far from H
0 (large oa i ) than for alternatives close to H 0 . For
a fixed value of
oa 2
i , b decreases as the sample size J on each treatment increases,
and it increases as the variance s 2 increases (since greater underlying variability
makes it more difficult to detect any given departure from H 0 ).
Because hand computation of b and sample size determination for the F test are quite difficult (as in the case of t tests), statisticians have constructed sets of
curves from which b can be obtained. Sets of curves for numerator df n 1 5 3 and
n 1 5 4 are displayed in Figure 10.6 and Figure 10.7, respectively. After the values of s 2 and the a i ’s for which b is desired are specified, these are used to compute the
value of f, where f 2 5 (J
2 yI)oa 2
i ys . We then enter the appropriate set of curves at
1 2 3 f (for 5 .05) a
f (for 5 .01) a 1 2 3 4 5
Figure 10.6 Power curves for the ANOVA
F test (n 1 5 3)
1 2 3 f (for 5 .05) a
f (for 5 .01) a 1 2 3 4 5
Figure 10.7 Power curves for the ANOVA
F test (n 1 5 4)
From E. S. Pearson and H. O. Hartley, “Charts of the Power Function for Analysis of Variance Tests, Derived from the Non-central F Distribution,” Biometrika, vol. 38, 1951: 112.
10.3 More on Single-Factor aNOVa 429
the value of f on the horizontal axis, move up to the curve associated with error df
n 2 , and move over to the value of power on the vertical axis. Finally, b 5 1 2 power.
ExamplE 10.8
The effects of four different heat treatments on yield point (tonsin 2 ) of steel ingots are
to be investigated. A total of eight ingots will be cast using each treatment. Suppose the true standard deviation of yield point for any of the four treatments is s 5 1. How
likely is it that H 0 will not be rejected at level .05 if three of the treatments have the
same expected yield point and the other treatment has an expected yield point that is
1 tonin 2 greater than the common value of the other three (i.e., the fourth yield is on average 1 standard deviation above those for the first three treatments)?
Suppose that m 1 5m 2 5m 3 and m 4 5m 1 1, m 5 ( om i ) y4 5 m 1 y4. Then a 1 5m 1 2m52 1 y4, a 2 52 1 y4, a 3 52 1 y4, a 4 5 3 y4, so
2 8 1 1 1 3 f 3 5 2 1 2 1 2 1 5
and f 5 1.22. Degrees of freedom for the F test are n 1 5 I2 1 5 3 and n 2 5 I (J 2 1) 5 28, so interpolating visually between n 2 5 20 and n 2 5 30 gives
power < .47 and b < .53. This b is rather large, so we might decide to increase the value of J. How many ingots of each type would be required to yield b < .05 for the alternative under consideration? By trying different values of J, it can be verified that J5
24 will meet the requirement, but any smaller J will not.
n
As an alternative to the use of power curves, the SAS statistical software pack- age has a function that calculates the cumulative area under a noncentral F curve
(inputs F a , numerator df, denominator df, and f 2 ), and this area is b. Minitab does
this and also something rather different. The user is asked to specify the maximum difference between m i ’s rather than the individual means. For example, we might
wish to calculate the power of the test when I 5 4, m 1 5 100, m 2 5 101, m 3 5 102,
and m 4 5 106. Then the maximum difference is 106 2 100 5 6. However, the power depends not only on this maximum difference but on the values of all the m i ’s. In this
situation Minitab calculates the smallest possible value of power subject to m 1 5 100
and m 4 5 106, which occurs when the two other m’s are both halfway between 100 and 106. If this power is .85, then we can say that the power is at least .85 and b is at most .15 when the two most extreme m’s are separated by 6 (the common sample size, a, and s must also be specified). The software will also determine the neces- sary common sample size if maximum difference and minimum power are specified.
relationship of the F test to the t test
When the number of treatments or populations is I 5 2, all formulas and results con-
nected with the F test still make sense, so ANOVA can be used to test H 0 :m 1 5m 2 versus H a :m 1 Þm 2 . In this case, a two-tailed, two-sample t test can also be used.
In Section 9.3, we mentioned the pooled t test, which requires equal variances, as an alternative to the two-sample t procedure. It can be shown that the single-factor ANOVA F test and the two-tailed pooled t test are equivalent; for any given data set, the P-values for the two tests will be identical, so the same conclusion will be reached by either test.
The two-sample t test is more flexible than the F test when I 5 2 for two rea-
sons. First, it is valid without the assumption that s 1 5s 2 ; second, it can be used to test H a :m 1 .m 2 (an upper-tailed t test) or H a :m 1 ,m 2 as well as H a :m 1 Þm 2 . In the
case of I 3, there is unfortunately no general test procedure known to have good properties without assuming equal variances.
430 Chapter 10 the analysis of Variance
unequal Sample Sizes
When the sample sizes from each population or treatment are not equal, let J 1 ,J 2 ,…, J I denote the I sample sizes, and let n 5 o i J i denote the total number of observations. The accompanying box gives ANOVA formulas and the test procedure.
1 o 2 o
5 X o 2 o ij X ..
SSTr 5 (X 2 X ..) 2 5 X 2 X .. o 2 o i? df 5 I 2 1
I J i
i5 1 j5 1 i5 o 1 J
o i
i
I
o (J o o
5 SST 2 SSTr Test statistic:
F5 where MSTr 5
Statistical theory says that the test statistic has an F distribution with numera- tor df I 2 1 and denominator df n 2 I when H 0 is true. As in the case of equal sample sizes, the larger the value of F, the stronger is the evidence against H 0 . Therefore the test is upper-tailed; the P-value is the area under the F I2 1, n 2 I curve to the right of f.
ExamplE 10.9
The article “On the Development of a New Approach for the Determination of Yield
Strength in Mg-based Alloys” (Light Metal Age, Oct. 1998: 51–53) presented the following data on elastic modulus (GPa) obtained by a new ultrasonic method for speci- mens of a certain alloy produced using three different casting processes.
J i x i? x i?
Permanent molding 45.5 45.3 45.4 44.4 44.6 43.9 44.6 44.0 8 357.7 44.71 Die casting 44.2 43.9 44.7 44.2 44.0 43.8 44.6 43.1 8 352.5 44.06 Plaster molding 46.0 45.9 44.8 46.2 45.1 45.5
Let m 1 ,m 2 , and m 3 denote the true average elastic moduli for the three different pro- cesses under the given circumstances. The relevant hypotheses are H 0 :m 1 5m 2 5m 3 versus H a : at least two of the m i ’s are different. The test statistic is, of course, F5 MSTr yMSE, based on I 2 1 5 2 numerator df and n 2 I 5 22 2 3 5 19 denominator df. Relevant quantities include
x 2 5 oo 43,998.73 ij CF 5
The remaining computations are displayed in the accompanying ANOVA table. Since F .001,2,19 5 10.16 , 12.56 5 f , the P-value is smaller than .001. Thus the null
10.3 More on Single-Factor aNOVa 431
hypothesis should be rejected at any reasonable significance level; there is compelling evidence for concluding that a true average elastic modulus somehow depends on which casting process is used.
Sum of
Mean
Source of Variation
df Squares Square f
There is more controversy among statisticians regarding which multiple compari- sons procedure to use when sample sizes are unequal than there is in the case of equal sample sizes. The procedure that we present here is recommended in the excellent book Beyond ANOVA: Basics of Applied Statistics (see the chapter bib-
liography) for use when the I sample sizes J 1 ,J 2 ,…J I are reasonably close to one
another (“mild imbalance”). It modifies Tukey’s method by using averages of pairs of 1J i ’s in place of 1J.
w ij 5 Q a ,I,n2I ?
Then the probability is approximately 1 2 a that
X i? 2 X j? 2 w ij m i 2m j X i? 2 X j? 1 w ij
for every i and j (i 5 1,…, I and j 5 1,…, I) with i Þ j.
The simultaneous confidence level 100(1 2 a) is only approximate rather than exact as it is with equal sample sizes. Underscoring can still be used, but now the w ij factor used to decide whether x i . and x j? can be connected will depend on J i and J j .
ExamplE 10.10 The sample sizes for the elastic modulus data were J 1 5 8, J 2 5 8, J 3 5 6, and
(Example 10.9
I5 3, n 2 I 5 19, MSE 5 .316. A simultaneous confidence level of approximately
continued)
95 requires Q .05,3,19 5 3.59, from which
Î 2 1 8 8 2
w 12 5 3.59 1 5 .713, w 13 5 .771 w 23 5 .771
Since x 1? 2 x 2? 5 44.71 2 44.06 5 .65 , w 12 ,m 1 and m 2 are judged not sig- nificantly different. The accompanying underscoring scheme shows that m 1 and m 3
appear to differ significantly, as do m 2 and m 3 .
2. Die 1. Permanent
data transformation
The use of ANOVA methods can be invalidated by substantial differences in the vari-
ances s 2 , …, s 2 (which until now have been assumed equal with common value s 1 2 I ).
It sometimes happens that V(X
ij )5s i 5 g (m i ), a known function of m i (so that when
432 Chapter 10 the analysis of Variance
H 0 is false, the variances are not equal). For example, if X ij has a Poisson distribution
with parameter l i (approximately normal if l i 10), then m i 5l i and s i 2 5l i , so
g (m i )5m i is the known function. In such cases, one can often transform the X ij ’s to h(X ij ) so that they will have approximately equal variances (while leaving the transformed variables approximately normal), and then the F test can be used on the transformed observations. The key idea in choosing h(?) is that often V[h(X ij )] <
V (X ij ) ? [h9(m i )] 2 5 g (m i ) ? [h9(m i )] 2 . We now wish to find the function h(?) for which
g (m i ) ? [h9(m i )] 2 5 c (a constant) for every i.
pRoposition If V(X ij ) 5 g(m i ), a known function of m i , then a transformation h(X ij ) that “stabilizes the variance” so that V[h(X ij )] is approximately the same for each i
is given by h(x) ~ [g(x)] 2 1 y2 dx .
In the Poisson case, g(x) 5 x, so h(x) should be proportional to x 2 1 y2 dx 5 2x 1 y2 .
Thus Poisson data should be transformed to h(x ij )5 Ïx ij before the analysis.