Two-Way ANOVA

4.5.3 Two-Way ANOVA

In the two-way ANOVA test we consider that the variable being tested, X, is categorised by two independent factors, say Factor 1 and Factor 2. We say that X depends on two factors: Factor 1 and Factor 2.

Assuming that Factor 1 has c categories and Factor 2 has r categories, and that there is only one random observation for every combination of categories of the factors, we get the situation shown in Table 4.19. The means for the Factor 1

categories are denoted x 1 . , x 2 . , ..., x c . . The means for the Factor 2 categories are denoted x . 1 , x . 2 , ..., x . r . The total mean for all observations is denoted x .. .

Note that the situation shown in Table 4.19 constitutes a generalisation to multiple samples of the comparison of means for two paired samples described in section 4.4.3.3. One can, for instance, view the cases as being paired according to Factor 2 and compare the means for Factor 1. The inverse situation is, of course, also possible.

Table 4.19. Two-way ANOVA dataset showing the means along the columns, along the rows and the global mean.

Factor 1 Factor 2

c Mean

1 x 11 x 21 ... x c1 x . 1

2 x 12 x 22 ... x c2 x . 2 ...

... ... ... ... ... rx 1r x 2r ... x cr x . r Mean

x 1 . x 2 . ...

x c . x ..

Following the ANOVA approach of breaking down the total sum of squares (see formulas 4.22 through 4.30), we are now interested in reflecting the dispersion of the means along the rows and along the columns. This can be done as follows:

4.5 Inference on More than Two Populations

SST = ( x

x ∑∑ 2 ij − .. )

r ∑ ( x i . − 2 2 x 2 .. ) + c ∑ ( x . j − x .. ) + ∑∑ ( x ij − x i . − x . j + x .. ) 4.40

= SSC + SSR + SSE .

Besides the term SST described in the previous section, the sums of squares have the following interpretation:

1. SSC represents the sum of squares or dispersion along the columns, as the previous SSB. The variance along the columns is v c = SSC/(c −1), has c−1

2 degrees of freedom and is the point estimate of 2 σ + r σ

2. SSR represents the dispersion along the rows, i.e., is the row version of the previous SSB. The variance along the rows is v r = SSR/(r −1), has r−1

2 degrees of freedom and is the point estimate of 2 σ + c σ

3. SSE represents the residual dispersion or experimental error. The experimental variance associated to the randomness of the experiment is v e = SSE / [(c −1)(r−1)], has (c−1)(r−1) degrees of freedom and is the point estimate of 2 σ.

Note that formula 4.40 can only be obtained when c and r are constant along the rows and along the columns, respectively. This corresponds to the so-called orthogonal experiment.

In the situation shown in Table 4.19, it is possible to consider every cell value as

a random case from a population with mean µ ij , such that:

ij = µ + µ i. + µ .j , with ∑ µ i . = 0 and ∑ µ . j = 0 , 4.41

i.e., the mean of the population corresponding to cell ij is obtained by adding to a global mean µ the means along the columns and along the rows. The sum of the means along the columns as well as the sum of the means along the rows, is zero. Therefore, when computing the mean of all cells we obtain the global mean µ. It is

assumed that the variance for all cell populations is σ 2 .

In this single observation, additive effects model, one can, therefore, treat the effects along the columns and along the rows independently, testing the following null hypotheses:

H 01 : There are no column effects, µ i. = 0.

H 02 : There are no row effects, µ .j = 0.

The null hypothesis H 01 is tested using the ratio v c /v e , which, under the assumptions of independent sampling on normal distributions and with equal

4 Parametric Tests of Hypotheses

variances, follows the F c −1,(c−1)(r−1) distribution. Similarly, and under the same assumptions, the null hypothesis H 02 is tested using the ratio v r /v e and the

F r −1,(c−1)(r−1) distribution. Let us now consider the more general situation where for each combination of column and row categories, we have several values available. This repeated measurements experiment allows us to analyse the data more fully. We assume that the number of repeated measurements per table cell (combination of column and row categories) is constant, n, corresponding to the so-called factorial experiment. An example of this sort of experiment is shown in Figure 4.18.

Now, the breakdown of the total sum of squares expressed by the equation 4.40, does not generally apply, and has to be rewritten as:

SST = SSC + SSR + SSI + SSE,

with:

1. 2 SST = ∑∑∑ ( x

ijk − x ... ) .

i = 1 j == 1 k 1

Total sum of squares computed for all n cases in every combination of the

c × r categories, characterising the dispersion of all cases around the global mean. The cases are denoted x ijk , where k is the case index in each ij cell (one of the c × r categories with n cases).

2. SSC = rn ∑ ( x i .. − x ... ) .

Sum of the squares representing the dispersion along the columns. The variance along the columns is v c = SSC/(c – 1), has c – 1 degrees of freedom

2 and is the point estimate of 2 σ + rn σ

3. 2 SSR = cn ∑ ( x

. j . − x ... ) .

Sum of the squares representing the dispersion along the rows. The variance along the rows is v r = SSR/(r – 1), has r – 1 degrees of freedom and is the

point estimate of 2 cn σ 2 + σ r .

4. Besides the dispersion along the columns and along the rows, one must also consider the dispersion of the column-row combinations, i.e., one must consider the following sum of squares, known as subtotal or model sum of squares (similar to SSW in the one-way ANOVA):

2 SSS = n ∑∑ ( x

ij . − x ... ) .