Graphical Methods and Model Checking
13.9 Graphical Methods and Model Checking
In several chapters, we make reference to graphical procedures displaying data and analytical results. In early chapters, we used stem-and-leaf and box-and-whisker plots as visuals to aid in summarizing samples. We used similar diagnostics to better understand the data in two sample problems in Chapter 10. In Chapter 11 we introduced the notion of residual plots to detect violations of standard assump- tions. In recent years, much attention in data analysis has centered on graphical
13.9 Graphical Methods and Model Checking 541 methods. Like regression, analysis of variance lends itself to graphics that aid in
summarizing data as well as detecting violations. For example, a simple plotting of the raw observations around each treatment mean can give the analyst a feel for variability between sample means and within samples. Figure 13.7 depicts such a plot for the aggregate data of Table 13.1. From the appearance of the plot one may even gain a graphical insight into which aggregates (if any) stand out from the others. It is clear that aggregate 4 stands out from the others. Aggregates 3 and 5 certainly form a homogeneous group, as do aggregates 1 and 2.
Aggregate Figure 13.7: Plot of data around the mean for the Figure 13.8: Plot of residuals for five aggregates,
aggregate data of Table 13.1.
using data in Table 13.1.
As in the case of regression, residuals can be helpful in analysis of variance in providing a diagnostic that may detect violations of assumptions. To form the residuals, we merely need to consider the model of the one-factor problem, namely
y ij =μ i +ǫ ij .
It is straightforward to determine that the estimate of μ i is ¯ y i. . Hence, the ijth residual is y ij − ¯y i. . This is easily extendable to the randomized complete block model. It may be instructive to have the residuals plotted for each aggregate in order to gain some insight regarding the homogeneous variance assumption. This plot is shown in Figure 13.8.
Trends in plots such as these may reveal difficulties in some situations, par- ticularly when the violation of a particular assumption is graphic. In the case of Figure 13.8, the residuals seem to indicate that the within-treatment variances are reasonably homogeneous apart from aggregate 1. There is some graphical evidence that the variance for aggregate 1 is larger than the rest.
What Is a Residual for an RCB Design?
The randomized complete block design is another experimental situation in which graphical displays can make the analyst feel comfortable with an “ideal picture” or
542 Chapter 13 One-Factor Experiments: General perhaps highlight difficulties. Recall that the model for the randomized complete
block design is
j = 1, . . . , b, with the imposed constraints
y ij =μ+α i +β j +ǫ ij ,
To determine what indeed constitutes a residual, consider that
α i =μ i. − μ,
β j =μ .j −μ
and that μ is estimated by ¯ y .. ,μ i. is estimated by ¯ y i. , and μ .j is estimated by ¯ y .j . As a result, the predicted or fitted value ˆ y ij is given by
ˆ y ij =ˆ μ+ˆ α i +ˆ β j =¯ y i. +¯ y .j − ¯y .. ,
and thus the residual at the (i, j) observation is given by
y ij − ˆy ij =y ij − ¯y i. − ¯y .j +¯ y .. .
Note that ˆ y ij , the fitted value, is an estimate of the mean μ ij . This is consistent with the partitioning of variability given in Theorem 13.3, where the error sum of squares is
SSE =
(y ij − ¯y i. − ¯y .j +¯ y .. ) 2 .
The visual displays in the randomized complete block design involve plotting the residuals separately for each treatment and for each block. The analyst should expect roughly equal variability if the homogeneous variance assumption holds. The reader should recall that in Chapter 12 we discussed plotting residuals for the purpose of detecting model misspecification. In the case of the randomized complete block design, the serious model misspecification may be related to our assumption of additivity (i.e., no interaction). If no interaction is present, a random pattern should appear.
Consider the data of Example 13.6, in which treatments are four machines and blocks are six operators. Figures 13.9 and 13.10 give the residual plots for separate treatments and separate blocks. Figure 13.11 shows a plot of the residuals against the fitted values. Figure 13.9 reveals that the error variance may not be the same for all machines. The same may be true for error variance for each of the six operators. However, two unusually large residuals appear to produce the apparent difficulty. Figure 13.11 is a plot of residuals that shows reasonable evidence of random behavior. However, the two large residuals displayed earlier still stand out.
13.10 Data Transformations in Analysis of Variance 543 2.5
Operators Figure 13.9: Residual plot for the four machines for Figure 13.10: Residual plot for the six operators
Machines
the data of Example 13.6. for the data of Example 13.6.
Figure 13.11: Residuals plotted against fitted values for the data of Example 13.6.