Factorial Experiments in a Regression Setting
15.4 Factorial Experiments in a Regression Setting
Thus far in this chapter, we have mostly confined our discussion of analysis of the data for a 2 k factorial to the method of analysis of variance. The only reference to an alternative analysis resides in Exercise 15.9. Indeed, this exercise introduces much of what motivates the present section. There are situations in which model fitting is important and the factors under study can be controlled. For example,
a biologist may wish to study the growth of a certain type of algae in the water, and so a model that looks at units of algae as a function of the amount of a pollutant and, say, time would be very helpful. Thus, the study involves a factorial experiment in a laboratory setting in which concentration of the pollutant and time are the factors. As we shall discuss later in this section, a more precise model can be fitted if the factors are controlled in a factorial array, with the 2 k factorial often being a useful choice. In many biological and chemical processes, the levels of the regressor variables can and should be controlled.
Recall that the regression model employed in Chapter 12 can be written in matrix notation as
y = Xβ + ǫ.
The X matrix is referred to as the model matrix. Suppose, for example, that a
2 3 factorial experiment is employed with the variables
Pressure (psi): 1000
The familiar +1, −1 levels can be generated through the following centering and scaling to design units:
temperature − 175
humidity − 17.5
pressure − 1250
15.4 Factorial Experiments in a Regression Setting 613 As a result, the X matrix becomes
x 1 x 2 x 3 ⎡ Design Identification ⎤
It is now seen that the contrasts illustrated and discussed in Section 15.2 are directly related to regression coefficients. Notice that all the columns of the X
matrix in our 2 3 example are orthogonal. As a result, the computation of regression coefficients as described in Section 12.3 becomes
c + ac + bc + abc − (1) − a − b − ab where a, ab, and so on, are response measures.
One can now see that the notion of calculated main effects, which has been emphasized throughout this chapter with 2 k factorials, is related to coefficients in
a fitted regression model when factors are quantitative. In fact, for a 2 k with, say, n experimental runs per design point, the relationships between effects and regression coefficients are as follows:
contrast Effect =
2 k−1 (n) contrast
effect
Regression coefficient =
2 This relationship should make sense to the reader, since a regression coefficient
(n)
b j is an average rate of change in response per unit change in x j . Of course, as one goes from −1 to +1 in x j (low to high), the design variable changes by 2 units.
Example 15.2: Consider an experiment where an engineer desires to fit a linear regression of yield y against holding time x 1 and flexing time x 2 in a certain chemical system. All other factors are held fixed. The data in the natural units are given in Table 15.8. Estimate the multiple linear regression model.
Solution : The fitted regression model is
y=b ˆ 0 +b 1 x 1 +b 2 x 2 .
614 Chapter 15 2 k Factorial Experiments and Fractions
Table 15.8: Data for Example 15.2
Holding Time (hr) Flexing Time (hr) Yield (%)
The design units are
holding time − 0.65
flexing time − 0.15
0.15 0.05 and the X matrix is
with the regression coefficients
⎡ (1) + a + b + ab ⎤
36.25 ⎣ b 1 ⎦ = (X ′ X) −1 X ′ y= ⎢ a + ab − (1) − b ⎥ ⎢ = 6.25
Thus, the least squares regression equation is
ˆ y = 36.25 + 6.25x 1 + 2.75x 2 .
This example provides an illustration of the use of the two-level factorial ex- periment in a regression setting. The four experimental runs in the 2 2 design were used to calculate a regression equation, with the obvious interpretation of the regression coefficients. The value b 1 = 6.25 represents the estimated increase in response (percent yield) per design unit change (0.15 hour) in holding time. The value b 2 = 2.75 represents a similar rate of change for flexing time.
Interaction in the Regression Model
The interaction contrasts discussed in Section 15.2 have definite interpretations in the regression context. In fact, interactions are accounted for in regression models by product terms. For example, in Example 15.2, the model with interaction is
y=b 0 +b 1 x 1 +b 2 x 2 +b 12 x 1 x 2 with b 0 ,b 1 ,b 2 as before and
ab + (1) − a − b =
b 12 =
15.4 Factorial Experiments in a Regression Setting 615 Thus, the regression equation expressing two linear main effects and interaction is
y = 36.25 + 6.25x ˆ 1 + 2.75x 2 + 0.75x 1 x 2 .
The regression context provides a framework in which the reader should better understand the advantage of orthogonality that is enjoyed by the 2 k factorial. In Section 15.2, the merits of orthogonality were discussed from the point of view of analysis of variance of the data in a 2 k factorial experiment. It was pointed out that orthogonality among effects leads to independence among the sums of squares. Of course, the presence of regression variables certainly does not rule out the use of analysis of variance. In fact, f-tests are conducted just as they were described in Section 15.2. Of course, a distinction must be made. In the case of ANOVA, the hypotheses evolve from population means, while in the regression case, the hypotheses involve regression coefficients.
For instance, consider the experimental design in Exercise 15.2 on page 609. Each factor is continuous. Suppose that the levels are
5 lb/sec
10 lb/sec
C (x 3 ): 5 5.5
and we have, for design levels, x
% solids − 30 ,
flow rate − 7.5
pH − 5.25
10 2.5 0.25 Suppose that it is of interest to fit a multiple regression model in which all linear
coefficients and available interactions are to be considered. In addition, the engineer wants to obtain some insight into what levels of the factor will maximize cleansing (i.e., maximize the response). This problem will be the subject of Case Study 15.2.
Case Study 15.2: Coal Cleansing Experiment 1 : Figure 15.9 represents annotated computer print- out for the regression analysis for the fitted model
y=b ˆ 0 +b 1 x 1 +b 2 x 2 +b 3 x 3 +b 12 x 1 x 2 +b 13 x 1 x 3 +b 23 x 2 x 3 +b 123 x 1 x 2 x 3 , where x 1 ,x 2 , and x 3 are percent solids, flow rate, and pH of the system, respec-
tively. The computer system used is SAS PROC REG. Note the parameter estimates, standard error, and P-values in the printout. The parameter estimates represent coefficients in the model. All model coefficients are significant except the x 2 x 3 term (BC interaction). Note also that residuals, confidence intervals, and prediction intervals appear as discussed in the regression material in Chapters 11 and 12.
The reader can use the values of the model coefficients and predicted values from the printout to ascertain what combination of the factors results in max- imum cleansing efficiency. Factor A (percent solids circulated) has a large positive coefficient, suggesting a high value for percent solids. In addition, a low value for factor C (pH of the tank) is suggested. Though the B main effect (flow rate of the polymer) coefficient is positive, the rather large positive coefficient of
1 See Exercise 15.2.
616 Chapter 15 2 k Factorial Experiments and Fractions Dependent Variable: Y
Analysis of Variance
DF Squares
Square
F Value Pr > F
Corrected Total 15 492.43704
Root MSE 0.52465
R-Square
Dependent Mean 12.75188
Adj R-Sq
Coeff Var 4.11429 Parameter Estimates Parameter Standard Variable
DF Estimate
Error
t Value Pr > |t|
C 1 -1.41563
AB 1 -0.59938
AC 1 -0.52813
Dependent Predicted
Std Error
Obs Variable Value Mean Predict
95% CL Predict Residual 1 4.6500
95% CL Mean
Figure 15.9: SAS printout for data of Case Study 15.2.
x 1 x 2 x 3 (ABC) suggests that flow rate should be at the low level to enhance effi- ciency. Indeed, the regression model generated in the SAS printout suggests that the combination of factors that may produce optimum results, or perhaps suggest direction for further experimentation, is given by
A: high level
B: low level
C: low level
15.5 The Orthogonal Design 617