Factorial Experiments in a Regression Setting

15.4 Factorial Experiments in a Regression Setting

  Thus far in this chapter, we have mostly confined our discussion of analysis of the data for a 2 k factorial to the method of analysis of variance. The only reference to an alternative analysis resides in Exercise 15.9. Indeed, this exercise introduces much of what motivates the present section. There are situations in which model fitting is important and the factors under study can be controlled. For example,

  a biologist may wish to study the growth of a certain type of algae in the water, and so a model that looks at units of algae as a function of the amount of a pollutant and, say, time would be very helpful. Thus, the study involves a factorial experiment in a laboratory setting in which concentration of the pollutant and time are the factors. As we shall discuss later in this section, a more precise model can be fitted if the factors are controlled in a factorial array, with the 2 k factorial often being a useful choice. In many biological and chemical processes, the levels of the regressor variables can and should be controlled.

  Recall that the regression model employed in Chapter 12 can be written in matrix notation as

  y = Xβ + .

  The X matrix is referred to as the model matrix. Suppose, for example, that a

  2 3 factorial experiment is employed with the variables

  Pressure (psi): 1000

  The familiar +1, −1 levels can be generated through the following centering

  and scaling to design units:

  − 17.5 , x

  pressure

  − 1250 .

  x 1 =

  , x 2 =

  15.4 Factorial Experiments in a Regression Setting

  As a result, the X matrix becomes

  ⎡

  x 1 x 2 x 3 ⎤ Design Identification

  It is now seen that the contrasts illustrated and discussed in Section 15.2 are directly related to regression coefficients. Notice that all the columns of the X

  matrix in our 2 3 example are orthogonal. As a result, the computation of regression

  coefficients as described in Section 12.3 becomes

  where a, ab, and so on, are response measures.

  One can now see that the notion of calculated main effects, which has been emphasized throughout this chapter with 2 k factorials, is related to coefficients in

  a fitted regression model when factors are quantitative. In fact, for a 2 k with, say, n experimental runs per design point, the relationships between effects and regression coefficients are as follows:

  contrast

  Effect =

  2 k −1 (n) contrast

  effect

  Regression coefficient =

  This relationship should make sense to the reader, since a regression coefficient

  b j is an average rate of change in response per unit change in x j . Of course, as one goes from −1 to +1 in x j (low to high), the design variable changes by 2 units.

  Example 15.2: Consider an experiment where an engineer desires to fit a linear regression of yield

  y against holding time x 1 and flexing time x 2 in a certain chemical system. All

  other factors are held fixed. The data in the natural units are given in Table 15.8. Estimate the multiple linear regression model.

  Solution : The fitted regression model is

  y=b ˆ 0 +b 1 x 1 +b 2 x 2 .

  Chapter 15 2 k Factorial Experiments and Fractions

  Table 15.8: Data for Example 15.2

  Holding Time (hr) Flexing Time (hr) Yield ()

  The design units are

  holding time

  − 0.65

  flexing time

  and the X matrix is

  with the regression coefficients

  a + ab − (1) − b = ⎣ 6.25 ⎦.

  b 2 ⎣

  ⎦ − (1) − a 2.75

  b + ab

  Thus, the least squares regression equation is

  ˆ y = 36.25 + 6.25x 1 + 2.75x 2 .

  This example provides an illustration of the use of the two-level factorial ex-

  periment in a regression setting. The four experimental runs in the 2 2 design

  were used to calculate a regression equation, with the obvious interpretation of the

  regression coefficients. The value b 1 = 6.25 represents the estimated increase in

  response (percent yield) per design unit change (0.15 hour) in holding time. The

  value b 2 = 2.75 represents a similar rate of change for flexing time.

  Interaction in the Regression Model

  The interaction contrasts discussed in Section 15.2 have definite interpretations in the regression context. In fact, interactions are accounted for in regression models by product terms. For example, in Example 15.2, the model with interaction is

  y=b 0 +b 1 x 1 +b 2 x 2 +b 12 x 1 x 2

  with b 0 ,b 1 ,b 2 as before and

  ab + (1)

  −a−b =

  − 39 − 32 = 0.75.

  b 12 =

  15.4 Factorial Experiments in a Regression Setting

  Thus, the regression equation expressing two linear main effects and interaction is

  y = 36.25 + 6.25x ˆ 1 + 2.75x 2 + 0.75x 1 x 2 .

  The regression context provides a framework in which the reader should better understand the advantage of orthogonality that is enjoyed by the 2 k factorial. In Section 15.2, the merits of orthogonality were discussed from the point of view of analysis of variance of the data in a 2 k factorial experiment. It was pointed out that orthogonality among effects leads to independence among the sums of squares. Of course, the presence of regression variables certainly does not rule out the use of analysis of variance. In fact, f-tests are conducted just as they were described in Section 15.2. Of course, a distinction must be made. In the case of ANOVA, the hypotheses evolve from population means, while in the regression case, the hypotheses involve regression coefficients.

  For instance, consider the experimental design in Exercise 15.2 on page 609. Each factor is continuous. Suppose that the levels are

  and we have, for design levels,

  solids

  − 30

  flow rate

  − 7.5 ,

  x 3 = − 5.25 .

  Suppose that it is of interest to fit a multiple regression model in which all linear coefficients and available interactions are to be considered. In addition, the engineer wants to obtain some insight into what levels of the factor will maximize cleansing (i.e., maximize the response). This problem will be the subject of Case Study 15.2.

  Case Study 15.2: Coal Cleansing Experiment 1 : Figure 15.9 represents annotated computer print-

  out for the regression analysis for the fitted model

  y=b ˆ 0 +b 1 x 1 +b 2 x 2 +b 3 x 3 +b 12 x 1 x 2 +b 13 x 1 x 3 +b 23 x 2 x 3 +b 123 x 1 x 2 x 3 ,

  where x 1 ,x 2 , and x 3 are percent solids, flow rate, and pH of the system, respec-

  tively. The computer system used is SAS PROC REG.

  Note the parameter estimates, standard error, and P-values in the printout. The parameter estimates represent coefficients in the model. All model coefficients

  are significant except the x 2 x 3 term (BC interaction). Note also that residuals,

  confidence intervals, and prediction intervals appear as discussed in the regression material in Chapters 11 and 12.

  The reader can use the values of the model coefficients and predicted values from the printout to ascertain what combination of the factors results in max- imum cleansing efficiency . Factor A (percent solids circulated) has a large positive coefficient, suggesting a high value for percent solids. In addition, a low value for factor C (pH of the tank) is suggested. Though the B main effect (flow rate of the polymer) coefficient is positive, the rather large positive coefficient of

  1 See Exercise 15.2.

  Chapter 15 2 k Factorial Experiments and Fractions

  Dependent Variable: Y

  Analysis of Variance

  DF Squares

  Square

  F Value Pr > F

  Corrected Total 15 492.43704

  Root MSE

  Dependent Mean 12.75188

  Adj R-Sq

  Coeff Var

  Parameter Estimates Parameter Standard

  Variable

  DF Estimate

  Error

  t Value Pr > |t|

  C 1 -1.41563

  AB 1 -0.59938

  AC 1 -0.52813

  Dependent Predicted

  Std Error

  Obs Variable

  Value Mean Predict

  95 CL Mean

  95 CL Predict Residual

  Figure 15.9: SAS printout for data of Case Study 15.2.

  x 1 x 2 x 3 (ABC) suggests that flow rate should be at the low level to enhance effi- ciency. Indeed, the regression model generated in the SAS printout suggests that the combination of factors that may produce optimum results, or perhaps suggest direction for further experimentation, is given by

  A: high level

  B: low level

  C: low level

  15.5 The Orthogonal Design