Special Case of Orthogonality (Optional)
12.7 Special Case of Orthogonality (Optional)
Prior to our original development of the general linear regression problem, the assumption was made that the independent variables are measured without error and are often controlled by the experimenter. Quite often they occur as a result of an elaborately designed experiment. In fact, we can increase the effectiveness of the resulting prediction equation with the use of a suitable experimental plan.
Suppose that we once again consider the X matrix as defined in Section 12.3. We can rewrite it as
X = [1, x 1 ,x 2 ,...,x k ],
where 1 represents a column of ones and x j is a column vector representing the levels of x j . If
x ′ p x q = 0,
the variables x p and x q are said to be orthogonal to each other. There are certain obvious advantages to having a completely orthogonal situation where x ′ p x q =0
468 Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models
x ji = 0, j = 1, 2, . . . , k.
i=1
The resulting X ′
X is a diagonal matrix, and the normal equations in Section 12.3 reduce to
nb 0 =
y i ,b 1 x 2 1i =
x 1i y i ,···,b k
An important advantage is that one is easily able to partition SSR into single- degree-of-freedom components, each of which corresponds to the amount of variation in Y accounted for by a given controlled variable. In the orthogonal situation, we can write
0 +b 1 x 1i +···+b k x ki −b 0 )
i=1
i=1
=b 2 1 2 x 2 2 2 1i 2 +b 2 x 2i +···+b k x ki
The quantity R(β i ) is the amount of the regression sum of squares associated with
a model involving a single independent variable x i . To test simultaneously for the significance of a set of m variables in an orthog- onal situation, the regression sum of squares becomes
R(β 1 ,β 2 ,...,β m |β m+1 ,β m+2 ,...,β k ) = R(β 1 ) + R(β 2 ) + · · · + R(β m ), and thus we have the further simplification
R(β 1 |β 2 ,β 3 ,...,β k ) = R(β 1 )
when evaluating a single independent variable. Therefore, the contribution of a given variable or set of variables is essentially found by ignoring the other variables in the model. Independent evaluations of the worth of the individual variables are accomplished using analysis-of-variance techniques, as given in Table 12.4. The total variation in the response is partitioned into single-degree-of-freedom compo- nents plus the error term with n −k −1 degrees of freedom. Each computed f-value is used to test one of the hypotheses
by comparing with the critical point f α (1, n − k − 1) or merely interpreting the P-value computed from the f-distribution.
12.7 Special Case of Orthogonality (Optional) 469
Table 12.4: Analysis of Variance for Orthogonal Variables Source of
Mean Computed Variation
Sum of
Degrees of
s 2 = SSE n−k−1 Total
SSE
n−k−1
SST = S yy
n−1
Example 12.8: Suppose that a scientist takes experimental data on the radius of a propellant grain Y as a function of powder temperature x 1 , extrusion rate x 2 , and die temperature x 3 . Fit a linear regression model for predicting grain radius, and determine the effectiveness of each variable in the model. The data are given in Table 12.5.
Table 12.5: Data for Example 12.8
Grain Radius Temperature
250 (+1) Solution : Note that each variable is controlled at two levels, and the experiment is composed
of the eight possible combinations. The data on the independent variables are coded for convenience by means of the following formulas:
powder temperature − 170
20 extrusion rate − 18
6 x 3 = die temperature − 235 .
15 The resulting levels of x 1 ,x 2 , and x 3 take on the values −1 and +1 as indicated
in the table of data. This particular experimental design affords the orthogonal-
470 Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models ity that we want to illustrate here. (A more thorough treatment of this type of
experimental layout appears in Chapter 15.) The X matrix is
and the orthogonality conditions are readily verified.
We can now compute coefficients
y i = 121.75,
so in terms of the coded variables, the prediction equation is
y = 121.75 + 2.5 x ˆ 1 + 14.75 x 2 + 21.75 x 3 . The analysis of variance in Table 12.6 shows independent contributions to SSR for
each variable. The results, when compared to the f 0.05 (1, 4) critical point of 7.71, indicate that x 1 does not contribute significantly at the 0.05 level, whereas variables x 2 and x 3 are significant. In this example, the estimate for σ 2 is 23.1250. As for the single independent variable case, it should be pointed out that this estimate does not solely contain experimental error variation unless the postulated model is correct. Otherwise, the estimate is “contaminated” by lack of fit in addition to pure error, and the lack of fit can be separated out only if we obtain multiple
experimental observations for the various (x 1 ,x 2 ,x 3 ) combinations. Table 12.6: Analysis of Variance for Grain Radius Data
Source of
Computed Variation
Sum of
Degrees of
f P-Value
Since x 1 is not significant, it can simply be eliminated from the model without altering the effects of the other variables. Note that x 2 and x 3 both impact the grain radius in a positive fashion, with x 3 being the more important factor based on the smallness of its P-value.
Exercises 471
Exercises
12.31 Compute and interpret the coefficient of multi- 12.38 Consider the data for Exercise 12.36. Compute ple determination for the variables of Exercise 12.1 on the following: page 450.
R(β 1 |β 0 ,β 2 ,β 3 ), 12.32 Test whether the regression explained by the
R(β 1 |β 0 ),
R(β 2 |β 0 ,β 1 ,β 3 ), model in Exercise 12.1 on page 450 is significant at the
R(β 2 |β 0 ,β 1 ),
0.01 level of significance. R(β 3 |β 0 ,β 1 ,β 2 ), R(β 1 ,β 2 |β 3 ). 12.33 Test whether the regression explained by the Comment.
model in Exercise 12.5 on page 450 is significant at the 12.39 Consider the data of Exercise 11.55 on page 0.01 level of significance.
437. Fit a regression model using weight and drive ratio as explanatory variables. Compare this model
12.34 For the model of Exercise 12.5 on page 450, test the hypothesis
with the SLR (simple linear regression) model using weight alone. Use R 2 ,R 2 adj , and any t-statistics (or
H 0 :β 1 =β 2 = 0, F-statistics) you may need to compare the SLR with the multiple regression model.
H 1 :β 1 and β 2 are not both zero.
12.40 Consider Example 12.4. Figure 12.1 on page 459 displays a SAS printout of an analysis of the model 12.35 Repeat Exercise 12.17 on page 461 using an containing variables x 1 ,x 2 , and x 3 . Focus on the F-statistic.
confidence interval of the mean response μ Y at the (x 1 ,x 2 ,x 3 ) locations representing the 13 data points.
12.36 A small experiment was conducted to fit a mul- Consider an item in the printout indicated by C.V. tiple regression equation relating the yield y to temper- This is the coefficient of variation, which is defined
ature x 1 , reaction time x 2 , and concentration of one by of the reactants x 3 . Two levels of each variable were chosen, and measurements corresponding to the coded
independent variables were recorded as follows: y ¯ · 100, y
C.V. =
where s = s is the root mean squared error. The
coefficient of variation is often used as yet another crite- 9.2 −1
rion for comparing competing models. It is a scale-free 10.3 −1
1 quantity which expresses the estimate of σ, namely s,
as a percent of the average response ¯ y. In competition
1 for the “best” among a group of competing models, one 10.2 −1
1 1 strives for the model with a small value of C.V. Do a 12.6 1 1 1 regression analysis of the data set shown in Example
12.4 but eliminate x 3 . Compare the full (x 1 ,x 2 ,x 3 ) (a) Using the coded variables, estimate the multiple model with the restricted (x 1 ,x 2 ) model and focus on
linear regression equation two criteria: (i) C.V.; (ii) the widths of the confidence intervals on μ Y . For the second criterion you may want
to use the average width. Comment. (b) Partition SSR, the regression sum of squares,
μ Y |x 1 ,x 2 ,x 3 =β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 .
12.41 Consider Example 12.3 on page 447. Compare into three single-degree-of-freedom components at- the two competing models.
tributable to x 1 ,x 2 , and x 3 , respectively. Show an
analysis-of-variance table, indicating significance
y i =β 0 +β 1 x 1i +β 2 x 2i +ǫ i , tests on each variable.
First order:
Second order: y i =β 0 +β 1 x 1i +β 2 x 2i 12.37 Consider the electric power data of Exercise
11 x 1i +β 22 x 2i +β 12 x 1i x 2i +ǫ i .
12.5 on page 450. Test H 0 :β 1 =β 2 = 0, making use
of R(β 1 ,β 2 |β 3 ,β 4 ). Give a P-value, and draw conclu- Use R 2 adj in your comparison. Test H 0 :β 11 =β 22 = sions.
β 12 = 0. In addition, use the C.V. discussed in Exercise 12.40.
472 Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models 12.42 In Example 12.8, a case is made for eliminat-
12.43 Consider the data of Exercise 12.13 on page ing x 1 , powder temperature, from the model since the 452. Can the response, wear, be explained adequately P-value based on the F-test is 0.2156 while P-values by a single variable (either viscosity or load) in an SLR for x 2 and x 3 are near zero.
rather than with the full two-variable regression? Jus- (a) Reduce the model by eliminating x 1 , thereby pro- tify your answer thoroughly through tests of hypothe-
ducing a full and a restricted (or reduced) model, ses as well as comparison of the three competing
and compare them on the basis of R 2 adj .
models.
(b) Compare the full and restricted models using the 12.44 For the data set given in Exericise 12.16 on width of the 95% prediction intervals on a new ob- page 453, can the response be explained adequately by
servation. The better of the two models would be any two regressor variables? Discuss. that with the tightened prediction intervals. Use the average of the width of the prediction intervals.