12 The article “How to Optimize and Control the Wire Bonding Process: Part II” (Solid
Example 13.12 The article “How to Optimize and Control the Wire Bonding Process: Part II” (Solid
State Technology, Jan. 1991: 67–72) described an experiment carried out to assess the
impact of the variables x 1 5 force (gm), x 2 5 power (mW), x 3 5 tempertaure (8C ) ,
and x 4 5 time (msec) on y 5 ball bond shear strength (gm). The following data was generated to be consistent with the information given in the article:
A statistical computer package gave the following least squares estimates:
bˆ 0 5 237.48 bˆ 1 5 .2117 bˆ 2 5 .4983 bˆ 3 5 .1297 bˆ 4 5 .2583
Thus we estimate that .1297 gm is the average change in strength associated with a 1-degree increase in temperature when the other three predictors are held fixed; the other estimated coefficients are interpreted in a similar manner.
The estimated regression equation is y 5 237.48 1 .2117x 1 1 .4983x 2 1 .1297x 3 1 .2583x 4
A point prediction of strength resulting from a force of 35 gm, power of 75 mW, temperature of 200° degrees, and time of 20 msec is
yˆ 5 237.48 1 (.2117)(35) 1 (.4983)(75) 1 (.1297)(200) 1 (.2583)(20)
5 38.41 gm
From the book Statistics Engineering Problem Solving by Stephen Vardeman, an excellent exposition of the territory covered by our book, albeit at a somewhat higher level.
13.4 Multiple Regression Analysis
This is also a point estimate of the mean value of strength for the specified values of force, power, temperature, and time.
■
R 2 and s ˆ 2
Predicted or fitted values, residuals, and the various sums of squares are calculated
as in simple linear and polynomial regression. The predicted value yˆ 1 results from
substituting the values of the various predictors from the first observation into the estimated regression function:
yˆ 1 5 bˆ 0 1 bˆ 1 x 11 1 bˆ 1 x 21 1 c 1 bˆ k x k1
The remaining predicted values yˆ 2 , c, yˆ n come from substituting values of the pre- dictors from the 2nd, 3rd, c , and finally nth observations into the estimated func-
tion. For example, the values of the 4 predictors for the last observation in Example
13.12 are x 1,30 5 35, x 2,30 5 75, x 3,30 5 200 , and x 4,30 5 20 , so yˆ 30 5 237.48 1 .2117(35) 1 .4983(75) 1 .1297(200) 1 .2583(20) 5 38.41 The residuals y 1 2 yˆ 1 , c, y n 2 yˆ n are the differences between the observed and
predicted values. The last residual in Example 13.12 is
40.3 2 38.41 5 1.89 . The
closer the residuals are to 0, the better the job our estimated regression function is doing in making predictions corresponding to observations in the sample.
Error or residual sum of squares is SSE 5 g (y i 2 yˆ 2 i ) . It is again interpreted
as a measure of how much variation in the observed y values is not explained by (not attributed to) the model relationship. The number of df associated with SSE is n 2 (k 1 1) because k11
df are lost in estimating the k11b coefficients. Total
sum of squares, a measure of total variation in the observed y values, is
SST 5 g (y i 2 y) 2 . Regression sum of squares SSR 5 g (yˆ i 2y) 2 5 SST 2 SSE
is a measure of explained variation. Then the coefficient of multiple determination
R 2 is R 2 5 1 2 SSESST 5 SSRSST
It is interpreted as the proportion of observed y variation that can be explained by the multiple regression model fit to the data.
Because there is no preliminary picture of multiple regression data analogous to a scatter plot for bivariate data, the coefficient of multiple determination is our first indication of whether the chosen model is successful in explaining y variation.
Unfortunately, there is a problem with R 2 : Its value can be inflated by adding lots of
predictors into the model even if most of these predictors are rather frivolous. For example, suppose y is the sale price of a house. Then sensible predictors include
x 1 5 the interior size of the house, x 2 5 the size of the lot on which the house sits, x 3 5 the number of bedrooms, x 4 5 the number of bathrooms, and x 5 5 the house’s age. Now suppose we add in x 6 5 the diameter of the doorknob on the coat closet, x 7 5 the thickness of the cutting board in the kitchen, x 8 5 the thickness of the
patio slab, and so on. Unless we are very unlucky in our choice of predictors, using
n21 predictors (one fewer than the sample size) will yield R 2 51 . So the objec-
tive in multiple regression is not simply to explain most of the observed y variation, but to do so using a model with relatively few predictors that are easily interpreted.
It is thus desirable to adjust R 2 , as was done in polynomial regression, to take
account of the size of the model:
2 SSE(n 2 (k 1 1))
SST(n 2 1)
n 2 (k 1 1) SST
CHAPTER 13 Nonlinear and Multiple Regression
Because the ratio in front of SSESST exceeds 1, R 2 is smaller than R a 2 . Furthermore, the larger the number of predictors k relative to the sample size n, the smaller R 2 a will
be relative to R 2 . Adjusted R 2 can even be negative, whereas R 2 itself must be between
0 and 1. A value of R 2 a that is substantially smaller than R 2 itself is a warning that the
model may contain too many predictors.
The positive square root of R 2 is called the multiple correlation coefficient and is
denoted by R. It can be shown that R is the sample correlation coefficient calculated from the (yˆ i ,y i ) pairs (that is, use in place of x yˆ i i in the formula for r from Section 12.5).
SSE is also the basis for estimating the remaining model parameter:
2 2 s SSE ˆ 5s 5 5 MSE n 2 (k 1 1)