12 The article “How to Optimize and Control the Wire Bonding Process: Part II” (Solid

Example 13.12 The article “How to Optimize and Control the Wire Bonding Process: Part II” (Solid

  State Technology, Jan. 1991: 67–72) described an experiment carried out to assess the

  impact of the variables x 1 5 force (gm), x 2 5 power (mW), x 3 5 tempertaure (8C ) ,

  and x 4 5 time (msec) on y 5 ball bond shear strength (gm). The following data was generated to be consistent with the information given in the article:

  A statistical computer package gave the following least squares estimates:

  bˆ 0 5 237.48 bˆ 1 5 .2117 bˆ 2 5 .4983 bˆ 3 5 .1297 bˆ 4 5 .2583

  Thus we estimate that .1297 gm is the average change in strength associated with a 1-degree increase in temperature when the other three predictors are held fixed; the other estimated coefficients are interpreted in a similar manner.

  The estimated regression equation is y 5 237.48 1 .2117x 1 1 .4983x 2 1 .1297x 3 1 .2583x 4

  A point prediction of strength resulting from a force of 35 gm, power of 75 mW, temperature of 200° degrees, and time of 20 msec is

  yˆ 5 237.48 1 (.2117)(35) 1 (.4983)(75) 1 (.1297)(200) 1 (.2583)(20)

  5 38.41 gm

  From the book Statistics Engineering Problem Solving by Stephen Vardeman, an excellent exposition of the territory covered by our book, albeit at a somewhat higher level.

  13.4 Multiple Regression Analysis

  This is also a point estimate of the mean value of strength for the specified values of force, power, temperature, and time.

  ■

  R 2 and s ˆ 2

  Predicted or fitted values, residuals, and the various sums of squares are calculated

  as in simple linear and polynomial regression. The predicted value yˆ 1 results from

  substituting the values of the various predictors from the first observation into the estimated regression function:

  yˆ 1 5 bˆ 0 1 bˆ 1 x 11 1 bˆ 1 x 21 1 c 1 bˆ k x k1

  The remaining predicted values yˆ 2 , c, yˆ n come from substituting values of the pre- dictors from the 2nd, 3rd, c , and finally nth observations into the estimated func-

  tion. For example, the values of the 4 predictors for the last observation in Example

  13.12 are x 1,30 5 35, x 2,30 5 75, x 3,30 5 200 , and x 4,30 5 20 , so yˆ 30 5 237.48 1 .2117(35) 1 .4983(75) 1 .1297(200) 1 .2583(20) 5 38.41 The residuals y 1 2 yˆ 1 , c, y n 2 yˆ n are the differences between the observed and

  predicted values. The last residual in Example 13.12 is

  40.3 2 38.41 5 1.89 . The

  closer the residuals are to 0, the better the job our estimated regression function is doing in making predictions corresponding to observations in the sample.

  Error or residual sum of squares is SSE 5 g (y i 2 yˆ 2 i ) . It is again interpreted

  as a measure of how much variation in the observed y values is not explained by (not attributed to) the model relationship. The number of df associated with SSE is n 2 (k 1 1) because k11

  df are lost in estimating the k11b coefficients. Total

  sum of squares, a measure of total variation in the observed y values, is

  SST 5 g (y i 2 y) 2 . Regression sum of squares SSR 5 g (yˆ i 2y) 2 5 SST 2 SSE

  is a measure of explained variation. Then the coefficient of multiple determination

  R 2 is R 2 5 1 2 SSESST 5 SSRSST

  It is interpreted as the proportion of observed y variation that can be explained by the multiple regression model fit to the data.

  Because there is no preliminary picture of multiple regression data analogous to a scatter plot for bivariate data, the coefficient of multiple determination is our first indication of whether the chosen model is successful in explaining y variation.

  Unfortunately, there is a problem with R 2 : Its value can be inflated by adding lots of

  predictors into the model even if most of these predictors are rather frivolous. For example, suppose y is the sale price of a house. Then sensible predictors include

  x 1 5 the interior size of the house, x 2 5 the size of the lot on which the house sits, x 3 5 the number of bedrooms, x 4 5 the number of bathrooms, and x 5 5 the house’s age. Now suppose we add in x 6 5 the diameter of the doorknob on the coat closet, x 7 5 the thickness of the cutting board in the kitchen, x 8 5 the thickness of the

  patio slab, and so on. Unless we are very unlucky in our choice of predictors, using

  n21 predictors (one fewer than the sample size) will yield R 2 51 . So the objec-

  tive in multiple regression is not simply to explain most of the observed y variation, but to do so using a model with relatively few predictors that are easily interpreted.

  It is thus desirable to adjust R 2 , as was done in polynomial regression, to take

  account of the size of the model:

  2 SSE(n 2 (k 1 1))

  SST(n 2 1)

  n 2 (k 1 1) SST

  CHAPTER 13 Nonlinear and Multiple Regression

  Because the ratio in front of SSESST exceeds 1, R 2 is smaller than R a 2 . Furthermore, the larger the number of predictors k relative to the sample size n, the smaller R 2 a will

  be relative to R 2 . Adjusted R 2 can even be negative, whereas R 2 itself must be between

  0 and 1. A value of R 2 a that is substantially smaller than R 2 itself is a warning that the

  model may contain too many predictors.

  The positive square root of R 2 is called the multiple correlation coefficient and is

  denoted by R. It can be shown that R is the sample correlation coefficient calculated from the (yˆ i ,y i ) pairs (that is, use in place of x yˆ i i in the formula for r from Section 12.5).

  SSE is also the basis for estimating the remaining model parameter:

  2 2 s SSE ˆ 5s 5 5 MSE n 2 (k 1 1)