10 The article “A Method for Improving the Accuracy of Polynomial Regression

Example 13.10 The article “A Method for Improving the Accuracy of Polynomial Regression

  Analysis” (J. of Quality Tech., 1971: 149–155) reports the following data on x5 cure temperature (°F) and y 5 ultimate shear strength of a rubber compound (psi), with : x 5 297.13

  A computer analysis yielded the results shown in Table 13.4.

  Table 13.4 Estimated Coefficients and Standard Deviations for Example 13.10 Parameter

  Estimate

  Estimated SD

  Parameter

  Estimate Estimated SD

  The estimated regression function using the original model is y 5 226,219.64

  1 189.21x 2 .3312x 2 , whereas for the centered model the function is y 5 759.36

  2 7.61(x 2 297.13) 2 .3312(x 2 297.13) 2 . These estimated functions are identical;

  the only difference is that different parameters have been estimated for the two models.

  The estimated standard deviations indicate clearly that b 0 and b 1 have been more accu- rately estimated than b 0 and b 1 . The quadratic parameters are identical (b 2 5 b 2 ) , as can

  be seen by comparing the x 2 term in (13.14) with the original model. We emphasize again

  that a major benefit of centering is the gain in computational accuracy, not only in quad- ratic but also in higher-degree models.

  ■ The book by Neter et al., listed in the chapter bibliography, is a good source

  for more information about polynomial regression.

  EXERCISES Section 13.3 (26–35)

  26. The article “Physical Properties of Cumin Seed” (J. of Agric.

  graph in the article follows, along with Minitab output from

  Engr. Res., 1996: 93–98) considered a quadratic regression

  the quadratic fit.

  of y 5 bulk density on x 5 moisture content. Data from a

  CHAPTER 13 Nonlinear and Multiple Regression

  The regression equation is

  x and a normal probability plot. Do the plots exhibit any

  bulkdens 5 403 1 16.2 moiscont 2 0.706 contsqd

  troublesome features?

  e. The estimated standard deviation of mˆ Y 6 —that is,

  0 1 bˆ 1 (6) 1 bˆ 2 (36) —is 1.69. Compute

  S 5 10.15 R - Sq 5 93.8 R - Sq(adj) 5 89.6 f. Compute a 95 PI for a glucose concentration observa- tion made after 6 days of fermentation time.

  Analysis of Variance

  28. The viscosity (y) of an oil was measured by a cone and plate

  viscometer at six different cone speeds (x). It was assumed

  Regression

  Residual Error

  that a quadratic regression model was appropriate, and the

  Total

  estimated regression function resulting from the n56 observations was

  StDev

  St

  Obs moiscont bulkdens

  Fit Fit Residual Resid

  y 5 2113.0937 1 3.3684x 2 .01780x 2

  8.21 0.98 a. Estimate m Y 75 , the expected viscosity when speed is

  6.93 0.85 b. What viscosity would you predict for a cone speed of

  c. If , gy i 2 5 8386.43, g y i 5 210.70, g x i y i 5 17,002.00

  StDev

  and , gx i y i 5 1,419,780 compute SSE [5 g y i

  d. From part (c), SST 5 8386.43 2 (210.70) 2 6 5 987.35 .

  a. Does a scatter plot of the data appear consistent with the

  Using SSE computed in part (c), what is the computed

  quadratic regression model?

  value of R 2 ?

  b. What proportion of observed variation in density can be

  e. If the estimated standard deviation of bˆ 2 is s bˆ 2 5 .00226 ,

  attributed to the model relationship?

  test H 0 :b 2 50 versus H a :b 2 0 at level .01, and

  c. Calculate a 95 CI for true average density when mois-

  interpret the result.

  ture content is 13.7.

  29. High-alumina refractory castables have been extensively

  d. The last line of output is from a request for estimation and

  investigated in recent years because of their significant

  prediction information when moisture content is 14. Cal-

  advantages over other refractory brick of the same class—

  culate a 99 PI for density when moisture content is 14.

  lower production and application costs, versatility, and per-

  e. Does the quadratic predictor appear to provide useful infor-

  formance at high temperatures. The accompanying data on

  mation? Test the appropriate hypotheses at significance

  x 5 viscosity (MPa s) and was y 5 free-flow () read

  level .05.

  from a graph in the article “Processing of Zero-Cement

  27. The following data on y 5 glucose concentration (gL) and

  Self-Flow Alumina Castables” (The Amer. Ceramic Soc.

  x 5 fermentation time (days) for a particular blend of malt

  Bull., 1998: 60–66):

  liquor was read from a scatter plot in the article “Improving

  Fermentation Productivity with Reverse Osmosis” (Food Tech., 1984: 92–96):

  1 2 3 4 5 6 7 8 The authors of the cited paper related these two variables using a quadratic regression model. The estimated regres-

  y

  74 54 52 51 52 53 58 71 sion function is y 5 2295.96 1 2.1885x 2 .0031662x 2 . a. Compute the predicted values and residuals, and then

  a. Verify that a scatter plot of the data is consistent with the

  SSE and s 2 .

  choice of a quadratic regression model.

  b. Compute and interpret the coefficient of multiple

  b. The estimated quadratic regression equation is

  determination.

  y 5 84.482 2 15.875x 1 1.7679x 2 . Predict the value of

  c. The estimated SD of bˆ 2 is s bˆ 2 5 .0004835 . Does the

  glucose concentration for a fermentation time of 6 days,

  quadratic predictor belong in the regression model?

  and compute the corresponding residual.

  d. The estimated SD of bˆ 1 is .4050. Use this and the infor-

  c. Using SSE 5 61.77 , what proportion of observed variation

  mation in (c) to obtain joint CIs for the linear and quad-

  can be attributed to the quadratic regression relationship?

  ratic regression coefficients with a joint confidence level

  d. The n58 standardized residuals based on the quadratic

  of (at least) 95.

  model are 1.91,

  1.95, .25, .58, .90, .04, .66, and

  e. The estimated SD of mˆ Y 400 is 1.198. Calculate a 95 CI

  .20. Construct a plot of the standardized residuals versus

  for true average free-flow when viscosity 5 400 and also

  13.3 Polynomial Regression

  a 95 PI for free-flow resulting from a single observation

  The regression equation is

  made when viscosity 5 400 , and compare the intervals.

  y 52 134 1 12.7 x 2 0.377 x 2 1 0.00359 x 3

  30. The accompanying data was extracted from the article “Effects of Cold and Warm Temperatures on Springback of

  Predictor

  Coef

  SE Coef TP

  Aluminum-Magnesium Alloy 5083-H111” (J. of Engr.

  Manuf., 2009: 427–431). The response variable is yield

  strength (MPa), and the predictor is temperature (°C).

  x3

  0.0035861 0.0002529 14.18 0.000 S 5 0.168354 R - Sq 5 98.0 R - Sq (adj) 5 97.7

  Analysis of Variance

  Here is Minitab output from fitting the quadratic regression

  Residual Error 20

  model (a graph in the cited paper suggests that the authors

  Total

  did this):

  a. What proportion of observed variation in energy output

  Predictor

  Coef

  SE Coef

  T

  P

  can be attributed to the model relationship?

  b. Fitting a quadratic model to the data results in R 2 5 .780 .

  Calculate adjusted R 2 for this model and compare to

  adjusted R 2 for the cubic model.

  S 5 3.44398

  R - Sq 5 98.1 R - Sq(adj) 5 96.3 c. Does the cubic predictor appear to provide useful

  information about y over and above that provided by the linear and quadratic predictors? State and test the

  Analysis of Variance

  appropriate hypotheses.

  Regression

  Residual Error 2 23.72 11.86 d. When x 5 30, s Yˆ

  5 .0611 . Obtain a 95 CI for true aver-

  Total

  age energy output in this case, and also a 95 PI for a sin- gle energy output to be observed when temperature

  a. What proportion of observed variation in strength can be

  difference is 30. Hint: s Yˆ

  attributed to the model relationship?

  e. Interpret the hypotheses H 0 :m Y 35 55 versus H a :m 35 2 5 , and then carry out a test at significance

  5 .0523 decide if the quadratic predictor provides useful informa- . Yˆ

  b. Carry out a test of hypotheses at significance level .05 to

  Y

  level .05 using the fact that when x 5 35, s

  tion over and above that provided by the linear predictor.

  32. The following data is a subset of data obtained in an exper-

  c. For a strength value of 100, yˆ 5 134.07, s Yˆ

  iment to study the relationship between x 5 soil pH and

  Estimate true average strength when temperature is 100,

  y 5 A1 . ConcentrationEC (“Root Responses of Three

  in a way that conveys information about precision and

  Gramineae Species to Soil Acidity in an Oxisol and an

  reliability.

  Ultisol,” Soil Science, 1973: 295–302):

  d. Use the information in (c) to predict strength for a single observation to be made when temperature is 100, and do

  x

  so in a way that conveys information about precision and

  reliability. Then compare this prediction to the estimate obtained in (c).

  x

  31. The accompanying data on y 5 energy output (W) and

  x 5 temperature difference (°K) was provided by the authors of the article “Comparison of Energy and Exergy

  x

  Efficiency for Solar Box and Parabolic Cookers” (J. of Energy Engr., 2007: 53–62).

  The article’s authors fit a cubic regression model to the data.

  A cubic model was proposed in the article, but the version

  Here is Minitab output from such a fit.

  of Minitab used by the author of the present text refused to

  x 23.20 23.50 23.52 24.30 25.10 26.20 27.40 28.10 29.30 30.60 31.50 32.01

  y

  x 32.63 33.23 33.62 34.18 35.43 35.62 36.16 36.23 36.89 37.90 39.10 41.66

  y

  CHAPTER 13 Nonlinear and Multiple Regression

  include the x 3 term in the model, stating that “x 3 is highly

  d. What can you say about the relationship between SSEs

  correlated with other predictor variables.” To remedy this,

  and R 2 ’s for the standardized and unstandardized mod-

  x 5 4.3456 was subtracted from each x value to yield

  els? Explain.

  xr 5 x 2 x . A cubic regression was then requested to fit the

  e. SSE for the cubic model is .00006300, whereas for a

  model having regression function

  quadratic model SSE is .00014367. Compute R 2 for each model. Does the difference between the two suggest that

  y5b 2 0 3 1b 1 xr 1 b 2 (xr) 1b 3 (xr)

  the cubic term can be deleted? 34. The following data resulted from an experiment to assess

  The following computer output resulted:

  the potential of unburnt colliery spoil as a medium for plant growth. The variables are x 5 acid extractable cations and

  Parameter

  Estimate

  Estimated SD

  y 5 exchangeable aciditytotal cation exchange capacity (“Exchangeable Acidity in Unburnt Colliery Spoil,” Nature,

  a. What is the estimated regression function for the “centered”

  b. What is the estimated value of the coefficient b in the

  “uncentered” model with regression function y5b 0 1

  b x1b 2 x 2 1b 3 x 1 3 ? What is the estimate of b 2 ?

  Standardizing the independent variable x to obtain

  c. Using the cubic model, what value of y would you predict

  xr 5 (x 2 x)s x and fitting the regression function

  when soil pH is 4.5?

  y 5 b 0 1 b 1 xr 1 b 2 (xr) 2 yielded the accompanying com-

  d. Carry out a test to decide whether the cubic term should

  puter output.

  be retained in the model. 33. In many polynomial regression problems, rather than fit-

  Parameter

  Estimate

  Estimated SD

  ting a “centered” regression function using xr 5 x 2 x ,

  b .8733

  computational accuracy can be improved by using a func-

  b 1 .3255

  tion of the standardized independent variable

  xr 5 (x 2 x )s x , where s x is the standard deviation of the

  b 2 .0448

  x i ’s. Consider fitting the cubic regression function

  y5b xr 1 b 0 1b 1 2 (xr) 2 1b 3 3 (xr) to the following data

  a. Estimate . m Y 50

  resulting from a study of the relation between thrust effi-

  b. Compute the value of the coefficient of multiple deter-

  ciency y of supersonic propelling rockets and the half-

  mination. (See Exercise 28(c).)

  divergence angle x of the rocket nozzle (“More on

  c. What is the estimated regression function bˆ 0 1 bˆ 1 x1

  Correlating Data,” CHEMTECH, 1976: 266–270):

  bˆ 2 x 2 using the unstandardized variable x?

  d. What is the estimated standard deviation of bˆ 2 computed

  x

  5 10 15 20 25 30 35 in part (c)?

  e. Carry out a test using the standardized estimates to

  y

  decide whether the quadratic term should be retained in the model. Repeat using the unstandardized estimates.

  Parameter

  Estimate

  Estimated SD

  Do your conclusions differ?

  b 35. The article “The Respiration in Air and in Water of the 0 .9671 .0026

  Limpets Patella caerulea and Patella lusitanica” (Comp.

  b 1 .0502

  Biochemistry and Physiology, 1975: 407–411) proposed a

  b 2 .0176

  simple power model for the relationship between respiration

  rate y and temperature x for P. caerulea in air. However, a

  plot of ln(y) versus x exhibits a curved pattern. Fit the qua-

  a. What value of y would you predict when the half-divergence

  dratic power model Y 5 ae bx1gx 2 P to the accompanying

  angle is 20? When x 5 25 ?

  data.

  b. What is the estimated regression function bˆ 0 1 bˆ 1 x1

  bˆ 2 x 2 1 bˆ 3 x 3 for the “unstandardized” model?

  x

  c. Use a level .05 test to decide whether the cubic term should be deleted from the model.

  y

  13.4 Multiple Regression Analysis