10 The article “A Method for Improving the Accuracy of Polynomial Regression
Example 13.10 The article “A Method for Improving the Accuracy of Polynomial Regression
Analysis” (J. of Quality Tech., 1971: 149–155) reports the following data on x5 cure temperature (°F) and y 5 ultimate shear strength of a rubber compound (psi), with : x 5 297.13
A computer analysis yielded the results shown in Table 13.4.
Table 13.4 Estimated Coefficients and Standard Deviations for Example 13.10 Parameter
Estimate
Estimated SD
Parameter
Estimate Estimated SD
The estimated regression function using the original model is y 5 226,219.64
1 189.21x 2 .3312x 2 , whereas for the centered model the function is y 5 759.36
2 7.61(x 2 297.13) 2 .3312(x 2 297.13) 2 . These estimated functions are identical;
the only difference is that different parameters have been estimated for the two models.
The estimated standard deviations indicate clearly that b 0 and b 1 have been more accu- rately estimated than b 0 and b 1 . The quadratic parameters are identical (b 2 5 b 2 ) , as can
be seen by comparing the x 2 term in (13.14) with the original model. We emphasize again
that a major benefit of centering is the gain in computational accuracy, not only in quad- ratic but also in higher-degree models.
■ The book by Neter et al., listed in the chapter bibliography, is a good source
for more information about polynomial regression.
EXERCISES Section 13.3 (26–35)
26. The article “Physical Properties of Cumin Seed” (J. of Agric.
graph in the article follows, along with Minitab output from
Engr. Res., 1996: 93–98) considered a quadratic regression
the quadratic fit.
of y 5 bulk density on x 5 moisture content. Data from a
CHAPTER 13 Nonlinear and Multiple Regression
The regression equation is
x and a normal probability plot. Do the plots exhibit any
bulkdens 5 403 1 16.2 moiscont 2 0.706 contsqd
troublesome features?
e. The estimated standard deviation of mˆ Y 6 —that is,
0 1 bˆ 1 (6) 1 bˆ 2 (36) —is 1.69. Compute
S 5 10.15 R - Sq 5 93.8 R - Sq(adj) 5 89.6 f. Compute a 95 PI for a glucose concentration observa- tion made after 6 days of fermentation time.
Analysis of Variance
28. The viscosity (y) of an oil was measured by a cone and plate
viscometer at six different cone speeds (x). It was assumed
Regression
Residual Error
that a quadratic regression model was appropriate, and the
Total
estimated regression function resulting from the n56 observations was
StDev
St
Obs moiscont bulkdens
Fit Fit Residual Resid
y 5 2113.0937 1 3.3684x 2 .01780x 2
8.21 0.98 a. Estimate m Y 75 , the expected viscosity when speed is
6.93 0.85 b. What viscosity would you predict for a cone speed of
c. If , gy i 2 5 8386.43, g y i 5 210.70, g x i y i 5 17,002.00
StDev
and , gx i y i 5 1,419,780 compute SSE [5 g y i
d. From part (c), SST 5 8386.43 2 (210.70) 2 6 5 987.35 .
a. Does a scatter plot of the data appear consistent with the
Using SSE computed in part (c), what is the computed
quadratic regression model?
value of R 2 ?
b. What proportion of observed variation in density can be
e. If the estimated standard deviation of bˆ 2 is s bˆ 2 5 .00226 ,
attributed to the model relationship?
test H 0 :b 2 50 versus H a :b 2 0 at level .01, and
c. Calculate a 95 CI for true average density when mois-
interpret the result.
ture content is 13.7.
29. High-alumina refractory castables have been extensively
d. The last line of output is from a request for estimation and
investigated in recent years because of their significant
prediction information when moisture content is 14. Cal-
advantages over other refractory brick of the same class—
culate a 99 PI for density when moisture content is 14.
lower production and application costs, versatility, and per-
e. Does the quadratic predictor appear to provide useful infor-
formance at high temperatures. The accompanying data on
mation? Test the appropriate hypotheses at significance
x 5 viscosity (MPa s) and was y 5 free-flow () read
level .05.
from a graph in the article “Processing of Zero-Cement
27. The following data on y 5 glucose concentration (gL) and
Self-Flow Alumina Castables” (The Amer. Ceramic Soc.
x 5 fermentation time (days) for a particular blend of malt
Bull., 1998: 60–66):
liquor was read from a scatter plot in the article “Improving
Fermentation Productivity with Reverse Osmosis” (Food Tech., 1984: 92–96):
1 2 3 4 5 6 7 8 The authors of the cited paper related these two variables using a quadratic regression model. The estimated regres-
y
74 54 52 51 52 53 58 71 sion function is y 5 2295.96 1 2.1885x 2 .0031662x 2 . a. Compute the predicted values and residuals, and then
a. Verify that a scatter plot of the data is consistent with the
SSE and s 2 .
choice of a quadratic regression model.
b. Compute and interpret the coefficient of multiple
b. The estimated quadratic regression equation is
determination.
y 5 84.482 2 15.875x 1 1.7679x 2 . Predict the value of
c. The estimated SD of bˆ 2 is s bˆ 2 5 .0004835 . Does the
glucose concentration for a fermentation time of 6 days,
quadratic predictor belong in the regression model?
and compute the corresponding residual.
d. The estimated SD of bˆ 1 is .4050. Use this and the infor-
c. Using SSE 5 61.77 , what proportion of observed variation
mation in (c) to obtain joint CIs for the linear and quad-
can be attributed to the quadratic regression relationship?
ratic regression coefficients with a joint confidence level
d. The n58 standardized residuals based on the quadratic
of (at least) 95.
model are 1.91,
1.95, .25, .58, .90, .04, .66, and
e. The estimated SD of mˆ Y 400 is 1.198. Calculate a 95 CI
.20. Construct a plot of the standardized residuals versus
for true average free-flow when viscosity 5 400 and also
13.3 Polynomial Regression
a 95 PI for free-flow resulting from a single observation
The regression equation is
made when viscosity 5 400 , and compare the intervals.
y 52 134 1 12.7 x 2 0.377 x 2 1 0.00359 x 3
30. The accompanying data was extracted from the article “Effects of Cold and Warm Temperatures on Springback of
Predictor
Coef
SE Coef TP
Aluminum-Magnesium Alloy 5083-H111” (J. of Engr.
Manuf., 2009: 427–431). The response variable is yield
strength (MPa), and the predictor is temperature (°C).
x3
0.0035861 0.0002529 14.18 0.000 S 5 0.168354 R - Sq 5 98.0 R - Sq (adj) 5 97.7
Analysis of Variance
Here is Minitab output from fitting the quadratic regression
Residual Error 20
model (a graph in the cited paper suggests that the authors
Total
did this):
a. What proportion of observed variation in energy output
Predictor
Coef
SE Coef
T
P
can be attributed to the model relationship?
b. Fitting a quadratic model to the data results in R 2 5 .780 .
Calculate adjusted R 2 for this model and compare to
adjusted R 2 for the cubic model.
S 5 3.44398
R - Sq 5 98.1 R - Sq(adj) 5 96.3 c. Does the cubic predictor appear to provide useful
information about y over and above that provided by the linear and quadratic predictors? State and test the
Analysis of Variance
appropriate hypotheses.
Regression
Residual Error 2 23.72 11.86 d. When x 5 30, s Yˆ
5 .0611 . Obtain a 95 CI for true aver-
Total
age energy output in this case, and also a 95 PI for a sin- gle energy output to be observed when temperature
a. What proportion of observed variation in strength can be
difference is 30. Hint: s Yˆ
attributed to the model relationship?
e. Interpret the hypotheses H 0 :m Y 35 55 versus H a :m 35 2 5 , and then carry out a test at significance
5 .0523 decide if the quadratic predictor provides useful informa- . Yˆ
b. Carry out a test of hypotheses at significance level .05 to
Y
level .05 using the fact that when x 5 35, s
tion over and above that provided by the linear predictor.
32. The following data is a subset of data obtained in an exper-
c. For a strength value of 100, yˆ 5 134.07, s Yˆ
iment to study the relationship between x 5 soil pH and
Estimate true average strength when temperature is 100,
y 5 A1 . ConcentrationEC (“Root Responses of Three
in a way that conveys information about precision and
Gramineae Species to Soil Acidity in an Oxisol and an
reliability.
Ultisol,” Soil Science, 1973: 295–302):
d. Use the information in (c) to predict strength for a single observation to be made when temperature is 100, and do
x
so in a way that conveys information about precision and
reliability. Then compare this prediction to the estimate obtained in (c).
x
31. The accompanying data on y 5 energy output (W) and
x 5 temperature difference (°K) was provided by the authors of the article “Comparison of Energy and Exergy
x
Efficiency for Solar Box and Parabolic Cookers” (J. of Energy Engr., 2007: 53–62).
The article’s authors fit a cubic regression model to the data.
A cubic model was proposed in the article, but the version
Here is Minitab output from such a fit.
of Minitab used by the author of the present text refused to
x 23.20 23.50 23.52 24.30 25.10 26.20 27.40 28.10 29.30 30.60 31.50 32.01
y
x 32.63 33.23 33.62 34.18 35.43 35.62 36.16 36.23 36.89 37.90 39.10 41.66
y
CHAPTER 13 Nonlinear and Multiple Regression
include the x 3 term in the model, stating that “x 3 is highly
d. What can you say about the relationship between SSEs
correlated with other predictor variables.” To remedy this,
and R 2 ’s for the standardized and unstandardized mod-
x 5 4.3456 was subtracted from each x value to yield
els? Explain.
xr 5 x 2 x . A cubic regression was then requested to fit the
e. SSE for the cubic model is .00006300, whereas for a
model having regression function
quadratic model SSE is .00014367. Compute R 2 for each model. Does the difference between the two suggest that
y5b 2 0 3 1b 1 xr 1 b 2 (xr) 1b 3 (xr)
the cubic term can be deleted? 34. The following data resulted from an experiment to assess
The following computer output resulted:
the potential of unburnt colliery spoil as a medium for plant growth. The variables are x 5 acid extractable cations and
Parameter
Estimate
Estimated SD
y 5 exchangeable aciditytotal cation exchange capacity (“Exchangeable Acidity in Unburnt Colliery Spoil,” Nature,
a. What is the estimated regression function for the “centered”
b. What is the estimated value of the coefficient b in the
“uncentered” model with regression function y5b 0 1
b x1b 2 x 2 1b 3 x 1 3 ? What is the estimate of b 2 ?
Standardizing the independent variable x to obtain
c. Using the cubic model, what value of y would you predict
xr 5 (x 2 x)s x and fitting the regression function
when soil pH is 4.5?
y 5 b 0 1 b 1 xr 1 b 2 (xr) 2 yielded the accompanying com-
d. Carry out a test to decide whether the cubic term should
puter output.
be retained in the model. 33. In many polynomial regression problems, rather than fit-
Parameter
Estimate
Estimated SD
ting a “centered” regression function using xr 5 x 2 x ,
b .8733
computational accuracy can be improved by using a func-
b 1 .3255
tion of the standardized independent variable
xr 5 (x 2 x )s x , where s x is the standard deviation of the
b 2 .0448
x i ’s. Consider fitting the cubic regression function
y5b xr 1 b 0 1b 1 2 (xr) 2 1b 3 3 (xr) to the following data
a. Estimate . m Y 50
resulting from a study of the relation between thrust effi-
b. Compute the value of the coefficient of multiple deter-
ciency y of supersonic propelling rockets and the half-
mination. (See Exercise 28(c).)
divergence angle x of the rocket nozzle (“More on
c. What is the estimated regression function bˆ 0 1 bˆ 1 x1
Correlating Data,” CHEMTECH, 1976: 266–270):
bˆ 2 x 2 using the unstandardized variable x?
d. What is the estimated standard deviation of bˆ 2 computed
x
5 10 15 20 25 30 35 in part (c)?
e. Carry out a test using the standardized estimates to
y
decide whether the quadratic term should be retained in the model. Repeat using the unstandardized estimates.
Parameter
Estimate
Estimated SD
Do your conclusions differ?
b 35. The article “The Respiration in Air and in Water of the 0 .9671 .0026
Limpets Patella caerulea and Patella lusitanica” (Comp.
b 1 .0502
Biochemistry and Physiology, 1975: 407–411) proposed a
b 2 .0176
simple power model for the relationship between respiration
rate y and temperature x for P. caerulea in air. However, a
plot of ln(y) versus x exhibits a curved pattern. Fit the qua-
a. What value of y would you predict when the half-divergence
dratic power model Y 5 ae bx1gx 2 P to the accompanying
angle is 20? When x 5 25 ?
data.
b. What is the estimated regression function bˆ 0 1 bˆ 1 x1
bˆ 2 x 2 1 bˆ 3 x 3 for the “unstandardized” model?
x
c. Use a level .05 test to decide whether the cubic term should be deleted from the model.
y
13.4 Multiple Regression Analysis