10 The article “A Method for Improving the Accuracy of Polynomial Regression
ExamplE 13.10 The article “A Method for Improving the Accuracy of Polynomial Regression
Analysis” (J. of Quality Tech., 1971: 149–155) reports the following data on x 5 cure temperature (°F) and y 5 ultimate shear strength of a rubber compound (psi), with x 5 297.13:
A computer analysis yielded the results shown in Table 13.3. Table 13.3 Estimated Coefficients and Standard Deviations for Example 13.10
Parameter
Estimate
Estimated SD
Parameter
Estimate Estimated SD
b 0 2 26,219.64 11,912.78
b 759.36 23.20 0 b 1 189.21 80.25 b 1 2 7.61 1.43 b 2 2 .3312 .1350 b 2 2 .3312 .1350
The estimated regression function using the original model is y 5 226,219.64 1
189.21x 2 .3312x 2 , whereas for the centered model the function is y 5 759.36 2
7.61 sx 2 297.13d 2 .3312sx 2 297.13d 2 . These estimated functions are identical; the
only difference is that different parameters have been estimated for the two models. The
estimated standard deviations indicate clearly that b 0 and b 1 have been more accurately estimated than b 0 and b 1 . The quadratic parameters are identical sb 2 5b 2 d, as can be seen by comparing the x 2 term in (13.14) with the original model. We emphasize again
that a major benefit of centering is the gain in computational accuracy, not only in quad- ratic but also in higher-degree models.
n
The book by Neter et al., listed in the chapter bibliography, is a good source for more information about polynomial regression.
ExERcisEs Section 13.3 (26–35)
26. The article “Physical Properties of Cumin Seed” (J. of
Data from a graph in the article follows, along with
Agric. Engr. Res., 1996: 93–98) considered a quadratic
Minitab output from the quadratic fit.
regression of y 5 bulk density on x 5 moisture content.
13.3 polynomial regression 569
bulkdens 5 403 1 16.2 moiscont 2 0.706 contsqd
The regression equation is
2 .66, and .20. Construct a plot of the standardized
residuals versus x and a normal probability plot. Do
Predictor
Coef StDev
T
P
the plots exhibit any troublesome features?
Constant
e. The estimated standard deviation of mˆ Y ? 6 —that is,
moiscont
bˆ 0 1 bˆ 1 s6d 1 bˆ 2 s36d—is 1.69. Compute a 95 CI for
RSq 5 93.8
RSq(adj) 5 89.6
f. Compute a 95 PI for a glucose concentration
Analysis of Variance
observation made after 6 days of fermentation time.
Source
DF SS MS F P
28. The viscosity (y) of an oil was measured by a cone and
Regression
plate viscometer at six different cone speeds (x). It was
Residual Error 3
assumed that a quadratic regression model was appropri-
StDev ate, and the estimated regression function resulting from
St
the n 5 6 observations was
Obs moiscont bulkdens
Fit Fit Residual Resid
y52 113.0937 1 3.3684x 2 .01780x 2
a. Estimate m Y ? 75 , the expected viscosity when speed is
b. What viscosity would you predict for a cone speed of
c. If oy i 2 5 8386.43, oy i 5 210.70, ox i y 5 17,002.00,
and ox i y i 5 1,419,780, compute SSE [5 oy i 2
0 oy i 2 bˆ 1 ox i y i 2 bˆ 2 ox 2 i y i ] and s. d. From part (c), SST 5 8386.432(210.70) 2 y65987.35.
a. Does a scatterplot of the data appear consistent with
Using SSE computed in part (c), what is the computed
the quadratic regression model?
value of R 2 ?
b. What proportion of observed variation in density can
e. If the estimated standard deviation of bˆ 2 is
be attributed to the model relationship?
s bˆ 2 5 .00226, test H 0 :b 2 5 0 versus H a :b 2 Þ 0 at
c. Calculate a 95 CI for true average density when
level .01, and interpret the result.
moisture content is 13.7.
29. High-alumina refractory castables have been extensively
d. The last line of output is from a request for estimation
investigated in recent years because of their significant
and prediction information when moisture content
advantages over other refractory brick of the same
is 14. Calculate a 99 PI for density when moisture
class—lower production and application costs, versatility,
content is 14.
and performance at high temperatures. The accompany-
e. Does the quadratic predictor appear to provide useful
ing data on x 5 viscosity sMPa ? sd and y 5 freeflow sd
information? Test the appropriate hypotheses at sig-
was read from a graph in the article “Processing of Zero-
nificance level .05.
Cement Self-Flow Alumina Castables” (The Amer.
27. The following data on y 5 glucose concentration (gL)
Ceramic Soc. Bull., 1998: 60–66) :
and x 5 fermentation time (days) for a particular blend
x 351 367 373 400 402 456 484
of malt liquor was read from a scatterplot in the article
“Improving Fermentation Productivity with Reverse
y 81 83 79 75 70 43 22
Osmosis” (Food Tech., 1984: 92–96) :
The authors of the cited paper related these two variables
x 12345678
using a quadratic regression model. The estimated regres- sion function is y 5 2295.96 1 2.1885x 2 .0031662x 2 .
y 74 54 52 51 52 53 58 71
a. Compute the predicted values and residuals, and then
a. Verify that a scatterplot of the data is consistent with
SSE and s 2 .
the choice of a quadratic regression model.
b. Compute and interpret the coefficient of multiple
b. The estimated quadratic regression equation is
determination.
y5 84.482 2 15.875x 1 1.7679x 2 . Predict the value
c. The estimated SD of bˆ 2 is s bˆ 2 5 .0004835. Does the
of glucose concentration for a fermentation time of
quadratic predictor belong in the regression model?
6 days, and compute the corresponding residual.
d. The estimated SD of bˆ 1 is .4050. Use this and the
c. Using SSE 5 61.77, what proportion of observed
information in (c) to obtain joint CIs for the linear
variation can be attributed to the quadratic regression
and quadratic regression coefficients with a joint
relationship?
confidence level of (at least) 95.
d. The n 5 8 standardized residuals based on the qua-
e. The estimated SD of mˆ Y ? 400 is 1.198. Calculate a 95
dratic model are 1.91, 21.95, 2.25, .58, .90, .04,
CI for true average free-flow when viscosity 5 400
570 ChApter 13 Nonlinear and Multiple regression
and also a 95 PI for free-flow resulting from a
The regression equation is
single observation made when viscosity 5 400, and compare the intervals.
y 5 2134 1 12.7 x 2 0.377 x 2 1 0.00359 x3
30. The accompanying data was extracted from the article
Predictor
Coef
SE Coef TP
“Effects of Cold and Warm Temperatures on
Constant
Springback of Aluminum-Magnesium Alloy 5083-
2 0.37652 0.02444 2 H111” (J. of Engr. Manuf., 2009: 427–431) 15.41 0.000 . The
response variable is yield strength (MPa), and the predic- tor is temperature (°C).
S 5 0.168354 RSq 5 98.0 RSq (adj) 5 97.7
x 250 25 100 200 300
Analysis of Variance Source
Here is Minitab output from fitting the quadratic regres-
Residual Error 20 0.5669 0.0283 Total
sion model (a graph in the cited paper suggests that the authors did this):
a. What proportion of observed variation in energy
Predictor
Coef
SE Coef
T
P
output can be attributed to the model relationship?
b. Fitting a quadratic model to the data results in
temp
R 5 .780. Calculate adjusted R 0.0010050 0.0001213 28.29 0.014 2 for this model and
compare to adjusted R 2 for the cubic model.
S 5 3.44398
RSq 5 98.1
RSq(adj) 5 96.3
c. Does the cubic predictor appear to provide useful
Analysis of Variance
information about y over and above that provided by
the linear and quadratic predictors? State and test the
appropriate hypotheses.
Residual Error 2
d. When x 5 30, s Yˆ 5 .0611. Obtain a 95 CI for true
Total
average energy output in this case, and also a 95 PI
a. What proportion of observed variation in strength
for a single energy output to be observed when tem-
can be attributed to the model relationship?
perature difference is 30. [Hint: s Yˆ 5 .0611.]
b. Carry out a test of hypotheses at significance level .05
e. Interpret the hypotheses H 0 :m Y? 35 5 5 versus H a :
to decide if the quadratic predictor provides useful
m Y? 35 Þ
5, and then carry out a test at significance
information over and above that provided by the lin-
level .05 using the fact that when x 5 35, s Yˆ 5 .0523.
ear predictor.
32. The following data is a subset of data obtained in an c. For a strength value of 100, yˆ 5 134.07, s Yˆ 5 2.38. experiment to study the relationship between x 5 soil pH
Estimate true average strength when temperature is
and y 5 A1 ConcentrationEC (“Root Responses of
100, in a way that conveys information about preci-
Three Gramineae Species to Soil Acidity in an Oxisol
sion and reliability.
and an Ultisol,” Soil Science, 1973: 295–302) :
d. Use the information in (c) to predict strength for a single observation to be made when temperature is
x 4.01 4.07 4.08 4.10 4.18
100, and do so in a way that conveys information
y
about precision and reliability. Then compare this prediction to the estimate obtained in (c).
x
31. The accompanying data on y 5 energy output (W) and
y .76 .40 .45 .39 .30
x5 temperature difference (°K) was provided by the
authors of the article “Comparison of Energy and
x
Exergy Efficiency for Solar Box and Parabolic
Cookers” (J. of Energy Engr., 2007: 53–62) .
y .20 .24 .10 .13 .07 .04
The article’s authors fit a cubic regression model to the
A cubic model was proposed in the article, but the ver-
data. Here is Minitab output from such a fit.
sion of Minitab used by the author of the present text
x 23.20 23.50 23.52 24.30 25.10 26.20 27.40 28.10 29.30 30.60 31.50 32.01 y 3.78 4.12 4.24 5.35 5.87 6.02 6.12 6.41 6.62 6.43 6.13 5.92
x 32.63 33.23 33.62 34.18 35.43 35.62 36.16 36.23 36.89 37.90 39.10 41.66 y 5.64 5.45 5.21 4.98 4.65 4.50 4.34 4.03 3.92 3.65 3.02 2.89
13.3 polynomial regression 571
refused to include the x 3 term in the model, stating that
d. What can you say about the relationship between
“x 3 is highly correlated with other predictor variables.”
SSEs and R 2 ’s for the standardized and unstandard-
To remedy this, x 5 4.3456 was subtracted from each x
ized models? Explain.
value to yield x9 5 x 2 x. A cubic regression was then
e. SSE for the cubic model is .00006300, whereas for a
requested to fit the model having regression function
quadratic model SSE is .00014367. Compute R 2 for
each model. Does the difference between the two 2 sx9d 2 1b 3 sx9d 3 suggest that the cubic term can be deleted?
y5b 0 1b 1 x9 1 b
The following computer output resulted:
34. The following data resulted from an experiment to assess the potential of unburnt colliery spoil as a medium for
Parameter
Estimate
Estimated SD
plant growth. The variables are x 5 acid extractable cat- ions and y 5 exchangeable aciditytotal cation exchange
b 0 .3463 .0366
capacity (“Exchangeable Acidity in Unburnt Colliery
b 1 2 1.2933 .2535
Spoil,” Nature, 1969: 161) :
a. What is the estimated regression function for the “cen-
y
tered” model?
x
b. What is the estimated value of the coefficient b 3 in
the “uncentered” model with regression function
y .91 .78 .69 .52 .48 .55
y5b 0 1b 1 x1b 2 x 2 1b 3 x 3 ? What is the estimate
of b 2 ?
Standardizing the independent variable x to obtain
c. Using the cubic model, what value of y would you
x9 5 sx 2 xdys x and fitting the regression function
predict when soil pH is 4.5?
y5b 0 1b 1 x91 b 2 sx9d 2 yielded the accompanying
d. Carry out a test to decide whether the cubic term
computer output.
should be retained in the model. 33. In many polynomial regression problems, rather than
Parameter
Estimate Estimated SD
fitting a “centered” regression function using x9 5 x 2 x,
b 0 .8733 .0421
computational accuracy can be improved by using a
b 1 2 .3255 .0316
function of the standardized independent variable x9 5 sx 2 xdys x , where s x is the standard deviation of the
b .0448 .0319 2
x i ’s. Consider fitting the cubic regression function
y5b 0 1b 1 x9 1 b 2 sx9d 2 1b 3 sx9d 3 to the following data
a. Estimate m Y? 50 .
resulting from a study of the relation between thrust
b. Compute the value of the coefficient of multiple
efficiency y of supersonic propelling rockets and the
determination. (See Exercise 28(c).)
half-divergence angle x of the rocket nozzle (“More on
c. What is the estimated regression function bˆ
0 1 bˆ 1 x1
Correlating Data,” CHEMTECH, 1976: 266–270)
bˆ 2 x 2 using the unstandardized variable x? d. What is the estimated standard deviation of bˆ
2 com-
x 5 10 15 20 25 30 35
puted in part (c)?
y .985 .996 .988 .962 .940 .915 .878
e. Carry out a test using the standardized estimates to decide whether the quadratic term should be retained
Parameter
Estimate
Estimated SD
in the model. Repeat using the unstandardized esti- mates. Do your conclusions differ?
b 0 .9671 .0026 35. The article b “The Respiration in Air and in Water of
the Limpets Patella caerulea and Patella lusitanica”
b 2 2 .0176 .0023
(Comp. Biochemistry and Physiology, 1975: 407–411)
b 3 .0062 .0031
proposed a simple power model for the relationship between respiration rate y and temperature x for P. cae-
a. What value of y would you predict when the half-
rulea in air. However, a plot of ln(y) versus x exhibits a
divergence angle is 20? When x 5 25?
curved pattern. Fit the quadratic power model
b. What is the estimated regression function
b x1gx Y 5 ae 2 ?e to the accompanying data.
bˆ
2 0 1 bˆ 1 x 1 bˆ 2 x
1 bˆ 3 x 3 for the “unstandardized”
c. Use a level .05 test to decide whether the cubic term should be deleted from the model.
y 37.1 70.1 109.7 177.2 222.6
572 ChApter 13 Nonlinear and Multiple regression