Y5a1b 1x 1 P , so that xr 5 1x yields a linear model.

d. Y5a1b 1x 1 P , so that xr 5 1x yields a linear model.

  The additive exponential and power models, Y 5 ae bx 1P and Y 5 ax b 1P , are

  not intrinsically linear. Notice that both (a) and (b) require a transformation on Y and, as a result, a transformation on the error variable . In fact, if has a lognormal dis- P

  tribution (see Chapter 4) with E(P) 5 e s 2 and V(P) 5 t 2 independent of x, then the

  transformed models for both (a) and (b) will satisfy all the assumptions of Chapter

  12 regarding the linear probabilistic model; this in turn implies that all inferences for the parameters of the transformed model based on these assumptions will be valid.

  If s 2 is small, m < ae bx Yx in (a) or ax b in (b).

  The major advantage of an intrinsically linear model is that the parameters b 0 and b 1 of the transformed model can be immediately estimated using the principle

  of least squares simply by substituting x and y into the estimating formulas:

  1 g(xr 2

  Parameters of the original nonlinear model can then be estimated by transforming back

  b ˆ 0 andor b ˆ 1 if necessary. Once a prediction interval for y when xr 5 xr has been cal- culated, reversing the transformation gives a PI for y itself. In cases (a) and (b), when

  s 2 is small, an approximate CI for m Yx results from taking antilogs of the limits in the

  CI for b 0 1b 1 xr (strictly speaking, taking antilogs gives a CI for the median of the Y distribution, i.e., for m | Y x . Because the lognormal distribution is positively skewed,

  m.m | ; the two are approximately equal if s 2 is close to 0.)

  Example 13.3 Taylor’s equation for tool life y as a function of cutting time x states that xy c 5k or, equivalently, that y 5 ax b . The article “The Effect of Experimental Error on

  the Determination of Optimum Metal Cutting Conditions” (J. of Engr. for Industry, 1967: 315–322) observes that the relationship is not exact (deterministic) and that the parameters a and b must be estimated from data. Thus an appropriate model

  is the multiplicative power model Y5a x b P , which the author fit to the accom-

  panying data consisting of 12 carbide tool life observations (Table 13.2). In addi- tion to the x, y, x , and y values, the predicted transformed values (yˆr) and the predicted values on the original scale , after transforming back) are given. (yˆ

  The summary statistics for fitting a straight line to the transformed data are gxr i 5

  74.41200, g y r 5 26.22601, g x r 2 i i 5 461.75874, g y 2 i r 5 67.74609 , and gx i ry i r5

  The estimated values of a and b, the parameters of the power function model, are

  b ˆ 5 bˆ 1 5 25.3996 and a ˆ5e bˆ 0 5 3.094491530 10 15 . Thus the estimated

  CHAPTER 13 Nonlinear and Multiple Regression

  Table 13.2 Data for Example 13.3

  x

  y

  xr 5 ln(x)

  yr 5 ln(y)

  regression function is m ˆ Yx < 3.094491530 10 15 x 25.3996 . To recapture Taylor’s (estimated) equation, set y 5 3.094491530 10 15 x 25.3996 , whence . xy .185 5 740

  Figure 13.4(a) gives a plot of the standardized residuals from the linear regres-

  sion using transformed variables (for which r 2 5 .922 ); there is no apparent pattern

  in the plot, though one standardized residual is a bit large, and the residuals look as they should for a simple linear regression. Figure 13.4(b) pictures a plot of versus yˆ y, which indicates satisfactory predictions on the original scale.

  To obtain a confidence interval for median tool life when cutting time is 500, we

  transform x 5 500 to xr 5 6.21461 . Then b ˆ 0 1b ˆ 1 xr 5 2.1120 , and a 95 CI for

  b 0 1b 1 (6.21461) is (from Section 12.4) 2.1120 6 (2.228)(.0824) 5 (1.928, 2.296) . The 95 CI for m |

  is then obtained by taking antilogs: (e Y500 1.928 ,

  e 2.296 ) 5 (6.876, 9.930) .

  It is easily checked that for the transformed data s 2 5s ˆ 2 < .081 . Because this is quite

  small, (6.876, 9.930) is an approximate interval for m Y500 .

  e yˆ 3.0 30.0

  Figure 13.4 (a) Standardized residuals versus x’ from Example 13.3; (b) versus y from yˆ Example 13.3

  ■

  Example 13.4 In the article “Ethylene Synthesis in Lettuce Seeds: Its Physiological

  Significance” (Plant Physiology, 1972: 719–722), ethylene content of lettuce seeds (y, in nLg dry wt) was studied as a function of exposure time (x, in min) to

  13.2 Regression with Transformed Variables

  an ethylene absorbent. Figure 13.5 presents both a scatter plot of the data and a plot of the residuals generated from a linear regression of y on x. Both plots show

  a strong curved pattern, suggesting that a transformation to achieve linearity is appropriate. In addition, a linear regression gives negative predictions for x 5 90 and . x 5 100

  Figure 13.5 (a) Scatter plot; (b) residual plot from linear regression for the data in Example 13.4

  The author did not give any argument for a theoretical model, but his plot of versus yr 5 ln(y) x shows a strong linear relationship, suggesting that an exponential function will provide a good fit to the data. Table 13.3 shows the data values and other information from a linear regression of y on x. The

  estimates of parameters of the linear model are b ˆ 1 5 2.0323 and b ˆ 0 5 5.941 , with The r 2 5 .995. estimated regression function for the exponential model is

  mˆ

  bˆ e bˆ

  Yx

  substitution of x

  yˆ r

  (i 5 1, c, n) into m ˆ Yx or else by computing yˆ i 5e i , where the s yˆr i

  i

  are the predictions from the transformed straight-line model. Figure 13.6 presents both

  a plot of er versus x (the standardized residuals from a linear regression) and a plot of yˆ versus y. These plots support the choice of an exponential model.

  Table 13.3 Data for Example 13.4

  ⫽ ln(y) yˆr yˆ ⫽e

  CHAPTER 13 Nonlinear and Multiple Regression

  Figure 13.6 Plot of (a) standardized residuals (after transforming) versus x; (b) versus yˆ y for data in Example 13.4

  ■

  In analyzing transformed data, one should keep in mind the following points:

  1. Estimating b 1 and b 0 as in (13.5) and then transforming back to obtain estimates

  of the original parameters is not equivalent to using the principle of least squares directly on the original model. Thus, for the exponential model, we could estimate

  a and b by minimizing g(y 2 ae bx i 2 i ) . Iterative computation would be necessary.

  In general, a ˆ2e b ˆ 0 and bˆ 2 bˆ 1 .

  2. If the chosen model is not intrinsically linear, the approach summarized in (13.5) cannot be used. Instead, least squares (or some other fitting procedure) would have to be applied to the untransformed model. Thus, for the additive exponential model Y 5 ae bx 1P , least squares would involve minimizing g(y

  2 ae i ) . Taking

  bx 2

  i

  partial derivatives with respect to a and b results in two nonlinear normal equations in a and b; these equations must then be solved using an iterative procedure.

  3. When the transformed linear model satisfies all the assumptions listed in Chapter 12, the method of least squares yields best estimates of the transformed parameters. However, estimates of the original parameters may not be best in any sense, though they will be reasonable. For example, in the exponential model, the

  estimator a ˆ5e b ˆ 0 will not be unbiased, though it will be the maximum likelihood

  estimator of a if the error variable

  is normally distributed. Using least squares

  directly (without transforming) could yield better estimates.

  4. If a transformation on y has been made and one wishes to use the standard for-

  mulas to test hypotheses or construct CIs, P should be at least approximately

  normally distributed. To check this, the residuals from the transformed regression should be examined.

  5. When y is transformed, the r 2 value from the resulting regression refers to varia-

  tion in the ’s, explained by the transformed regression model. Although a high y i r

  value of r 2 here indicates a good fit of the estimated original nonlinear model to

  the observed y i ’s, r 2 does not refer to these original observations. Perhaps the best

  way to assess the quality of the fit is to compute the predicted values yˆr i using the transformed model, transform them back to the original y scale to obtain , and i then plot versus y. A good fit is then evidenced by points close to the 45° line. yˆ

  One could compute SSE 5 g (y 2 yˆ ) 2 i i as a numerical measure of the goodness of fit. When the model was linear, we compared this to SST 5 g (y 2 y) 2 i , the total variation about the horizontal line at height ; this led to r y 2 . In the nonlinear

  case, though, it is not necessarily informative to measure total variation in this

  way, so an r 2 value is not as useful as in the linear case.

  13.2 Regression with Transformed Variables