Regression with Transformed Variables
13.2 Regression with Transformed Variables
The necessity for an alternative to the linear model Y 5 b 0 1b 1 x1e may be sug-
gested either by a theoretical argument or else by examining diagnostic plots from a linear regression analysis. In either case, settling on a model whose parameters can
be easily estimated is desirable. An important class of such models is specified by means of functions that are “intrinsically linear.”
DEFINITION A function relating y to x is intrinsically linear if, by means of a transforma-
tion on x andor y, the function can be expressed as y9 5 b 0 1b 1 x9 , where
x9 5 the transformed independent variable and y9 5 the transformed depend- ent variable.
Four of the most useful intrinsically linear functions are given in Table 13.1. In each case, the appropriate transformation is either a log transformation—either base 10 or natural logarithm (base e)—or a reciprocal transformation. Representative graphs of the four functions appear in Figure 13.3.
For an exponential function relationship, only y is transformed to achieve linearity, whereas for a power function relationship, both x and y are transformed. Because the variable x is in the exponent in an exponential relationship, y increases
13.2 regression with transformed Variables 551
Table 13.1 Useful Intrinsically Linear Functions
Function
Transformation(s) to Linearize
Linear Form
a. Exponential: y 5 ae b x y9 5 ln(y)
y9 5 ln(a) 1 bx
b. Power: y 5 ax b y9 5 log(y), x9 5 log(x)
y9 5 log(a) 1 bx9
c. y 5 a 1 b ? log sxd
x9 5 log(x)
y 5 a 1 bx9
d. Reciprocal: y 5 a 1 b ?
When log s ? d appears, either a base 10 or a base e logarithm can be used.
sif b . 0d or decreases sif b, 0 d much more rapidly as x increases than is the case
for the power function, though over a short interval of x values it can be difficult to differentiate between the two functions. Examples of functions that are not intrinsi-
cally linear are y 5 a 1 ge b x and y 5 a 1 gx b .
Figure 13.3 Graphs of the intrinsically linear functions given in Table 13.1
Intrinsically linear functions lead directly to probabilistic models that, though not linear in x as a function, have parameters whose values are easily estimated using ordinary least squares.
DEFINITION
A probabilistic model relating Y to x is intrinsically linear if, by means of a transformation on Y andor x, it can be reduced to a linear probabilistic model
Y9 5 b 0 1b 1 x9 1 e9 .
The intrinsically linear probabilistic models that correspond to the four functions of Table 13.1 are as follows:
a. Y 5 ae b x ?e , a multiplicative exponential model, from which ln(Y) 5 Y9 5 b 0 1
b 1 x9 1 e9 with x9 5 x, b 0 5 ln sad, b 1 5b , and e9 5 ln sed.
b. Y 5 ax b ?e , a multiplicative power model, so that log(Y ) 5 Y9 5 b 0 1 b 1 x91e9
with x9 5 log sxd, b 0 5 log sxd 1 e, and e9 5 logsed.
552 ChApter 13 Nonlinear and Multiple regression
c. Y 5 a 1 b log sxd 1 e, so that x9 5 logsxd immediately linearizes the model.
d. Y 5 a 1 b ? 1 yx 1 e, so that x9 5 1yx yields a linear model.
The additive exponential and power models, Y 5 ae b x 1e and Y 5 ax b 1e , are
not intrinsically linear. Notice that both (a) and (b) require a transformation on Y and, as a result, a transformation on the error variable e. In fact, if e has a lognormal 2 distribution (see Chapter 4) with E
sed 5 e s y2 and V
sed 5 t 2 independent of x, then
the transformed models for both (a) and (b) will satisfy all the assumptions of Chap- ter 12 regarding the linear probabilistic model; this in turn implies that all inferences for the parameters of the transformed model based on these assumptions will be
valid. If s 2 is small, m
Y?x < ae
b x in (a) or ax b
in (b).
The major advantage of an intrinsically linear model is that the parameters b 0
and b 1 of the transformed model can be immediately estimated using the principle of least squares simply by substituting x9 and y9 into the estimating formulas:
Parameters of the original nonlinear model can then be estimated by transforming back
bˆ 0 andor bˆ 1 if necessary. Once a prediction interval for y9 when x9 5 x9 has been cal-
culated, reversing the transformation gives a PI for y itself. In cases (a) and (b), when
s 2 is small, an approximate CI for m Y?x results from taking antilogs of the limits in the
CI for b 0 1b 1 x9 . (Strictly speaking, taking antilogs gives a CI for the median of the Y
distribution, i.e., for m , Y?x . Because the lognormal distribution is positively skewed,
m.m ,; the two are approximately equal if s 2 is close to 0.)
ExamplE 13.3 Taylor’s equation for tool life y as a function of cutting time x states that xy c 5 k or, equivalently, that y 5 ax b (see the Wikipedia entry on Tool wear for more informa-
tion). The article “The Effect of Experimental Error on the Determination of
Optimum Metal Cutting Conditions” (J. of Engr. for Industry, 1967: 315–322) observes that the relationship is not exact (deterministic) and that the parameters
a and b must be estimated from data. Thus an appropriate model is the multipli-
cative power model Y 5 a ? x b ?e , which the author fit to the accompanying data
consisting of 12 carbide tool life observations (Table 13.2). In addition to the x, y, x9 , and y9 values, the predicted transformed values syˆ9d and the predicted values on the original scale syˆ, after transforming back) are given.
The summary statistics for fitting a straight line to the transformed data are ox9 i 5
74.41200, oy i 95 26.22601, ox 9 i 2 5 461.75874, oy 2 i 9 5 67.74609, and ox i 9 y 95 i
160.84601, so 160.84601 2 (74.41200)(26.22601) y12
bˆ 1 5 2 52 5.3996
461.75874 2 (74.41200) y12 26.22601 2 s25.3996ds74.41200d
bˆ 0 5 5 35.6684
The estimated values of a and b, the parameters of the power function model,
are bˆ 5 bˆ 1 52 5.3996 and aˆ 5 e bˆ 0 5 3.094491530 ? 10 15 . Thus the estimated
13.2 regression with transformed Variables 553
Table 13.2 Data for Example 13.3
x
y
x9 5 ln(x)
y9 5 ln(y)
regression function is mˆ
Y?x < 3.094491530 ? 10 ? x 2 . To recapture Taylor’s estimated) equation, set y 5 3.094491530 ? 10 15 ? x 2 5.3996 , whence xy .185 5 740.
Figure 13.4(a) gives a plot of the standardized residuals from the linear regres-
sion using transformed variables (for which r 2 5 .922); there is no apparent pattern
in the plot, though one standardized residual is a bit large, and the residuals look as they should for a simple linear regression. Figure 13.4(b) pictures a plot of yˆ versus y, which indicates satisfactory predictions on the original scale.
To obtain a confidence interval for median tool life when cutting time is 500,
we transform x 5 500 to x9 5 6.21461. Then bˆ 0 1 bˆ 1 x9 5 2.1120, and a 95 CI for
b 0 1b 1 (6.21461) is (from Section 12.4) 2.1120 6 (2.228)(.0824) 5 (1.928, 2.296).
The 95 CI for m , Y? is then obtained by taking antilogs: (e 1.928 , e 2.296 500 ) 5 (6.876, 9.930). It is easily checked that for the transformed data s 2 5 sˆ 2 < .081.
Because this is quite small, (6.876, 9.930) is an approximate interval for m Y? 500 .
e ˆ 3.0 30.0
(b) Figure 13.4 (a) Standardized residuals versus x ’ from Example 13.3; (b) yˆ versus y from
The accompanying data on x 5 length of a scamp (mm) and y 5 mercury content