Regression with Transformed Variables

13.2 Regression with Transformed Variables

  The necessity for an alternative to the linear model Y 5 b 0 1b 1 x1e may be sug-

  gested either by a theoretical argument or else by examining diagnostic plots from a linear regression analysis. In either case, settling on a model whose parameters can

  be easily estimated is desirable. An important class of such models is specified by means of functions that are “intrinsically linear.”

  DEFINITION A function relating y to x is intrinsically linear if, by means of a transforma-

  tion on x andor y, the function can be expressed as y9 5 b 0 1b 1 x9 , where

  x9 5 the transformed independent variable and y9 5 the transformed depend- ent variable.

  Four of the most useful intrinsically linear functions are given in Table 13.1. In each case, the appropriate transformation is either a log transformation—either base 10 or natural logarithm (base e)—or a reciprocal transformation. Representative graphs of the four functions appear in Figure 13.3.

  For an exponential function relationship, only y is transformed to achieve linearity, whereas for a power function relationship, both x and y are transformed. Because the variable x is in the exponent in an exponential relationship, y increases

  13.2 regression with transformed Variables 551

  Table 13.1 Useful Intrinsically Linear Functions

  Function

  Transformation(s) to Linearize

  Linear Form

  a. Exponential: y 5 ae b x y9 5 ln(y)

  y9 5 ln(a) 1 bx

  b. Power: y 5 ax b y9 5 log(y), x9 5 log(x)

  y9 5 log(a) 1 bx9

  c. y 5 a 1 b ? log sxd

  x9 5 log(x)

  y 5 a 1 bx9

  d. Reciprocal: y 5 a 1 b ?

  When log s ? d appears, either a base 10 or a base e logarithm can be used.

  sif b . 0d or decreases sif b, 0 d much more rapidly as x increases than is the case

  for the power function, though over a short interval of x values it can be difficult to differentiate between the two functions. Examples of functions that are not intrinsi-

  cally linear are y 5 a 1 ge b x and y 5 a 1 gx b .

  Figure 13.3 Graphs of the intrinsically linear functions given in Table 13.1

  Intrinsically linear functions lead directly to probabilistic models that, though not linear in x as a function, have parameters whose values are easily estimated using ordinary least squares.

  DEFINITION

  A probabilistic model relating Y to x is intrinsically linear if, by means of a transformation on Y andor x, it can be reduced to a linear probabilistic model

  Y9 5 b 0 1b 1 x9 1 e9 .

  The intrinsically linear probabilistic models that correspond to the four functions of Table 13.1 are as follows:

  a. Y 5 ae b x ?e , a multiplicative exponential model, from which ln(Y) 5 Y9 5 b 0 1

  b 1 x9 1 e9 with x9 5 x, b 0 5 ln sad, b 1 5b , and e9 5 ln sed.

  b. Y 5 ax b ?e , a multiplicative power model, so that log(Y ) 5 Y9 5 b 0 1 b 1 x91e9

  with x9 5 log sxd, b 0 5 log sxd 1 e, and e9 5 logsed.

  552 ChApter 13 Nonlinear and Multiple regression

  c. Y 5 a 1 b log sxd 1 e, so that x9 5 logsxd immediately linearizes the model.

  d. Y 5 a 1 b ? 1 yx 1 e, so that x9 5 1yx yields a linear model.

  The additive exponential and power models, Y 5 ae b x 1e and Y 5 ax b 1e , are

  not intrinsically linear. Notice that both (a) and (b) require a transformation on Y and, as a result, a transformation on the error variable e. In fact, if e has a lognormal 2 distribution (see Chapter 4) with E

  sed 5 e s y2 and V

  sed 5 t 2 independent of x, then

  the transformed models for both (a) and (b) will satisfy all the assumptions of Chap- ter 12 regarding the linear probabilistic model; this in turn implies that all inferences for the parameters of the transformed model based on these assumptions will be

  valid. If s 2 is small, m

  Y?x < ae

  b x in (a) or ax b

  in (b).

  The major advantage of an intrinsically linear model is that the parameters b 0

  and b 1 of the transformed model can be immediately estimated using the principle of least squares simply by substituting x9 and y9 into the estimating formulas:

  Parameters of the original nonlinear model can then be estimated by transforming back

  bˆ 0 andor bˆ 1 if necessary. Once a prediction interval for y9 when x9 5 x9 has been cal-

  culated, reversing the transformation gives a PI for y itself. In cases (a) and (b), when

  s 2 is small, an approximate CI for m Y?x results from taking antilogs of the limits in the

  CI for b 0 1b 1 x9 . (Strictly speaking, taking antilogs gives a CI for the median of the Y

  distribution, i.e., for m , Y?x . Because the lognormal distribution is positively skewed,

  m.m ,; the two are approximately equal if s 2 is close to 0.)

  ExamplE 13.3 Taylor’s equation for tool life y as a function of cutting time x states that xy c 5 k or, equivalently, that y 5 ax b (see the Wikipedia entry on Tool wear for more informa-

  tion). The article “The Effect of Experimental Error on the Determination of

  Optimum Metal Cutting Conditions” (J. of Engr. for Industry, 1967: 315–322) observes that the relationship is not exact (deterministic) and that the parameters

  a and b must be estimated from data. Thus an appropriate model is the multipli-

  cative power model Y 5 a ? x b ?e , which the author fit to the accompanying data

  consisting of 12 carbide tool life observations (Table 13.2). In addition to the x, y, x9 , and y9 values, the predicted transformed values syˆ9d and the predicted values on the original scale syˆ, after transforming back) are given.

  The summary statistics for fitting a straight line to the transformed data are ox9 i 5

  74.41200, oy i 95 26.22601, ox 9 i 2 5 461.75874, oy 2 i 9 5 67.74609, and ox i 9 y 95 i

  160.84601, so 160.84601 2 (74.41200)(26.22601) y12

  bˆ 1 5 2 52 5.3996

  461.75874 2 (74.41200) y12 26.22601 2 s25.3996ds74.41200d

  bˆ 0 5 5 35.6684

  The estimated values of a and b, the parameters of the power function model,

  are bˆ 5 bˆ 1 52 5.3996 and aˆ 5 e bˆ 0 5 3.094491530 ? 10 15 . Thus the estimated

  13.2 regression with transformed Variables 553

  Table 13.2 Data for Example 13.3

  x

  y

  x9 5 ln(x)

  y9 5 ln(y)

  regression function is mˆ

  Y?x < 3.094491530 ? 10 ? x 2 . To recapture Taylor’s estimated) equation, set y 5 3.094491530 ? 10 15 ? x 2 5.3996 , whence xy .185 5 740.

  Figure 13.4(a) gives a plot of the standardized residuals from the linear regres-

  sion using transformed variables (for which r 2 5 .922); there is no apparent pattern

  in the plot, though one standardized residual is a bit large, and the residuals look as they should for a simple linear regression. Figure 13.4(b) pictures a plot of yˆ versus y, which indicates satisfactory predictions on the original scale.

  To obtain a confidence interval for median tool life when cutting time is 500,

  we transform x 5 500 to x9 5 6.21461. Then bˆ 0 1 bˆ 1 x9 5 2.1120, and a 95 CI for

  b 0 1b 1 (6.21461) is (from Section 12.4) 2.1120 6 (2.228)(.0824) 5 (1.928, 2.296).

  The 95 CI for m , Y? is then obtained by taking antilogs: (e 1.928 , e 2.296 500 ) 5 (6.876, 9.930). It is easily checked that for the transformed data s 2 5 sˆ 2 < .081.

  Because this is quite small, (6.876, 9.930) is an approximate interval for m Y? 500 .

  e ˆ 3.0 30.0

  (b) Figure 13.4 (a) Standardized residuals versus x ’ from Example 13.3; (b) yˆ versus y from

  The accompanying data on x 5 length of a scamp (mm) and y 5 mercury content

Dokumen yang terkait

AN ALIS IS YU RID IS PUT USAN BE B AS DAL AM P E RKAR A TIND AK P IDA NA P E NY E RTA AN M E L AK U K A N P R AK T IK K E DO K T E RA N YA NG M E N G A K IB ATK AN M ATINYA P AS IE N ( PUT USA N N O MOR: 9 0/PID.B /2011/ PN.MD O)

0 82 16

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

19 819 7

Anal isi s L e ve l Pe r tanyaan p ad a S oal Ce r ita d alam B u k u T e k s M at e m at ik a Pe n u n jang S MK Pr ogr a m Keahl ian T e k n ologi , Kese h at an , d an Pe r tani an Kelas X T e r b itan E r lan gga B e r d asarkan T ak s on om i S OL O

2 99 16

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

1 29 9

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

7 202 3

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

0 63 87

The Correlation between students vocabulary master and reading comprehension

16 145 49

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

8 140 133

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

9 128 37

Transmission of Greek and Arabic Veteri

0 1 22