Other issues in Multiple Regression

13.5 Other issues in Multiple Regression

  In this section, we touch upon a number of issues that may arise when a multiple regression analysis is carried out. Consult the chapter references for a more exten- sive treatment of any particular topic.

  transformations

  Sometimes, theoretical considerations suggest a nonlinear relation between a dependent variable and two or more independent variables, whereas on other occa- sions diagnostic plots indicate that some type of nonlinear function should be used. Frequently a transformation will linearize the model.

  ExamplE 13.18

  Natural single crystal diamond has been widely used in ultraprecision machin- ing. However, its application to the cutting of ferrous metals has been problematic

  due to significant tool wear. The article “Investigation on Frictional Wear of

  Single Crystal Diamond Against Ferrous Metals” (Intl. J. of Refractory Metals

  and Hard Materials, 2013: 174–179) presented the accompanying data on x 1 5 mechanical force (N), x 2 5 sliding velocity (ms), x 3 5 carbon content (), and y 5

  graphitized degree, a measure of diamond wear. Obs

  The investigators proposed and fit the multiplicative power regression model

  Y 5 ax b 1 x b 2 x b 3 e . Taking the natural logarithm of both sides of this equation gives

  ln(Y) 5 ln(a) 1 b 1 ln (x 1 )1b 2 ln (x 2 )1b 3 ln (x 3 ) 1 ln (e) (13.21)

  which is our general additive multiple regression equation with the dependent varia-

  ble being the natural log of graphitized degree and predictors ln(x 1 ), ln(x 2 ), and ln(x 3 ).

  Presuming that e in the original model equation has a lognormal distribution, the random error in our transformed model will be normally distributed. The plausibility of this assumption can be checked with a normal probability plot of the standardized residuals resulting from fitting the transformed model.

  Table 13.4 shows Minitab output from fitting (13.21). The R 2 value is quite

  impressive—about 98 of the observed variation in ln(y) can be attributed to

  the model relationship—and adjusted R 2 is only slightly smaller than R 2 itself.

  Furthermore, the P-value for the model utility F test is .000 (the area under the F 3,5 curve to the right of 81.16), implying a useful relationship between ln(y) and at least

  one of the three predictors. The point estimates of b 1 ,b 2 , and b 3 are .36557, .59366,

  and −.02074, respectively. The point estimate of ln(a) is 22.53727, so the point

  estimate of a itself is e 2 2.53727 5 .079082. The estimated original regression function

  is then .079x .366 1 x .594 2 x 2 3 .021 ; this appears in the cited article.

  596 Chapter 13 Nonlinear and Multiple regression

  Table 13.4 Minitab output for the transformed regression in Example 13.18

  The regression equation is ln(y) 5 2 2.54 1 0.366 ln(x1) 1 0.594 ln(x2) 2 0.0207 ln(x3)

  Predictor

  Coef

  SE Coef

  ln(x1)

  ln(x2)

  ln(x3)

  S 5 0.0372066 R-Sq 5 98.0 R-Sq(adj) 5 96.8 Analysis of Variance

  Residual Error

  Predicted Values for New Observations New Obs

  A point prediction of the value of graphitized degree when force 5 20, velocity 5

  1, and carbon content 5 .25 requires that we first obtain a point prediction of ln(Y) by substituting ln(20), ln(0), and ln(.25) into the estimated regression equation in Table 13.4. The result is ln(yˆ) 5 −1.4134, which appears in the last line of Minitab

  output. Then yˆ 5 e 2 1.4134 5 .243. Similarly, the output gives a 95 PI for ln(Y), so

  a PI for Y itself is (e 2 1.5150 ,e 2 1.3118 ) 5 (.220, .269).

  The normal probability plot of Figure 13.20 exhibits a substantial linear pattern, validating the normality assumption for ln(e). And the plot of standard- ized residuals versus predicted values [of ln(y)] does not show any pattern other than pure randomness, indicating no violation of model assumptions. However,

  looking back at Table 13.4, the P-value for testing H 0 : b 3 5 0 is .246. Thus it appears that as long as ln(x 1 ) and ln(x 2 ) remain in the model, there is no useful

  information about the response variable contained in the natural log of carbon

  2 0.5 10 Standardized residual 2 1.0 5 2 1.5 1

  Fitted value

  Standardized residual

  Figure 13.20 Standardized residual plot and normal probability plot for Example 13.18

  13.5 Other Issues in Multiple Regression 597

  content. Deleting that predictor and refitting gives R 2 5 .973 and a model util- ity F ratio of 107.87. The estimates of b 1 and b 2 are almost identical to those

  for the three-predictor model. Also, the multiple exponential regression model

  Y 5 ae b 1 x 1 1b 2 x 2 « [for which ln(Y) is regressed against x 1 and x 2 rather than against ln(x 1 ) and ln(x 2 )] fits the data about as well as does the power model. None of this

  was mentioned in the cited article.

  n

  The logistic regression model was introduced in Section 13.2 to relate a dichotomous variable y to a single predictor. This model can be extended in an obvi- ous way to incorporate more than one predictor. The probability of success p is now

  a function of the predictors x 1 ,x 2 , …, x k :

  e b 0 1b 1 x 1 1…1b k x k p (x 1 , …, x k )5

  11e b 0 1b 1 x 1 1…1b k x k

  Simple algebra yields an expression for the odds: p (x 1 , …, x k )

  5 e a1b 1 x 1 1…1b k x k

  1 2 p(x 1 , …, x k )

  The interpretation of b i (i 5 1, …, k) is analogous to the interpretation for b 1 given

  in the logit function containing only a single predictor x. That is, the following argu-

  ment shows that the odds change by the multiplicative factor e bi when x i increases

  by 1 unit and all other predictors remain fixed. p (x 1 , …, x i 1 1, …, x k )

  5 e a1b 1 x 1 1…b i (x i 1 1)1…1b k x k

  1 2 p(x 1 , …, x i 1 1, …, x k )

  5 e a1b 1 x 1 1… b i x i 1…1b k x k 1b i p (x 1 , …, x k )

  5 e b i

  1 2 p(x 1 , …, x k )

  Again, statistical software must be used to estimate parameters, calculate relevant standard deviations, and provide other inferential information.

  ExamplE 13.19

  Data was obtained from 189 women who gave birth during a particular period at the Bayside Medical Center in Springfield, MA, in order to identify factors associated with low birth weight. The accompanying Minitab output resulted from a logistic regression in which the dependent variable indicated whether (1) or not (0) a child had low birth weight (,2500 g), and predictors were weight of the mother at her last menstrual period, age of the mother, and an indicator variable for whether (1) or not (0) the mother had smoked during pregnancy.

  Logistic Regression Table

  Odds 95 CI

  Predictor

  Coef SE Coef

  Z

  P

  Ratio Lower Upper

  It appears that age is not an important predictor of LBW, provided that the two other predictors are retained. The other two predictors do appear to be informative. The point estimate of the odds ratio associated with smoking status is 1.92 [ratio of the odds of LBW for a smoker to the odds for a nonsmoker, where odds 5 P sY 5 1dyPsY 5 0d];

  598 ChApter 13 Nonlinear and Multiple regression

  at the 95 confidence level, the odds of a low-birth-weight child could be as much as 3.7 times higher for a smoker what it is for a nonsmoker.

  n

  Please see one of the chapter references for more information on logistic regression, including methods for assessing model effectiveness and adequacy.

Dokumen yang terkait

AN ALIS IS YU RID IS PUT USAN BE B AS DAL AM P E RKAR A TIND AK P IDA NA P E NY E RTA AN M E L AK U K A N P R AK T IK K E DO K T E RA N YA NG M E N G A K IB ATK AN M ATINYA P AS IE N ( PUT USA N N O MOR: 9 0/PID.B /2011/ PN.MD O)

0 82 16

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

19 819 7

Anal isi s L e ve l Pe r tanyaan p ad a S oal Ce r ita d alam B u k u T e k s M at e m at ik a Pe n u n jang S MK Pr ogr a m Keahl ian T e k n ologi , Kese h at an , d an Pe r tani an Kelas X T e r b itan E r lan gga B e r d asarkan T ak s on om i S OL O

2 99 16

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

1 29 9

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

7 202 3

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

0 63 87

The Correlation between students vocabulary master and reading comprehension

16 145 49

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

8 140 133

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

9 128 37

Transmission of Greek and Arabic Veteri

0 1 22