Estimating the Coefficients

12.2 Estimating the Coefficients

  In this section, we obtain the least squares estimators of the parameters β 0 ,β 1 ,...,β k

  by fitting the multiple linear regression model

  μ Y |x 1 ,x 2 ,...,x k =β 0 +β 1 x 1 + ···+β k x k

  to the data points

  {(x 1i ,x 2i ,...,x ki ,y i );

  i = 1, 2, . . . , n and n > k },

  where y i is the observed response to the values x 1i ,x 2i ,...,x ki of the k independent

  variables x 1 ,x 2 ,...,x k . Each observation (x 1i ,x 2i ,...,x ki ,y i ) is assumed to satisfy

  the following equation.

  Multiple Linear

  y i =β 0 +β 1 x 1i +β 2 x 2i + ···+β k x ki + i

  Regression Model or

  y i =ˆ y i +e i =b 0 +b 1 x 1i +b 2 x 2i + ···+b k x ki +e i ,

  where i and e i are the random error and residual, respectively, associated with the response y i and fitted value ˆ y i .

  As in the case of simple linear regression, it is assumed that the i are independent

  and identically distributed with mean 0 and common variance σ 2 . In using the concept of least squares to arrive at estimates b 0 ,b 1 ,...,b k , we

  minimize the expression

  −b 2 0 −b 1 x 1i −b 2 x 2i −···−b k x ki ) .

  i=1

  i=1

  Differentiating SSE in turn with respect to b 0 ,b 1 ,...,b k and equating to zero, we

  generate the set of k + 1 normal equations for multiple linear regression.

  12.2 Estimating the Coefficients

  n

  Normal Estimation n n n

  ···+b k

  x ki = y i

  Equations for

  Multiple Linear i=1

  +b 2 x 1i x 2i + ···+b k

  ···+b 2 k x

  These equations can be solved for b 0 ,b 1 ,b 2 ,...,b k by any appropriate method for

  solving systems of linear equations. Most statistical software can be used to obtain numerical solutions of the above equations.

  Example 12.1:

  A study was done on a diesel-powered light-duty pickup truck to see if humidity, air temperature, and barometric pressure influence emission of nitrous oxide (in ppm). Emission measurements were taken at different times, with varying experimental conditions. The data are given in Table 12.2. The model is

  μ Y |x 1 ,x 2 ,x 3 =β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 ,

  or, equivalently,

  y i =β 0 +β 1 x 1i +β 2 x 2i +β 3 x 3i + i ,

  i = 1, 2, . . . , 20.

  Fit this multiple linear regression model to the given data and then estimate the amount of nitrous oxide emitted for the conditions where humidity is 50, tem- perature is 76 ◦

  F, and barometric pressure is 29.30.

  Table 12.1: Data for Example 12.1

  Nitrous

  Humidity, Temp., Pressure,

  Nitrous

  Humidity, Temp., Pressure,

  Oxide, y

  x 1 x 2 x 3 Oxide, y

  Source: Charles T. Hare, “Light-Duty Diesel Emission Correction Factors for Ambient Conditions,” EPA-6002-77- 116. U.S. Environmental Protection Agency.

  Solution : The solution of the set of estimating equations yields the unique estimates

  b 0 = −3.507778, b 1 = −0.002625, b 2 = 0.000799, b 3 = 0.154155.

  Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models

  Therefore, the regression equation is

  ˆ y= −3.507778 − 0.002625 x 1 + 0.000799 x 2 + 0.154155 x 3 .

  For 50 humidity, a temperature of 76 ◦

  F, and a barometric pressure of 29.30, the

  estimated amount of nitrous oxide emitted is

  ˆ y= −3.507778 − 0.002625(50.0) + 0.000799(76.0) + 0.1541553(29.30)

  = 0.9384 ppm.

  Polynomial Regression

  Now suppose that we wish to fit the polynomial equation

  μ Y |x =β 0 +β 1 x+β 2 x 2 + ···+β r x r

  to the n pairs of observations {(x i ,y i ); i = 1, 2, . . . , n }. Each observation, y i , satisfies the equation

  y i =β 0 +β 1 x i +β 2 x 2 i + ···+β r x r i + i

  or

  y i =ˆ y i +e i =b 0 +b

  1 x i +b 2 x i + ···+b r x i +e i ,

  2 r

  where r is the degree of the polynomial and i and e i are again the random error and residual associated with the response y i and fitted value ˆ y i , respectively. Here, the number of pairs, n, must be at least as large as r + 1, the number of parameters to be estimated.

  Notice that the polynomial model can be considered a special case of the more

  general multiple linear regression model, where we set x 1 = x, x 2 =x 2 ,...,x r =x r .

  The normal equations assume the same form as those given on page 445. They are

  then solved for b 0 ,b 1 ,b 2 ,...,b r .

  Example 12.2: Given the data

  x

  0 1 2 3 4 5 6 7 8 9 y 9.1 7.3 3.2 4.6 4.8 2.9 5.7 7.1 8.8 10.2

  fit a regression curve of the form μ

  2 Y |x =β 0 +β 1 x+β 2 x and then estimate μ Y |2 .

  Solution : From the data given, we find that

  10b 0 +

  45 b 1 + 285 b 2 = 63.7, 45b 0 + 285b 1 + 2025 b 2 = 307.3, 285b 0 + 2025 b 1 + 15,333b 2 = 2153.3.

  Solving these normal equations, we obtain

  b 0 = 8.698, b 1 = −2.341, b 2 = 0.288.

  Therefore,

  ˆ y = 8.698

  − 2.341x + 0.288x 2 .

  12.3 Linear Regression Model Using Matrices

  When x = 2, our estimate of μ Y |2 is

  y = 8.698 ˆ

  − (2.341)(2) + (0.288)(2 2 ) = 5.168.

  Example 12.3: The data in Table 12.2 represent the percent of impurities that resulted for various

  temperatures and sterilizing times during a reaction associated with the manufac- turing of a certain beverage. Estimate the regression coefficients in the polynomial model

  y i =β 0 +β 1 x 1i +β 2 x 2i +β 11 x 2 1i +β 22 x 2 2i +β 12 x 1i x 2i + i ,

  for i = 1, 2, . . . , 18.

  Table 12.2: Data for Example 12.3

  Sterilizing

  Temperature, x 1 ( ◦ C)

  Time, x 2 (min)

  Solution : Using the normal equations, we obtain

  b 0 = 56.4411,

  b 1 = −0.36190,

  b 2 = −2.75299,

  and our estimated regression equation is

  y = 56.4411 ˆ

  1 2 + 0.00081x 2 + 0.08173x − 0.36190x 2 − 2.75299x 1 + 0.00314x 1 x 2 .

  Many of the principles and procedures associated with the estimation of poly- nomial regression functions fall into the category of response surface methodol- ogy , a collection of techniques that have been used quite successfully by scientists

  and engineers in many fields. The x 2 i are called pure quadratic terms, and the

  x i x j (i = j) are called interaction terms. Such problems as selecting a proper experimental design, particularly in cases where a large number of variables are

  in the model, and choosing optimum operating conditions for x 1 ,x 2 ,...,x k are

  often approached through the use of these methods. For an extensive exposure, the reader is referred to Response Surface Methodology: Process and Product Opti- mization Using Designed Experiments by Myers, Montgomery, and Anderson-Cook (2009; see the Bibliography).