Estimating the Coefficients
12.2 Estimating the Coefficients
In this section, we obtain the least squares estimators of the parameters β 0 ,β 1 ,...,β k by fitting the multiple linear regression model
μ Y |x 1 ,x 2 ,...,x k =β 0 +β 1 x 1 +···+β k x k
to the data points
i = 1, 2, . . . , n and n > k}, where y i is the observed response to the values x 1i ,x 2i ,...,x ki of the k independent
{(x 1i ,x 2i ,...,x ki ,y i );
variables x 1 ,x 2 ,...,x k . Each observation (x 1i ,x 2i ,...,x ki ,y i ) is assumed to satisfy the following equation.
Multiple Linear y i =β 0 +β 1 x 1i +β 2 x 2i +···+β k x ki +ǫ i Regression Model or
y i =ˆ y i +e i =b 0 +b 1 x 1i +b 2 x 2i +···+b k x ki +e i , where ǫ i and e i are the random error and residual, respectively, associated with
the response y i and fitted value ˆ y i . As in the case of simple linear regression, it is assumed that the ǫ i are independent
and identically distributed with mean 0 and common variance σ 2 . In using the concept of least squares to arrive at estimates b 0 ,b 1 ,...,b k , we minimize the expression
SSE =
(y i −b 0 −b 1 x 1i −b 2 x 2i −···−b k x ki ) 2 .
i=1
i=1
Differentiating SSE in turn with respect to b 0 ,b 1 ,...,b k and equating to zero, we generate the set of k + 1 normal equations for multiple linear regression.
12.2 Estimating the Coefficients 445
Normal Estimation n Equations for
nb 0 +b 1 x 1i
+b 2 x 2i
+···+b k
x ki = y i
Multiple Linear i=1
+b 2 x 1i x 2i +···+b k
b 0 x ki +b 1 x ki x 1i +b 2 x ki x 2i +···+b k
These equations can be solved for b 0 ,b 1 ,b 2 ,...,b k by any appropriate method for solving systems of linear equations. Most statistical software can be used to obtain numerical solutions of the above equations.
Example 12.1:
A study was done on a diesel-powered light-duty pickup truck to see if humidity, air temperature, and barometric pressure influence emission of nitrous oxide (in ppm). Emission measurements were taken at different times, with varying experimental conditions. The data are given in Table 12.2. The model is
μ Y |x 1 ,x 2 ,x 3 =β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 ,
or, equivalently,
i = 1, 2, . . . , 20. Fit this multiple linear regression model to the given data and then estimate the
y i =β 0 +β 1 x 1i +β 2 x 2i +β 3 x 3i +ǫ i ,
amount of nitrous oxide emitted for the conditions where humidity is 50%, tem- perature is 76 ◦
F, and barometric pressure is 29.30. Table 12.1: Data for Example 12.1
Humidity, Temp., Pressure, Oxide, y
Nitrous Humidity, Temp., Pressure,
Nitrous
x 1 x 2 x 3 Oxide, y
Source : Charles T. Hare, “Light-Duty Diesel Emission Correction Factors for Ambient Conditions,” EPA-600/2-77- 116. U.S. Environmental Protection Agency.
Solution : The solution of the set of estimating equations yields the unique estimates
b 0 = −3.507778, b 1 = −0.002625, b 2 = 0.000799, b 3 = 0.154155.
446 Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models Therefore, the regression equation is
ˆ y = −3.507778 − 0.002625 x 1 + 0.000799 x 2 + 0.154155 x 3 . For 50% humidity, a temperature of 76 ◦
F, and a barometric pressure of 29.30, the estimated amount of nitrous oxide emitted is
ˆ y = −3.507778 − 0.002625(50.0) + 0.000799(76.0) + 0.1541553(29.30)
= 0.9384 ppm.
Polynomial Regression
Now suppose that we wish to fit the polynomial equation
μ Y |x =β 0 +β 1 x+β 2 x 2 +···+β r x r
to the n pairs of observations {(x i ,y i ); i = 1, 2, . . . , n}. Each observation, y i , satisfies the equation
y i =β 0 +β 1 x i +β 2 x 2 i +···+β r x r i +ǫ i
or y i =ˆ y i +e i =b 0 +b 1 x i +b 2 x 2 i +···+b r x r i +e i , where r is the degree of the polynomial and ǫ i and e i are again the random error
and residual associated with the response y i and fitted value ˆ y i , respectively. Here, the number of pairs, n, must be at least as large as r + 1, the number of parameters to be estimated.
Notice that the polynomial model can be considered a special case of the more general multiple linear regression model, where we set x 1 = x, x 2 =x 2 ,...,x r =x r . The normal equations assume the same form as those given on page 445. They are
then solved for b 0 ,b 1 ,b 2 ,...,b r .
Example 12.2: Given the data
0 1 2 3 4 5 6 7 8 9 y 9.1 7.3 3.2 4.6 4.8 2.9 5.7 7.1 8.8 10.2 fit a regression curve of the form μ
Y |x 2 =β 0 +β 1 x+β 2 x and then estimate μ Y |2 . Solution : From the data given, we find that
10b 0 +
45 b 1 + 285 b 2 = 63.7, 45b 0 + 285b 1 + 2025 b 2 = 307.3, 285b 0 + 2025 b 1 + 15,333b 2 = 2153.3.
Solving these normal equations, we obtain
b 0 = 8.698, b 1 = −2.341, b 2 = 0.288.
Therefore,
ˆ y = 8.698 − 2.341x + 0.288x 2 .
12.3 Linear Regression Model Using Matrices 447 When x = 2, our estimate of μ Y |2 is
y = 8.698 − (2.341)(2) + (0.288)(2 ˆ 2 ) = 5.168.
Example 12.3: The data in Table 12.2 represent the percent of impurities that resulted for various temperatures and sterilizing times during a reaction associated with the manufac- turing of a certain beverage. Estimate the regression coefficients in the polynomial model
y i =β 0 +β 1 x 1i +β 2 x 2i +β 11 x 2 1i +β 22 x 2 2i +β 12 x 1i x 2i +ǫ i , for i = 1, 2, . . . , 18.
Table 12.2: Data for Example 12.3 Sterilizing
Temperature, x 1 ( ◦ C)
Time, x 2 (min)
21.66 17.98 16.44 Solution : Using the normal equations, we obtain
b 12 = 0.00314, and our estimated regression equation is
b 11 = 0.00081,
b 22 = 0.08173,
y = 56.4411 − 0.36190x ˆ 1 − 2.75299x 2 + 0.00081x 2 1 + 0.08173x 2 2 + 0.00314x 1 x 2 . Many of the principles and procedures associated with the estimation of poly- nomial regression functions fall into the category of response surface methodol- ogy, a collection of techniques that have been used quite successfully by scientists
and engineers in many fields. The x 2 i are called pure quadratic terms, and the x i x j experimental design, particularly in cases where a large number of variables are
in the model, and choosing optimum operating conditions for x 1 ,x 2 ,...,x k are often approached through the use of these methods. For an extensive exposure, the reader is referred to Response Surface Methodology: Process and Product Opti- mization Using Designed Experiments by Myers, Montgomery, and Anderson-Cook (2009; see the Bibliography).