Linear Regression Model Using Matrices
12.3 Linear Regression Model Using Matrices
In fitting a multiple linear regression model, particularly when the number of vari- ables exceeds two, a knowledge of matrix theory can facilitate the mathematical manipulations considerably. Suppose that the experimenter has k independent
Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models
variables x 1 ,x 2 ,...,x k and n observations y 1 ,y 2 ,...,y n , each of which can be ex-
pressed by the equation
y i =β 0 +β 1 x 1i +β 2 x 2i + ···+β k x ki + i .
This model essentially represents n equations describing how the response values are generated in the scientific process. Using matrix notation, we can write the following equation:
General Linear
y
Model where
y 1 1 x 11 x 21 ···x k1
β 0 1
⎢ y 2 ⎥
⎢ 1 x 12 x 22 ···x k2 ⎥
1 x 1n x 2n ···x kn
β k n
Then the least squares method for estimation of β, illustrated in Section 12.2, involves finding b for which
SSE = (y − Xb) (y − Xb)
is minimized. This minimization process involves solving for b in the equation
∂ (SSE) = 0. ∂b
We will not present the details regarding solution of the equations above. The result reduces to the solution of b in
(X X )b = X y .
Notice the nature of the X matrix. Apart from the initial element, the ith row represents the x-values that give rise to the response y i . Writing
allows the normal equations to be put in the matrix form
Ab = g.
12.3 Linear Regression Model Using Matrices
If the matrix A is nonsingular, we can write the solution for the regression coefficients as
b =A −1 g = (X X ) −1 X y .
Thus, we can obtain the prediction equation or regression equation by solving a set of k + 1 equations in a like number of unknowns. This involves the inversion of
the k + 1 by k + 1 matrix X X . Techniques for inverting this matrix are explained
in most textbooks on elementary determinants and matrices. Of course, there are many high-speed computer packages available for multiple regression problems, packages that not only print out estimates of the regression coefficients but also provide other information relevant to making inferences concerning the regression equation.
Example 12.4: The percent survival rate of sperm in a certain type of animal semen, after storage,
was measured at various combinations of concentrations of three materials used to increase chance of survival. The data are given in Table 12.3. Estimate the multiple linear regression model for the given data.
Table 12.3: Data for Example 12.4
y ( survival) x 1 (weight ) x 2 (weight ) x 3 (weight )
26.5 1.70 5.30 8.20 Solution : The least squares estimating equations, (X X )b = X y , are
522.0780 ⎥ ⎢ b 1 ⎥ ⎢ 1877.567 ⎥
⎣ 81.82 360.6621 576.7264
728.3100 ⎦ ⎣ b 2 ⎦= ⎣ 2246.661 ⎦.
b 3 3337.780
From a computer readout we obtain the elements of the inverse matrix
⎡
⎤
8.0648 −0.0826 −0.0942 −0.7905
⎣ −0.0942
0.0166 −0.0021 ⎦,
−0.7905
0.0037 −0.0021
0.0886 and then, using the relation b = (X X ) −1 X y , the estimated regression coefficients
are obtained as
Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models
b 0 = 39.1574, b 1 = 1.0161, b 2 = −1.8616, b 3 = −0.3433.
Hence, our estimated regression equation is
ˆ y = 39.1574 + 1.0161x 1 − 1.8616x 2 − 0.3433x 3 .
Exercises
12.1 A set of experimental runs was made to deter-
Chemistry
Test Classes
mine a way of predicting cooking time y at various
Student Grade, y Score, x 1 Missed, x 2
values of oven width x 1 and flue temperature x 2 . The
coded data were recorded as follows:
y
x 1 x 2 3 76 55 5 6.40 1.32 1.15 4 90 65 2 15.05 2.69 3.40 5 85 55 6 18.75 3.56 4.10 6 87 70 3 30.25 4.41 8.75 7 94 65 2 44.85 5.35 14.82 8 98 70 5 48.94 6.20 15.15 9 81 55 4 51.55 7.12 15.32 10 91 70 3 61.50 8.87 18.18 11 76 50 1
10.65 40.40 (a) Fit a multiple linear regression equation of the form
Estimate the multiple linear regression equation
(b) Estimate the chemistry grade for a student who has
an intelligence test score of 60 and missed 4 classes. 12.4 An experiment was conducted to determine if
12.2 In Applied Spectroscopy, the infrared reflectance the weight of an animal can be predicted after a given spectra properties of a viscous liquid used in the elec- period of time on the basis of the initial weight of the tronics industry as a lubricant were studied. The de- animal and the amount of feed that was eaten. The signed experiment consisted of the effect of band fre- following data, measured in kilograms, were recorded:
quency x 1 and film thickness x 2 on optical density y
using a Perkin-Elmer Model 621 infrared spectrometer.
Final
Initial Feed
(Source: Pacansky, J., England, C. D., and Wattman,
Weight, y
Weight, x 1 Weight, x 2
0.31 (a) Fit a multiple regression equation of the form
0.62 Y |x 1 ,x 2 =β 0 +β 1 x 1 +β 2 x 2 .
0.31 (b) Predict the final weight of an animal having an ini-
Estimate the multiple linear regression equation
tial weight of 35 kilograms that is given 250 kilo- grams of feed.
y=b ˆ 0 +b 1 x 1 +b 2 x 2 .
12.5 The electric power consumed each month by a chemical plant is thought to be related to the average 12.3 Suppose in Review Exercise 11.53 on page 437 ambient temperature x 1 , the number of days in the that we were also given the number of class periods month x 2 , the average product purity x 3 , and the tons missed by the 12 students taking the chemistry course. of product produced x 4 . The past year’s historical data
The complete data are shown.
are available and are presented in the following table.
(a) Fit a multiple linear regression model using the
above data set. (b) Predict power consumption for a month in which
x 1 = 75 ◦
2 F, x = 24 days, x 3 = 90, and x 4 = 98
tons.
12.6 An experiment was conducted on a new model of a particular make of automobile to determine the stopping distance at various speeds. The following data were recorded.
Speed, v (kmhr)
Stopping Distance,
d (m) 16 26 41 62 88 119
(a) Fit a multiple regression curve of the form μ D|v =
β 0 +β 1 v+β 2 v 2 .
(b) Estimate the stopping distance when the car is
traveling at 70 kilometers per hour.
12.7 An experiment was conducted in order to de- termine if cerebral blood flow in human beings can be predicted from arterial oxygen tension (millimeters of mercury). Fifteen patients participated in the study, and the following data were collected:
Blood Flow,
Arterial Oxygen
y
Tension, x
Estimate the quadratic regression equation
μ Y |x =β 0 +β 1 x+β 2 x 2 .
12.8 The following is a set of coded experimental data on the compressive strength of a particular alloy at var- ious values of the concentration of some additive:
Strength, y
(a) Estimate the quadratic regression equation μ Y |x =
β 0 +β 1 x+β 2 x 2 .
(b) Test for lack of fit of the model.
12.9 (a) Fit a multiple regression equation of the form μ Y |x =β 0 +β 1 x 1 +β 2 x 2 to the data of Ex-
ample 11.8 on page 420. (b) Estimate the yield of the chemical reaction for a
temperature of 225 ◦ C.
12.10 The following data are given:
1 4 5 3 2 3 4 (a) Fit the cubic model μ Y |x =β 0 +β 1 x+β 2 x 2 +β 3 x 3 .
(b) Predict Y when x = 2.
12.11 An experiment was conducted to study the size of squid eaten by sharks and tuna. The regressor vari- ables are characteristics of the beaks of the squid. The data are given as follows:
x 1 x 2 x 3 x 4 x 5 y
Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models
In the study, the regressor variables and response con- were obtained. (From Response Surface Methodology, sidered are
Myers, Montgomery, and Anderson-Cook, 2009.)
x 1 = rostral length, in inches, x 2 = wing length, in inches,
x 3 = rostral to notch length, in inches,
x 4 = notch to wing length, in inches,
x 5 = width, in inches,
(a) Estimate the unknown parameters of the multiple
y = weight, in pounds.
linear regression equation
μ Y |x 1 ,x 2 =β 0 +β 1 x +β 2 x
Estimate the multiple linear regression equation
μ Y |x 1 ,x 2 ,x 3 ,x 4 ,x 5
=β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 +β 4 x 4 +β 5 x 5 .
(b) Predict wear when oil viscosity is 20 and load is
12.12 The following data reflect information from 17
U.S. Naval hospitals at various sites around the world. 12.14 Eleven student teachers took part in an eval-
The regressors are workload variables, that is, items uation program designed to measure teacher effective- that result in the need for personnel in a hospital. A ness and determine what factors are important. The brief description of the variables is as follows:
response measure was a quantitative evaluation of the
y = monthly labor-hours,
teacher. The regressor variables were scores on four standardized tests given to each teacher. The data are
x 1 = average daily patient load,
as follows:
x 2 = monthly X-ray exposures,
y
x 1 x 2 x 3 x 4
x 3 = monthly occupied bed-days,
x 4 = eligible population in the area1000,
x 5 = average length of patient’s stay, in days.
Site x 1 x 2 x 3 x 4 x 5 y
43.3 5.62 1854.17 Estimate the multiple linear regression equation
3921.00 103.7 4.88 3741.40 12.15 The personnel department of a certain indus-
3865.67 126.8 5.50 4026.52 trial firm used 12 subjects in a study to determine the
7684.10 157.7 7.00 10,343.81 relationship between job performance rating (y) and 15 409.20 34,703 12,446.33 169.4 10.75 11,732.17 scores on four tests. The data are as follows: 16 463.70 39,204 14,098.40 331.4 7.05 15,414.94 17 510.22 86,533 15,524.00 371.6 6.35 18,854.45
y
x 1 x 2 x 3 x 4 11.2 56.5 71.0 38.5 43.0
The goal here is to produce an empirical equation that
will estimate (or predict) personnel needs for Naval
hospitals. Estimate the multiple linear regression equa-
tion
19.3 81.2 84.0 47.5 60.2 μ Y |x 1 ,x 2 ,x 3 ,x 4 ,x 5 24.5 88.0 86.2 47.4 62.0
=β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 +β 4 x 4 +β 5 x 5 .
12.13 A study was performed on a type of bear-
ing to find the relationship of amount of wear y to
x 1 = oil viscosity and x 2 = load. The following data
12.4 Properties of the Least Squares Estimators
Estimate the regression coefficients in the model
Emitter-RS
Base-RS
E-B-RS hFE
12.16 An engineer at a semiconductor company
wants to model the relationship between the gain or
hFE of a device (y) and three parameters: emitter-RS
(x 1 ), base-RS (x 2 ), and emitter-to-base-RS (x 3 ). The
data are shown below:
x 171.90 1 , x 2 x 8.875 3 , y,
98.01 (Data from Myers, Montgomery, and Anderson-Cook,
(a) Fit a multiple linear regression to the data.
(cont.)
(b) Predict hFE when x 1 = 14, x 2 = 220, and x 3 = 5.