Linear Regression Model Using Matrices

12.3 Linear Regression Model Using Matrices

  In fitting a multiple linear regression model, particularly when the number of vari- ables exceeds two, a knowledge of matrix theory can facilitate the mathematical manipulations considerably. Suppose that the experimenter has k independent

  Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models

  variables x 1 ,x 2 ,...,x k and n observations y 1 ,y 2 ,...,y n , each of which can be ex-

  pressed by the equation

  y i =β 0 +β 1 x 1i +β 2 x 2i + ···+β k x ki + i .

  This model essentially represents n equations describing how the response values are generated in the scientific process. Using matrix notation, we can write the following equation:

  General Linear

  y

  Model where

  y 1 1 x 11 x 21 ···x k1

  β 0 1

  ⎢ y 2 ⎥

  ⎢ 1 x 12 x 22 ···x k2 ⎥

  1 x 1n x 2n ···x kn

  β k n

  Then the least squares method for estimation of β, illustrated in Section 12.2, involves finding b for which

  SSE = (y − Xb) (y − Xb)

  is minimized. This minimization process involves solving for b in the equation

  ∂ (SSE) = 0. ∂b

  We will not present the details regarding solution of the equations above. The result reduces to the solution of b in

  (X X )b = X y .

  Notice the nature of the X matrix. Apart from the initial element, the ith row represents the x-values that give rise to the response y i . Writing

  allows the normal equations to be put in the matrix form

  Ab = g.

  12.3 Linear Regression Model Using Matrices

  If the matrix A is nonsingular, we can write the solution for the regression coefficients as

  b =A −1 g = (X X ) −1 X y .

  Thus, we can obtain the prediction equation or regression equation by solving a set of k + 1 equations in a like number of unknowns. This involves the inversion of

  the k + 1 by k + 1 matrix X X . Techniques for inverting this matrix are explained

  in most textbooks on elementary determinants and matrices. Of course, there are many high-speed computer packages available for multiple regression problems, packages that not only print out estimates of the regression coefficients but also provide other information relevant to making inferences concerning the regression equation.

  Example 12.4: The percent survival rate of sperm in a certain type of animal semen, after storage,

  was measured at various combinations of concentrations of three materials used to increase chance of survival. The data are given in Table 12.3. Estimate the multiple linear regression model for the given data.

  Table 12.3: Data for Example 12.4

  y ( survival) x 1 (weight ) x 2 (weight ) x 3 (weight )

  26.5 1.70 5.30 8.20 Solution : The least squares estimating equations, (X X )b = X y , are

  522.0780 ⎥ ⎢ b 1 ⎥ ⎢ 1877.567 ⎥

  ⎣ 81.82 360.6621 576.7264

  728.3100 ⎦ ⎣ b 2 ⎦= ⎣ 2246.661 ⎦.

  b 3 3337.780

  From a computer readout we obtain the elements of the inverse matrix

  ⎡

  ⎤

  8.0648 −0.0826 −0.0942 −0.7905

  ⎣ −0.0942

  0.0166 −0.0021 ⎦,

  −0.7905

  0.0037 −0.0021

  0.0886 and then, using the relation b = (X X ) −1 X y , the estimated regression coefficients

  are obtained as

  Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models

  b 0 = 39.1574, b 1 = 1.0161, b 2 = −1.8616, b 3 = −0.3433.

  Hence, our estimated regression equation is

  ˆ y = 39.1574 + 1.0161x 1 − 1.8616x 2 − 0.3433x 3 .

  Exercises

  12.1 A set of experimental runs was made to deter-

  Chemistry

  Test Classes

  mine a way of predicting cooking time y at various

  Student Grade, y Score, x 1 Missed, x 2

  values of oven width x 1 and flue temperature x 2 . The

  coded data were recorded as follows:

  y

  x 1 x 2 3 76 55 5 6.40 1.32 1.15 4 90 65 2 15.05 2.69 3.40 5 85 55 6 18.75 3.56 4.10 6 87 70 3 30.25 4.41 8.75 7 94 65 2 44.85 5.35 14.82 8 98 70 5 48.94 6.20 15.15 9 81 55 4 51.55 7.12 15.32 10 91 70 3 61.50 8.87 18.18 11 76 50 1

  10.65 40.40 (a) Fit a multiple linear regression equation of the form

  Estimate the multiple linear regression equation

  (b) Estimate the chemistry grade for a student who has

  an intelligence test score of 60 and missed 4 classes. 12.4 An experiment was conducted to determine if

  12.2 In Applied Spectroscopy, the infrared reflectance the weight of an animal can be predicted after a given spectra properties of a viscous liquid used in the elec- period of time on the basis of the initial weight of the tronics industry as a lubricant were studied. The de- animal and the amount of feed that was eaten. The signed experiment consisted of the effect of band fre- following data, measured in kilograms, were recorded:

  quency x 1 and film thickness x 2 on optical density y

  using a Perkin-Elmer Model 621 infrared spectrometer.

  Final

  Initial Feed

  (Source: Pacansky, J., England, C. D., and Wattman,

  Weight, y

  Weight, x 1 Weight, x 2

  0.31 (a) Fit a multiple regression equation of the form

  0.62 Y |x 1 ,x 2 =β 0 +β 1 x 1 +β 2 x 2 .

  0.31 (b) Predict the final weight of an animal having an ini-

  Estimate the multiple linear regression equation

  tial weight of 35 kilograms that is given 250 kilo- grams of feed.

  y=b ˆ 0 +b 1 x 1 +b 2 x 2 .

  12.5 The electric power consumed each month by a chemical plant is thought to be related to the average 12.3 Suppose in Review Exercise 11.53 on page 437 ambient temperature x 1 , the number of days in the that we were also given the number of class periods month x 2 , the average product purity x 3 , and the tons missed by the 12 students taking the chemistry course. of product produced x 4 . The past year’s historical data

  The complete data are shown.

  are available and are presented in the following table.

  (a) Fit a multiple linear regression model using the

  above data set. (b) Predict power consumption for a month in which

  x 1 = 75 ◦

  2 F, x = 24 days, x 3 = 90, and x 4 = 98

  tons.

  12.6 An experiment was conducted on a new model of a particular make of automobile to determine the stopping distance at various speeds. The following data were recorded.

  Speed, v (kmhr)

  Stopping Distance,

  d (m) 16 26 41 62 88 119

  (a) Fit a multiple regression curve of the form μ D|v =

  β 0 +β 1 v+β 2 v 2 .

  (b) Estimate the stopping distance when the car is

  traveling at 70 kilometers per hour.

  12.7 An experiment was conducted in order to de- termine if cerebral blood flow in human beings can be predicted from arterial oxygen tension (millimeters of mercury). Fifteen patients participated in the study, and the following data were collected:

  Blood Flow,

  Arterial Oxygen

  y

  Tension, x

  Estimate the quadratic regression equation

  μ Y |x =β 0 +β 1 x+β 2 x 2 .

  12.8 The following is a set of coded experimental data on the compressive strength of a particular alloy at var- ious values of the concentration of some additive:

  Strength, y

  (a) Estimate the quadratic regression equation μ Y |x =

  β 0 +β 1 x+β 2 x 2 .

  (b) Test for lack of fit of the model.

  12.9 (a) Fit a multiple regression equation of the form μ Y |x =β 0 +β 1 x 1 +β 2 x 2 to the data of Ex-

  ample 11.8 on page 420. (b) Estimate the yield of the chemical reaction for a

  temperature of 225 ◦ C.

  12.10 The following data are given:

  1 4 5 3 2 3 4 (a) Fit the cubic model μ Y |x =β 0 +β 1 x+β 2 x 2 +β 3 x 3 .

  (b) Predict Y when x = 2.

  12.11 An experiment was conducted to study the size of squid eaten by sharks and tuna. The regressor vari- ables are characteristics of the beaks of the squid. The data are given as follows:

  x 1 x 2 x 3 x 4 x 5 y

  Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models

  In the study, the regressor variables and response con- were obtained. (From Response Surface Methodology, sidered are

  Myers, Montgomery, and Anderson-Cook, 2009.)

  x 1 = rostral length, in inches, x 2 = wing length, in inches,

  x 3 = rostral to notch length, in inches,

  x 4 = notch to wing length, in inches,

  x 5 = width, in inches,

  (a) Estimate the unknown parameters of the multiple

  y = weight, in pounds.

  linear regression equation

  μ Y |x 1 ,x 2 =β 0 +β 1 x +β 2 x

  Estimate the multiple linear regression equation

  μ Y |x 1 ,x 2 ,x 3 ,x 4 ,x 5

  =β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 +β 4 x 4 +β 5 x 5 .

  (b) Predict wear when oil viscosity is 20 and load is

  12.12 The following data reflect information from 17

  U.S. Naval hospitals at various sites around the world. 12.14 Eleven student teachers took part in an eval-

  The regressors are workload variables, that is, items uation program designed to measure teacher effective- that result in the need for personnel in a hospital. A ness and determine what factors are important. The brief description of the variables is as follows:

  response measure was a quantitative evaluation of the

  y = monthly labor-hours,

  teacher. The regressor variables were scores on four standardized tests given to each teacher. The data are

  x 1 = average daily patient load,

  as follows:

  x 2 = monthly X-ray exposures,

  y

  x 1 x 2 x 3 x 4

  x 3 = monthly occupied bed-days,

  x 4 = eligible population in the area1000,

  x 5 = average length of patient’s stay, in days.

  Site x 1 x 2 x 3 x 4 x 5 y

  43.3 5.62 1854.17 Estimate the multiple linear regression equation

  3921.00 103.7 4.88 3741.40 12.15 The personnel department of a certain indus-

  3865.67 126.8 5.50 4026.52 trial firm used 12 subjects in a study to determine the

  7684.10 157.7 7.00 10,343.81 relationship between job performance rating (y) and 15 409.20 34,703 12,446.33 169.4 10.75 11,732.17 scores on four tests. The data are as follows: 16 463.70 39,204 14,098.40 331.4 7.05 15,414.94 17 510.22 86,533 15,524.00 371.6 6.35 18,854.45

  y

  x 1 x 2 x 3 x 4 11.2 56.5 71.0 38.5 43.0

  The goal here is to produce an empirical equation that

  will estimate (or predict) personnel needs for Naval

  hospitals. Estimate the multiple linear regression equa-

  tion

  19.3 81.2 84.0 47.5 60.2 μ Y |x 1 ,x 2 ,x 3 ,x 4 ,x 5 24.5 88.0 86.2 47.4 62.0

  =β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 +β 4 x 4 +β 5 x 5 .

  12.13 A study was performed on a type of bear-

  ing to find the relationship of amount of wear y to

  x 1 = oil viscosity and x 2 = load. The following data

  12.4 Properties of the Least Squares Estimators

  Estimate the regression coefficients in the model

  Emitter-RS

  Base-RS

  E-B-RS hFE

  12.16 An engineer at a semiconductor company

  wants to model the relationship between the gain or

  hFE of a device (y) and three parameters: emitter-RS

  (x 1 ), base-RS (x 2 ), and emitter-to-base-RS (x 3 ). The

  data are shown below:

  x 171.90 1 , x 2 x 8.875 3 , y,

  98.01 (Data from Myers, Montgomery, and Anderson-Cook,

  (a) Fit a multiple linear regression to the data.

  (cont.)

  (b) Predict hFE when x 1 = 14, x 2 = 220, and x 3 = 5.