Special Case of Orthogonality (Optional)

12.7 Special Case of Orthogonality (Optional)

Prior to our original development of the general linear regression problem, the assumption was made that the independent variables are measured without error and are often controlled by the experimenter. Quite often they occur as a result of an elaborately designed experiment. In fact, we can increase the effectiveness of the resulting prediction equation with the use of a suitable experimental plan.

Suppose that we once again consider the X matrix as defined in Section 12.3. We can rewrite it as

X = [1, x 1 ,x 2 ,...,x k ],

where 1 represents a column of ones and x j is a column vector representing the levels of x j . If

x ′ p x q = 0,

the variables x p and x q are said to be orthogonal to each other. There are certain obvious advantages to having a completely orthogonal situation where x ′ p x q =0

468 Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models

x ji = 0, j = 1, 2, . . . , k.

i=1

The resulting X ′

X is a diagonal matrix, and the normal equations in Section 12.3 reduce to

nb 0 =

y i ,b 1 x 2 1i =

x 1i y i ,···,b k

An important advantage is that one is easily able to partition SSR into single- degree-of-freedom components, each of which corresponds to the amount of variation in Y accounted for by a given controlled variable. In the orthogonal situation, we can write

0 +b 1 x 1i +···+b k x ki −b 0 )

i=1

i=1

=b 2 1 2 x 2 2 2 1i 2 +b 2 x 2i +···+b k x ki

The quantity R(β i ) is the amount of the regression sum of squares associated with

a model involving a single independent variable x i . To test simultaneously for the significance of a set of m variables in an orthog- onal situation, the regression sum of squares becomes

R(β 1 ,β 2 ,...,β m |β m+1 ,β m+2 ,...,β k ) = R(β 1 ) + R(β 2 ) + · · · + R(β m ), and thus we have the further simplification

R(β 1 |β 2 ,β 3 ,...,β k ) = R(β 1 )

when evaluating a single independent variable. Therefore, the contribution of a given variable or set of variables is essentially found by ignoring the other variables in the model. Independent evaluations of the worth of the individual variables are accomplished using analysis-of-variance techniques, as given in Table 12.4. The total variation in the response is partitioned into single-degree-of-freedom compo- nents plus the error term with n −k −1 degrees of freedom. Each computed f-value is used to test one of the hypotheses

by comparing with the critical point f α (1, n − k − 1) or merely interpreting the P-value computed from the f-distribution.

12.7 Special Case of Orthogonality (Optional) 469

Table 12.4: Analysis of Variance for Orthogonal Variables Source of

Mean Computed Variation

Sum of

Degrees of

s 2 = SSE n−k−1 Total

SSE

n−k−1

SST = S yy

n−1

Example 12.8: Suppose that a scientist takes experimental data on the radius of a propellant grain Y as a function of powder temperature x 1 , extrusion rate x 2 , and die temperature x 3 . Fit a linear regression model for predicting grain radius, and determine the effectiveness of each variable in the model. The data are given in Table 12.5.

Table 12.5: Data for Example 12.8

Grain Radius Temperature

250 (+1) Solution : Note that each variable is controlled at two levels, and the experiment is composed

of the eight possible combinations. The data on the independent variables are coded for convenience by means of the following formulas:

powder temperature − 170

20 extrusion rate − 18

6 x 3 = die temperature − 235 .

15 The resulting levels of x 1 ,x 2 , and x 3 take on the values −1 and +1 as indicated

in the table of data. This particular experimental design affords the orthogonal-

470 Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models ity that we want to illustrate here. (A more thorough treatment of this type of

experimental layout appears in Chapter 15.) The X matrix is

and the orthogonality conditions are readily verified.

We can now compute coefficients

y i = 121.75,

so in terms of the coded variables, the prediction equation is

y = 121.75 + 2.5 x ˆ 1 + 14.75 x 2 + 21.75 x 3 . The analysis of variance in Table 12.6 shows independent contributions to SSR for

each variable. The results, when compared to the f 0.05 (1, 4) critical point of 7.71, indicate that x 1 does not contribute significantly at the 0.05 level, whereas variables x 2 and x 3 are significant. In this example, the estimate for σ 2 is 23.1250. As for the single independent variable case, it should be pointed out that this estimate does not solely contain experimental error variation unless the postulated model is correct. Otherwise, the estimate is “contaminated” by lack of fit in addition to pure error, and the lack of fit can be separated out only if we obtain multiple

experimental observations for the various (x 1 ,x 2 ,x 3 ) combinations. Table 12.6: Analysis of Variance for Grain Radius Data

Source of

Computed Variation

Sum of

Degrees of

f P-Value

Since x 1 is not significant, it can simply be eliminated from the model without altering the effects of the other variables. Note that x 2 and x 3 both impact the grain radius in a positive fashion, with x 3 being the more important factor based on the smallness of its P-value.

Exercises 471

Exercises

12.31 Compute and interpret the coefficient of multi- 12.38 Consider the data for Exercise 12.36. Compute ple determination for the variables of Exercise 12.1 on the following: page 450.

R(β 1 |β 0 ,β 2 ,β 3 ), 12.32 Test whether the regression explained by the

R(β 1 |β 0 ),

R(β 2 |β 0 ,β 1 ,β 3 ), model in Exercise 12.1 on page 450 is significant at the

R(β 2 |β 0 ,β 1 ),

0.01 level of significance. R(β 3 |β 0 ,β 1 ,β 2 ), R(β 1 ,β 2 |β 3 ). 12.33 Test whether the regression explained by the Comment.

model in Exercise 12.5 on page 450 is significant at the 12.39 Consider the data of Exercise 11.55 on page 0.01 level of significance.

437. Fit a regression model using weight and drive ratio as explanatory variables. Compare this model

12.34 For the model of Exercise 12.5 on page 450, test the hypothesis

with the SLR (simple linear regression) model using weight alone. Use R 2 ,R 2 adj , and any t-statistics (or

H 0 :β 1 =β 2 = 0, F-statistics) you may need to compare the SLR with the multiple regression model.

H 1 :β 1 and β 2 are not both zero.

12.40 Consider Example 12.4. Figure 12.1 on page 459 displays a SAS printout of an analysis of the model 12.35 Repeat Exercise 12.17 on page 461 using an containing variables x 1 ,x 2 , and x 3 . Focus on the F-statistic.

confidence interval of the mean response μ Y at the (x 1 ,x 2 ,x 3 ) locations representing the 13 data points.

12.36 A small experiment was conducted to fit a mul- Consider an item in the printout indicated by C.V. tiple regression equation relating the yield y to temper- This is the coefficient of variation, which is defined

ature x 1 , reaction time x 2 , and concentration of one by of the reactants x 3 . Two levels of each variable were chosen, and measurements corresponding to the coded

independent variables were recorded as follows: y ¯ · 100, y

C.V. =

where s = s is the root mean squared error. The

coefficient of variation is often used as yet another crite- 9.2 −1

rion for comparing competing models. It is a scale-free 10.3 −1

1 quantity which expresses the estimate of σ, namely s,

as a percent of the average response ¯ y. In competition

1 for the “best” among a group of competing models, one 10.2 −1

1 1 strives for the model with a small value of C.V. Do a 12.6 1 1 1 regression analysis of the data set shown in Example

12.4 but eliminate x 3 . Compare the full (x 1 ,x 2 ,x 3 ) (a) Using the coded variables, estimate the multiple model with the restricted (x 1 ,x 2 ) model and focus on

linear regression equation two criteria: (i) C.V.; (ii) the widths of the confidence intervals on μ Y . For the second criterion you may want

to use the average width. Comment. (b) Partition SSR, the regression sum of squares,

μ Y |x 1 ,x 2 ,x 3 =β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 .

12.41 Consider Example 12.3 on page 447. Compare into three single-degree-of-freedom components at- the two competing models.

tributable to x 1 ,x 2 , and x 3 , respectively. Show an

analysis-of-variance table, indicating significance

y i =β 0 +β 1 x 1i +β 2 x 2i +ǫ i , tests on each variable.

First order:

Second order: y i =β 0 +β 1 x 1i +β 2 x 2i 12.37 Consider the electric power data of Exercise

11 x 1i +β 22 x 2i +β 12 x 1i x 2i +ǫ i .

12.5 on page 450. Test H 0 :β 1 =β 2 = 0, making use

of R(β 1 ,β 2 |β 3 ,β 4 ). Give a P-value, and draw conclu- Use R 2 adj in your comparison. Test H 0 :β 11 =β 22 = sions.

β 12 = 0. In addition, use the C.V. discussed in Exercise 12.40.

472 Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models 12.42 In Example 12.8, a case is made for eliminat-

12.43 Consider the data of Exercise 12.13 on page ing x 1 , powder temperature, from the model since the 452. Can the response, wear, be explained adequately P-value based on the F-test is 0.2156 while P-values by a single variable (either viscosity or load) in an SLR for x 2 and x 3 are near zero.

rather than with the full two-variable regression? Jus- (a) Reduce the model by eliminating x 1 , thereby pro- tify your answer thoroughly through tests of hypothe-

ducing a full and a restricted (or reduced) model, ses as well as comparison of the three competing

and compare them on the basis of R 2 adj .

models.

(b) Compare the full and restricted models using the 12.44 For the data set given in Exericise 12.16 on width of the 95% prediction intervals on a new ob- page 453, can the response be explained adequately by

servation. The better of the two models would be any two regressor variables? Discuss. that with the tightened prediction intervals. Use the average of the width of the prediction intervals.

Dokumen yang terkait

Optimal Retention for a Quota Share Reinsurance

0 0 7

Digital Gender Gap for Housewives Digital Gender Gap bagi Ibu Rumah Tangga

0 0 9

Challenges of Dissemination of Islam-related Information for Chinese Muslims in China Tantangan dalam Menyebarkan Informasi terkait Islam bagi Muslim China di China

0 0 13

Family is the first and main educator for all human beings Family is the school of love and trainers of management of stress, management of psycho-social-

0 0 26

THE EFFECT OF MNEMONIC TECHNIQUE ON VOCABULARY RECALL OF THE TENTH GRADE STUDENTS OF SMAN 3 PALANGKA RAYA THESIS PROPOSAL Presented to the Department of Education of the State Islamic College of Palangka Raya in Partial Fulfillment of the Requirements for

0 3 22

GRADERS OF SMAN-3 PALANGKA RAYA ACADEMIC YEAR OF 20132014 THESIS Presented to the Department of Education of the State College of Islamic Studies Palangka Raya in Partial Fulfillment of the Requirements for the Degree of Sarjana Pendidikan Islam

0 0 20

A. Research Design and Approach - The readability level of reading texts in the english textbook entitled “Bahasa Inggris SMA/MA/MAK” for grade XI semester 1 published by the Ministry of Education and Culture of Indonesia - Digital Library IAIN Palangka R

0 1 12

A. Background of Study - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

0 0 15

1. The definition of textbook - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

0 0 38

CHAPTER IV DISCUSSION - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

0 0 95