Test for Linearity of Regression: Data with Repeated Observations

11.9 Test for Linearity of Regression: Data with Repeated Observations

In certain kinds of experimental situations, the researcher has the capability of obtaining repeated observations on the response for each value of x. Although it is

not necessary to have these repetitions in order to estimate β 0 and β 1 , nevertheless

repetitions enable the experimenter to obtain quantitative information concerning the appropriateness of the model. In fact, if repeated observations are generated, the experimenter can make a signiﬁcance test to aid in determining whether or not the model is adequate.

11.9 Test for Linearity of Regression: Data with Repeated Observations

The regression equation is COD = 3.83 + 0.904 Per_Red Predictor

Coef SE Coef

2.17 0.038 Per_Red 0.90364 0.05012 18.03 0.000

S = 3.22954

R-Sq = 91.3

R-Sq(adj) = 91.0

Analysis of Variance

Residual Error 31

Obs Per_Red

COD

Fit SE Fit Residual St Resid

Figure 11.14: MINITAB printout of simple linear regression for chemical oxygen demand reduction data; part I.

Let us select a random sample of n observations using k distinct values of x,

say x 1 ,x 2 ,...,x n , such that the sample contains n 1 observed values of the random variable Y 1 corresponding to x 1 ,n 2 observed values of Y 2 corresponding to x 2 ,...,

n k observed values of Y k corresponding to x k . Of necessity, n =

n i .

i=1

Chapter 11 Simple Linear Regression and Correlation

Figure 11.15: MINITAB printout of simple linear regression for chemical oxygen demand reduction data; part II.

We deﬁne

y ij = the jth value of the random variable Y

Hence, if n 4 = 3 measurements of Y were made corresponding to x = x 4 , we would

indicate these observations by y 41 ,y 42 , and y 43 . Then T i. =y 41 +y 42 +y 43 .

Concept of Lack of Fit

The error sum of squares consists of two parts: the amount due to the variation between the values of Y within given values of x and a component that is normally

11.9 Test for Linearity of Regression: Data with Repeated Observations

called the lack-of-fit contribution. The first component reflects mere random variation, or pure experimental error, while the second component is a measure of the systematic variation brought about by higher-order terms. In our case, these are terms in x other than the linear, or first-order, contribution. Note that in choosing a linear model we are essentially assuming that this second component does not exist and hence our error sum of squares is completely due to random

errors. If this should be the case, then s 2 = SSE(n − 2) is an unbiased estimate

of σ 2 . However, if the model does not adequately ﬁt the data, then the error sum

of squares is inﬂated and produces a biased estimate of σ 2 . Whether or not the model ﬁts the data, an unbiased estimate of σ 2 can always be obtained when we

have repeated observations simply by computing

for each of the k distinct values of x and then pooling these variances to get

The numerator of s 2 is a measure of the pure experimental error. A compu-

tational procedure for separating the error sum of squares into the two components representing pure error and lack of ﬁt is as follows:

Computation of 1. Compute the pure error sum of squares

Lack-of-Fit Sum of

Squares n i

This sum of squares has n − k degrees of freedom associated with it, and the

resulting mean square is our unbiased estimate s 2 of σ 2 .

2. Subtract the pure error sum of squares from the error sum of squares SSE, thereby obtaining the sum of squares due to lack of ﬁt. The degrees of freedom for lack of ﬁt are obtained by simply subtracting (n − 2) − (n − k) = k − 2.

The computations required for testing hypotheses in a regression problem with repeated measurements on the response may be summarized as shown in Table

Figures 11.16 and 11.17 display the sample points for the “correct model” and “incorrect model” situations. In Figure 11.16, where the μ Y |x fall on a straight line, there is no lack of ﬁt when a linear model is assumed, so the sample variation around the regression line is a pure error resulting from the variation that occurs among repeated observations. In Figure 11.17, where the μ Y |x clearly do not fall on a straight line, the lack of ﬁt from erroneously choosing a linear model accounts for a large portion of the variation around the regression line, supplementing the pure error.

Chapter 11 Simple Linear Regression and Correlation

Table 11.3: Analysis of Variance for Testing Linearity of Regression

Source of

Sum of

Degrees of

Computed f

Lack of ﬁt ) pure

SSE −SSE( pure )

SSE

− SSE (pure) −SSE( −2

Pure error

SSE (pure)

−k SSE( s 2 = pure )

Figure 11.16: Correct linear model with no lack-of- Figure 11.17: Incorrect linear model with lack-of-ﬁt ﬁt component.

component.

What Is the Importance in Detecting Lack of Fit?

The concept of lack of fit is extremely important in applications of regression analysis. In fact, the need to construct or design an experiment that will account for lack of fit becomes more critical as the problem and the underlying mechanism involved become more complicated. Surely, one cannot always be certain that his or her postulated structure, in this case the linear regression model, is correct or even an adequate representation. The following example shows how the error sum of squares is partitioned into the two components representing pure error and lack of fit. The adequacy of the model is tested at the α-level of significance by

comparing the lack-of-ﬁt mean square divided by s 2 with f α (k − 2, n − k).

Example 11.8: Observations of the yield of a chemical reaction taken at various temperatures were

recorded in Table 11.4. Estimate the linear model μ Y |x =β 0 +β 1 x and test for

lack of ﬁt. Solution : Results of the computations are shown in Table 11.5.

Conclusion: The partitioning of the total variation in this manner reveals a significant variation accounted for by the linear model and an insignificant amount of variation due to lack of fit. Thus, the experimental data do not seem to suggest the need to consider terms higher than first order in the model, and the null hypothesis is not rejected.

Exercises

Table 11.4: Data for Example 11.8

Table 11.5: Analysis of Variance on Yield-Temperature Data

Source of

Sum of

Degrees of

Computed f P-Values

Lack of ﬁt

Pure error

Annotated Computer Printout for Test for Lack of Fit

Figure 11.18 is an annotated computer printout showing analysis of the data of Example 11.8 with SAS. Note the “LOF” with 2 degrees of freedom, representing the quadratic and cubic contribution to the model, and the P -value of 0.22, suggesting that the linear (ﬁrst-order) model is adequate.

Dependent Variable: yield

Sum of

Source

DF Squares

Mean Square

F Value

Corrected Total

R-Square

Coeff Var

Root MSE

yield Mean

DF Type I SS

Mean Square

F Value

Figure 11.18: SAS printout, showing analysis of data of Example 11.8.

Exercises

11.31 Test for linearity of regression in Exercise 11.3 origin (Exercise 11.28) μ Y |x = βx. on page 398. Use a 0.05 level of signiﬁcance. Comment. (a) Estimate the regression line passing through the

11.32 Test for linearity of regression in Exercise 11.8

origin for the following data:

on page 399. Comment.

11.33 Suppose we have a linear equation through the

Chapter 11 Simple Linear Regression and Correlation

(b) Suppose it is not known whether the true regres- (a) Determine if emitter drive-in time inﬂuences gain

sion should pass through the origin. Estimate the

in a linear relationship. That is, test H 0 :β 1 = 0,

linear model μ Y |x =β 0 +β 1 x and test the hypoth-

where β 1 is the slope of the regressor variable.

esis that β 0 = 0, at the 0.10 level of signiﬁcance, (b) Do a lack-of-ﬁt test to determine if the linear rela-

against the alternative that β 0 = 0.

tionship is adequate. Draw conclusions.

11.34 Use an analysis-of-variance approach to test (c) Determine if emitter dose inﬂuences gain in a linear

the hypothesis that β 1 = 0 against the alternative hy-

relationship. Which regressor variable is the better

pothesis β 1 = 0 in Exercise 11.5 on page 398 at the

predictor of gain?

0.05 level of signiﬁcance.

11.37 Organophosphate (OP) compounds are used as

11.35 The following data are a result of an investiga- pesticides. However, it is important to study their ef- tion as to the effect of reaction temperature x on per- fect on species that are exposed to them. In the labora- cent conversion of a chemical process y. (See Myers, tory study Some Effects of Organophosphate Pesticides Montgomery and Anderson-Cook, 2009.) Fit a simple on Wildlife Species, by the Department of Fisheries linear regression, and use a lack-of-fit test to determine and Wildlife at Virginia Tech, an experiment was con- if the model is adequate. Discuss.

ducted in which diﬀerent dosages of a particular OP

Temperature Conversion

pesticide were administered to 5 groups of 5 mice (per-

Observation

( ◦ C),

(), x omysius leucopus). The 25 mice were females of similar y

43 age and condition. One group received no chemical.

78 The basic response y was a measure of activity in the

69 brain. It was postulated that brain activity would de-

73 crease with an increase in OP dosage. The data are as

78 Dose, x (mgkg

Activity, y

65 Animal

body weight)

11.36 Transistor gain between emitter and collector

in an integrated circuit device (hFE) is related to two

variables (Myers, Montgomery and Anderson-Cook,

2009) that can be controlled at the deposition process,

emitter drive-in time (x 1 , in minutes) and emitter dose

(x 2 , in ions

11 4.6 × 10 10.6

14 ). Fourteen samples were observed

following deposition, and the resulting data are shown

in the table below. We will consider linear regression

models using gain as the response and emitter drive-in

time or emitter dose as the regressor variable.

x 1 (drive-in

x 2 (dose,

y (gain,

Obs. time, min) ions ×10 14 ) or hFE)

(a) Using the model

ﬁnd the least squares estimates of β 0 and β 1 .

(b) Construct an analysis-of-variance table in which

the lack of ﬁt and pure error have been separated.

Exercises

Determine if the lack of fit is significant at the 0.05 11.40 It is of interest to study the effect of population

level. Interpret the results.

size in various cities in the United States on ozone con- centrations. The data consist of the 1999 population

11.38 Heat treating is often used to carburize metal in millions and the amount of ozone present per hour parts such as gears. The thickness of the carburized in ppb (parts per billion). The data are as follows. layer is considered an important feature of the gear,

Ozone (ppbhour), y

and it contributes to the overall reliability of the part.

Because of the critical nature of this feature, a lab test

is performed on each furnace load. The test is a de-

structive one, where an actual part is cross sectioned

and soaked in a chemical for a period of time. This

test involves running a carbon analysis on the surface

of both the gear pitch (top of the gear tooth) and the

gear root (between the gear teeth). The data below

are the results of the pitch carbon-analysis test for 19

Soak Time

Pitch

Soak Time

Pitch

(a) Fit the linear regression model relating ozone con-

centration to population. Test H 0 :β 1 = 0 using

the ANOVA approach.

(b) Do a test for lack of ﬁt. Is the linear model appro-

priate based on the results of your test?

square error in the F-test. Do the results change?

Comment on the advantage of each test.

11.41 Evaluating nitrogen deposition from the atmo-

(a) Fit a simple linear regression relating the pitch car- sphere is a major role of the National Atmospheric

bon analysis y against soak time. Test H :β

Deposition Program (NADP), a partnership of many 0 1 = 0. agencies. NADP is studying atmospheric deposition

(b) If the hypothesis in part (a) is rejected, determine and its eﬀect on agricultural crops, forest surface wa-

if the linear model is adequate.

ters, and other resources. Nitrogen oxides may aﬀect

11.39 A regression model is desired relating tempera- the ozone in the atmosphere and the amount of pure ture and the proportion of impurities passing through nitrogen in the air we breathe. The data are as follows: solid helium. Temperature is listed in degrees centi-

Year

Nitrogen Oxide

grade. The data are as follows:

Temperature ( ◦ C) Proportion of Impurities

−265.0 2.77 0.475 −270.0 1984 0.705

−272.5 4.39 0.935 −272.6

(a) Fit a linear regression model.

(b) Does it appear that the proportion of impurities

passing through helium increases as the tempera-

ture approaches −273 degrees centigrade?

(d) Based on the information above, does the linear

model seem appropriate? What additional information would you need to better answer that ques-

Chapter 11 Simple Linear Regression and Correlation

(a) Plot the data.

tions were used for each level of x. The data are shown

(b) Fit a linear regression model and ﬁnd R 2 .

as follows:

Plants per Plot,

Quantity of Seeds, y

across time?

11.42 For a particular variety of plant, researchers

wanted to develop a formula for predicting the quan-

tity of seeds (in grams) as a function of the density of

plants. They conducted a study with four levels of the Is a simple linear regression model adequate for ana- factor x, the number of plants per plot. Four replica- lyzing this data set?

Test for Linearity of Regression: Data with Repeated Observations

11.9 Test for Linearity of Regression: Data with Repeated Observations

Parts

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Antiremed Kelas 12 Matematika (4)

Transmission of Greek and Arabic Veteri

Services for adults with an autism spect

Dukungan

Links

Test for Linearity of Regression: Data with Repeated Observations

11.9 Test for Linearity of Regression: Data with Repeated Observations

Parts

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Antiremed Kelas 12 Matematika (4)

Transmission of Greek and Arabic Veteri

Services for adults with an autism spect

Dokumen yang Anda mencari sudah siap untuk unduhkan