Inferences Concerning the Regression Coeﬃcients

11.5 Inferences Concerning the Regression Coeﬃcients

Aside from merely estimating the linear relationship between x and Y for purposes of prediction, the experimenter may also be interested in drawing certain inferences about the slope and intercept. In order to allow for the testing of hypotheses and

the construction of conﬁdence intervals on β 0 and β 1 , one must be willing to make

the further assumption that each i , i = 1, 2, . . . , n, is normally distributed. This

assumption implies that Y 1 ,Y 2 ,...,Y n are also normally distributed, each with

probability distribution n(y i ;β 0 +β 1 x i , σ).

From Section 11.4 we know that B 1 follows a normal distribution. It turns out

that under the normality assumption, a result very much analogous to that given in Theorem 8.4 allows us to conclude that (n

− 2)S 2 σ 2 is a chi-squared variable with n − 2 degrees of freedom, independent of the random variable B 1 . Theorem

8.5 then assures us that the statistic √

has a t-distribution with n − 2 degrees of freedom. The statistic T can be used to

construct a 100(1 − α) conﬁdence interval for the coeﬃcient β 1 .

Conﬁdence Interval

A 100(1 − α) conﬁdence interval for the parameter β 1 in the regression line

where t α2 is a value of the t-distribution with n − 2 degrees of freedom.

Example 11.2: Find a 95 conﬁdence interval for β 1 in the regression line μ Y |x =β 0 +β 1 x, based

on the pollution data of Table 11.1. Solution : From the results given in Example 11.1 we ﬁnd that S xx = 4152.18 and S xy =

3752.09. In addition, we ﬁnd that S yy = 3713.88. Recall that b 1 = 0.903643.

Therefore, taking the square root, we obtain s = 3.2295. Using Table A.4, we ﬁnd t 0.025 ≈ 2.045 for 31 degrees of freedom. Therefore, a 95 conﬁdence interval for

which simpliﬁes to

0.8012 < β 1 < 1.0061.

Chapter 11 Simple Linear Regression and Correlation

Hypothesis Testing on the Slope

To test the null hypothesis H 0 that β 1 =β 10 against a suitable alternative, we

again use the t-distribution with n − 2 degrees of freedom to establish a critical region and then base our decision on the value of

The method is illustrated by the following example.

Example 11.3: Using the estimated value b 1 = 0.903643 of Example 11.1, test the hypothesis that

β 1 = 1.0 against the alternative that β 1 < 1.0. Solution : The hypotheses are H 0 :β 1 = 1.0 and H 1 :β 1 < 1.0. So

with n − 2 = 31 degrees of freedom (P ≈ 0.03).

Decision: The t-value is signiﬁcant at the 0.03 level, suggesting strong evidence

that β 1 < 1.0. One important t-test on the slope is the test of the hypothesis

H 0 :β 1 = 0 versus H 1 :β 1 = 0.

When the null hypothesis is not rejected, the conclusion is that there is no signiﬁ- cant linear relationship between E(y) and the independent variable x. The plot of the data for Example 11.1 would suggest that a linear relationship exists. However,

in some applications in which σ 2 is large and thus considerable “noise” is present in

the data, a plot, while useful, may not produce clear information for the researcher.

Rejection of H 0 above implies that a signiﬁcant linear regression exists.

Figure 11.7 displays a MINITAB printout showing the t-test for

H 0 :β 1 = 0 versus H 1 :β 1 = 0,

for the data of Example 11.1. Note the regression coeﬃcient (Coef), standard error (SE Coef), t-value (T), and P -value (P). The null hypothesis is rejected. Clearly, there is a signiﬁcant linear relationship between mean chemical oxygen demand reduction and solids reduction. Note that the t-statistic is computed as

coeﬃcient

standard error

s S xx

The failure to reject H 0 :β 1 = 0 suggests that there is no linear relationship

between Y and x. Figure 11.8 is an illustration of the implication of this result. It may mean that changing x has little impact on changes in Y , as seen in (a). However, it may also indicate that the true relationship is nonlinear, as indicated by (b).

When H 0 :β 1 = 0 is rejected, there is an implication that the linear term in x

residing in the model explains a signiﬁcant portion of variability in Y . The two

11.5 Inferences Concerning the Regression Coeﬃcients

Regression Analysis: COD versus Per_Red The regression equation is COD = 3.83 + 0.904 Per_Red

Predictor

Coef SE Coef

Per_Red

R-Sq(adj) = 91.0

Analysis of Variance Source

Residual Error 31 323.3

Total

Figure 11.7: MINITAB printout for t-test for data of Example 11.1.

Figure 11.8: The hypothesis H 0 :β 1 = 0 is not rejected.

plots in Figure 11.9 illustrate possible scenarios. As depicted in (a) of the ﬁgure,

rejection of H 0 may suggest that the relationship is, indeed, linear. As indicated

in (b), it may suggest that while the model does contain a linear eﬀect, a better representation may be found by including a polynomial (perhaps quadratic) term (i.e., terms that supplement the linear term).

Statistical Inference on the Intercept

Conﬁdence intervals and hypothesis testing on the coeﬃcient β 0 may be established

from the fact that B 0 is also normally distributed. It is not diﬃcult to show that

B 0

−β 0

x 2 i (nS xx )

Chapter 11 Simple Linear Regression and Correlation

Figure 11.9: The hypothesis H 0 :β 1 = 0 is rejected.

has a t-distribution with n − 2 degrees of freedom from which we may construct a 100(1 − α) conﬁdence interval for α.

Conﬁdence Interval

A 100(1 − α) conﬁdence interval for the parameter β 0 in the regression line

where t α2 is a value of the t-distribution with n − 2 degrees of freedom.

Example 11.4: Find a 95 conﬁdence interval for β 0 in the regression line μ Y |x =β 0 +β 1 x, based

on the data of Table 11.1. Solution : In Examples 11.1 and 11.2, we found that

S xx = 4152.18

and

s = 3.2295.

From Example 11.1 we had

x 2 i = 41,086

Using Table A.4, we ﬁnd t 0.025 ≈ 2.045 for 31 degrees of freedom. Therefore, a

95 conﬁdence interval for β 0 is

which simpliﬁes to 0.2132 < β 0 < 7.4461.

11.5 Inferences Concerning the Regression Coeﬃcients

To test the null hypothesis H 0 that β 0 =β 00 against a suitable alternative,

we can use the t-distribution with n − 2 degrees of freedom to establish a critical region and then base our decision on the value of

Example 11.5: Using the estimated value b 0 = 3.829633 of Example 11.1, test the hypothesis that β 0 = 0 at the 0.05 level of signiﬁcance against the alternative that β 0 = 0.

Solution : The hypotheses are H 0 :β 0 = 0 and H 1 :β 0 = 0. So

with 31 degrees of freedom. Thus, P = P -value ≈ 0.038 and we conclude that

β 0 = 0. Note that this is merely CoefStDev, as we see in the MINITAB printout

in Figure 11.7. The SE Coef is the standard error of the estimated intercept.

A Measure of Quality of Fit: Coeﬃcient of Determination

Note in Figure 11.7 that an item denoted by R-Sq is given with a value of 91.3.

This quantity, R 2 , is called the coeﬃcient of determination. This quantity is

a measure of the proportion of variability explained by the ﬁtted model. In Section 11.8, we shall introduce the notion of an analysis-of-variance approach to hypothesis testing in regression. The analysis-of-variance approach makes use

of the error sum of squares SSE =

(y i − ˆy i ) 2 and the total corrected sum of

i=1

squares SST =

(y i − ¯y i ) 2 . The latter represents the variation in the response

i=1

values that ideally would be explained by the model. The SSE value is the variation due to error, or variation unexplained. Clearly, if SSE = 0, all variation is explained. The quantity that represents variation explained is SST − SSE. The

R 2 is

Coeﬀ. of determination: R 2 =1 SSE − .

SST

Note that if the ﬁt is perfect, all residuals are zero, and thus R 2 = 1.0. But if SSE is only slightly smaller than SST , R 2 ≈ 0. Note from the printout in Figure 11.7

that the coeﬃcient of determination suggests that the model ﬁt to the data explains

91.3 of the variability observed in the response, the reduction in chemical oxygen demand.

Figure 11.10 provides an illustration of a good ﬁt (R 2 ≈ 1.0) in plot (a) and a

poor ﬁt (R 2 ≈ 0) in plot (b).

Pitfalls in the Use of R 2

Analysts quote values of R 2 quite often, perhaps due to its simplicity. However, there are pitfalls in its interpretation. The reliability of R 2 is a function of the

Chapter 11 Simple Linear Regression and Correlation

Figure 11.10: Plots depicting a very good ﬁt and a poor ﬁt.

size of the regression data set and the type of application. Clearly, 0

≤R 2 ≤1

and the upper bound is achieved when the ﬁt to the data is perfect (i.e., all of

the residuals are zero). What is an acceptable value for R 2 ? This is a diﬃcult

question to answer. A chemist, charged with doing a linear calibration of a high-

precision piece of equipment, certainly expects to experience a very high R 2 -value

(perhaps exceeding 0.99), while a behavioral scientist, dealing in data impacted

by variability in human behavior, may feel fortunate to experience an R 2 as large

as 0.70. An experienced model ﬁtter senses when a value is large enough, given the situation confronted. Clearly, some scientiﬁc phenomena lend themselves to modeling with more precision than others.

The R 2 criterion is dangerous to use for comparing competing models for the

same data set. Adding additional terms to the model (e.g., an additional regressor)

decreases SSE and thus increases R 2 (or at least does not decrease it). This implies

that R 2 can be made artiﬁcially high by an unwise practice of overﬁtting (i.e., the

inclusion of too many model terms). Thus, the inevitable increase in R 2 enjoyed

by adding an additional term does not imply the additional term was needed. In fact, the simple model may be superior for predicting response values. The role of overfitting and its influence on prediction capability will be discussed at length in Chapter 12 as we visit the notion of models involving more than a single regressor . Suffice it to say at this point that one should not subscribe to a model

selection process that solely involves the consideration of R 2 .

Inferences Concerning the Regression Coeﬃcients

11.5 Inferences Concerning the Regression Coeﬃcients

Parts

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Antiremed Kelas 12 Matematika (4)

Transmission of Greek and Arabic Veteri

Services for adults with an autism spect

Dukungan

Links

Inferences Concerning the Regression Coeﬃcients

11.5 Inferences Concerning the Regression Coeﬃcients

Parts

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Antiremed Kelas 12 Matematika (4)

Transmission of Greek and Arabic Veteri

Services for adults with an autism spect

Dokumen yang Anda mencari sudah siap untuk unduhkan