Estimating Model Parameters

12.2 Estimating Model Parameters

We will assume in this and the next several sections that the variables x and y are

related according to the simple linear regression model. The values of b 0 ,b 1 , and s 2

will almost never be known to an investigator. Instead, sample data consisting of n

observed pairs (x 1 ,y 1 ), c, (x n ,y n ) will be available, from which the model parame-

ters and the true regression line itself can be estimated. These observations are assumed to have been obtained independently of one another. That is, y i is the

observed value of Y i , where Y i 5b 0 1b 1 x i 1P i and the n deviations P 1 ,P 2 , c, P n are independent rv’s. Independence of Y 1 ,Y 2 , c, Y n follows from independence of

the P i ’s.

According to the model, the observed points will be distributed about the true regression line in a random manner. Figure 12.6 shows a typical plot of observed pairs along with two candidates for the estimated regression line.

Intuitively, the line y5a 0 1a 1 x is not a reasonable estimate of the true line y5b 0 1b 1 x because, if y5a 0 1a 1 x were the true line, the observed points would almost surely have been closer to this line. The line y5b 0 1b 1 x is a more

plausible estimate because the observed points are scattered rather closely about this line.

Figure 12.6 Two different estimates of the true regression line

Figure 12.6 and the foregoing discussion suggest that our estimate of y5b 0 1b 1 x should be a line that provides in some sense a best fit to the observed

data points. This is what motivates the principle of least squares, which can be traced back to the German mathematician Gauss (1777–1855). According to this principle,

a line provides a good fit to the data if the vertical distances (deviations) from the observed points to the line are small (see Figure 12.7). The measure of the goodness of fit is the sum of the squares of these deviations. The best-fit line is then the one having the smallest possible sum of squared deviations.

CHAPTER 12 Simple Linear Regression and Correlation

Principle of Least Squares

The vertical deviation of the point (x i ,y i ) from the line y5b 0 1b 1 x is

height of point 2 height of line 5 y i 2 (b 0 1b 1 x i ) The sum of squared vertical deviations from the points (x 1 ,y 1 ), c, (x n ,y n ) to

the line is then

0 1 )5 g [y i 2 (b 0 1b 1 x i )]

i51

The point estimates of b 0 and b 1 , denoted by b ˆ 0 and b ˆ 1 and called the least squares estimates, are those values that minimize f( b 0 ,b 1 ) . That is, b ˆ 0 and

b ˆ 1 are such that f(b ˆ 0 ,b ˆ 1 ) f(b 0 ,b 1 ) for any b 0 and b 1 . The estimated

regression line or least squares line is then the line whose equation is

Time to failure (hr) 20 x 10 20 30 40 Applied stress (kgmm 2 )

Figure 12.7 Deviations of observed data from line y5b 0 1b 1 x

The minimizing values of b 0 and b 1 are found by taking partial derivatives of

f (b 0 ,b 1 ) with respect to both b 0 and b 1 , equating them both to zero [analogously to

f r(b) 5 0 in univariate calculus], and solving the equations

' f (b ,b )

0 ' 1

b 1 x i 0 ) (21) 5 0

Cancellation of the 22 factor and rearrangement gives the following system of equa-

tions, called the normal equations: nb 0 1 (g x i )b 1 5gy i

(g x i )b 0 1 (g x 2 i )b 1 5gx i y i

These equations are linear in the two unknowns b 0 and b 1 . Provided that not all x i ’s

are identical, the least squares estimates are the unique solution to this system.

12.2 Estimating Model Parameters

The least squares estimate of the slope coefficient b 1 of the true regression line is

g(x i 2 x)(y i 2 y)

b 1 5b ˆ

1 5 5 g(x (12.2) 2

Computing formulas for the numerator and denominator of bˆ 1 are

xy 5gx i y i 2 (g x i )(g y i )n

xx 5gx S i 2 (g x i ) n The least squares estimate of the intercept b 0 of the true regression line is

The computational formulas for S xy and S xx require only the summary statistics g x i ,

i , and g x i y i ( gy i will be needed shortly). In computing b ˆ 0 , use extra dig- its in b ˆ 1 because, if is large in magnitude, rounding will affect the final answer. In x

practice, the use of a statistical software package is preferable to hand calculation and hand-drawn plots. Once again, be sure that the scatter plot shows a linear pat- tern with relatively homogenous variation before fitting the simple linear regression model.

Estimating Model Parameters

12.2 Estimating Model Parameters

Parts

Dokumen yang terkait

SOAL MATEMATIKA KELAS X SMA PERSIAPAN UJ (1)

SOAL PKn KELAS VI (1)

MATA UJIAN MATEMATIKA IPA BIOLOGI FISIKA (1)

Tugas 7 SKB (1)

KISI KISI ULANGAN TENGAH SEMESTER KELAS (1)

Soal ujian seleksi calon peserta olimpia (1)

Kontrak Kerja 1 (1)

UJIAN MASUK BERSAMA PERGURUAN TINGGI 201 (1)

Jambore1 (1)

SOAL UAS SEMESTER 1 KELAS 6 MAPEL IPA (1)

Dukungan

Links

Estimating Model Parameters

12.2 Estimating Model Parameters

Parts

Dokumen yang terkait

SOAL MATEMATIKA KELAS X SMA PERSIAPAN UJ (1)

SOAL PKn KELAS VI (1)

MATA UJIAN MATEMATIKA IPA BIOLOGI FISIKA (1)

Tugas 7 SKB (1)

KISI KISI ULANGAN TENGAH SEMESTER KELAS (1)

Soal ujian seleksi calon peserta olimpia (1)

Kontrak Kerja 1 (1)

UJIAN MASUK BERSAMA PERGURUAN TINGGI 201 (1)

Jambore1 (1)

SOAL UAS SEMESTER 1 KELAS 6 MAPEL IPA (1)

Dokumen yang Anda mencari sudah siap untuk unduhkan