Least Squares and the Fitted Model

11.3 Least Squares and the Fitted Model

In this section, we discuss the method of ﬁtting an estimated regression line to

the data. This is tantamount to the determination of estimates b 0 for β 0 and b 1

11.3 Least Squares and the Fitted Model

for β 1 . This of course allows for the computation of predicted values from the

ﬁtted line ˆ y=b 0 +b 1 x and other types of analyses and diagnostic information

that will ascertain the strength of the relationship and the adequacy of the ﬁtted model. Before we discuss the method of least squares estimation, it is important to introduce the concept of a residual. A residual is essentially an error in the ﬁt

of the model ˆ y=b 0 +b 1 x.

Residual: Error in Given a set of regression data {(x i ,y i ); i = 1, 2, . . . , n } and a ﬁtted model, ˆy i =

Fit b 0 +b 1 x i , the ith residual e i is given by

e i =y i − ˆy i ,

i = 1, 2, . . . , n.

Obviously, if a set of n residuals is large, then the ﬁt of the model is not good. Small residuals are a sign of a good ﬁt. Another interesting relationship which is useful at times is the following:

y i =b 0 +b 1 x i +e i .

The use of the above equation should result in clariﬁcation of the distinction between the residuals, e i , and the conceptual model errors, i . One must bear in mind that whereas the i are not observed, the e i not only are observed but also play an important role in the total analysis.

Figure 11.5 depicts the line ﬁt to this set of data, namely ˆ y=b 0 +b 1 x, and the line reﬂecting the model μ Y |x =β 0 +β 1 x. Now, of course, β 0 and β 1 are unknown

parameters. The ﬁtted line is an estimate of the line produced by the statistical

model. Keep in mind that the line μ Y |x =β 0 +β 1 x is not known.

( x i ,y i )

y =b 0 + b 1 x

e ε i

μ Yx | = β 0 + β 1 x

Figure 11.5: Comparing i with the residual, e i .

The Method of Least Squares

We shall ﬁnd b 0 and b 1 , the estimates of β 0 and β 1 , so that the sum of the squares

of the residuals is a minimum. The residual sum of squares is often called the sum of squares of the errors about the regression line and is denoted by SSE. This

Chapter 11 Simple Linear Regression and Correlation

minimization procedure for estimating the parameters is called the method of least squares . Hence, we shall ﬁnd a and b so as to minimize

Diﬀerentiating SSE with respect to b 0 and b 1 , we have

∂(SSE) n ∂(SSE)

Setting the partial derivatives equal to zero and rearranging the terms, we obtain the equations (called the normal equations)

which may be solved simultaneously to yield computing formulas for b 0 and b 1 . Estimating the Given the sample {(x i ,y i ); i = 1, 2, . . . , n }, the least squares estimates b 0 and b 1

Regression of the regression coeﬃcients β 0 and β 1 are computed from the formulas

Coeﬃcients

(x i − ¯x)(y i − ¯y)

The calculations of b 0 and b 1 , using the data of Table 11.1, are illustrated by the

following example.

Example 11.1: Estimate the regression line for the pollution data of Table 11.1.

Solution :

x i = 1104,

y i = 1124,

x i y i = 41,355,

x 2 i = 41,086

Thus, the estimated regression line is given by

y = 3.8296 + 0.9036x. ˆ

Using the regression line of Example 11.1, we would predict a 31 reduction in the chemical oxygen demand when the reduction in the total solids is 30. The

11.3 Least Squares and the Fitted Model

31 reduction in the chemical oxygen demand may be interpreted as an estimate of the population mean μ Y |30 or as an estimate of a new observation when the reduction in total solids is 30. Such estimates, however, are subject to error. Even if the experiment were controlled so that the reduction in total solids was

30, it is unlikely that we would measure a reduction in the chemical oxygen demand exactly equal to 31. In fact, the original data recorded in Table 11.1 show that measurements of 25 and 35 were recorded for the reduction in oxygen demand when the reduction in total solids was kept at 30.

What Is Good about Least Squares?

It should be noted that the least squares criterion is designed to provide a ﬁtted line that results in a “closeness” between the line and the plotted points. There are many ways of measuring closeness. For example, one may wish to determine b 0

n n

and b 1 for which

|y 1.5 i − ˆy i | is minimized or for which |y i − ˆy i | is minimized.

i=1

These are both viable and reasonable methods. Note that both of these, as well as the least squares procedure, result in forcing residuals to be “small” in some sense. One should remember that the residuals are the empirical counterpart to the values. Figure 11.6 illustrates a set of residuals. One should note that the ﬁtted line has predicted values as points on the line and hence the residuals are vertical deviations from points to the line. As a result, the least squares procedure produces a line that minimizes the sum of squares of vertical deviations from the points to the line.

Figure 11.6: Residuals as vertical deviations.

Chapter 11 Simple Linear Regression and Correlation

Exercises

x( ◦ C)

11.1 A study was conducted at Virginia Tech to de-

y (grams)

termine if certain static arm-strength measures have

an inﬂuence on the “dynamic lift” characteristics of an

individual. Twenty-ﬁve individuals were subjected to

strength tests and then were asked to perform a weight-

lifting test in which weight was dynamically lifted over-

head. The data are given here.

(a) Find the equation of the regression line.

Individual

Strength, x

Lift, y

(b) Graph the line on a scatter diagram.

2 19.3 48.3 (c) Estimate the amount of chemical that will dissolve ◦

3 19.5 88.3 in 100 grams of water at 50 C.

4 19.7 75.0 11.4 The following data were collected to determine 5 22.9 91.7 the relationship between pressure and the correspond-

ing scale reading for the purpose of calibration.

8 26.8 65.0 Pressure, x (lbsq in.)

Scale Reading, y

19 36.0 88.3 (a) Find the equation of the regression line.

(b) The purpose of calibration in this application is to 21 40.4 estimate pressure from an observed scale reading.

22 44.3 Estimate the pressure for a scale reading of 54 using 23 44.6 91.7 x = (54 ˆ

−b 0 )b 1 . 25 55.9 71.7 11.5 A study was made on the amount of converted

(a) Estimate β 0 and β 1 for the linear regression curve sugar in a certain process at various temperatures. The

μ Y |x =β 0 +β 1 x.

data were coded and recorded as follows:

(b) Find a point estimate of μ Y |30 .

Temperature, x

Converted Sugar, y

11.2 The grades of a class of 9 students on a midterm

report (x) and on the ﬁnal examination (y) are as fol-

(a) Estimate the linear regression line.

(b) Estimate the ﬁnal examination grade of a student

who received a grade of 85 on the midterm report. (a) Estimate the linear regression line.

11.3 The amounts of a chemical compound y that dis- (b) Estimate the mean amount of converted sugar pro- solved in 100 grams of water at various temperatures

duced when the coded temperature is 1.75.

x were recorded as follows:

Exercises

11.6 In a certain type of metal test specimen, the nor-

Placement Test Course Grade

mal stress on a specimen is known to be functionally

related to the shear resistance. The following is a set

of coded experimental data on the two variables:

Normal Stress, x

Shear Resistance, y

(a) Estimate the regression line μ Y |x =β 0 +β 1 x.

(b) Estimate the shear resistance for a normal stress of

50 79 11.7 The following is a portion of a classic data set 11.9 A study was made by a retail merchant to deter-

called the “pilot plot data” in Fitting Equations to mine the relation between weekly advertising expendi- Data by Daniel and Wood, published in 1971. The tures and sales. response y is the acid content of material produced by

Advertising Costs ()

Sales ()

titration, whereas the regressor x is the organic acid

content produced by extraction and weighing.

(a) Plot the data; does it appear that a simple linear

regression will be a suitable model?

(b) Fit a simple linear regression; estimate a slope and

intercept.

(a) Plot a scatter diagram.

(b) Find the equation of the regression line to predict

11.8 A mathematics placement test is given to all en-

weekly sales from advertising expenditures.

tering freshmen at a small college. A student who re- (c) Estimate the weekly sales when advertising costs ceives a grade below 35 is denied admission to the regu-

are 35.

lar mathematics course and placed in a remedial class. (d) Plot the residuals versus advertising costs. Com- The placement test scores and the ﬁnal grades for 20

ment.

students who took the regular course were recorded. (a) Plot a scatter diagram.

11.10 The following data are the selling prices z of a certain make and model of used car w years old. Fit a

(b) Find the equation of the regression line to predict curve of the form μ

z|w w = γδ by means of the nonlin-

course grades from placement test scores.

ear sample regression equation ˆ z = cd w . [Hint: Write

ln ˆ z = ln c + (ln d)w = b 0 +b 1 w.]

(d) If 60 is the minimum passing grade, below which

w (years) z (dollars) w (years) z (dollars)

placement test score should students in the future

be denied admission to this course?

Chapter 11 Simple Linear Regression and Correlation

11.11 The thrust of an engine (y) is a function of data: exhaust temperature (x) in ◦

F when other important

Daily Rainfall , Particulate Removed ,

variables are held constant. Consider the following

x (0.01 cm)

y (μgm

(a) Plot the data.

(b) Fit a simple linear regression to the data and plot

the line through the data.

(a) Find the equation of the regression line to predict the particulate removed from the amount of daily

11.12 A study was done to study the eﬀect of ambi-

rainfall.

ent temperature x on the electric power consumed by (b) Estimate the amount of particulate removed when

a chemical plant y. Other factors were held constant,

the daily rainfall is x = 4.8 units.

and the data were collected from an experimental pilot plant.

11.14 A professor in the School of Business in a uni-

y (BTU) x( ◦ F)

versity polled a dozen colleagues about the number of

31 professional meetings they attended in the past ﬁve

60 years (x) and the number of papers they submitted

34 to refereed journals (y) during the same period. The

74 summary data are given as follows:

(a) Plot the data.

n = 12, x = 4, ¯ ¯ y = 12,

(b) Estimate the slope and intercept in a simple linear

2 regression model. n x i = 232, x i y i = 318. (c) Predict power consumption for an ambient temper-

ature of 65 ◦ F. Fit a simple linear regression model between x and y by ﬁnding out the estimates of intercept and slope. Com-

11.13 A study of the amount of rainfall and the quan- ment on whether attending more professional meetings tity of air pollution removed produced the following would result in publishing more papers.

Least Squares and the Fitted Model

11.3 Least Squares and the Fitted Model

Parts

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Antiremed Kelas 12 Matematika (4)

Transmission of Greek and Arabic Veteri

Services for adults with an autism spect

Dukungan

Links

Least Squares and the Fitted Model

11.3 Least Squares and the Fitted Model

Parts

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Antiremed Kelas 12 Matematika (4)

Transmission of Greek and Arabic Veteri

Services for adults with an autism spect

Dokumen yang Anda mencari sudah siap untuk unduhkan