Least Squares and the Fitted Model

11.3 Least Squares and the Fitted Model

In this section, we discuss the method of fitting an estimated regression line to the data. This is tantamount to the determination of estimates b 0 for β 0 and b 1

11.3 Least Squares and the Fitted Model 395 for β 1 . This of course allows for the computation of predicted values from the

fitted line ˆ y=b 0 +b 1 x and other types of analyses and diagnostic information that will ascertain the strength of the relationship and the adequacy of the fitted model. Before we discuss the method of least squares estimation, it is important to introduce the concept of a residual. A residual is essentially an error in the fit

of the model ˆ y=b 0 +b 1 x.

Residual: Error in Given a set of regression data {(x i ,y i ); i = 1, 2, . . . , n} and a fitted model, ˆy i =

Fit b 0 +b 1 x i , the ith residual e i is given by

e i =y i − ˆy i ,

i = 1, 2, . . . , n.

Obviously, if a set of n residuals is large, then the fit of the model is not good. Small residuals are a sign of a good fit. Another interesting relationship which is useful at times is the following:

y i =b 0 +b 1 x i +e i .

The use of the above equation should result in clarification of the distinction be- tween the residuals, e i , and the conceptual model errors, ǫ i . One must bear in mind that whereas the ǫ i are not observed, the e i not only are observed but also play an important role in the total analysis.

Figure 11.5 depicts the line fit to this set of data, namely ˆ y=b 0 +b 1 x, and the line reflecting the model μ Y |x =β 0 +β 1 x. Now, of course, β 0 and β 1 are unknown parameters. The fitted line is an estimate of the line produced by the statistical model. Keep in mind that the line μ Y |x =β 0 +β 1 x is not known.

( x i ,y i )

y^ =b 0 + b 1 x

μ Yx | = β 0 + β 1 x

x Figure 11.5: Comparing ǫ i with the residual, e i .

The Method of Least Squares

We shall find b 0 and b 1 , the estimates of β 0 and β 1 , so that the sum of the squares of the residuals is a minimum. The residual sum of squares is often called the sum of squares of the errors about the regression line and is denoted by SSE. This

396 Chapter 11 Simple Linear Regression and Correlation minimization procedure for estimating the parameters is called the method of

least squares. Hence, we shall find a and b so as to minimize

Differentiating SSE with respect to b 0 and b 1 , we have

∂(SSE) n ∂(SSE)

Setting the partial derivatives equal to zero and rearranging the terms, we obtain the equations (called the normal equations)

which may be solved simultaneously to yield computing formulas for b 0 and b 1 . Estimating the Given the sample {(x i ,y i ); i = 1, 2, . . . , n}, the least squares estimates b 0 and b 1

Regression of the regression coefficients β 0 and β 1 are computed from the formulas Coefficients

(x i − ¯x)(y i − ¯y)

The calculations of b 0 and b 1 , using the data of Table 11.1, are illustrated by the following example.

Example 11.1: Estimate the regression line for the pollution data of Table 11.1. Solution :

x i = 1104,

y i = 1124,

x i y i = 41,355,

x 2 i = 41,086

Thus, the estimated regression line is given by

y = 3.8296 + 0.9036x. ˆ

Using the regression line of Example 11.1, we would predict a 31% reduction in the chemical oxygen demand when the reduction in the total solids is 30%. The

11.3 Least Squares and the Fitted Model 397 31% reduction in the chemical oxygen demand may be interpreted as an estimate

of the population mean μ Y |30 or as an estimate of a new observation when the reduction in total solids is 30%. Such estimates, however, are subject to error. Even if the experiment were controlled so that the reduction in total solids was 30%, it is unlikely that we would measure a reduction in the chemical oxygen demand exactly equal to 31%. In fact, the original data recorded in Table 11.1 show that measurements of 25% and 35% were recorded for the reduction in oxygen demand when the reduction in total solids was kept at 30%.

What Is Good about Least Squares?

It should be noted that the least squares criterion is designed to provide a fitted line that results in a “closeness” between the line and the plotted points. There are many ways of measuring closeness. For example, one may wish to determine b 0

and b 1 for which

|y 1.5

i − ˆy i | is minimized or for which

|y i − ˆy i | is minimized.

i=1

i=1

These are both viable and reasonable methods. Note that both of these, as well as the least squares procedure, result in forcing residuals to be “small” in some sense. One should remember that the residuals are the empirical counterpart to the ǫ values. Figure 11.6 illustrates a set of residuals. One should note that the fitted line has predicted values as points on the line and hence the residuals are vertical deviations from points to the line. As a result, the least squares procedure produces a line that minimizes the sum of squares of vertical deviations from the points to the line.

+b 1 x y

^=b 0

x Figure 11.6: Residuals as vertical deviations.

398 Chapter 11 Simple Linear Regression and Correlation

Exercises

y (grams) termine if certain static arm-strength measures have

11.1 A study was conducted at Virginia Tech to de-

x( ◦ C)

0 8 6 8 an influence on the “dynamic lift” characteristics of an

15 12 10 14 individual. Twenty-five individuals were subjected to

30 25 21 24 strength tests and then were asked to perform a weight-

45 31 33 28 lifting test in which weight was dynamically lifted over-

60 44 39 42 head. The data are given here.

(a) Find the equation of the regression line. Individual

Arm

Dynamic

Strength, x

Lift, y

(b) Graph the line on a scatter diagram.

2 19.3 48.3 (c) Estimate the amount of chemical that will dissolve ◦ 3 19.5 88.3 in 100 grams of water at 50 C. 4 19.7 75.0 11.4 The following data were collected to determine 5 22.9 91.7 the relationship between pressure and the correspond-

ing scale reading for the purpose of calibration.

8 26.8 65.0 Pressure, x (lb/sq in.) Scale Reading, y 9 27.6 75.0 10 13 10 28.1 88.3 10 18 11 28.2 68.3 10 16 12 28.7 96.7 10 15 13 29.0 76.7 10 20 14 29.6 78.3 50 86 15 29.9 60.0 50 90 16 29.9 71.7 50 88 17 30.3 85.0 50 88 18 31.3 85.0 50 92

19 36.0 88.3 (a) Find the equation of the regression line.

(b) The purpose of calibration in this application is to

estimate pressure from an observed scale reading.

Estimate the pressure for a scale reading of 54 using

x = (54 − b ˆ 0 )/b 1 .

25 55.9 71.7 11.5 A study was made on the amount of converted (a) Estimate β 0 and β 1 for the linear regression curve sugar in a certain process at various temperatures. The

μ Y |x =β 0 +β 1 x. data were coded and recorded as follows: (b) Find a point estimate of μ Y |30 .

Converted Sugar, y (c) Plot the residuals versus the x’s (arm strength).

Temperature, x

1.0 8.1 Comment.

11.2 The grades of a class of 9 students on a midterm 1.3 9.8 report (x) and on the final examination (y) are as fol-

1.4 9.5 lows:

x 77 50 71 72 81 94 96 99 67 1.7 10.2 y

82 66 78 34 47 85 99 99 68 1.8 9.3 (a) Estimate the linear regression line.

1.9 9.2 (b) Estimate the final examination grade of a student

2.0 10.5 who received a grade of 85 on the midterm report. (a) Estimate the linear regression line. 11.3 The amounts of a chemical compound y that dis- (b) Estimate the mean amount of converted sugar pro- solved in 100 grams of water at various temperatures

duced when the coded temperature is 1.75. x were recorded as follows:

(c) Plot the residuals versus temperature. Comment.

Exercises 399 11.6 In a certain type of metal test specimen, the nor-

Placement Test Course Grade mal stress on a specimen is known to be functionally

50 53 related to the shear resistance. The following is a set

35 41 of coded experimental data on the two variables:

Normal Stress, x

Shear Resistance, y

50 68 (b) Estimate the shear resistance for a normal stress of

(a) Estimate the regression line μ Y |x =β 0 +β 1 x.

50 79 11.7 The following is a portion of a classic data set

11.9 A study was made by a retail merchant to deter- called the “pilot plot data” in Fitting Equations to mine the relation between weekly advertising expendi-

Data by Daniel and Wood, published in 1971. The tures and sales. response y is the acid content of material produced by

Advertising Costs ($) Sales ($) titration, whereas the regressor x is the organic acid

40 385 content produced by extraction and weighing.

(a) Plot the data; does it appear that a simple linear 40 525 regression will be a suitable model?

25 480 (b) Fit a simple linear regression; estimate a slope and

50 510 intercept.

(a) Plot a scatter diagram.

(c) Graph the regression line on the plot in (a). (b) Find the equation of the regression line to predict 11.8 A mathematics placement test is given to all en-

weekly sales from advertising expenditures. tering freshmen at a small college. A student who re- (c) Estimate the weekly sales when advertising costs

ceives a grade below 35 is denied admission to the regu-

are $35.

lar mathematics course and placed in a remedial class. (d) Plot the residuals versus advertising costs. Com- The placement test scores and the final grades for 20

ment.

students who took the regular course were recorded. (a) Plot a scatter diagram.

11.10 The following data are the selling prices z of a certain make and model of used car w years old. Fit a

(b) Find the equation of the regression line to predict curve of the form μ z|w = γδ w by means of the nonlin- course grades from placement test scores.

ear sample regression equation ˆ z = cd w . [Hint: Write (c) Graph the line on the scatter diagram.

ln ˆ z = ln c + (ln d)w = b 0 +b 1 w.] (d) If 60 is the minimum passing grade, below which

w (years) z (dollars) w (years) z (dollars) placement test score should students in the future

3 5395 be denied admission to this course?

400 Chapter 11 Simple Linear Regression and Correlation 11.11 The thrust of an engine (y) is a function of data:

Particulate Removed, variables are held constant. Consider the following

exhaust temperature (x) in ◦

F when other important

Daily Rainfall,

x (0.01 cm) 3 y (μg/m ) data.

1820 3.8 132 (a) Plot the data.

2.1 141 (b) Fit a simple linear regression to the data and plot

7.5 108 the line through the data.

(a) Find the equation of the regression line to predict the particulate removed from the amount of daily

11.12 A study was done to study the effect of ambi-

rainfall.

ent temperature x on the electric power consumed by (b) Estimate the amount of particulate removed when a chemical plant y. Other factors were held constant,

the daily rainfall is x = 4.8 units. and the data were collected from an experimental pilot

plant. 11.14 A professor in the School of Business in a uni- y (BTU) x( ◦ F)

versity polled a dozen colleagues about the number of 250

y (BTU) x( ◦ F)

31 professional meetings they attended in the past five 285

60 years (x) and the number of papers they submitted 320

34 to refereed journals (y) during the same period. The 295

74 summary data are given as follows: (a) Plot the data.

n = 12, x = 4, ¯ ¯ y = 12, (b) Estimate the slope and intercept in a simple linear

x i y i = 318. (c) Predict power consumption for an ambient temper-

regression model.

x i 2 = 232,

i=1

i=1

ature of 65 ◦ F. Fit a simple linear regression model between x and y by finding out the estimates of intercept and slope. Com-

11.13 A study of the amount of rainfall and the quan- ment on whether attending more professional meetings tity of air pollution removed produced the following would result in publishing more papers.

Dokumen yang terkait

Optimal Retention for a Quota Share Reinsurance

0 0 7

Digital Gender Gap for Housewives Digital Gender Gap bagi Ibu Rumah Tangga

0 0 9

Challenges of Dissemination of Islam-related Information for Chinese Muslims in China Tantangan dalam Menyebarkan Informasi terkait Islam bagi Muslim China di China

0 0 13

Family is the first and main educator for all human beings Family is the school of love and trainers of management of stress, management of psycho-social-

0 0 26

THE EFFECT OF MNEMONIC TECHNIQUE ON VOCABULARY RECALL OF THE TENTH GRADE STUDENTS OF SMAN 3 PALANGKA RAYA THESIS PROPOSAL Presented to the Department of Education of the State Islamic College of Palangka Raya in Partial Fulfillment of the Requirements for

0 3 22

GRADERS OF SMAN-3 PALANGKA RAYA ACADEMIC YEAR OF 20132014 THESIS Presented to the Department of Education of the State College of Islamic Studies Palangka Raya in Partial Fulfillment of the Requirements for the Degree of Sarjana Pendidikan Islam

0 0 20

A. Research Design and Approach - The readability level of reading texts in the english textbook entitled “Bahasa Inggris SMA/MA/MAK” for grade XI semester 1 published by the Ministry of Education and Culture of Indonesia - Digital Library IAIN Palangka R

0 1 12

A. Background of Study - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

0 0 15

1. The definition of textbook - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

0 0 38

CHAPTER IV DISCUSSION - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

0 0 95