Simple Linear Regression and

Chapter 11 Simple Linear Regression and

Correlation

11.1 (a) x i = 778.7,

y i = 2050.0,

x 2 i = 26, 591.63,

x i y i = 65, 164.04, n = 25.

(b) Using the equation ˆ y = 64.53 + 0.5609x with x = 30, we find ˆ y = 64.53 + (0.5609)(30) = 81.40.

(c) Residuals appear to be random as desired.

10 20 30 40 50 60 Arm Strength

11.2 (a) x i = 707,

y i = 658,

x i = 57, 557,

x i y i = 53, 258, n = 9.

b=

a=

150 Chapter 11 Simple Linear Regression and Correlation Hence ˆ y = 12.0623 + 0.7771x.

(b) For x = 85, ˆ y = 12.0623 + (0.7771)(85) = 78. P

11.3 (a) x i = 16.5,

y i = 100.4,

x 2 i = 25.85,

x i y i = 152.59, n = 11. Therefore,

Hence ˆ y = 6.4136 + 1.8091x (b) For x = 1.75, ˆ y = 6.4136 + (1.8091)(1.75) = 9.580. (c) Residuals appear to be random as desired.

11.4 (a) x i = 311.6,

y i = 297.2,

x i = 8134.26,

x i y i = 7687.76, n = 12.

Hence ˆ y = 42.582 − 0.6861x. (b) At x = 24.5, ˆ y = 42.582 − (0.6861)(24.5) = 25.772.

11.5 (a) x i = 675,

y i = 488,

x 2 i = 37, 125,

x i y i = 25, 005, n = 18. Therefore,

Hence ˆ y = 5.8254 + 0.5676x

Solutions for Exercises in Chapter 11 151 (b) The scatter plot and the regression line are shown below.

40 y ^ = 5.8254 + 0.5676 x

(c) For x = 50, ˆ y = 5.8254 + (0.5676)(50) = 34.205 grams.

11.6 (a) The scatter plot and the regression line are shown below.

60 ^ y = 32.5059 + 0.4711 x

Course Grade 40

40 50 60 70 80 90 Placement Test

(b) x = 1110,

y = 1173,

i = 67, 100,

x i y i = 67, 690, n = 20. Therefore,

Hence ˆ y = 32.5059 + 0.4711x (c) See part (a). (d) For ˆ y = 60, we solve 60 = 32.5059 + 0.4711x to obtain x = 58.466. Therefore,

students scoring below 59 should be denied admission.

11.7 (a) The scatter plot and the regression line are shown here.

152 Chapter 11 Simple Linear Regression and Correlation

y ^ = 343.706 + 3.221 x

20 25 30 35 40 45 50 Advertising Costs

(b) x i = 410,

y i = 5445,

x 2 i = 15, 650,

x i y i = 191, 325, n = 12. Therefore,

Hence ˆ y = 343.7056 + 3.2208x (c) When x = $35, ˆ y = 343.7056 + (3.2208)(35) = $456.43. (d) Residuals appear to be random as desired.

20 25 30 35 40 45 50 Advertising Costs

11.8 (a) ˆ y = −1.70 + 1.81x. (b) ˆ x = (54 + 1.71)/1.81 = 30.78.

11.9 (a) x i = 45,

y i = 1094,

x 2 i = 244.26,

x i y i = 5348.2, n = 9.

Hence ˆ y = 153.1755 − 6.3240x.

Solutions for Exercises in Chapter 11 153 (b) For x = 4.8, ˆ y = 153.1755 − (6.3240)(4.8) = 123.

11.10 (a) ˆ w z = cd , ln ˆ z = ln c + (ln d)w; setting ˆ y = ln z, a = ln c, b = ln d, and ˆ y = a + bx, we have

1 2 2 3 5 5 y = ln z 8.7562 8.6473 8.6570 8.5932 8.5142 8.4960

x=w

x i = 18,

y i = 51.6639,

x 2 i = 68,

x i y i = 154.1954, n = 6.

Now c = e w = 6511.3364, d = e −0.0569 = 0.9447, and ˆ z = 6511.3364 × 0.9447 . (b) For w = 4, ˆ 4 z = 6511.3364 × 0.9447 = $5186.16.

11.11 (a) The scatter plot and the regression line are shown here.

y =− 1847.633 + 3.653 x

(b) x i = 14, 292,

y i = 35, 578,

x 2 i = 22, 954, 054,

x i y i = 57, 441, 610, n = 9.

Hence ˆ y = −1847.69 + 3.6529x.

11.12 (a) The scatter plot and the regression line are shown here.

154 Chapter 11 Simple Linear Regression and Correlation

Power Consumed 270

y ^ = 218.255 + 1.384 x

30 40 50 60 70 Temperature

(b) x i = 401,

y i = 2301,

x 2 i = 22, 495,

x i y i = 118, 652, n = 8. Therefore,

Hence ˆ y = 218.26 + 1.3839x. (c) For x = 65 ◦ y = 218.26 + (1.3839)(65) = 308.21. F, ˆ

11.13 (a) The scatter plot and the regression line are shown here. A simple linear model seems suitable for the data.

Power Consumed 270

y ^ = 218.255 + 1.384 x

30 40 50 60 70 Temperature

(b) x i = 999,

y i = 670,

x 2 i = 119, 969,

x i y i = 74, 058, n = 10. Therefore,

Hence ˆ y = 31.71 + 0.3533x. (c) See (a).

Solutions for Exercises in Chapter 11 155

11.14 From the data summary, we obtain

Hence, ˆ y = 37.8 − 6.45x. It appears that attending professional meetings would not result in publishing more papers.

11.15 The least squares estimator A of α is a linear combination of normally distributed random variables and is thus normal as well.

E(A) = E( ¯ Y − B¯x) = E( ¯ Y ) − ¯xE(B) = α + β ¯x − β ¯x = α,

σ A =σ Y −B¯ ¯ x =σ Y ¯ +¯ x σ B − 2¯xσ YB ¯ =

, since σ YB ¯ = 0.

11.16 We have the following:

(x i − ¯x)µ Y i  

  Cov( ¯ Y , B) = E

(x i

X X − ¯x)Y i

(x i − ¯x)E(Y i −µ Y i ) + (x i − ¯x)E(Y i −µ Y i )(Y j −µ Y j )

= i6=j

(x i − ¯x)σ Y i +

Cov(Y i ,Y j )

= i6=j

Now, σ 2 Y i =σ 2 for all i, and Cov(Y i ,Y j ) = 0 for i 6= j. Therefore,

(x i − ¯x)

Cov( ¯ i=1 Y , B) =

n (x i

− ¯x) 2

i=1

156 Chapter 11 Simple Linear Regression and Correlation

11.17 S 2 xx = 26, 591.63 − 778.7 /25 = 2336.6824, S yy = 172, 891.46 − 2050 /25 = 4791.46, S xy = 65, 164.04 − (778.7)(2050)/25 = 1310.64, and b = 0.5609.

(a) s 4791.46−(0.5609)(1310.64) 2 =

(b) The hypotheses are

H 0 : β = 0,

H 1 : β 6= 0.

α = 0.05. Critical region: t < −2.069 or t > 2.069.

Computation: t =

Decision: Do not reject H 0 .

11.18 S

xx = 57, 557 − 707 /9 = 2018.2222, S

yy = 51, 980 − 658 /9 = 3872.8889, S xy =

53, 258 − (707)(658)/9 = 1568.4444, a = 12.0623 and b = 0.7771.

2 (a) s 3872.8889−(0.7771)(1568.4444) =

(b) Since s = 19.472 and t 0.025 = 2.365 for 7 degrees of freedom, then a 95% confidence interval is

which implies −69.91 < α < 94.04.

xx = 25.85 − 16.5 /11 = 1.1, S yy = 923.58 − 100.4 /11 = 7.2018, S xy = 152.59 − (165)(100.4)/11 = 1.99, a = 6.4136 and b = 1.8091.

(a) s 2 = 7.2018−(1.8091)(1.99) 9 = 0.40.

(b) Since s = 0.632 and t 0.025 = 2.262 for 9 degrees of freedom, then a 95% confidence interval is

which implies 4.324 < α < 8.503.

yy = 7407.80 − 297.2 /12 = 47.1467, S xy = 7687.76 − (311.6)(297.2)/12 = −29.5333, a = 42.5818 and b = −0.6861.

xx = 8134.26 − 311.6 /12 = 43.0467, S

(a) s 47.1467−(−0.6861)(−29.5333) 2 =

Solutions for Exercises in Chapter 11 157 (b) Since s = 1.640 and t 0.005 = 3.169 for 10 degrees of freedom, then a 99% confidence

interval is

which implies 21.958 < α < 63.205.

(c) −0.6861 ± (3.169)(1.640)/ 43.0467 implies −1.478 < β < 0.106.

11.21 S

xx = 37, 125 − 675 yy = 17, 142 − 488 /18 = 3911.7778, S xy =

2 /18 = 11, 812.5, S

25, 005 − (675)(488)/18 = 6705, a = 5.8254 and b = 0.5676.

(a) s 2 = 3911.7778−(0.5676)(6705) 16 = 6.626.

(b) Since s = 2.574 and t 0.005 = 2.921 for 16 degrees of freedom, then a 99% confidence interval is

which implies 2.686 < α < 8.968.

(c) 0.5676 ± (2.921)(2.574)/

11, 812.5 implies 0.498 < β < 0.637.

11.22 The hypotheses are

H 0 : α = 10,

H 1 : α > 10.

α = 0.05. Critical region: t > 1.734. Computations: S

= 67, 100 − 1110 2 yy = 74, 725 − 1173 /20 = 5928.55, S 5928.55−(0.4711)(2588.5) 2

xx

/20 = 5495, S

18 = 261.617 and then s = 16.175. Now

xy = 67, 690 − (1110)(1173)/20 = 2588.5, s =

Decision: Reject H 0 and claim α > 10.

11.23 The hypotheses are

H 0 : β = 6,

H 1 : β < 6.

α = 0.025. Critical region: t = −2.228.

158 Chapter 11 Simple Linear Regression and Correlation

Computations: S 2

xx = 15, 650 − 410 /12 = 1641.667, S yy = 2, 512.925 − 5445 /12 =

2 42, 256.25, S 42,256.25−(3,221)(5,287.5) xy = 191, 325 − (410)(5445)/12 = 5, 287.5, s =

2, 522.521 and then s = 50.225. Now

50.225/ 1641.667 Decision: Reject H 0 and claim β < 6.

11.24 Using the value s = 19.472 from Exercise 11.18(a) and the fact that ¯ y 0 = 74.230 when x 0 = 80, and ¯ x = 78.556, we have

Simplifying it we get 58.808 < µ Y | 80 < 89.652.

11.25 Using the value s = 1.64 from Exercise 11.20(a) and the fact that y 0 = 25.7724 when x 0 = 24.5, and ¯ x = 25.9667, we have

1 (−1.4667) (a) 25.7724 ± (2.228)(1.640) 2

12 + 43.0467 = 25.7724 ± 1.3341 implies 24.438 < µ Y | 24.5 < 27.106.

(b) 25.7724 ± (2.228)(1.640) 1+ 1 12 + (−1.4667) 2 43.0467 = 25.7724 ± 3.8898 implies 21.883 < y 0 < 29.662.

11.26 95% confidence bands are obtained by plotting the limits

9.0 Converted Sugar

11.27 Using the value s = 0.632 from Exercise 11.19(a) and the fact that y 0 = 9.308 when x 0 = 1.6, and ¯ x = 1.5, we have

11 1.1 implies 7.809 < y 0 < 10.807.

Solutions for Exercises in Chapter 11 159

11.28 sing the value s = 2.574 from Exercise 11.21(a) and the fact that y 0 = 34.205 when x 0 = 50, and ¯ x = 37.5, we have

1 12.5 (a) 34.205 ± (2.921)(2.574) 2

18 + 11,812.5 = 34.205 ± 1.9719 implies 32.23 < µ Y | 50 <

1 12.5 (b) 34.205 ± (2.921)(2.574) 2 1+

18 + 11,812.5 = 34.205 ± 7.7729 implies 26.43 < y 0 <

11.29 (a) 17.1812. (b) The goal of 30 mpg is unlikely based on the confidence interval for mean mpg,

(27.95, 29.60). (c) Based on the prediction interval, the Lexus ES300 should exceed 18 mpg.

11.30 It is easy to see that

(y i − ˆy i )=

(y i − a − bx i )=

[y i − (¯y − b¯x) − bx i )

since a = ¯ y − b¯x.

11.31 When there are only two data points x 1 6= x 2 , using Exercise 11.30 we know that (y 1 − ˆy 1 ) + (y 2 − ˆy 2 ) = 0. On the other hand, by the method of least squares on page 395, we also know that x 1 (y 1 − ˆy 1 )+x 2 (y 2 − ˆy 2 ) = 0. Both of these equations yield (x 2 −x 1 )(y 2 − ˆy 2 ) = 0 and hence y 2 − ˆy 2 = 0. Therefore, y 1 − ˆy 1 = 0. So,

SSE = (y

1 − ˆy 1 ) + (y 2 − ˆy 2 ) = 0.

Since R 2 =1− SSE SST , we have R 2 = 1.

11.32 (a) Suppose that the fitted model is ˆ y = bx. Then

Taking derivative of the above with respect to b and setting the derivative to zero,

we have −2 i=1 x i (y i − bx i ) = 0, which implies b =

P n 2 2 = P n 2 2 = P n 2 , since Y i ( ’s are independent. x

2 i=1

i=1

Yi

i=1

i=1

i=1

160 Chapter 11 Simple Linear Regression and Correlation

P x i (βx i )

(c) E(B) = i=1

11.33 (a) The scatter plot of the data is shown next.

^ y = 3.416 x

(b) 5564 x

i = 1629 and x i y i = 5564. Hence b = 1629 = 3.4156. So, ˆ y = 3.4156x.

i=1

i=1

(c) See (a). (d) Since there is only one regression coefficient, β, to be estimated, the degrees of

freedom in estimating σ 2 is n − 1. So,

2 x σ (e) V ar(ˆ 2 y

i ) = V ar(Bx

i )=x i

V ar(B) = P n

i=1

(f) The plot is shown next.

11.34 Using part (e) of Exercise 11.33, we can see that the variance of a prediction y 0 at x 0

Solutions for Exercises in Chapter 11 161 

2 2 x is σ 2 y 0 =σ

1 + 0 P n x 2  . Hence the 95% prediction limits are given as

1629 which implies 74.45 < y 0 < 96.27.

11.35 (a) As shown in Exercise 11.32, the least squares estimator of β is b = i=1 P n x 2 .

i=1 i

(b) Since x

i y i = 197.59, and

i = 98.64, then b = 98.64 = 2.003 and ˆ y = 2.003x.

i=1

i=1

11.36 It can be calculated that b = 1.929 and a = 0.349 and hence ˆ y = 0.349 + 1.929x when intercept is in the model. To test the hypotheses

H 0 : α = 0,

H 1 : α 6= 0,

with 0.10 level of significance, we have the critical regions as t < −2.132 or t > 2.132.

Computations: s 0.349 2 = 0.0957 and t = √

Decision: Fail to reject H 0 ; the intercept appears to be zero.

11.37 Now since the true model has been changed,

(x 1i − ¯x 1 )E(Y i )

(x 1i − ¯x 1 )(α + βx 1i + γx 2i )

E(B) = i=1

11.38 The hypotheses are

H 0 : β = 0,

H 1 : β 6= 0.

Level of significance: 0.05. Critical regions: f > 5.12.

Computations: SSR = bS xy = 1.8091 1.99 = 3.60 and SSE = S yy − SSR = 7.20 − 3.60 =

162 Chapter 11 Simple Linear Regression and Correlation

Source of

Sum of Degrees of Mean Computed

Variation

Squares Freedom Square

Decision: Reject H 0 .

S 11.39 (a) S xy xx = 1058, S yy = 198.76, S xy = −363.63, b = S xx = −0.34370, and a =

(b) The hypotheses are

H 0 : The regression is linear in x,

H 1 : The regression is nonlinear in x.

α = 0.05. Critical regions: f > 3.10 with 3 and 20 degrees of freedom. Computations: SST = S yy = 198.76, SSR = bS xy = 124.98 and SSE = S yy − SSR = 73.98. Since

T 1. = 51.1, T 2. = 51.5, T 3. = 49.3, T 4. = 37.0 and T 5. = 22.1, then

2 X T 2 i.

SSE(pure) =

Hence the “Lack-of-fit SS” is 73.78 − 69.33 = 4.45. Source of

Computed Variation

Sum of

Degrees of

Mean

f Regression

Lack of fit

Pure error

Decision: Do not reject H 0 .

11.40 The hypotheses are

H 0 : The regression is linear in x,

H 1 : The regression is nonlinear in x.

α = 0.05. Critical regions: f > 3.26 with 4 and 12 degrees of freedom.

Solutions for Exercises in Chapter 11 163 Computations: SST = S yy = 3911.78, SSR = bS xy = 3805.89 and SSE = S yy −

SSR = 105.89. SSE(pure) =

ij i. −

3 = 69.33, and the “Lack-of-fit SS” is

i=1 j=1

i=1

105.89 − 69.33 = 36.56. Source of

Computed Variation

Sum of

Degrees of

Mean

f Regression

Lack of fit

36.56 4 9.14 1.58 Pure error

Decision: Do not reject H 0 ; the lack-of-fit test is insignificant.

11.41 The hypotheses are

H 0 : The regression is linear in x,

H 1 : The regression is nonlinear in x.

α = 0.05. Critical regions: f > 3.00 with 6 and 12 degrees of freedom. Computations: SST = S yy = 5928.55, SSR = bS xy = 1219.35 and SSE = S yy −

= 3020.67, and the “Lack-of-fit SS” is 4709.20 − 3020.67 = 1688.53. Source of

SSR = 4709.20. SSE(pure) =

Computed Variation

Sum of

Degrees of

Mean

f Regression

Lack of fit

1.12 Pure error

Decision: Do not reject H 0 ; the lack-of-fit test is insignificant.

11.42 (a) t = 2.679 and 0.01 < P (T > 2.679) < 0.015, hence 0.02 < P -value < 0.03. There is a strong evidence that the slope is not 0. Hence emitter drive-in time influences gain in a positive linear fashion.

(b) f = 56.41 which results in a strong evidence that the lack-of-fit test is significant. Hence the linear model is not adequate.

(c) Emitter does does not influence gain in a linear fashion. A better model is a quadratic one using emitter drive-in time to explain the variability in gain.

164 Chapter 11 Simple Linear Regression and Correlation

11.43 ˆ y = −21.0280 + 0.4072x; f LOF = 1.71 with a P -value = 0.2517. Hence, lack-of-fit test is insignificant and the linear model is adequate.

11.44 (a) ˆ y = 0.011571 + 0.006462x with t = 7.532 and P (T > 7.532) < 0.0005 Hence P -value < 0.001; the slope is significantly different from 0 in the linear regression model.

(b) f LOF = 14.02 with P -value < 0.0001. The lack-of-fit test is significant and the linear model does not appear to be the best model.

11.45 (a) ˆ y = −11.3251 − 0.0449 temperature. (b) Yes. (c) 0.9355.

(d) The proportion of impurities does depend on temperature.

Proportion of Impurity 0.4

However, based on the plot, it does not appear that the dependence is in linear fashion. If there were replicates, a lack-of-fit test could be performed.

11.46 (a) ˆ y = 125.9729 + 1.7337 population; P -value for the regression is 0.0023. (b) f 6,2 = 0.49 and P -value = 0.7912; the linear model appears to be adequate based

on the lack-of-fit test. (c) f 1,2 = 11.96 and P -value = 0.0744. The results do not change. The pure error

test is not as sensitive because the loss of error degrees of freedom.

11.47 (a) The figure is shown next. (b) ˆ

y = −175.9025 + 0.0902 year; R 2 = 0.3322. (c) There is definitely a relationship between year and nitrogen oxide. It does not

appear to be linear.

Solutions for Exercises in Chapter 11 165

11.48 The ANOVA model is: Source of

Computed Variation

Sum of

Degrees of

Mean

f Regression

Lack of fit

9.88 Pure error

The P -value = 0.0029 with f = 9.88.

Decision: Reject H 0 ; the lack-of-fit test is significant.

11.49 S xx = 36, 354 − 35, 882.667 = 471.333, S yy = 38, 254 − 37, 762.667 = 491.333, and S

xy = 36, 926 − 36, 810.667 = 115.333. So, r = √ = 0.240.

11.50 The hypotheses are

H 0 : ρ = 0,

H 1 : ρ 6= 0.

α = 0.05. Critical regions: t < −2.776 or t > 2.776. √

0.240 Computations: t = 4 √

Decision: Do not reject H 0 .

S 11.51 Since b = yy S xx , we can write s =

. Also, b = r S xx so that

n−2

n−2

(1−r 2 )S s yy =

2 S yy −r 2 S yy

= n−2 , and hence

2 (1 − r 2 )/(n − 2) 1−r

166 Chapter 11 Simple Linear Regression and Correlation

2 11.52 (a) S 2 xx = 128.6602 − 32.68 /9 = 9.9955, S yy = 7980.83 − 266.7 /9 = 77.62, and S 21.8507 xy = 990.268 − (32.68)(266.7)/9 = 21.8507. So, r = √

(b) The hypotheses are

H 0 : ρ = 0,

H 1 : ρ > 0.

α = 0.01. Critical regions: t > 2.998. √

Computations: t = √ 0.784 7 2 = 3.34.

Decision: Reject H 0 ; ρ > 0.

(c) (0.784) 2 (100%) = 61.5%.

11.53 (a) From the data of Exercise 11.1 we can calculate

xx = 26, 591.63 − (778.7) /25 = 2336.6824, S 2

yy = 172, 891.46 − (2050) /25 = 4791.46, S xy = 65, 164.04 − (778.7)(2050)/25 = 1310.64.

Therefore, r = 1310.64 √

(b) The hypotheses are

H 0 : ρ = 0,

H 1 : ρ 6= 0.

α = 0.05. Critical regions: t < −2.069 or t > 2.069. √

Computations: t = √ 0.392 23 1−0.392 2 = 2.04. Decision: Fail to reject H 0 at level 0.05. However, the P -value = 0.0530 which is marginal.

11.54 (a) From the data of Exercise 11.9 we find S

xx = 244.26 − 45 /9 = 19.26, S yy = 133, 786 − 1094 2 /9 = 804.2222, and S

xy = 5348.2 − (45)(1094)/9 = −121.8. So, r=

√ −121.8 (19.26)(804.2222) = −0.979. (b) The hypotheses are

H 0 : ρ = −0.5,

H 1 : ρ < −0.5.

α = 0.025. Critical regions: z < −1.96. √ h

Computations: z = 2 ln (1.979)(0.5) = −4.22. Decision: Reject H 0 ; ρ < −0.5.

Solutions for Exercises in Chapter 11 167 (c) (−0.979) 2 (100%) = 95.8%.

11.55 Using the value s = 16.175 from Exercise 11.6 and the fact that ˆ y 0 = 48.994 when x 0 = 35, and ¯ x = 55.5, we have

(a) 48.994±(2.101)(16.175) 2 1/20 + (−20.5) /5495 which implies to 36.908 < µ

Y | 35 < 61.080.

(b) 48.994 ± (2.101)(16.175) 2 1 + 1/20 + (−20.5) /5495 which implies to 12.925 < y 0 < 85.063.

11.56 The fitted model can be derived as ˆ y = 3667.3968 − 47.3289x. The hypotheses are

H 0 : β = 0,

H 1 : β 6= 0. t = −0.30 with P -value = 0.77. Hence H 0 cannot be rejected.

11.57 (a) S

xx = 729.18 − 118.6 /20 = 25.882, S xy = 1714.62 − (118.6)(281.1)/20 = 47.697, S so b = xy

S xx = 1.8429, and a = ¯ y − b¯x = 3.1266. Hence ˆy = 3.1266 + 1.8429x. (b) The hypotheses are

H 0 : the regression is linear in x,

H 1 : the regression is not linear in x.

α = 0.05. Critical region: f > 3.07 with 8 and 10 degrees of freedom. Computations: SST = 13.3695, SSR = 87.9008, SSE = 50.4687, SSE(pure) = 16.375, and Lack-of-fit SS = 34.0937.

Source of

Computed Variation

Sum of

Degrees of

Mean

f Regression

2.60 Pure error

Lack of fit

The P -value = 0.0791. The linear model is adequate at the level 0.05.

11.58 Using the value s = 50.225 and the fact that ˆ y 0 = $448.644 when x 0 = $45, and x = $34.167, we have ¯

(a) 488.644 ± (1.812)(50.225) 1/12 + 10.833 2 1641.667 , which implies 452.835 < µ Y | 45 < 524.453.

168 Chapter 11 Simple Linear Regression and Correlation q

10.833 (b) 488.644 ± (1.812)(50.225) 2 1 + 1/12 + 1641.667 , which implies 390.845 < y 0 < 586.443.

11.59 (a) ˆ y = 7.3598 + 135.4034x. (b) SS(Pure Error) = 52, 941.06; f LOF = 0.46 with P -value = 0.64. The lack-of-fit

test is insignificant. (c) No.

11.60 (a) S 603.75 xx = 672.9167, S yy = 728.25, S xy = 603.75 and r = √ = 0.862,

which means that (0.862) 2 (100%) = 74.3% of the total variation of the values of Y in our sample is accounted for by a linear relationship with the values of X.

(b) To estimate and test hypotheses on ρ, X and Y are assumed to be random variables from a bivariate normal distribution.

(c) The hypotheses are

H 0 : ρ = 0.5,

H 1 : ρ > 0.5.

α = 0.01. Critical regions: z > 2.33.

Computations: z = 9 2 (1.862)(0.5) ln (0.138)(1.5) = 2.26. Decision: Reject H 0 ; ρ > 0.5.

P n (y i −ˆ y i ) 2

2 11.61 s i=1 =

. Using the centered model, ˆ y i =¯ y + b(x i − ¯x) + ǫ i .

n−2

(n − 2)E(S )=E

[α + β(x i − ¯x) + ǫ i − (¯y + b(x i − ¯x))]

i=1

2 2 2 = 2 E (α − ¯y) + (β − b) (x

i − ¯x) +ǫ i − 2b(x i − ¯x)ǫ i − 2¯yǫ i ,

i=1

(other cross product terms go to 0)

11.62 (a) The confidence interval is an interval on the mean sale price for a given buyer’s bid. The prediction interval is an interval on a future observed sale price for a given buyer’s bid.

(b) The standard errors of the prediction of sale price depend on the value of the buyer’s bid.

Solutions for Exercises in Chapter 11 169 (c) Observations 4, 9, 10, and 17 have the lowest standard errors of prediction. These

observations have buyer’s bids very close to the mean.

11.63 (a) The residual plot appears to have a pattern and not random scatter. The R 2 is only 0.82. (b) The log model has an R 2 of 0.84. There is still a pattern in the residuals.

(c) The model using gallons per 100 miles has the best R 2 with a 0.85. The residuals appear to be more random. This model is the best of the three models attempted. Perhaps a better model could be found.

11.64 (a) The plot of the data and an added least squares fitted line are given here.

(b) Yes. (c) ˆ y = 61.5133 + 0.1139x.

Computed Variation

Source of

Sum of

Degrees of

Mean

f Regression

Lack of fit

Pure error

The P -value = 0.533. (d) The results in (c) show that the linear model is adequate.

11.65 (a) ˆ y = 90.8904 − 0.0513x. (b) The t-value in testing H 0 : β = 0 is −6.533 which results in a P -value < 0.0001.

Hence, the time it takes to run two miles has a significant influence on maximum oxygen uptake.

(c) The residual graph shows that there may be some systematic behavior of the residuals and hence the residuals are not completely random

170 Chapter 11 Simple Linear Regression and Correlation

11.66 Let Y ∗ i =Y i − α, for i = 1, 2, . . . , n. The model Y i = α + βx i +ǫ i is equivalent to Y ∗ i = βx i +ǫ i . This is a “regression through the origin” model that is studied in Exercise 11.32.

(a) Using the result from Exercise 11.32(a), we have

x i (y i − α)

x i y i − n¯xα

(b) Also from Exercise 11.32(b) we have σ 2 2

i=1

11.67 SSE = 2 (y

i − βx i ) . Taking derivative with respect to β and setting this as 0, we

i=1

get x i (y i − bx i ) = 0, or x i (y i − ˆy i ) = 0. This is the only equation we can get

i=1

i=1

using the least squares method. Hence in general, (y i − ˆy i ) = 0 does not hold for a

i=1