10 Reconsider the data on x 5 burner area liberation rate and y 5 NO x emission rate

Example 12.10 Reconsider the data on x 5 burner area liberation rate and y 5 NO x emission rate

  from Exercise 12.19 in the previous section. There are 14 observations, made at the x values 100, 125, 125, 150, 150, 200, 200, 250, 250, 300, 300, 350, 400, and 400, respectively. Suppose that the slope and intercept of the true regression line are

  b 1 5 1.70 and b 0 5 250 , with s 5 35 (consistent with the values bˆ 1 5 1.7114, bˆ 0 5 245.55, s 5 36.75 ). We proceeded to generate a sample of

  random deviations | P

  1 , c, P 14 from a normal distribution with mean 0 and standard deviation 35 and then added | P i to b 0 1b 1 x i to obtain 14 corresponding y values.

  Regression calculations were then carried out to obtain the estimated slope, intercept, and standard deviation. This process was repeated a total of 20 times, resulting in the values given in Table 12.1.

  Table 12.1 Simulation Results for Example 12.10

  There is clearly variation in values of the estimated slope and estimated intercept, as well as the estimated standard deviation. The equation of the least squares line thus varies from one sample to the next. Figure 12.13 on page 492 shows a dotplot of the estimated slopes as well as graphs of the true regression line and the 20 sample regression lines.

  ■ The slope b 1 of the population regression line is the true average change in

  the dependent variable y associated with a 1-unit increase in the independent

  variable x. The slope of the least squares line, bˆ 1 , gives a point estimate of b 1 . In

  the same way that a confidence interval for m and procedures for testing hypothe- ses about m were based on properties of the sampling distribution of , further X

  inferences about b 1 are based on thinking of bˆ 1 as a statistic and investigating its

  sampling distribution.

  The values of the x i ’s are assumed to be chosen before the experiment is performed, so only the Y i ’s are random. The estimators (statistics, and thus random

  variables) for b 0 and b 1 are obtained by replacing y i by Y i in (12.2) and (12.3):

  Similarly, the estimator for s 2 results from replacing each y

  i in the formula for s by

  the rv Y i :

  2 ˆ 2 2 gY i

  2 bˆ 0 gY i 2 bˆ 1 gx i Y

  s 5S 5 i

  n22

  CHAPTER 12 Simple Linear Regression and Correlation

  1.5 1.6 1.7 1.8 1.9 Slope 1 (a)

  True regression line Simulated least squares lines

  (b)

  Figure 12.13 Simulation results from Example 12.10: (a) dotplot of estimated slopes; (b) graphs of the true regres- sion line and 20 least squares lines (from S-Plus)

  The denominator of bˆ 1 ,S xx 5 g (x 2x) 2 i , depends only on the x i ’s and not on the

  Y i ’s, so it is a constant. Then because g(x i 2 x )Y 5 Y g (x

  i 2x)5Y 050 , the

  slope estimator can be written as

  1 5gc i Y i where c 5 (x S i i 2 x )S xx

  i

  xx

  That is, b ˆ i is a linear function of the independent rv’s Y 1 ,Y 2 , c, Y n , each of which

  is normally distributed. Invoking properties of a linear function of random variables discussed in Section 5.5 leads to the following results.

  PROPOSITION

  1. The mean value of bˆ 1 is E(bˆ 1 )5m bˆ 1 5b 1 , so bˆ 1 is an unbiased estimator

  of b 1 (the distribution of bˆ 1 is always centered at the value of b 1 ).

  12.3 Inferences About the Slope Parameter b 1 493

  2. The variance and standard deviation of bˆ 1 are

  2 s 2 V(bˆ s 1 )5s

  bˆ 1 5 S

  1 s 5 1S (12.4) xx

  bˆ

  xx

  where . S xx 5 g (x i 2 x) 2 5gx i 2 2 (g x i ) 2 n Replacing s by its estimate s gives an estimate for s bˆ 1 (the estimated standard deviation, i.e., estimated

  standard error, of ): bˆ 1 s

  s bˆ 1 5 1S

  xx

  (This estimate can also be denoted by s ˆ bˆ 1 .)

  3. The estimator bˆ 1 has a normal distribution (because it is a linear function of

  independent normal rv’s).

  According to (12.4), the variance of bˆ 1 equals the variance s 2 of the random error term—or, equivalently, of any Y i , divided by g(x i 2 x) 2 . This denominator is a

  measure of how spread out the x i ’s are about . We conclude that making observa- x tions at x i values that are quite spread out results in a more precise estimator of the

  slope parameter (smaller variance of ), whereas values of x bˆ 1 i all close to one another

  imply a highly variable estimator. Of course, if the x i ’s are spread out too far, a linear model may not be appropriate throughout the range of observation.

  Many inferential procedures discussed previously were based on standardizing an estimator by first subtracting its mean value and then dividing by its estimated standard deviation. In particular, test procedures and a CI for the mean m of a normal population utilized the fact that the standardized variable (X 2 m)(S 1n) —that is, (X 2 m)S m ˆ — had a t distribution with n21

  df. A similar result here provides the key to further

  inferences concerning b 1 .

  THEOREM

  The assumptions of the simple linear regression model imply that the standardized variable

  has a t distribution with n22 df.