10 Reconsider the data on x 5 burner area liberation rate and y 5 NO x emission rate
Example 12.10 Reconsider the data on x 5 burner area liberation rate and y 5 NO x emission rate
from Exercise 12.19 in the previous section. There are 14 observations, made at the x values 100, 125, 125, 150, 150, 200, 200, 250, 250, 300, 300, 350, 400, and 400, respectively. Suppose that the slope and intercept of the true regression line are
b 1 5 1.70 and b 0 5 250 , with s 5 35 (consistent with the values bˆ 1 5 1.7114, bˆ 0 5 245.55, s 5 36.75 ). We proceeded to generate a sample of
random deviations | P
1 , c, P 14 from a normal distribution with mean 0 and standard deviation 35 and then added | P i to b 0 1b 1 x i to obtain 14 corresponding y values.
Regression calculations were then carried out to obtain the estimated slope, intercept, and standard deviation. This process was repeated a total of 20 times, resulting in the values given in Table 12.1.
Table 12.1 Simulation Results for Example 12.10
There is clearly variation in values of the estimated slope and estimated intercept, as well as the estimated standard deviation. The equation of the least squares line thus varies from one sample to the next. Figure 12.13 on page 492 shows a dotplot of the estimated slopes as well as graphs of the true regression line and the 20 sample regression lines.
■ The slope b 1 of the population regression line is the true average change in
the dependent variable y associated with a 1-unit increase in the independent
variable x. The slope of the least squares line, bˆ 1 , gives a point estimate of b 1 . In
the same way that a confidence interval for m and procedures for testing hypothe- ses about m were based on properties of the sampling distribution of , further X
inferences about b 1 are based on thinking of bˆ 1 as a statistic and investigating its
sampling distribution.
The values of the x i ’s are assumed to be chosen before the experiment is performed, so only the Y i ’s are random. The estimators (statistics, and thus random
variables) for b 0 and b 1 are obtained by replacing y i by Y i in (12.2) and (12.3):
Similarly, the estimator for s 2 results from replacing each y
i in the formula for s by
the rv Y i :
2 ˆ 2 2 gY i
2 bˆ 0 gY i 2 bˆ 1 gx i Y
s 5S 5 i
n22
CHAPTER 12 Simple Linear Regression and Correlation
1.5 1.6 1.7 1.8 1.9 Slope 1 (a)
True regression line Simulated least squares lines
(b)
Figure 12.13 Simulation results from Example 12.10: (a) dotplot of estimated slopes; (b) graphs of the true regres- sion line and 20 least squares lines (from S-Plus)
The denominator of bˆ 1 ,S xx 5 g (x 2x) 2 i , depends only on the x i ’s and not on the
Y i ’s, so it is a constant. Then because g(x i 2 x )Y 5 Y g (x
i 2x)5Y 050 , the
slope estimator can be written as
1 5gc i Y i where c 5 (x S i i 2 x )S xx
i
xx
That is, b ˆ i is a linear function of the independent rv’s Y 1 ,Y 2 , c, Y n , each of which
is normally distributed. Invoking properties of a linear function of random variables discussed in Section 5.5 leads to the following results.
PROPOSITION
1. The mean value of bˆ 1 is E(bˆ 1 )5m bˆ 1 5b 1 , so bˆ 1 is an unbiased estimator
of b 1 (the distribution of bˆ 1 is always centered at the value of b 1 ).
12.3 Inferences About the Slope Parameter b 1 493
2. The variance and standard deviation of bˆ 1 are
2 s 2 V(bˆ s 1 )5s
bˆ 1 5 S
1 s 5 1S (12.4) xx
bˆ
xx
where . S xx 5 g (x i 2 x) 2 5gx i 2 2 (g x i ) 2 n Replacing s by its estimate s gives an estimate for s bˆ 1 (the estimated standard deviation, i.e., estimated
standard error, of ): bˆ 1 s
s bˆ 1 5 1S
xx
(This estimate can also be denoted by s ˆ bˆ 1 .)
3. The estimator bˆ 1 has a normal distribution (because it is a linear function of
independent normal rv’s).
According to (12.4), the variance of bˆ 1 equals the variance s 2 of the random error term—or, equivalently, of any Y i , divided by g(x i 2 x) 2 . This denominator is a
measure of how spread out the x i ’s are about . We conclude that making observa- x tions at x i values that are quite spread out results in a more precise estimator of the
slope parameter (smaller variance of ), whereas values of x bˆ 1 i all close to one another
imply a highly variable estimator. Of course, if the x i ’s are spread out too far, a linear model may not be appropriate throughout the range of observation.
Many inferential procedures discussed previously were based on standardizing an estimator by first subtracting its mean value and then dividing by its estimated standard deviation. In particular, test procedures and a CI for the mean m of a normal population utilized the fact that the standardized variable (X 2 m)(S 1n) —that is, (X 2 m)S m ˆ — had a t distribution with n21
df. A similar result here provides the key to further
inferences concerning b 1 .
THEOREM
The assumptions of the simple linear regression model imply that the standardized variable
has a t distribution with n22 df.