Copyright 1996 Lawrence C. Marsh
What factors determine variance and covariance ?
1.
σ
2
: uncertainty about y
t
values uncertainty about b
1
, b
2
and their relationship. 2. The more spread out the x
t
values are then the more confidence we have in b
1
, b
2
, etc. 3. The larger the sample size, T, the smaller the
variances and covariances. 4. The variance b
1
is large when the squared x
t
values are far from zero in either direction.
5. Changing the slope, b
2
, has no effect on the intercept, b
1
, when the sample mean is zero. But if sample mean is positive, the covariance between b
1
and b
2
will be negative, and vice versa.
4.15
Copyright 1996 Lawrence C. Marsh
Gauss-Markov Theorm
Under the first five assumptions of the simple, linear regression model, the
ordinary least squares estimators b
1
and b
2
have the smallest variance
of all linear and unbiased estimators of
β
1
and β
2
. This means that b
1
and b
2
are the Best Linear Unbiased Estimators BLUE of
β
1
and β
2
. 4.16
Copyright 1996 Lawrence C. Marsh
implications of Gauss-Markov
1. b
1
and b
2
are “best” within the class of linear and unbiased estimators.
2. “Best” means smallest variance
within the class of linearunbiased.
3. All of the first five assumptions must hold to satisfy Gauss-Markov.
4. Gauss-Markov does not require assumption six: normality.
5. G-Markov is not based on the least squares principle but on b
1
and b
2
. 4.17
Copyright 1996 Lawrence C. Marsh
G-Markov implications continued
6. If we are not satisfied with restricting our estimation to the class of linear and
unbiased estimators, we should ignore the Gauss-Markov Theorem and use
some nonlinear andor biased estimator instead. Note: a biased or nonlinear
estimator
could have smaller variance than those satisfying Gauss-Markov.
7. Gauss-Markov applies to the b
1
and b
2
estimators and not to particular sample values estimates of b
1
and b
2
. 4.18
Copyright 1996 Lawrence C. Marsh
Probability Distribution
of Least Squares Estimators
b
2
~ N β
2
, Σ
x
t
− x
σ
2
2
b
1
~ N β
1
, Τ Σ
x
t
− x
2
σ
2
Σ
x
t
2
4.19
Copyright 1996 Lawrence C. Marsh
y
t
and ε
t
normally distributed
The least squares estimator of β
2
can be expressed as a
linear combination of y
t
’s:
b
2
= Σ
w
t
y
t
b
1
= y −
b
2
x
Σ
x
t
− x
2
where w
t
=
x
t
− x
This means that b
1
and b
2
are normal since linear combinations of normals are normal.
4.20
Copyright 1996 Lawrence C. Marsh
normally distributed under The Central Limit Theorem
If the first five Gauss-Markov assumptions hold, and sample size, T, is sufficiently large,
then the least squares estimators, b
1
and b
2
, have a distribution that approximates the
normal distribution with greater accuracy the larger the value of sample size, T.
4.21
Copyright 1996 Lawrence C. Marsh
Consistency
We would like our estimators, b
1
and b
2
, to collapse onto the true population values,
β
1
and β
2
, as sample size, T, goes to infinity.
One way to achieve this consistency
property is for the variances of b
1
and b
2
to go to zero as T goes to infinity.
Since the formulas for the variances of the least squares estimators b
1
and b
2
show that their variances do, in fact, go to zero, then b
1
and b
2
, are
consistent estimators of
β
1
and β
2
.
4.22
Copyright 1996 Lawrence C. Marsh
Estimating the variance
of the error term, σ
2
e
t
= y
t
− b
1
− b
2
x
t
Σ
e
t
t =1 T
2
T −
2
σ
2
=
σ
2
is an unbiased estimator of
σ
2
4.23
Copyright 1996 Lawrence C. Marsh
The Least Squares
Predictor, y
o
G iven a value of the explanatory variable, X
o
, w e w ould like to predict a value of the dependent variable, y
o
.
The least squares predictor is:
y
o
= b
1
+ b
2
x
o
4.7.2 4.24
Copyright 1996 Lawrence C. Marsh
Inference
in the Simple
Regression Model
Chapter 5
Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution
or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein.
5.1
Copyright 1996 Lawrence C. Marsh
1.
y
t
= β
1
+ β
2
x
t
+ ε
t
2. E
ε
t
=
0 = E
y
t
= β
1
+ β
2
x
t
3. var
ε
t
=
σ
2
=
var
y
t
4. cov
ε
i
,
ε
j
=
cov
y
i
,
y
j
=
5.
x
t
≠ c
for every observation
6.
ε
t
~N0,
σ
2
=
y
t
~
N β
1
+ β
2
x
t
,
σ
2
Assumptions of the Simple
Linear Regression Model
5.2
Copyright 1996 Lawrence C. Marsh
Probability D istribution
of Least Squares Estim ators
b
1
~ N β
1
, Τ Σ
x
t
− x
2
σ
2
Σ
x
t
2
b
2
~ N β
2
, Σ
x
t
− x
σ
2
2
5.3
Copyright 1996 Lawrence C. Marsh
σ
2
=
Τ − 2
e
t
2
Σ
Unbiased estimator of the error variance :
σ
2
σ
2
Τ − 2
∼
Τ − 2
χ
Transform to a chi-square distribution :
Error Variance Estimation
5.4
Copyright 1996 Lawrence C. Marsh
We make a
correct
decision if:
• The null hypothesis is false and we decide to reject it.
• The null hypothesis is true and we decide not to reject it.
Our decision is
incorrect
if:
• The null hypothesis is true and we decide to reject it. This is a
type I error
.
• The null hypothesis is false and we decide not to reject it.
This is a
type II error
.
5.5
Copyright 1996 Lawrence C. Marsh
b
2
~ N β
2
, Σ
x
t
− x
σ
2
2
Create a standardized normal random variable, Z, by subtracting the mean of b
2
and dividing by its standard deviation:
b
2
− β
2
varb
2
Ζ
=
∼ Ν0,1
5.6
Copyright 1996 Lawrence C. Marsh
Simple Linear Regression
y
t
= β
1
+ β
2
x
t
+ ε
t
where E
ε
t
= 0
y
t
~
N β
1
+ β
2
x
t
,
σ
2
since Ey
t
= β
1
+ β
2
x
t
ε
t
= y
t
− β
1
− β
2
x
t
Therefore,
ε
t
~ N0,
σ
2
.
5.7
Copyright 1996 Lawrence C. Marsh
Create a Chi-Square
ε
t
~ N0,
σ
2
but want N0, 1
.
ε
t
σ
~ N0,
1 Standard Normal .
ε
t
σ
2
~
χ
2 1
Chi-Square
.
5.8
Copyright 1996 Lawrence C. Marsh
Sum of Chi-Squares
Σ
t =1
ε
t
σ
2
=
ε
1
σ
2
+
ε
2
σ
2
+. . .+
ε
T
σ
2
χ
2 1
+
χ
2 1
+. . .+
χ
2 1
=
χ
2 Τ
Therefore,
Σ
t =1
ε
t
σ
2
∼
χ
2 Τ
5.9
Copyright 1996 Lawrence C. Marsh
Since the
errors
ε
t
= y
t
− β
1
− β
2
x
t
are not observable, we estimate them with
the
sample residuals
e
t
= y
t
− b
1
− b
2
x
t
.
Unlike the errors, the
sample residuals
are
not independent since they use up two degrees of freedom by using
b
1
and b
2
to estimate β
1
and β
2
.
We get only T
−
2 degrees of freedom instead of T.
Chi-Square degrees of freedom
5.10
Copyright 1996 Lawrence C. Marsh
Student-t Distribution
t = ~ t
m
Z
V m
where Z ~ N0,1
and V ~
χ
m
2
5.11
Copyright 1996 Lawrence C. Marsh
t = ~ t
m
Z
V
T −
2
where Z = b
2
− β
2
varb
2
and varb
2
=
σ
2
Σ x
i
− x
2
5.12
Copyright 1996 Lawrence C. Marsh
t =
Z
V T-2
b
2
− β
2
varb
2
t =
T −
2
σ
2
σ
2
T −
2
V =
T −
2
σ
2
σ
2
5.13
Copyright 1996 Lawrence C. Marsh
varb
2
=
σ
2
Σ x
i
− x
2
b
2
− β
2
σ
2
Σ x
i
− x
2
t = =
T −
2
σ
2
σ
2
T −
2
b
2
− β
2
σ
2
Σ x
i
− x
2
notice the cancellations
5.14
Copyright 1996 Lawrence C. Marsh
b
2
− β
2
σ
2
Σ x
i
− x
2
t = =
b
2
− β
2
varb
2
t =
b
2
− β
2
seb
2
5.15
Copyright 1996 Lawrence C. Marsh
Student’s t - statistic
t = ~ t
T −
2
b
2
− β
2
seb
2
t
has a Student-t Distribution
with
T −
2
degrees of freedom.
5.16
Copyright 1996 Lawrence C. Marsh
Figure 5.1 Student-t Distribution
1−α
t ft
-t
c
t
c
α
2
α
2
red area = rejection region for 2-sided test
5.17
Copyright 1996 Lawrence C. Marsh
probability statements
P-t
c
≤ t
≤ t
c
= 1 −
α
P t -t
c
= P t t
c
=
α
2
P-t
c
≤ ≤
t
c
= 1 −
α
b
2
− β
2
seb
2
5.18
Copyright 1996 Lawrence C. Marsh
Confidence Intervals
Two-sided 1 −α
x100 C.I. for β
1
:
b
1
− t
α 2
[seb
1
], b
1
+ t
α 2
[seb
1
]
b
2
− t
α 2
[seb
2
], b
2
+ t
α 2
[seb
2
] Two-sided 1
−α x100 C.I. for
β
2
:
5.19
Copyright 1996 Lawrence C. Marsh
Student-t vs. Normal Distribution
1. Both are symmetric bell-shaped distributions.
2. Student-t distribution has fatter tails than the normal.
3. Student-t converges to the normal for infinite sample.
4. Student-t conditional on degrees of freedom df.
5. Normal is a good approximation of Student-t for the first few decimal places when df 30 or so.
5.20
Copyright 1996 Lawrence C. Marsh
Hypothesis Tests
1. A null hypothesis, H .
2. An alternative hypothesis, H
1
.
3. A test statistic.
4. A rejection region.
5.21
Copyright 1996 Lawrence C. Marsh
Rejection Rules
1.
Two-Sided Test :
If the value of the test statistic falls in the critical region in either
tail of the t-distribution, then we reject the null hypothesis in favor
of the alternative.
2.
Left-Tail Test :
If the value of the test statistic falls in the critical region which lies in the
left tail of the t-distribution, then we reject the null
hypothesis in favor of the alternative.
2.
Right-Tail Test :
If the value of the test statistic falls in the critical region which lies in the
right tail of the t-distribution, then we reject the null
hypothesis in favor of the alternative.
5.22
Copyright 1996 Lawrence C. Marsh
Format for Hypothesis Testing
1. Determine null and alternative hypotheses.
2. Specify the test statistic and its distribution as if the null hypothesis were true.
3. Select
α
and determine the rejection region.
4. Calculate the sample value of test statistic.
5. State your conclusion. 5.23
Copyright 1996 Lawrence C. Marsh
practical vs. statistical
significance in economics
Practically
but not statistically significant:
When sample size is very small, a large average gap between the salaries of men and women might not be statistically
significant.
Statistically
but not practically significant:
When sample size is very large, a small correlation say, ρ
= 0.00000001 between the winning numbers in the PowerBall
Lottery and the Dow-Jones Stock Market Index might be statistically significant.
5.24
Copyright 1996 Lawrence C. Marsh
Type I and Type II errors
Type I
error:
We make the mistake of rejecting the null hypothesis when it is true.
α
= Prejecting H when it is true.
Type II
error:
We make the mistake of failing to reject the null hypothesis when it is false.
β
= Pfailing to reject H when it is false.
5.25
Copyright 1996 Lawrence C. Marsh
Prediction Intervals
A 1 −α
x100 prediction interval for y
o
is:
y
o
± t
c
se f
se f = var f f
= y
o
− y
o
Σ
x
t
− x
2
var f = σ
2
1 + + 1
Τ
x
o
− x
2
5.26
Copyright 1996 Lawrence C. Marsh
The Simple Linear
Regression Model
Chapter 6
Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution
or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein.
6.1
Copyright 1996 Lawrence C. Marsh
Explaining Variation in
y
t
Predicting y
t
without any explanatory variables:
y
t
= β
1
+ e
t
Σ
e
t
=
Σ
y
t
− β
1 2
2
t = 1 t = 1
T T
= −2
Σ
y
t
− b
1
= 0
∂Σ
e
t
2
t = 1 t = 1
T T
∂
β
1
Σ
y
t
− b
1
= 0
t = 1 T
Σ
y
t
− Tb
1
= 0
t = 1 T
b
1
= y
Why not y?
6.2
Copyright 1996 Lawrence C. Marsh
Explaining Variation in
y
t
y
t
= b
1
+ b
2
x
t
+ e
t
Unexplained variation:
y
t
= b
1
+ b
2
x
t
Explained variation:
e
t
= y
t
− y
t
= y
t
− b
1
− b
2
x
t
6.3
Copyright 1996 Lawrence C. Marsh
Explaining Variation in
y
t
y
t
= y
t
+ e
t
Why not y? y
t
− y = y
t
− y + e
t
using y as baseline
SST = SSR + SSE
Σ
y
t
− y
2
=
Σ
y
t
− y
2
+
Σ
e
t
t = 1 T
T T
t = 1 t = 1
2 cross
product term
drops out
6.4
Copyright 1996 Lawrence C. Marsh
Total Variation in
y
t
SST = total sum of squares
SST measures variation of
y
t
around
y
Σ
y
t
− y
2
t = 1
T
SST
=
6.5
Copyright 1996 Lawrence C. Marsh
Explained Variation in
y
t
SSR = regression sum of squares
y
t
= b
1
+ b
2
x
t
Fitted y
t
values:
SSR measures variation of
y
t
around
y
Σ
y
t
− y
2
t = 1
T
SSR
=
6.6
Copyright 1996 Lawrence C. Marsh
Unexplained Variation in
y
t
SSE = error sum of squares
SSE measures variation of
y
t
around
y
t
e
t
= y
t
− y
t
= y
t
− b
1
− b
2
x
t
Σ
y
t
− y
t 2
=
Σ
e
t 2
t = 1
T
SSE
=
t = 1
T
6.7
Copyright 1996 Lawrence C. Marsh
Analysis of Variance Table
Table 6.1 Analysis of Variance Table Source of Sum of Mean
Variation DF Squares Square Explained 1 SSR SSR1
Unexplained T-2 SSE SSET-2 [
= σ
2
] Total T-1 SST
6.8
Copyright 1996 Lawrence C. Marsh
Coefficient of Determination
≤ R
2
≤ 1
What proportion of the variation in y
t
is explained?
SSR SST
R
2
=
6.9
Copyright 1996 Lawrence C. Marsh
Coefficient of Determination
SST = SSR + SSE
SST SSR SSE
SST SST SST
=
+
SSR SSE SST SST
1 =
+
Dividing by SST
SSR SST
R
2
= = 1 −
SSE SST
6.10
Copyright 1996 Lawrence C. Marsh
R
2
is only a descriptive measure.
R
2
does not measure the quality
of the regression model.
Focusing solely on maximizing R
2
is not a good idea. Coefficient of Determination
6.11
Copyright 1996 Lawrence C. Marsh
covX,Y ρ
=
varX varY
Correlation Analysis
covX,Y r =
varX varY
Population:
Sample:
6.12
Copyright 1996 Lawrence C. Marsh
Correlation Analysis
varX =
Σ
x
t
− x
2
T −
1
t = 1
T
varY =
Σ
y
t
− y
2
T −
1
t = 1
T
covX,Y =
Σ
x
t
− xy
t
− y
T −
1
t = 1
T
6.13
Copyright 1996 Lawrence C. Marsh
Correlation Analysis
Σ
x
t
− x
2
Σ
y
t
− y
2
t = 1
T
Σ
x
t
− xy
t
− y
t = 1
T
r =
t = 1
T
Sample Correlation Coefficient
6.14
Copyright 1996 Lawrence C. Marsh
Correlation Analysis and R
2
For simple linear regression analysis:
r
2
= R
2
R
2
is also the correlation
between
y
t
and
y
t
measuring “goodness of fit”.
6.15
Copyright 1996 Lawrence C. Marsh
Regression Computer Output
Table 6.2 Computer Generated Least Squares Results 1 2 3 4 5
Parameter Standard T for H0: Variable Estimate Error Parameter=0 Prob|T|
INTERCEPT
40.7676 22.1387 1.841 0.0734 X 0.1283 0.0305 4.201 0.0002
Typical computer output of regression estimates:
6.16
Copyright 1996 Lawrence C. Marsh
Regression Computer Output
seb
1
= varb
1
= 490.12 = 22.1287
seb
2
= varb
2
= 0.0009326 = 0.0305 b
1
= 40.7676 b
2
= 0.1283
seb
1
t = = =
1.84 b
1
40.7676
22.1287
seb
2
b
2
t = = =
4.20 0.1283
0.0305 6.17
Copyright 1996 Lawrence C. Marsh
Regression Computer Output
Table 6.3 Analysis of Variance Table Sum of Mean
Source DF Squares Square Explained 1 25221.2229 25221.2229
Unexplained 38 54311.3314 1429.2455 Total 39 79532.5544
R-square: 0.3171
Sources of variation in the dependent variable:
6.18
Copyright 1996 Lawrence C. Marsh
Regression Computer Output
SSR SST
R
2
= = 1 −
=
0.317 SSE
SST SSE T-2
=
σ
2
=
1429.2455 SSE =
Σ
e
t 2
= 54311 SST =
Σ
y
t
− y
2
= 79532
SSR =
Σ
y
t
− y
2
= 25221
6.19
Copyright 1996 Lawrence C. Marsh
y
t
= 40.7676 + 0.1283x
t
s.e. 22.1387 0.0305
y
t
= 40.7676 + 0.1283x
t
t 1.84 4.20
Reporting Regression Results
6.20
Copyright 1996 Lawrence C. Marsh
R
2
=
0.317
Reporting Regression Results
This R
2
value may seem low but it is typical in studies involving
cross-sectional
data analyzed at the individual or
micro
level.
A considerably higher R
2
value would be expected in studies involving
time-series
data analyzed at an aggregate or
macro
level. 6.21
Copyright 1996 Lawrence C. Marsh
Effects of Scaling the Data
Changing the scale of x
y
t
= β
1
+ c β
2
x
t
c + e
t
y
t
= β
1
+ β
2
x
t
+ e
t
y
t
= β
1
+ β
2
x
t
+ e
t
β
2
= c β
2
x
t
= x
t
c
where
and The estimated
coefficient and standard error
change but the other statistics
are unchanged. 6.22
Copyright 1996 Lawrence C. Marsh
Effects of Scaling the Data
Changing the scale of y
y
t
c = β
1
c +
β
2
cx
t
+ e
t
c y
t
= β
1
+ β
2
x
t
+ e
t
β
1
= β
1
c
and All statistics
are changed except for
the t-statistics and R
2
value.
y
t
= β
1
+ β
2
x
t
+ e
t
β
2
= β
2
c e
t
= e
t
c y
t
= y
t
c
where 6.23
Copyright 1996 Lawrence C. Marsh
Effects of Scaling the Data
Changing the scale of x and y
y
t
c = β
1
c + c
β
2
cx
t
c + e
t
c y
t
= β
1
+ β
2
x
t
+ e
t
β
1
= β
1
c
and No change in
the R
2
or the t-statistics or
in regression
results for
β
2
but all other stats change.
y
t
= β
1
+ β
2
x
t
+ e
t
x
t
= x
t
c e
t
= e
t
c y
t
= y
t
c
where 6.24
Copyright 1996 Lawrence C. Marsh
Functional Forms
The term
linear
in a simple
regression model does not mean a linear relationship between
variables, but a model in which the
parameters enter the model
in a linear way.
6.25
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
x
t
+ e
t
Linear Statistical Models:
Nonlinear Statistical Models:
lny
t
= β
1
+ β
2
x
t
+ e
t
y
t
= β
1
+ β
2
lnx
t
+ e
t
y
t
= β
1
+ β
2
x
t
+ e
t
2
y
t
= β
1
+ β
2
x
t
+ e
t β
3
y
t
= β
1
+ β
2
x
t
+ exp β
3
x
t
+ e
t
y
t
= β
1
+ β
2
x
t
+ e
t β
3
Linear vs. Nonlinear
6.27
Copyright 1996 Lawrence C. Marsh
y
x
nonlinear
relationship
between food
expenditure and
income
Linear vs. Nonlinear
food expenditure
income 6.27
Copyright 1996 Lawrence C. Marsh
Useful Functional Forms
1. Linear 2. Reciprocal
3. Log-Log 4. Log-Linear
5. Linear-Log 6. Log-Inverse
Look at each form
and its slope and
elasticity
6.28
Copyright 1996 Lawrence C. Marsh
Linear
y
t
= β
1
+ β
2
x
t
+ e
t
slope: β
2
elasticity: β
2
y
t
Useful Functional Forms
x
t 6.29
Copyright 1996 Lawrence C. Marsh
Reciprocal
y
t
= β
1
+ β
2
+ e
t
Useful Functional Forms
1
x
t
slope: elasticity:
1
x
t
2
− β
2
1
x
t
y
t
− β
2
6.30
Copyright 1996 Lawrence C. Marsh
x
t
y
t
Log-Log
lny
t
= β
1
+ β
2
lnx
t
+ e
t
slope: β
2
elasticity: β
2
Useful Functional Forms
6.31
Copyright 1996 Lawrence C. Marsh
Log-Linear
lny
t
= β
1
+ β
2
x
t
+ e
t
slope: β
2
y
t
elasticity: β
2
x
t
Useful Functional Forms
6.32
Copyright 1996 Lawrence C. Marsh
Linear-Log
y
t
= β
1
+ β
2
lnx
t
+ e
t
_
slope: β
2
elasticity: β
2
1
x
t
y
t
1
_
Useful Functional Forms
6.33
Copyright 1996 Lawrence C. Marsh
Useful Functional Forms
lny
t
= β
1
- β
2
+ e
t
1
x
t
Log-Inverse
slope: β
2
elasticity: β
2
x
2
t
y
t
1
x
t 6.34
Copyright 1996 Lawrence C. Marsh
1. E e
t
= 0
2. var e
t
= σ
2
3. cove
i
, e
j
= 0
4. e
t
~ N0, σ
2
Error Term Properties
6.35
Copyright 1996 Lawrence C. Marsh
Economic Models
1. Demand Models 2. Supply Models
3. Production Functions 4. Cost Functions
5. Phillips Curve
6.36
Copyright 1996 Lawrence C. Marsh
1. Demand Models
quality demanded y
d
and price x
constant elasticity
Economic Models
lny
t
= β
1
+ β
2
lnx
t
+ e
t
d 6.37
Copyright 1996 Lawrence C. Marsh
2. Supply Models
quality supplied y
s
and price x
constant elasticity
Economic Models
lny
t
= β
1
+ β
2
lnx
t
+ e
t
s
6.38
Copyright 1996 Lawrence C. Marsh
3. Production Functions
output y and input x
constant elasticity
Economic Models
lny
t
= β
1
+ β
2
lnx
t
+ e
t
Cobb-Douglas Production Function:
6.39
Copyright 1996 Lawrence C. Marsh
4a. Cost Functions
total cost y and output x
Economic Models
y
t
= β
1
+ β
2
x
2
t
+ e
t
6.40
Copyright 1996 Lawrence C. Marsh
4b. Cost Functions
average cost xy and output x
Economic Models
y
t
x
t
= β
1
x
t
+ β
2
x
t
+ e
t
x
t
6.41
Copyright 1996 Lawrence C. Marsh
5. Phillips Curve
wage rate w
t
and time t
Economic Models
unemployment rate,
u
t
w
t-1
∆ w
t
= w
t
− w
t-1
= γα + γη
u
t
1
nonlinear in both variables and parameters
6.42
Copyright 1996 Lawrence C. Marsh
The Multiple
Regression Model
Chapter 7
Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution
or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein.
7.1
Copyright 1996 Lawrence C. Marsh
Two Explanatory Variables
y
t
= β
1
+ β
2
x
t2
+ β
3
x
t3
+ e
t
∂ y
t
∂ x
t2
= β
2
∂ x
t3
∂ y
t
= β
3
x
t
‘s affect
y
t
separately
But least squares estimation of β
2
now depends upon both x
t2
and x
t3
.
7.2
Copyright 1996 Lawrence C. Marsh
Correlated Variables
y
t
= output x
t2
= capital x
t3
= labor
Always 5 workers per machine.
If number of workers per machine is never varied, it becomes impossible
to tell if the machines or the workers are responsible for changes in output.
y
t
= β
1
+ β
2
x
t2
+ β
3
x
t3
+ e
t
7.3
Copyright 1996 Lawrence C. Marsh
The General Model
y
t
= β
1
+ β
2
x
t2
+ β
3
x
t3
+. . .+ β
K
x
tK
+ e
t
The parameter
β
1
is the intercept constant term.
The “variable” attached to
β
1
is
x
t1
= 1.
Usually, the number of explanatory variables is said to be K
− 1 ignoring
x
t1
= 1, while the
number of parameters is K. Namely:
β
1
. . . β
K
. 7.4
Copyright 1996 Lawrence C. Marsh
1. Ee
t
= 0
2. vare
t
= σ
2
3. cov
e
t
, e
s
=
for
t ≠
s
4. e
t
~ N0, σ
2
Statistical Properties of e
t 7.5
Copyright 1996 Lawrence C. Marsh
1. E y
t
= β
1
+ β
2
x
t2
+. . .+ β
K
x
tK
2. vary
t
= vare
t
= σ
2
3. covy
t
,y
s
= cove
t
, e
s
= 0 t ≠
s
4.
y
t
~ N β
1
+ β
2
x
t2
+. . .+ β
K
x
tK
, σ
2
Statistical Properties of y
t 7.6
Copyright 1996 Lawrence C. Marsh
Assumptions
1. y
t
= β
1
+ β
2
x
t2
+. . .+ β
K
x
tK
+ e
t
2. E y
t
= β
1
+ β
2
x
t2
+. . .+ β
K
x
tK
3. vary
t
= vare
t
= σ
2
4. covy
t
,y
s
= cove
t
,e
s
= 0 t ≠
s
5. The values of x
tk
are not random
6. y
t
~ N β
1
+ β
2
x
t2
+. . .+ β
K
x
tK
, σ
2
7.7
Copyright 1996 Lawrence C. Marsh
Least Squares Estimation
y
t
= β
1
+ β
2
x
t2
+ β
3
x
t3
+ e
t
S ≡
S β
1
, β
2
, β
3
=
Σ
y
t
− β
1
− β
2
x
t2
− β
3
x
t3
2
t =
1 T
Define: y
t
= y
t
− y
x
t2
= x
t2
− x
2
x
t3
= x
t3
− x
3
7.8
Copyright 1996 Lawrence C. Marsh
b
1
= y −
b
1
− b
2
x
2
− b
3
x
3
b
3
=
Σ
y
t
x
t3
Σ
x
t2
−
Σ
y
t
x
t2
Σ
x
t3
x
t2
2
Σ
x
t2
Σ
x
t3
− Σ
x
t2
x
t3
2 2
2
b
2
=
Σ
y
t
x
t2
Σ
x
t3
−
Σ
y
t
x
t3
Σ
x
t2
x
t3
2
Σ
x
t2
Σ
x
t3
− Σ
x
t2
x
t3
2 2
2
Least Squares Estimators
7.9
Copyright 1996 Lawrence C. Marsh
Dangers of Extrapolation
Statistical models generally are good only “within the relevant range”. This means
that extending them to extreme data values outside the range of the original data often
leads to poor and sometimes ridiculous results.
If height is normally distributed and the normal ranges from minus infinity to plus
infinity, pity the man minus three feet tall.
7.10
Copyright 1996 Lawrence C. Marsh
Error Variance Estimation
σ
2
=
Τ − Κ
e
t
2
Σ
Unbiased estimator of the error variance :
σ
2
σ
2
Τ − Κ
∼
Τ − Κ
χ
Transform to a chi-square distribution :
7.11
Copyright 1996 Lawrence C. Marsh
Gauss-Markov Theorem
Under the assumptions of the multiple regression model, the
ordinary least squares estimators have the smallest variance of
all linear and unbiased estimators. This means that the least squares
estimators are the Best Linear U
nbiased Estimators BLUE.
7.12
Copyright 1996 Lawrence C. Marsh
Variances
y
t
= β
1
+ β
2
x
t2
+ β
3
x
t3
+ e
t
varb
3
=
1
− r
23
Σ
x
t3
− x
3 2
2
σ
2
varb
2
=
1
− r
23
Σ
x
t2
− x
2 2
2
σ
2
Σ
x
t2
− x
2 2
Σ
x
t3
− x
3 2
where
r
23
=
Σ
x
t2
− x
2
x
t3
− x
3
When
r
23
= 0
these reduce to the simple
regression formulas.
7.13
Copyright 1996 Lawrence C. Marsh
Variance Decomposition
The variance of an estimator is smaller when:
1. The error variance,
σ
2
, is smaller:
σ
2
0 .
2. The sample size, T, is larger:
Σ
x
t2
− x
2 2
.
3. The variable’s values are more spread out:
x
t2
− x
2 2
.
4. The correlation is close to zero:
r
23
0 .
2
t = 1
T
7.14
Copyright 1996 Lawrence C. Marsh
Covariances
y
t
= β
1
+ β
2
x
t2
+ β
3
x
t3
+ e
t
where
r
23
=
Σ
x
t2
− x
2 2
Σ
x
t3
− x
3 2
Σ
x
t2
− x
2
x
t3
− x
3
1
− r
23
Σ
x
t2
− x
2 2
Σ
x
t3
− x
3 2
covb
2
,b
3
=
2
− r
23
σ
2
7.15
Copyright 1996 Lawrence C. Marsh
Covariance Decomposition
1. The error variance,
σ
2
, is larger.
2. The sample size, T, is smaller.
3. The values of the variables are less spread out.
4. The correlation,
r
23
, is high
. The covariance between any two estimators
is larger in absolute value when:
7.16
Copyright 1996 Lawrence C. Marsh
Var-Cov Matrix
y
t
= β
1
+ β
2
x
t2
+ β
3
x
t3
+ e
t
varb
1
covb
1
,b
2
covb
1
,b
3
covb
1
,b
2
,b
3
= covb
1
,b
2
varb
2
covb
2
,b
3
covb
1
,b
3
covb
2
,b
3
varb
3
The least squares estimators b
1
, b
2
, and b
3
have covariance matrix:
7.17
Copyright 1996 Lawrence C. Marsh
Normal
y
t
= β
1
+ β
2
x
2t
+ β
3
x
3t
+. . .+ β
K
x
Kt
+ e
t
y
t
~N β
1
+ β
2
x
2t
+ β
3
x
3t
+. . .+ β
K
x
Kt
, σ
2
e
t
~ N0, σ
2
This implies and is implied by:
b
k
~ N β
k
, varb
k
z =
~ N0,1
for k = 1,2,...,K
b
k
− β
k
varb
k
Since
b
k
is a linear function of the y
t
’s:
7.18
Copyright 1996 Lawrence C. Marsh
Student-t
b
k
− β
k
varb
k
t = = b
k
− β
k
seb
k
Since generally the population variance of b
k
,
varb
k
,
is unknown
,
we estimate
it with which uses σ
2
instead of σ
2
. varb
k
t
has a Student-t distribution with df=
T −
K
.
7.19
Copyright 1996 Lawrence C. Marsh
Interval Estimation
b
k
− β
k
seb
k
P −
t
c
≤ ≤
t
c
= 1 − α
t
c
is critical value for T-K degrees of freedom
such that P
t ≥
t
c
= α
2.
P
b
k
− t
c
seb
k
≤
β
k
≤
b
k
+ t
c
seb
k
= 1 − α
Interval endpoints:
b
k
− t
c
seb
k
,
b
k
+ t
c
seb
k
7.20
Copyright 1996 Lawrence C. Marsh
Hypothesis Testing
and
Nonsample Information
Chapter 8
Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution
or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein.
8.1
Copyright 1996 Lawrence C. Marsh
1. Student-t Tests
2. Goodness-of-Fit
3. F-Tests
4. ANOVA Table
5. Nonsample Information
6. Collinearity
7. Prediction
Chapter 8: Overview
8.2
Copyright 1996 Lawrence C. Marsh
Student - t Test
y
t
= β
1
+ β
2
X
t2
+ β
3
X
t3
+ β
4
X
t4
+ e
t
Student-t tests can be used to test any linear combination of the regression coefficients:
H :
β
2
+ β
3
+ β
4
= 1 H
: β
1
= 0
H : 3
β
2
− 7
β
3
= 21 H
: β
2
− β
3
≤ 5
Every such t-test has exactly T −
K degrees of freedom where K=coefficients estimatedincluding the intercept.
8.3
Copyright 1996 Lawrence C. Marsh
One Tail Test
y
t
= β
1
+ β
2
X
t2
+ β
3
X
t3
+ β
4
X
t4
+ e
t
H :
β
3
≤
H
1
: β
3
b
3
se
b
3
t = ~ t
T
−
K
t
c
df =
T −
K
=
T −
4
α 1 − α
8.4
Copyright 1996 Lawrence C. Marsh
Two Tail Test
y
t
= β
1
+ β
2
X
t2
+ β
3
X
t3
+ β
4
X
t4
+ e
t
H :
β
2
= 0
H
1
: β
2
≠ b
2
se
b
2
t = ~ t
T
−
K
t
c
df =
T −
K
=
T −
4
α2 1 − α
-t
c
α2
8.5
Copyright 1996 Lawrence C. Marsh
Goodness - of - Fit
≤ R
2
≤ 1
Coefficient of Determination
SST
R
2
= =
Σ
y
t
− y
2
t = 1
T
SSR
Σ
y
t
− y
2
t = 1
T
8.6
Copyright 1996 Lawrence C. Marsh
Adjusted R-Squared
Adjusted Coefficient of Determination
Original:
Adjusted:
SST
T −
1
R
2
= 1 −
SSE
T −
K
SST
= 1 −
SSE
R
2
=
SST SSR
8.7
Copyright 1996 Lawrence C. Marsh
Computer Output
Table 8.2 Summary of Least Squares Results Variable Coefficient Std Error t-value p-value
constant
104.79 6.48 16.17 0.000 price
− 6.642 3.191
− 2.081 0.042
advertising 2.984 0.167 17.868 0.000
b
2
se
b
2
t = =
− 6.642
3.191 −
2.081
=
8.8
Copyright 1996 Lawrence C. Marsh
Reporting Your Results
y
t
= 104.79
− 6.642
X
t2
+ 2.984
X
t3
6.48 3.191 0.167 s.e.
y
t
= 104.79
− 6.642
X
t2
+ 2.984
X
t3
16.17 -2.081 17.868 t Reporting t-statistics:
Reporting standard errors:
8.9
Copyright 1996 Lawrence C. Marsh
Single Restriction F-Test
y
t
= β
1
+ β
2
X
t2
+ β
3
X
t3
+ β
4
X
t4
+ e
t
H :
β
2
= 0
H
1
: β
2
≠
df
d
=
T −
K
= 49 df
n
=
J
= 1
SSE
R
− SSE
U
J
SSE
U
T −
K
F =
1964.758 −
1805.1681
1805.16852
− 3
=
= 4.33
By definition this is the t-statistic squared:
t = −
2.081 F = t
2
= 4.33
8.10
Copyright 1996 Lawrence C. Marsh
Multiple Restriction F-Test
y
t
= β
1
+ β
2
X
t2
+ β
3
X
t3
+ β
4
X
t4
+ e
t
H :
β
2
= 0, β
4
= 0
H
1
: H not true
df
d
=
T −
K
= 49 df
n
=
J
= 2
SSE
R
− SSE
U
J
SSE
U
T −
K
F =
First run the restricted regression by dropping
X
t2
and
X
t4
to get SSE
R
.
Next run unrestricted regression to get SSE
U
.
8.11
Copyright 1996 Lawrence C. Marsh
F-Tests
SSE
R
− SSE
U
J
SSE
U
T −
K
F =
F-Tests of this type are always right-tailed, even for left-sided or two-sided hypotheses,
because any deviation from the null will make the F value bigger move rightward.
F
c
α 1 − α
f
F
F
8.12
Copyright 1996 Lawrence C. Marsh
F-Test of Entire Equation
y
t
= β
1
+ β
2
X
t2
+ β
3
X
t3
+ e
t
H :
β
2
= β
3
= 0
H
1
: H not true
df
d
=
T −
K
= 49 df
n
=
J
= 2
SSE
R
− SSE
U
J
SSE
U
T −
K
F =
13581.35 −
1805.1682
1805.16852
− 3
=
= 159.828 We ignore
β
1
. Why?
F
c
=
3.187
α =
0.05
Reject
H
8.13
Copyright 1996 Lawrence C. Marsh
ANOVA Table
Table 8.3 Analysis of Variance Table Sum of Mean
Source DF Squares Square F-Value Explained 2 11776.18 5888.09 158.828
Unexplained 49 1805.168 36.84 Total 51 13581.35 p-value: 0.0001
SST
R
2
= =
SSR
=
0.867
11776.18
13581.35 8.14
Copyright 1996 Lawrence C. Marsh
Nonsample Information
lny
t
= β
1
+ β
2
lnX
t2
+ β
3
lnX
t3
+ β
4
lnX
t4
+ e
t
A certain production process is known to be Cobb-Douglas with constant returns to scale.
β
2
+ β
3
+ β
4
= 1 where
β
4
= 1 −
β
2
− β
3
lny
t
X
t4
= β
1
+ β
2
lnX
t2
X
t4
+ β
3
lnX
t3
X
t4
+ e
t
y
t
= β
1
+ β
2
X
t2
+ β
3
X
t3
+ β
4
X
t4
+ e
t
Run least squares on the transformed model. Interpret coefficients same as in original model.
8.15
Copyright 1996 Lawrence C. Marsh
Collinear Variables
The term “independent variable” means an explanatory variable is independent of
of the error term, but not necessarily independent of other explanatory variables.
Since economists typically have no control over the implicit “experimental design”,
explanatory variables tend to move together which often makes sorting out
their separate influences rather problematic.
8.16
Copyright 1996 Lawrence C. Marsh
Effects of Collinearity
1. no least squares output when collinearity is exact.
2. large standard errors and wide confidence intervals.
3. insignificant t-values even with high R
2
and a significant F-value.
4. estimates sensitive to deletion or addition of a few observations or “insignificant” variables.
5. good “within-sample”same proportions but poor “out-of-sample”different proportions prediction.
A high degree of collinearity will produce:
8.17
Copyright 1996 Lawrence C. Marsh
Identifying Collinearity
Evidence of high collinearity include:
1. a high pairwise correlation between two explanatory variables.
2. a high R-squared when regressing one explanatory variable at a time on each of the
remaining explanatory variables.
3. a statistically significant F-value when the t-values are statistically insignificant.
4. an R-squared that doesn’t fall by much when dropping any of the explanatory variables.
8.18
Copyright 1996 Lawrence C. Marsh
Mitigating Collinearity
Since high collinearity is not a violation of any least squares assumption, but rather a
lack of adequate information in the sample:
1. collect more data with better information.
2. impose economic restrictions as appropriate.
3. impose statistical restrictions when justified.
4. if all else fails at least point out that the poor model performance might be due to the
collinearity problem or it might not. 8.19
Copyright 1996 Lawrence C. Marsh
Prediction
Given a set of values for the explanatory variables, 1 X
02
X
03
, the best linear
unbiased predictor of y is given by:
y
t
= β
1
+ β
2
X
t2
+ β
3
X
t3
+ e
t
This predictor is unbiased
in the sense
that the average value of the forecast
error is zero .
y = b
1
+ b
2
X
02
+ b
3
X
03
8.20
Copyright 1996 Lawrence C. Marsh
Extensions
of the Multiple
Regression Model
Chapter 9
Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution
or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein.
9.1
Copyright 1996 Lawrence C. Marsh
Topics for This Chapter
1. Intercept Dummy Variables
2. Slope Dummy Variables
3. Different Intercepts Slopes
4. Testing Qualitative Effects
5. Are Two Regressions Equal?
6. Interaction Effects
7. Dummy Dependent Variables
9.2
Copyright 1996 Lawrence C. Marsh
Intercept Dummy Variables
Dummy variables are binary 0,1
D
t
= 1 if red
car, D
t
= 0 otherwise.
y
t
= β
1
+ β
2
X
t
+ β
3
D
t
+ e
t
y
t
=
speed of car in miles per hour
X
t
=
age of car in years
Police: red
cars travel faster
.
H :
β
3
= 0
H
1
:
β
3
9.3
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
X
t
+ β
3
D
t
+ e
t
red cars:
y
t
= β
1
+ β
3
+ β
2
x
t
+ e
t
other cars:
y
t
= β
1
+ β
2
X
t
+ e
t
y
t
X
t
miles per
hour
age in years
β
1
+ β
3
β
1
β
2
β
2
red
cars
other cars
9.4
Copyright 1996 Lawrence C. Marsh
Slope Dummy Variables
y
t
= β
1
+ β
2
X
t
+ β
3
D
t
X
t
+ e
t
y
t
= β
1
+ β
2
+ β
3
X
t
+ e
t
y
t
= β
1
+ β
2
X
t
+ e
t
y
t
X
t
value of
porfolio
years
β
2
+ β
3
β
1
β
2
stocks
bonds Stock portfolio: D
t
= 1 Bond portfolio: D
t
= 0
β
1
= initial investment
9.5
Copyright 1996 Lawrence C. Marsh
Different Intercepts Slopes
y
t
= β
1
+ β
2
X
t
+ β
3
D
t
+ β
4
D
t
X
t
+ e
t
y
t
= β
1
+ β
3
+ β
2
+ β
4
X
t
+ e
t
y
t
= β
1
+ β
2
X
t
+ e
t
y
t
X
t
harvest weight
of
corn
rainfall
β
2
+ β
4
β
1
β
2
“miracle”
regular
“miracle” seed: D
t
= 1 regular seed: D
t
= 0
β
1
+ β
3
9.6
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
X
t
+ β
3
D
t
+ e
t
β
2
β
1
+ β
3
β
2
β
1
y
t
X
t
Men
Women
y
t
= β
1
+ β
2
X
t
+ e
t
For men
: D
t
= 1. For
women :
D
t
= 0.
years of experience y
t
= β
1
+ β
3
+ β
2
X
t
+ e
t
wage rate
H :
β
3
=
H
1
:
β
3
. .
Testing for discrimination
in starting wage
9.7
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
5
X
t
+ β
6
D
t
X
t
+ e
t
β
5
β
5
+ β
6
β
1
y
t
X
t
Men
Women y
t
= β
1
+ β
5
+ β
6
X
t
+ e
t
y
t
= β
1
+ β
5
X
t
+ e
t
For men D
t
= 1. For women D
t
= 0.
Men and women have the same starting wage,
β
1
, but their wage rates increase at different rates diff.=
β
6
.
β
6
means that men’s wage rates are increasing faster than womens wage rates.
years of experience
wage rate
9.8
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
X
t
+ β
3
D
t
+ β
4
D
t
X
t
+ e
t
β
1
+
β
3
β
1
β
2
β
2
+
β
4
y
t
X
t
Men
Women y
t
=
β
1
+
β
3
+
β
2
+
β
4
X
t
+ e
t
y
t
=
β
1
+
β
2
X
t
+ e
t
Women are given a higher starting wage,
β
1
, while men get the lower starting wage,
β
1
+
β
3
,
β
3
. But, men get a faster rate of increase in their wages,
β
2
+
β
4
, which is higher than the rate of increase for women,
β
2
, since
β
4
.
years of experience
An Ineffective Affirmative Action Plan
women are started at a higher wage.
Note :
β
3
wage rate
9.9
Copyright 1996 Lawrence C. Marsh
Testing Qualitative Effects
1. Test for differences in intercept
.
2. Test for differences in slope
.
3. Test for differences in both
intercept and
slope .
9.10
Copyright 1996 Lawrence C. Marsh
H :
β
3
≤ 0 vs
. Η
1
: β
3
H :
β
4
≤ 0 vs
. Η
1
: β
4
Y
t
= β
1
+ β
2
X
t
+ β
3
D
t
+ β
4
D
t
X
t
b −
3
Est. Var b
3
˜
t
n −
4
b
−
4
Est. Var b
4
˜
t
n −
4
men: D
t
= 1 ; women: D
t
= 0
Testing for discrimination in
starting wage.
Testing for discrimination in
wage increases.
intercept
slope
+ e
t
9.11
Copyright 1996 Lawrence C. Marsh
Testing :
H
o
: β
3
=
β
4
= 0
H
1
: otherwise
and
SSE
R
=
y
t
− b
1
− b
2
X
t
2
t
=
1
T
∑
SSE
U
= y
t
− b
1
− b
2
X
t
− b
3
D
t
− b
4
D
t
X
t 2
t =
1 T
∑
SSE
R
− SSE
U
2
SSE
U
T −
4 ∼
F
T −
4 2
intercept and slope
9.12
Copyright 1996 Lawrence C. Marsh
Are Two Regressions Equal?
y
t
= β
1
+ β
2
X
t
+ β
3
D
t
+ β
4
D
t
X
t
+ e
t
variations of “The Chow Test”
I. Assuming equal variances pooling:
men: D
t
= 1 ; women: D
t
= 0
H
o
:
β
3
= β
4
= 0 vs.
H
1
: otherwise
y
t
= wage rate
This model assumes equal wage rate variance.
X
t
= years of experience 9.13
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
X
t
+ e
t
II. Allowing for unequal variances:
y
tm
= δ
1
+ δ
2
X
tm
+ e
tm
y
tw
= γ
1
+ γ
2
X
tw
+ e
tw
Everyone:
Men only: Women only:
SSE
R
Forcing men and women to have same
β
1
, β
2
.
Allowing men and women to be different.
SSE
m
SSE
w
where SSE
U
= SSE
m
+ SSE
w
F = SSE
R
− SSE
U
J
SSE
U
T −
K J = restrictions
K=unrestricted coefs. running three regressions
J = 2 K = 4
9.14
Copyright 1996 Lawrence C. Marsh
Interaction Variables
1. Interaction Dummies
2. Polynomial Terms
special case of continuous interaction
3. Interaction Among Continuous Variables
9.15
Copyright 1996 Lawrence C. Marsh
1. Interaction Dummies
y
t
= β
1
+ β
2
X
t
+ β
3
M
t
+ β
4
B
t
+ e
t
For men
: M
t
= 1. For women
: M
t
= 0.
For black
: B
t
= 1. For nonblack
: B
t
= 0.
No Interaction: wage gap assumed the same:
y
t
= β
1
+ β
2
X
t
+ β
3
M
t
+ β
4
B
t
+ β
5
M
t
B
t
+ e
t
Interaction: wage gap depends on race: Wage Gap between Men and Women
y
t
= wage rate; X
t
= experience
9.16
Copyright 1996 Lawrence C. Marsh
2. Polynomial Terms
y
t
= β
1
+ β
2
X
t
+ β
3
X
2 t
+ β
4
X
3 t
+ e
t
Linear in parameters but nonlinear in variables:
y
t
= income; X
t
= age
Polynomial Regression
y
t
X
t
People retire at different ages or not at all.
90 20
30 40
50 60
80 70
9.17
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
X
t
+ β
3
X
2 t
+ β
4
X
3 t
+ e
t
y
t
= income; X
t
= age
Polynomial Regression
Rate income is changing as we age:
∂ y
t
∂ X
t
= β
2
+ 2 β
3
X
t
+ 3 β
4
X
2 t
Slope changes as X
t
changes.
9.18
Copyright 1996 Lawrence C. Marsh
3. Continuous Interaction
y
t
= β
1
+ β
2
Z
t
+ β
3
B
t
+ β
4
Z
t
B
t
+ e
t
Exam grade = fsleep: Z
t
, study time: B
t
Sleep and study time do not act independently.
More study time will be more effective when combined with more sleep and less
effective when combined with less sleep.
9.19
Copyright 1996 Lawrence C. Marsh
Your mind sorts things out while
you sleep when you have things to sort out. y
t
= β
1
+ β
2
Z
t
+ β
3
B
t
+ β
4
Z
t
B
t
+ e
t
Exam grade = fsleep: Z
t
, study time: B
t
∂ y
t
∂ B
t
= β
2
+ β
4
Z
t
Your studying is more effective
with more sleep.
∂ y
t
∂ Z
t
= β
2
+ β
4
B
t
continuous interaction
9.20
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
Z
t
+ β
3
B
t
+ β
4
Z
t
B
t
+ e
t
Exam grade = fsleep: Z
t
, study time: B
t
If Z
t
+ B
t
= 24 hours, then B
t
= 24 −
Z
t
y
t
= β
1
+ β
2
Z
t
+ β
3
24 −
Z
t
+ β
4
Z
t
24 −
Z
t
+ e
t
y
t
= β
1
+24 β
3
+ β
2
−β
3
+24 β
4
Z
t
− β
4
Z
2 t
+ e
t
y
t
= δ
1
+ δ
2
Z
t
+ δ
3
Z
2 t
+ e
t
Sleep needed to maximize your exam grade:
∂ y
t
∂ Z
t
= δ
2
+ 2 δ
3
Z
t
= 0
where δ
2
0 and δ
3
− δ
2
2δ
3
Z
t
=
9.21
Copyright 1996 Lawrence C. Marsh
1. Linear Probability Model
2. Probit Model
3. Logit Model
Dummy Dependent Variables
9.22
Copyright 1996 Lawrence C. Marsh
Linear Probability Model
y
i
= β
1
+ β
2
X
i2
+ β
3
X
i3
+ β
4
X
i4
+ e
i
X
i2
=
total hours of work each week
1 quits job 0 does not quit
y
i
=
X
i3
=
weekly paycheck
X
i4
=
hourly pay
X
i3
divided by
X
i2
9.23
Copyright 1996 Lawrence C. Marsh
X
i2
y
i
= β
1
+ β
2
X
i2
+ β
3
X
i3
+ β
4
X
i4
+ e
i
y
t
= 1
y
t
=
total hours of work each week
y
i
= b
1
+ b
2
X
i2
+ b
3
X
i3
+ b
4
X
i4
y
i
Read predicted values of
y
i
off the regression line :
Linear Probability Model
9.24
Copyright 1996 Lawrence C. Marsh
1. Probability estimates are sometimes
less than zero or greater than one.
2. Heteroskedasticity is present in that
the model generates a nonconstant error variance.
Linear Probability Model
Problems with Linear Probability Model:
9.25
Copyright 1996 Lawrence C. Marsh
Probit Model
z
i
= β
1
+ β
2
X
i2
+ . . .
2
π fz
i
= e
− 0.5z
i
2
1
Fz
i
= P[ Z ≤
z
i
] =
∫
e
− 0.5u
2
du
2
π
1
Normal probability density function:
Normal cumulative probability function:
z
i
−∞
latent variable, z
i
:
9.26
Copyright 1996 Lawrence C. Marsh
p
i
= P[ Z ≤ β
1
+ β
2
X
i2
] = F β
1
+ β
2
X
i2
Since
z
i
= β
1
+ β
2
X
i2
+ . . .
, we can
substitute in to get :
Probit Model
X
i2
total hours of work each week
y
t
= 1
y
t
=
9.27
Copyright 1996 Lawrence C. Marsh
Logit Model
p
i
=
1
1
+ e
−
β
1
+ β
2
X
i2
+ . .
.
Define p
i
:
For β
2
, p
i
will approach 1 as X
i2
+
∞
p
i
is the probability of quitting the job.
For β
2
, p
i
will approach 0 as X
i2
−
∞
9.28
Copyright 1996 Lawrence C. Marsh
Logit Model
X
i2
total hours of work each week
y
t
= 1
y
t
= p
i
=
1
1
+ e
−
β
1
+ β
2
X
i2
+ . .
.
p
i
is the probability of quitting the job.
9.29
Copyright 1996 Lawrence C. Marsh
Maximum Likelihood
Maximum likelihood estimation MLE is used to estimate Probit and Logit functions.
The small sample properties of MLE are not known, but in large samples
MLE is normally distributed, and it is consistent and asymptotically efficient.
9.30
Copyright 1996 Lawrence C. Marsh
Heteroskedasticity
Chapter 10
Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution
or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein.
10.1
Copyright 1996 Lawrence C. Marsh
The Nature of Heteroskedasticity
Heteroskedasticity
is a systematic pattern in the errors where the variances of the errors
are not constant.
Ordinary least squares assumes that all observations are
equally reliable
.
For
efficiency
accurate estimationprediction reweight observations to ensure equal error
variance. 10.2
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
x
t
+ e
t
Regression Model
Ee
t
= 0
vare
t
= σ
2
zero mean:
homoskedasticity:
nonautocorrelation:
cove
t
, e
s
= t
≠ s
heteroskedasticity:
vare
t
= σ
t 2
10.3
Copyright 1996 Lawrence C. Marsh
Homoskedastic pattern of errors
x
t
y
t
. .
. .
. .
. .
. .
. .
. .
. .
. . .
. .
.. . . .
. . .
. . . .
. .
. .
. .
.
.
income consumption
10.4
Copyright 1996 Lawrence C. Marsh
.
.
x
t
x
1
x
2
y
t
fy
t
The Homoskedastic Case
. .
x
3
x
4
income
consumption
10.5
Copyright 1996 Lawrence C. Marsh
Heteroskedastic pattern of errors
x
t
y
t
.
. .
. . .
. .
. .
. .
.
. .
. .
. .
.
. .
. .
.
. .
.
.
. .
. .
. .
. .
. .
. .
. .
. .
.
income consumption
10.6
Copyright 1996 Lawrence C. Marsh
.
x
t
x
1
x
2
y
t
fy
t
consumption
x
3
. .
The Heteroskedastic Case
income
rich people
poor people
10.7
Copyright 1996 Lawrence C. Marsh
Properties of Least Squares
1. Least squares still linear
and unbiased
.
2. Least squares not efficient
.
3. Usual formulas give incorrect
standard
errors for least squares.
4. Confidence intervals and hypothesis tests
based on usual standard errors are wrong
.
10.8
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
x
t
+ e
t
heteroskedasticity:
vare
t
= σ
t 2
incorrect formula for least squares variance:
varb
2
=
σ
2
Σ
x
t
− x
2
correct formula for least squares variance:
varb
2
=
Σ
σ
t 2
x
t
− x
2
[Σ
x
t
− x
2
]
2
10.9
Copyright 1996 Lawrence C. Marsh
Hal White’s Standard Errors
White’s estimator of the least squares variance:
est.varb
2
=
Σ
e
t 2
x
t
− x
2
[Σ
x
t
− x
2
]
2
In large samples White’s standard error square root of estimated variance is a
correct accurate consistent measure.
10.10
Copyright 1996 Lawrence C. Marsh
Two Types of Heteroskedasticity
1. Proportional
Heteroskedasticity.
continuous functionof x
t
, for example
2. Partitioned
Heteroskedasticity.
discrete categoriesgroups
10.11
Copyright 1996 Lawrence C. Marsh
Proportional Heteroskedasticity
y
t
= β
1
+ β
2
x
t
+ e
t
where
vare
t
= σ
t 2
Ee
t
= 0 cove
t
, e
s
= 0 t ≠
s
σ
t 2
= σ
2
x
t
The variance is assumed to be
proportional to the value of x
t
10.12
Copyright 1996 Lawrence C. Marsh
σ
t 2
= σ
2
x
t
y
t
= β
1
+ β
2
x
t
+ e
t
std.dev. proportional to
x
t
variance:
standard deviation:
σ
t
= σ
x
t
y
t
1 x
t
e
t
= β
1
+ β
2
+ x
t
x
t
x
t
x
t
To correct for heteroskedasticity divide the model by
x
t
vare
t
= σ
t 2
10.13
Copyright 1996 Lawrence C. Marsh
y
t
1 x
t
e
t
= β
1
+ β
2
+ x
t
x
t
x
t
x
t
y
t
= β
1
x
t1
+ β
2
x
t2
+ e
t
vare
t
= var = vare
t
= σ
2
x
t
e
t
x
t
1
x
t
1
x
t
vare
t
= σ
2
e
t
is heteroskedastic
, but
e
t
is homoskedastic
.
10.14
Copyright 1996 Lawrence C. Marsh
1. Decide which variable is proportional to the