Copyright 1996 Lawrence C. Marsh
1. Decide which variable is proportional to the
heteroskedasticity x
t
in previous example.
2. Divide all terms in the original model by the
square root of that variable divide by x
t
.
3. Run least squares on the transformed model
which has new y
t
, x
t1
and x
t2
variables but
no intercept
.
Generalized Least Squares
These steps describe weighted least squares:
10.15
Copyright 1996 Lawrence C. Marsh
Partitioned Heteroskedasticity
y
t
= β
1
+ β
2
x
t
+ e
t
vare
t
= σ
1 2
vare
t
= σ
2 2
error variance of “field” corn
:
error variance of “sweet” corn
: y
t
=
bushels per acre of corn
x
t
=
gallons of water per acre rain or other
t = 1,
. . .
,100
t = 1,
. . .
,80
t = 81,
. . .
,100
10.16
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
x
t
+ e
t
vare
t
= σ
1 2
“field” corn
:
y
t
= β
1
+ β
2
x
t
+ e
t
vare
t
= σ
2 2
“sweet” corn
:
y
t
1
x
t
e
t
= β
1
+ β
2
+ σ
1
σ
1
σ
1
σ
1
y
t
1
x
t
e
t
= β
1
+ β
2
+ σ
2
σ
2
σ
2
σ
2
Reweighting Each Group’s Observations
t = 1,
. . .
,80
t = 81,
. . .
,100
10.17
Copyright 1996 Lawrence C. Marsh
Apply Generalized Least Squares
Run least squares separately on data for each group.
σ
1 2
provides estimator of
σ
1 2
using
the 80 observations on “field” corn.
σ
2 2
provides estimator of
σ
2 2
using
the 20 observations on “sweet” corn.
10.18
Copyright 1996 Lawrence C. Marsh
1. Residual Plots provide information on the
exact nature of heteroskedasticity partitioned or proportional to aid in correcting for it.
2. Goldfeld-Quandt Test checks for presence
of heteroskedasticity.
Detecting Heteroskedasticity
Determine existence and nature of heteroskedasticity
: 10.19
Copyright 1996 Lawrence C. Marsh
Residual Plots
e
t
x
t
.
.
. .
.
. .
. .
. .
. .
. .
.
. .
. .
. .
. .
. . . .
. .
. . .
. .
. .
. .
. .
.
. .
Plot residuals against one variable at a time after sorting the data by that variable to try
to find a heteroskedastic pattern in the data.
10.20
Copyright 1996 Lawrence C. Marsh
Goldfeld-Quandt Test
The Goldfeld-Quandt test can be used to detect heteroskedasticity in either the proportional case
or for comparing two groups in the discrete case.
For proportional heteroskedasticity, it is first necessary to determine which variable, such as x
t
, is proportional to the error variance. Then
sort the data from the
largest to smallest values of that variable.
10.21
Copyright 1996 Lawrence C. Marsh
H
o
: σ
1 2
= σ
2 2
H
1
: σ
1 2
σ
2 2
GQ = ~ F
[T
1
-K
1
, T
2
-K
2
]
σ
1 2
σ
2 2
In the proportional case, drop the middle r observations where r
≈
T6, then run
separate least squares regressions on the first T
1
observations and the last T
2
observations.
Small values of GQ support H
o
while large values support H
1
.
Goldfeld-Quandt Test Statistic
Use F Table
10.22
Copyright 1996 Lawrence C. Marsh
σ
t 2
= σ
2
exp{ α
1
z
t1
+ α
2
z
t2
}
More General Model
Structure of heteroskedasticity could be more complicated:
z
t1
and
z
t2
are any observable variables upon
which we believe the variance could depend.
Note: The function exp{
.
} ensures that
σ
t 2
is positive.
10.23
Copyright 1996 Lawrence C. Marsh
σ
t 2
= σ
2
exp{ α
1
z
t1
+ α
2
z
t2
}
More General Model
ln σ
t 2
= ln σ
2
+ α
1
z
t1
+ α
2
z
t2
ln σ
t 2
= α
+ α
1
z
t1
+ α
2
z
t2
where
α = ln
σ
2
H
o
: α
1
= 0, α
2
= 0
H
1
: α
1
≠ 0,
α
2
≠
andor
Least squares residuals,
e
t
ln e
t 2
= α
+ α
1
z
t1
+ α
2
z
t2
+ ν
t
the usual F test
10.24
Copyright 1996 Lawrence C. Marsh
Autocorrelation
Chapter 11
Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution
or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein.
11.1
Copyright 1996 Lawrence C. Marsh
The Nature of Autocorrelation
For
efficiency
accurate estimationprediction all systematic information needs to be incor-
porated into the regression model.
Autocorrelation
is a systematic pattern in the errors that can be either attracting
positive
or repelling
negative
autocorrelation. 11.2
Copyright 1996 Lawrence C. Marsh
Postive Auto.
No Auto.
Negative Auto.
e
t
.
e
t
e
t
t
t
t
. . . . .
. . . . . . .
. . . . .
. .. .
. . .
. .. .
.. . .
.
. .
. . . .
. .
. .. .
. .
. .
. .
. .
. .
. .
. .
. .
.
. .
. .
.
crosses line not enough attracting
crosses line randomly
crosses line too much repelling
11.3
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
x
t
+ e
t
Regression Model
Ee
t
= 0
vare
t
= σ
2
zero mean:
homoskedasticity:
nonautocorrelation:
cove
t
, e
s
= t
≠ s
autocorrelation:
cove
t
, e
s
≠ t
≠ s
11.4
Copyright 1996 Lawrence C. Marsh
Order of Autocorrelation
y
t
= β
1
+ β
2
x
t
+ e
t
e
t
= ρ
e
t −
1
+ ν
t
e
t
= ρ
1
e
t −
1
+ ρ
2
e
t −
2
+ ν
t
e
t
= ρ
1
e
t −
1
+ ρ
2
e
t −
2
+ ρ
3
e
t −
3
+ ν
t
1st Order:
2nd Order:
3rd Order:
We will assume First Order Autocorrelation:
e
t
= ρ
e
t −
1
+ ν
t
AR1 :
11.5
Copyright 1996 Lawrence C. Marsh
First Order Autocorrelation
y
t
= β
1
+ β
2
x
t
+ e
t
e
t
= ρ
e
t −
1
+ ν
t
where
− 1
ρ 1
E ν
t
= 0 var ν
t
= σ
ν 2
cov ν
t
, ν
s
= t
≠ s
These assumptions about
ν
t
imply the following about
e
t
:
Ee
t
= 0
vare
t
= σ
e 2
= cove
t
, e
t −
k
= σ
e 2
ρ
k
for k
corre
t
, e
t −
k
= ρ
k
for k
σ
ν 2
1 −
ρ
2
11.6
Copyright 1996 Lawrence C. Marsh
Autocorrelation creates some
Problems for Least Squares:
1. The least squares estimator is still linear and unbiased but it is
not efficient
.
2. The formulas normally used to compute the least squares standard errors are no
longer correct and confidence intervals and hypothesis tests using them will be
wrong
. 11.7
Copyright 1996 Lawrence C. Marsh
Generalized Least Squares
y
t
= β
1
+ β
2
x
t
+ e
t
e
t
= ρ
e
t −
1
+ ν
t
y
t
= β
1
+ β
2
x
t
+ ρ
e
t −
1
+ ν
t
substitute
in for
e
t
Now we need to get rid of
e
t −
1
continued
AR1 :
11.8
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
x
t
+ e
t
y
t
= β
1
+ β
2
x
t
+ ρ
e
t −
1
+ ν
t
e
t
= y
t
− β
1
− β
2
x
t
e
t −
1
= y
t −
1
− β
1
− β
2
x
t −
1
y
t
= β
1
+ β
2
x
t
+ ρ
y
t −
1
− β
1
− β
2
x
t −
1
+ ν
t
lag the errors
once
continued
11.9
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
x
t
+ ρ
y
t −
1
− β
1
− β
2
x
t −
1
+ ν
t
y
t
= β
1
+ β
2
x
t
+ ρ
y
t −
1
− ρ β
1
− ρ β
2
x
t −
1
+ ν
t
y
t
− ρ
y
t −
1
= β
1
1 −ρ
+ β
2
x
t
−ρ x
t −
1
+ ν
t
y
t
= β
1
+ β
2
x
t2
+ ν
t
y
t
= y
t
− ρ
y
t −
1
β
1
= β
1
1 −ρ
x
t2
= x
t
−ρ x
t −
1
11.10
Copyright 1996 Lawrence C. Marsh
y
t
= β
1
+ β
2
x
t2
+ ν
t
y
t
= y
t
− ρ
y
t −
1
β
1
= β
1
1 −ρ
x
t2
= x
t
− ρ x
t −
1
Problems estimating
this model
with least squares
:
1. One observation is used up in creating the transformed lagged variables leaving only
T −
1 observations for estimating the model.
2. The value of
ρ
is not known. We must find some way to estimate it.
11.11
Copyright 1996 Lawrence C. Marsh
Recovering the 1st Observation
Dropping the 1st observation and applying least squares is
not
the
best linear unbiased
estimation method.
Efficiency is lost because the variance of the error associated with the 1st observation
is not equal to that of the other errors.
This is a special case of the heteroskedasticity problem except that here all errors are assumed
to have equal variance except the 1st error.
11.12
Copyright 1996 Lawrence C. Marsh
Recovering the 1st Observation
y
1
= β
1
+ β
2
x
1
+ e
1
The 1st observation should fit the original model as:
We could include this as the 1st observation for our estimation procedure but we must first transform it so
that it has the same error variance as the other observations. with error variance: vare
1
= σ
e 2
= σ
ν 2
1-
ρ
2
.
Note: The other observations all have error variance
σ
ν 2
. 11.13
Copyright 1996 Lawrence C. Marsh
y
1
= β
1
+ β
2
x
1
+ e
1
with error variance: vare
1
= σ
e 2
= σ
ν 2
1-
ρ
2
.
The other observations all have error variance
σ
ν 2
.
Given any constant c : varce
1
= c
2
vare
1
.
If c = 1- ρ
2
, then var 1- ρ
2
e
1
= 1- ρ
2
vare
1
.
= 1- ρ
2
σ
e 2
= 1- ρ
2
σ
ν 2
1-
ρ
2
= σ
ν 2
The transformation
ν
1
= 1- ρ
2
e
1
has variance σ
ν 2
.
11.14
Copyright 1996 Lawrence C. Marsh
y
1
= β
1
+ β
2
x
1
+ e
1
The transformed error
ν
1
= 1- ρ
2
e
1
has variance σ
ν 2
. Multiply through by 1-
ρ
2
to get:
1- ρ
2
y
1
=
1- ρ
2
β
1
+
1- ρ
2
β
2
x
1
+
1- ρ
2
e
1
This transformed first observation may now be added to the other T-1 observations to obtain
the fully restored set of T observations.
11.15
Copyright 1996 Lawrence C. Marsh
Estimating Unknown ρ
Value
e
t
= ρ
e
t −
1
+ ν
t
First, use least squares to estimate the model: If we had values for the
e
t
’s, we could estimate:
y
t
= β
1
+ β
2
x
t
+ e
t
The residuals from this estimation are:
e
t
= y
t
- b
1
- b
2
x
t
11.16
Copyright 1996 Lawrence C. Marsh
e
t
= y
t
- b
1
- b
2
x
t
e
t
= ρ
e
t −
1
+ ν
t
Next, estimate the following by least squares:
The least squares solution is:
Σ
e
t
e
t-1
Σ
e
t-1
T
T
t = 2
t = 2
2
ρ =
11.17
Copyright 1996 Lawrence C. Marsh
Durbin-Watson Test
H
o
:
ρ = 0 vs. H
1
:
ρ ≠
0 , ρ
0, or ρ
Σ
e
t
− e
t-1
Σ
e
t
T
T
t = 2
t = 1
2
d =
2
The Durbin-Watson Test statistic, d, is :
11.18
Copyright 1996 Lawrence C. Marsh
Testing for Autocorrelation
The test statistic, d, is approximately related to ρ
as:
d
≈
21 −ρ
When ρ
= 0 , the Durbin-Watson statistic is d
≈
2.
When ρ
= 1 , the Durbin-Watson statistic is d
≈
0.
Tables for critical values for d are not always readily available so it is easier to use the p-value
that most computer programs provide for d.
Reject H
o
if p-value α
, the significance level.
11.19
Copyright 1996 Lawrence C. Marsh
Prediction with AR1 Errors
When errors are autocorrelated, the previous period’s error may help us predict next period’s error.
The best predictor,
y
T+1
, for next period is:
y
T+1
= β
1
+ β
2
x
T+1
+ ρ
e
T
~
where
β
1
and
β
2
are generalized least squares
estimates and
e
T
is given by: ~
e
T
= y
T
− β
1
− β
2
x
T
~ 11.20
Copyright 1996 Lawrence C. Marsh
y
T+h
= β
1
+ β
2
x
T+h
+ ρ
h
e
T
~
For h periods ahead, the best predictor is:
Assuming | ρ
| 1, the influence of
ρ
h
e
T
diminishes the further we go into the future the larger h becomes.
~ 11.21
Copyright 1996 Lawrence C. Marsh
Pooling
Time-Series and
Cross-Sectional Data
Chapter 12
Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution
or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein.
12.1
Copyright 1996 Lawrence C. Marsh
Pooling Time and Cross Sections
y
it
= β
1it
+ β
2it
x
2it
+ β
3it
x
3it
+ e
it
If left unrestricted, this model requires different equations
for each firm in each time period. for the i
th
firm in the t
th
time period
12.2
Copyright 1996 Lawrence C. Marsh
Seemingly Unrelated Regressions
y
it
= β
1i
+ β
2i
x
2it
+ β
3i
x
3it
+ e
it
SUR models impose the restrictions:
β
1it
= β
1i
β
2it
= β
2i
β
3it
= β
3i
Each firm gets its own coefficients:
β
1i
,
β
2i
and
β
3i
but those coefficients are constant over time. 12.3
Copyright 1996 Lawrence C. Marsh
The investment expenditures INV of General Electric G and WestinghouseW may be related to their stock market
value V and actual capital stock K as follows:
INV
Gt
= β
1G
+ β
2G
V
Gt
+ β
3G
K
Gt
+ e
Gt
INV
Wt
= β
1W
+ β
2W
V
Wt
+ β
3W
K
Wt
+ e
Wt
i = G, W t = 1, . . . , 20
Two-Equation SUR Model
12.4
Copyright 1996 Lawrence C. Marsh
Estimating Separate Equations
For now make the assumption of no correlation between the error terms across equations:
We make the usual error term assumptions:
cov
e
Gt
,
e
Gs
= 0 cov
e
Wt
,
e
Ws
= 0 var
e
Gt
= σ
G
2
var
e
Wt
= σ
W
2
E
e
Gt
= 0 E
e
Wt
= 0
cov
e
Gt
,
e
Wt
= 0 cov
e
Gt
,
e
Ws
= 0 12.5
Copyright 1996 Lawrence C. Marsh
homoskedasticity assumption:
σ
G
= σ
W
2 2
INV
t
= β
1G
+ δ
1
D
t
+ β
2G
V
t
+ δ
2
D
t
V
t
+ β
3G
K
t
+ δ
3
D
t
K
t
+ e
t
Dummy variable model assumes that :
σ
G
= σ
W
2 2
For Westinghouse observations D
t
= 1; otherwise D
t
= 0.
β
1W
= β
1G
+ δ
1
β
2W
= β
2G
+ δ
2
β
3W
= β
3G
+ δ
3
12.6
Copyright 1996 Lawrence C. Marsh
Problem with OLS on Each Equation
The first assumption of the Gauss-Markov Theorem concerns the
model specification
.
If the model is not fully and correctly specified the Gauss-Markov
properties
might not hold.
Any
correlation
of error terms across equations must be part of model specification.
12.7
Copyright 1996 Lawrence C. Marsh
Any correlation
between the
dependent variables of two or more equations that is
not due
to their explanatory variables
is by default due to correlated
error terms .
Correlated Error Terms
12.8
Copyright 1996 Lawrence C. Marsh
1. Sales of Pepsi vs. sales of Coke. uncontrolled factor: outdoor temperature
2. Investments in bonds vs. investments in stocks. uncontrolled factor: computerappliance sales
3. Movie admissions vs. Golf Course admissions. uncontrolled factor: weather conditions
4. Sales of butter vs. sales of bread. uncontrolled factor: bagels and cream cheese
Which of the following models would be likely to produce positively correlated
errors and which would produce negatively correlations errors?
12.9
Copyright 1996 Lawrence C. Marsh
Joint Estimation of the Equations
INV
Gt
= β
1G
+ β
2G
V
Gt
+ β
3G
K
Gt
+ e
Gt
INV
Wt
= β
1W
+ β
2W
V
Wt
+ β
3W
K
Wt
+ e
Wt
cov
e
G
t
,
e
W
t
= σ
GW
12.10
Copyright 1996 Lawrence C. Marsh
Seemingly Unrelated Regressions
When the error terms of two or more equations are
correlated
, efficient estimation requires the use of a
Seemingly Unrelated Regressions
SUR type estimator to take the
correlation
into account.
Be sure to use the Seemingly Unrelated Regressions SUR procedure in your regression
software program
to estimate any equations that you believe might have
correlated errors
.
12.11
Copyright 1996 Lawrence C. Marsh
Separate vs. Joint Estimation
SUR will give exactly the same results as estimating each equation separately with OLS if either or both
of the following two conditions are true:
1. Every equation has exactly the same set of explanatory variables with exactly the same
values.
2. There is no correlation between the error terms of any of the equations.
12.12
Copyright 1996 Lawrence C. Marsh
Test for Correlation
Η
ο
: σ
GW
= 0
Test the null hypothesis of zero correlation
σ
GW
σ
G
σ
W
r
GW
=
2
2 2
2
λ = T r
GW
2
λ ∼ χ
2 1
asy.
12.13
Copyright 1996 Lawrence C. Marsh
Start with the residuals
e
Gt
and
e
Wt
from each equation
estimated separately.
σ
GW
σ
G
σ
W
r
GW
=
2
2 2
2
λ = T r
GW
2
λ ∼ χ
2 1
asy.
σ
GW
= Σ e
Gt
e
Wt
1 T
σ
G
= Σ e
Gt
1 T
2 2
σ
W
= Σ e
Wt
1 T
2 2
12.14
Copyright 1996 Lawrence C. Marsh
Fixed Effects Model
y
it
= β
1it
+ β
2it
x
2it
+ β
3it
x
3it
+ e
it
y
it
= β
1i
+ β
2
x
2it
+ β
3
x
3it
+ e
it
Fixed effects models impose the restrictions:
β
1it
= β
1i
β
2it
= β
2
β
3it
= β
3
For each i
th
cross section in the t
th
time period:
Each i
th
cross-section has its own constant β
1i
intercept. 12.15
Copyright 1996 Lawrence C. Marsh
The
Fixed Effects Model
is conveniently represented using dummy variables:
y
it
= β
11
D
1i
+ β
12
D
2i
+ β
13
D
3i
+ β
14
D
4 i
+ β
2
x
2it
+ β
3
x
3it
+ e
it
D
1i
=1 if North D
1i
=0 if not N D
2i
=1 if East D
2i
=0 if not E D
3i
=1 if South D
3i
=0 if not S D
4i
=1 if West D
4i
=0 if not W
y
it
= millions of bushels of corn produced x
2it
= price of corn in dollars per bushel x
3it
= price of soybeans in dollars per bushel
Each cross-sectional unit gets its own intercept, but each cross-sectional intercept is constant over time.
12.16
Copyright 1996 Lawrence C. Marsh
H
o
:
β
11
= β
12
= β
13
= β
14
Test for Equality of Fixed Effects
H
1
: H
o
not true
The
H
o
joint null hypothesis may be tested with F-statistic:
SSE
R
− SSE
U
J SSE
U
NT −
K
F = ~ F
NT −
K J
SSE
R
is the restricted error sum of squares one intercept SSE
U
is the unrestricted error sum of squares four intercepts N is the number of cross-sectional units N = 4
K is the number of parameters in the model K = 6 J is the number of restrictions being tested J = N
− 1 = 3
T is the number of time periods
12.17
Copyright 1996 Lawrence C. Marsh
Random Effects Model
y
it
= β
1i
+ β
2
x
2it
+ β
3
x
3it
+ e
it
β
1i
= β
1
+ µ
i
β
1
is the population mean intercept.
µ
i
is an unobservable random error that
accounts for the cross-sectional differences.
12.18
Copyright 1996 Lawrence C. Marsh
β
1i
= β
1
+ µ
i
µ
i
are independent
of one another and of e
it
E µ
i
= 0 var
µ
i
= σ
µ 2
where
i = 1, ... ,N
Consequently, E
β
1i
= β
1
var β
1i
= σ
µ 2
Random Intercept Term
12.19
Copyright 1996 Lawrence C. Marsh
y
it
= β
1i
+ β
2
x
2it
+ β
3
x
3it
+ e
it
y
it
= β
1
+ µ
i
+ β
2
x
2it
+ β
3
x
3it
+ e
it
y
it
= β
1
+ β
2
x
2it
+ β
3
x
3it
+ µ
i
+e
it
y
it
= β
1
+ β
2
x
2it
+ β
3
x
3it
+ ν
it
Random Effects Model
12.20
Copyright 1996 Lawrence C. Marsh
ν
it
= µ
i
+e
it
y
it
= β
1
+ β
2
x
2it
+ β
3
x
3it
+ ν
it
ν
it
has zero mean
: E
ν
it
= 0
ν
it
is homoskedastic
: var
ν
it
= σ
µ
+ σ
e 2
2
The errors from the same firm in different time periods are correlated:
The errors from different firms are always uncorrelated:
cov ν
it
, ν
is
= σ
µ 2
cov ν
it
, ν
js
=
t ≠
s
i ≠
j
12.21
Copyright 1996 Lawrence C. Marsh
Simultaneous
Equations
Models
Chapter 13
Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution
or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein.
13.1
Copyright 1996 Lawrence C. Marsh
Keynesian Macro Model
Assumptions of Simple Keynesian Model
1. Consumption, c, is function of income, y.
2. Total expenditures = consumption + investment.
3. Investment assumed independent of income.
13.2
Copyright 1996 Lawrence C. Marsh
consumption is a function of income:
income is either consumed or invested:
c = β
1
+ β
2
y
y = c + i
The Structural Equations
13.3
Copyright 1996 Lawrence C. Marsh
The Statistical Model
c
t
= β
1
+ β
2
y
t
+ e
t
y
t
= c
t
+ i
t The consumption equation:
The income identity:
13.4
Copyright 1996 Lawrence C. Marsh
The Simultaneous Nature
of Simultaneous Equations
c
t
= β
1
+ β
2
y
t
+ e
t
y
t
= c
t
+ i
t
Since y
t
contains e
t
they are
correlated
2. 1.
3.
4. 5.
13.5
Copyright 1996 Lawrence C. Marsh
The Failure of Least Squares
The least squares estimators of parameters in a structural simul-
taneous equation is
biased and
inconsistent because of the cor-
relation between the random error and the endogenous variables on
the right-hand side of the equation.
13.6
Copyright 1996 Lawrence C. Marsh
Single Equation: Simultaneous Equations:
Single vs. Simultaneous Equations
c
t
y
t
e
t
c
t
y
t
i
t
e
t
13.7
Copyright 1996 Lawrence C. Marsh
Deriving the Reduced Form
c
t
= β
1
+ β
2
y
t
+ e
t
y
t
= c
t
+ i
t
c
t
= β
1
+ β
2
c
t
+ i
t
+ e
t
1 −
β
2
c
t
= β
1
+ β
2
i
t
+ e
t
13.8
Copyright 1996 Lawrence C. Marsh
Deriving the Reduced Form
1 −
β
2
c
t
= β
1
+ β
2
i
t
+ e
t
c
t
= + i
t
+ e
t
1 −β
2
1 −β
2
1 −β
2
1 β
1
β
2
c
t
= π
11
+ π
21
i
t
+ ν
t
The Reduced Form Equation
13.9
Copyright 1996 Lawrence C. Marsh
Reduced Form Equation
c
t
= π
11
+ π
21
i
t
+ ν
t
1 −β
2
β
1
π
11
=
1 −β
2
β
2
π
21
=
1 −β
2
1
ν
t
= + e
t
and
13.10
Copyright 1996 Lawrence C. Marsh
y
t
= c
t
+ i
t
where
c
t
= π
11
+ π
21
i
t
+ ν
t
y
t
= π
12
+ π
22
i
t
+ ν
t
It is sometimes useful to give this equation its own reduced form parameters as follows:
y
t
= π
11
+
1
+ π
21
i
t
+ ν
t
13.11
Copyright 1996 Lawrence C. Marsh
y
t
= π
12
+ π
22
i
t
+ ν
t
c
t
= π
11
+ π
21
i
t
+ ν
t
Since c
t
and y
t
are related through the identity:
y
t
= c
t
+ i
t
, the error term,
ν
t
, of these two equations is the same, and it is easy to
show that:
1 −β
2
β
1
π
11
= π
12
=
1 −β
2
π
22
=
1
−π
21
=
1
13.12
Copyright 1996 Lawrence C. Marsh
Identification
The structural parameters are
β
1
and
β
2
.
The reduced form parameters are
π
11
and
π
21
.
Once the reduced form parameters are estimated, the identification problem is to determine if the
orginal structural parameters can be expressed uniquely in terms of the reduced form parameters.
1 +
π
21
β
2
= π
21
1 +
π
21
β
1
= π
11
13.13
Copyright 1996 Lawrence C. Marsh
Identification
An equation is exactly identified
if its structural behavorial parameters can be uniquely expres-
sed in terms of the reduced form parameters.
An equation is over-identified
if there is more than one solution for expressing its structural
behavorial parameters in terms of the reduced form parameters.
An equation is under-identified
if its structural behavorial parameters cannot be expressed
in terms of the reduced form parameters.
13.14
Copyright 1996 Lawrence C. Marsh
The Identification Problem
A system of M equations containing M endogenous
variables must exclude at least M
− 1 variables from a given
equation in order for the parameters of that equation to
be identified and to be able to be consistently estimated.
13.15
Copyright 1996 Lawrence C. Marsh
Two Stage Least Squares
Problem: right-hand endogenous variables y
t2
and y
t1
are correlated
with the error terms. y
t1
= β
1
+ β
2
y
t2
+ β
3
x
t1
+ e
t1
y
t2
= α
1
+ α
2
y
t1
+ α
3
x
t2
+ e
t2
13.16
Copyright 1996 Lawrence C. Marsh
Problem: right-hand endogenous variables y
t2
and y
t1
are correlated
with the error terms.
Solution: First, derive the reduced form equations.
y
t1
= β
1
+ β
2
y
t2
+ β
3
x
t1
+ e
t1
y
t2
= α
1
+ α
2
y
t1
+ α
3
x
t2
+ e
t2
y
t1
= π
11
+ π
21
x
t1
+ π
31
x
t2
+ ν
t1
y
t2
= π
12
+ π
22
x
t1
+ π
32
x
t2
+ ν
t2
Solve two equations for two unknowns, y
t1
, y
t2
:
13.17
Copyright 1996 Lawrence C. Marsh
y
t1
= π
11
+ π
21
x
t1
+ π
31
x
t2
+ ν
t1
y
t2
= π
12
+ π
22
x
t1
+ π
32
x
t2
+ ν
t2
Use least squares to get fitted values: 2SLS: Stage I
y
t1
= π
11
+ π
21
x
t1
+ π
31
x
t2
y
t2
= π
12
+ π
22
x
t1
+ π
32
x
t2
y
t2
= y
t2
+ ν
t2
y
t1
= y
t1
+ ν
t1
13.18
Copyright 1996 Lawrence C. Marsh
2SLS: Stage II
y
t2
= y
t2
+ ν
t2
y
t1
= y
t1
+ ν
t1
and
y
t1
= β
1
+ β
2
y
t2
+ β
3
x
t1
+ e
t1
y
t2
= α
1
+ α
2
y
t1
+ α
3
x
t2
+ e
t2
Substitue in
for
y
t1
,
y
t2
y
t1
= β
1
+ β
2
y
t2
+ ν
t2
+ β
3
x
t1
+ e
t1
y
t2
= α
1
+ α
2
y
t1
+ ν
t1
+ α
3
x
t2
+ e
t2
13.19
Copyright 1996 Lawrence C. Marsh
2SLS: Stage II continued
y
t1
= β
1
+ β
2
y
t2
+ β
3
x
t1
+ u
t1
y
t2
= α
1
+ α
2
y
t1
+ α
3
x
t2
+ u
t2
u
t1
= β
2
ν
t2
+ e
t1
u
t2
= α
2
ν
t1
+ e
t2
where and
Run least squares on each of the above equations to get 2SLS estimates:
β
1
, β
2
, β
3
, α
1
, α
2
and
α
3
~ ~
~ ~
~ ~
13.20
Copyright 1996 Lawrence C. Marsh
Nonlinear
Least
Squares
Chapter 14
Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution
or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein.
14.1
Copyright 1996 Lawrence C. Marsh
A. “Regression” model with only an intercept term:
Review of Least Squares Principle
y
t
= α
+ e
t
e
t
= y
t
− α
Σ e
t
= Σ
y
t
− α
2 2
SSE = Σ
y
t
− α
2
∂ SSE
∂ α =
−
2
Σ y
t
− α
= 0
Σ y
t
− Σ
α = 0
Σ y
t
− Τ α
= 0
α =
Σ y
t
= y
1
T
minimize the sum of squared errors
Yields an exact analytical solution:
14.2
Copyright 1996 Lawrence C. Marsh
Review of Least Squares
B. Regression model without an intercept term:
y
t
= β
x
t
+ e
t
e
t
= y
t
− β
x
t
Σ e
t
= Σ
y
t
− β
x
t
2 2
SSE = Σ
y
t
− β
x
t
2
∂ SSE
∂ α =
−
2
Σ x
t
y
t
− β x
t
= 0
Σ x
t
y
t
− Σ
β x
t
= 0
2
Σ x
t
y
t
− β Σ
x
t
= 0
2
β Σ x
t
= Σ
x
t
y
t
2
β =
Σ x
t
y
t
2
Σ x
t
This yields an exact analytical solution:
14.3
Copyright 1996 Lawrence C. Marsh
Review of Least Squares
C. Regression model with both an intercept and a slope:
y
t
= α
+ β
x
t
+ e
t
SSE = Σ
y
t
− α
− β
x
t
2
∂ SSE
∂ α =
−
2
Σ y
t
− α
− β x
t
= 0 ∂
SSE ∂ β
= −
2
Σ x
t
y
t
− α
− β x
t
= 0
y −
α − β
x = 0
β =
Σ x
t
− xy
t
− y
Σ x
t
− x
2
Σ x
t
y
t
− αΣ
x
t
− βΣ x
t
= 0
2 This yields an exact
analytical solution:
α = y
− β x
14.4
Copyright 1996 Lawrence C. Marsh
Nonlinear Least Squares
D. Nonlinear Regression model:
y
t
= x
t β
+ e
t
SSE = Σ
y
t
− x
t β
2 PROBLEM: An exact
analytical solution to this does not exist.
∂ SSE
∂ β =
−
2
Σ x
t β
ln
x
t
y
t
− x
t β
= 0
Σ [ x
t β
ln
x
t
y
t
] − Σ [
x
t 2β
ln
x
t
] = 0
Must use numerical search algorithm to
try to find value of
β to satisfy this.
14.5
Copyright 1996 Lawrence C. Marsh
Find Minimum of Nonlinear SSE
β β
SSE SSE =
Σ y
t
− x
t β
2
14.6
Copyright 1996 Lawrence C. Marsh
The least squares principle
is still appropriate when the model is
nonlinear , but it is
harder to find the solution.
Conclusion
14.7
Copyright 1996 Lawrence C. Marsh
Nonlinear least squares
optimization methods:
The Gauss-Newton Method
Optional Appendix
14.8
Copyright 1996 Lawrence C. Marsh
The Gauss-Newton Algorithm
1. Apply the Taylor Series Expansion to the nonlinear model around some initial b
o
.
2. Run Ordinary Least Squares OLS on the linear part of the Taylor Series to get b
m
.
3. Perform a Taylor Series around the new b
m
to get b
m+1
.
4. Relabel b
m+1
as b
m
and rerun steps 2.-4.
5. Stop when b
m+1
− b
m
becomes very small. 14.9
Copyright 1996 Lawrence C. Marsh
The Gauss-Newton Method
y
t
= fX
t
,b +
ε
t
for
t = 1, . . . , n.
Do a Taylor Series Expansion around the vector
b
=
b
o
as follows:
y
t
= fX
t
,b
ο
+ f’X
t
,b
ο
b - b
ο
+
ε
t
∗
where
ε
t
∗
≡
b - b
o
T
f’’X
t
,b
ο
b - b
ο
+ R
t
+
ε
t
fX
t
,b = fX
t
,b
ο
+ f’X
t
,b
ο
b - b
ο
+ b - b
ο
T
f’’X
t
,b
ο
b - b
ο
+ R
t
14.10
Copyright 1996 Lawrence C. Marsh y
t
= fX
t
,b
ο
+ f’X
t
,b
ο
b - b
ο
+
ε
t
∗
y
t
- fX
t
,b
ο
= f’X
t
,b
ο
b - f’X
t
,b
ο
b
ο
+
ε
t
∗
y
t
- fX
t
,b
ο
+ f’X
t
,b
ο
b
ο
= f’X
t
,b
ο
b +
ε
t
∗
y
t
∗ ο
= f’X
t
,b
ο
b +
ε
t
∗
where y
t
∗ ο
≡
y
t
- fX
t
,b
ο
+ f’X
t
,b
ο
b
ο
This is linear in b .
Gauss-Newton just runs OLS on this transformed truncated Taylor series.
14.11
Copyright 1996 Lawrence C. Marsh
y
t
∗ ο
= f’X
t
,b
ο
b +
ε
t
∗
Gauss-Newton just runs OLS on this transformed truncated Taylor series.
or
y
∗ ο
= f’X,b
ο
b +
∈
∗
for t = 1, . . . , n in matrix terms
b
=
[
f’X,b
ο
T
f’X,b
ο
]
-1
f’X,b
ο
T
y
∗ ο
This is analogous to linear OLS where y = Xb +
∈
led to the solution: b
= X
T
X
−1
X
T
y except that X is replaced with the matrix of first
partial derivatives: f’X
t
,b
ο
and y is replaced by y
∗ ο
i.e. “y” = y
ο
and “X” =
f’X,b
ο
14.12
Copyright 1996 Lawrence C. Marsh
Recall that: y
o
≡
y
−
fX,b
o
+ f’X,b
ο
b
ο
Now define: y
∗∗ ο
≡
y
−
fX,b
o
Therefore: y
∗ ο
= y
∗∗ ο
+ f’X,b
ο
b
ο
b
=
[
f’X,b
ο
T
f’X,b
ο
]
-1
f’X,b
ο
T
y
∗ ο
Now substitute in for y
∗
in Gauss-Newton solution:
to get:
b = b
o
+
[
f’X,b
ο
T
f’X,b
ο
]
-1
f’X,b
ο
T
y
∗∗ ο
14.13
Copyright 1996 Lawrence C. Marsh
b = b
o
+
[
f’X,b
ο
T
f’X,b
ο
]
-1
f’X,b
ο
T
y
∗∗ ο
b
1
= b
ο
+
[
f’X,b
ο
T
f’X,b
ο
]
-1
f’X,b
ο
T
y
∗∗ ο
Now call this b value b
1
as follows:
More generally, in going from interation m to iteration m+1 we obtain the general expression:
b
m+1
= b
m
+
[
f’X,b
m
T
f’X,b
m
]
-1
f’X,b
m
T
y
∗∗
m
14.14
Copyright 1996 Lawrence C. Marsh
b
m+1
=
[
f’X,b
m
T
f’X,b
m
]
-1
f’X,b
m
T
y
m
b
m+1
= b
m
+
[
f’X,b
m
T
f’X,b
m
]
-1
f’X,b
m
T
y
∗∗
m
Thus, the Gauss-Newton nonlinear OLS solution can be expressed in two alternative, but equivalent,
forms:
1. replacement form: