Linear statistical modelling3
The linear statistical model: consistency of OLS
We will show the properties of the OLS estimator in two scenarios: first under the
assumption that the regressors X are non-stochastic and then assuming they are
stochastic.
One of the assumptions in the linear statistical model
y=Xβ+u
is that X is a (n×k) non-stochastic matrix. The idea is that X remains fixed in repeated
samples, i.e., if we have two samples of size n each, the values of the matrix X will
remain constant across samples while the values of y will change.
For example, suppose we are regressing hourly wages (y) onto a constant, age, and
gender. Then if n=6 for example, we could have something like
y’=(12, 11, 9, 8, 12, 8)
⎛1 25 1 ⎞
⎟
⎜
⎜1 26 1 ⎟
⎜1 27 1 ⎟
⎟
X =⎜
⎜1 25 0 ⎟
⎜1 26 0 ⎟
⎟
⎜
⎜1 27 0 ⎟
⎠
⎝
in our first sample, and something like
y’=(11, 7, 12, 12, 9, 9)
⎛1 25 1 ⎞
⎜
⎟
⎜1 26 1 ⎟
⎜1 27 1 ⎟
⎟
X =⎜
⎜1 25 0 ⎟
⎜1 26 0 ⎟
⎟
⎜
⎜1 27 0 ⎟
⎝
⎠
in our second sample. You can see that the two samples feature a different y vector, but
the same X matrix.
The assumption that X is non-stochastic may be appropriate when the values of X are
chosen by the analyst (not arbitrarily but as in laboratory experiments or clinical studies,
in which one tries to maintain the demographic composition of the samples unchanged –
in the example above one could solicit applications for a clinical study from people aged
25 to 27 and select one person for each age/gender combination, and y could be the result
of some post-treatment outcome), but is less appropriate for non-experimental data –
which form the majority of data used by economists.
1
Along with the other assumptions of the linear statistical model (namely, E(u)=0;
E(uu’)=σ2I; X is full rank; and u is normally distributed), the assumption that X is nonstochastic allows us to prove the Gauss-Markov theorem, which says that the OLS
estimator βˆ =(X’X)-1X’y is BLUE (best linear unbiased estimator), i.e., it is the “best”
estimators in a class that includes all unbiased and linear (in y) estimators of β, the
unknown parameter of the population.
The normality assumption about u (which translates into a normality assumption about y
and thus βˆ -which are both linear transformations of u), allows us to construct hypothesis
tests based on the t distribution (if we are testing a single restriction) and the F
distribution (if we are testing multiple restrictions). Note that in the unlikely case in
which we know σ2, tests are actually conducted using the N(0,1) and the χ2 distribution,
respectively.
But what if u is not normal? What distribution(s) can we use to conduct inference about
β? In practice, we will still use the distributions above (t and F, respectively for testing
single and multiple restrictions on β if σ2 is unknown), but we will justify their usage
based on an asymptotic approximation –i.e., a large sample or n→∞ approximation. That
is, we will say that the test statistics we use are not exact t or exact F distributions in
small samples, but become so in large samples, or asymptotically. The same
approximations will be used when X is stochastic to find the optimal properties of βˆ , in
particular its consistency.
Consistency of OLS
Start from the non-stochastic X case.
X' X
X' X
lim
=
lim
=Q.
p
Assume
n
n
Notice also that:
⎛ 1
⎜
X ' u 1 ⎜ x 21
= ⎜
n
n ...
⎜⎜
⎝ xk1
1
x 22
...
xk 2
...
...
...
...
⎛ ∑ ui ⎞
⎜
⎟
n
⎜
⎟
1 ⎞⎛ u1 ⎞
⎟⎜ ⎟ ⎜ ∑ x 2 i ui ⎟
x 2 n ⎟⎜ u 2 ⎟ ⎜
⎟
=⎜ n ⎟
⎟
⎜
⎟
... ...
⎟
⎟⎜ ⎟ ⎜ ...
⎟
x kn ⎟⎠⎜⎝ un ⎟⎠ ⎜
⎜ ∑ x ki ui ⎟
⎜
⎟
⎝ n ⎠
2
βˆ = ( X ' X )−1 X ' y
= ( X ' X ) X ' ( Xβ + u )
−1
= β + (X ' X ) X ' u
−1
Now
3
−1
p lim βˆ = β + p lim( X ' X ) X ' u
= β + p lim( X ' X ) p lim X ' u
−1
−1
X' X ⎞
X'u
⎛
= β + ⎜ p lim
⎟ p lim
n ⎠
n
⎝
4
Consistency of OLS
Non-stochastic X case.
We have the linear model
y = Xβ + u
and make the assumptions
X is full rank
E( u)=0
var(u ) = E (uu ') = σ 2 I
Assume
(X ' X )
= Q (this is true in any case – given that X is fixed). Note that:
n
βˆ = ( X ' X )−1 X ' y
= ( X ' X ) X ' ( Xβ + u ) n
−1
= β + (X ' X ) X ' u
−1
Now
−1
p lim βˆ = β + p lim ( X ' X ) X ' u
= β + p lim ( X ' X ) p lim X ' u
−1
−1
X 'X ⎞
X 'u
⎛
= β + ⎜ p lim
=
⎟ p lim
n ⎠
n
⎝
X 'u
β + Q −1 p lim
n
Now we check CQM (convergence in quadratic mean) for
X 'u
. First, note that
n
X 'u
X'
)=
E (u ) = 0 by assumption and because X is fixed.
n
n
Of course,
E(
p lim(
n →∞
X 'u
)=0
n
Moreover,
X 'u
1
1
1
σ2
) = 2 var( X ' u ) = 2 X ' var(u ) X = X ' X
n
n
n
n
n
X 'u
X 'u
) = Q × 0 = 0 , and therefore p lim(
) = 0 so OLS is consistent.
and lim var(
n →∞
n →∞
n
n
var(
Stochastic X case.
5
We still have the linear model
y = Xβ + u
but now make the assumptions
X is full rank
E( u|X)=0
var(u|X ) = E (uu '|X ) = σ 2 I
We now assume p lim
n →∞
(X ' X )
= Q , given that X is no longer fixed.
n
As before:
βˆ = ( X ' X )−1 X ' y
= ( X ' X ) X ' ( Xβ + u ) n
−1
= β + (X ' X ) X ' u
−1
Now
−1
p lim βˆ = β + p lim ( X ' X ) X ' u
= β + p lim ( X ' X ) p lim X ' u
−1
−1
X 'X ⎞
X 'u
⎛
= β + ⎜ p lim
=
⎟ p lim
n ⎠
n
⎝
X 'u
β + Q −1 p lim
n
Now we check CQM (convergence in quadratic mean) for
E(
(X ' X )
. First, note that
n
X 'u
1
) = 0 = E ( X ' u ) = E (u X ) .
n
n
Of course,
lim E (
n →∞
X 'u
)=0
n
Furthermore
var(
X 'u
σ2 X 'X
1
1
) = 2 var( X ' u ) = 2 ( X ' var(u ) X ) =
n
n
n
n n
σ2 X 'X
X 'u
X 'u
) = lim
)=0
= 0 . It then follows that p lim(
And lim var(
n →∞
x
→∞
n
n n
n
n →∞
6
And therefore as before
−1
p lim βˆ = β + p lim ( X ' X ) X ' u
−1
X 'X ⎞
X 'u
⎛
= β + ⎜ p lim
=
⎟ p lim
n ⎠
n
⎝
β + Q −1 p lim
X 'u
=β
n
so OLS is consistent also when X is stochastic.
7
We will show the properties of the OLS estimator in two scenarios: first under the
assumption that the regressors X are non-stochastic and then assuming they are
stochastic.
One of the assumptions in the linear statistical model
y=Xβ+u
is that X is a (n×k) non-stochastic matrix. The idea is that X remains fixed in repeated
samples, i.e., if we have two samples of size n each, the values of the matrix X will
remain constant across samples while the values of y will change.
For example, suppose we are regressing hourly wages (y) onto a constant, age, and
gender. Then if n=6 for example, we could have something like
y’=(12, 11, 9, 8, 12, 8)
⎛1 25 1 ⎞
⎟
⎜
⎜1 26 1 ⎟
⎜1 27 1 ⎟
⎟
X =⎜
⎜1 25 0 ⎟
⎜1 26 0 ⎟
⎟
⎜
⎜1 27 0 ⎟
⎠
⎝
in our first sample, and something like
y’=(11, 7, 12, 12, 9, 9)
⎛1 25 1 ⎞
⎜
⎟
⎜1 26 1 ⎟
⎜1 27 1 ⎟
⎟
X =⎜
⎜1 25 0 ⎟
⎜1 26 0 ⎟
⎟
⎜
⎜1 27 0 ⎟
⎝
⎠
in our second sample. You can see that the two samples feature a different y vector, but
the same X matrix.
The assumption that X is non-stochastic may be appropriate when the values of X are
chosen by the analyst (not arbitrarily but as in laboratory experiments or clinical studies,
in which one tries to maintain the demographic composition of the samples unchanged –
in the example above one could solicit applications for a clinical study from people aged
25 to 27 and select one person for each age/gender combination, and y could be the result
of some post-treatment outcome), but is less appropriate for non-experimental data –
which form the majority of data used by economists.
1
Along with the other assumptions of the linear statistical model (namely, E(u)=0;
E(uu’)=σ2I; X is full rank; and u is normally distributed), the assumption that X is nonstochastic allows us to prove the Gauss-Markov theorem, which says that the OLS
estimator βˆ =(X’X)-1X’y is BLUE (best linear unbiased estimator), i.e., it is the “best”
estimators in a class that includes all unbiased and linear (in y) estimators of β, the
unknown parameter of the population.
The normality assumption about u (which translates into a normality assumption about y
and thus βˆ -which are both linear transformations of u), allows us to construct hypothesis
tests based on the t distribution (if we are testing a single restriction) and the F
distribution (if we are testing multiple restrictions). Note that in the unlikely case in
which we know σ2, tests are actually conducted using the N(0,1) and the χ2 distribution,
respectively.
But what if u is not normal? What distribution(s) can we use to conduct inference about
β? In practice, we will still use the distributions above (t and F, respectively for testing
single and multiple restrictions on β if σ2 is unknown), but we will justify their usage
based on an asymptotic approximation –i.e., a large sample or n→∞ approximation. That
is, we will say that the test statistics we use are not exact t or exact F distributions in
small samples, but become so in large samples, or asymptotically. The same
approximations will be used when X is stochastic to find the optimal properties of βˆ , in
particular its consistency.
Consistency of OLS
Start from the non-stochastic X case.
X' X
X' X
lim
=
lim
=Q.
p
Assume
n
n
Notice also that:
⎛ 1
⎜
X ' u 1 ⎜ x 21
= ⎜
n
n ...
⎜⎜
⎝ xk1
1
x 22
...
xk 2
...
...
...
...
⎛ ∑ ui ⎞
⎜
⎟
n
⎜
⎟
1 ⎞⎛ u1 ⎞
⎟⎜ ⎟ ⎜ ∑ x 2 i ui ⎟
x 2 n ⎟⎜ u 2 ⎟ ⎜
⎟
=⎜ n ⎟
⎟
⎜
⎟
... ...
⎟
⎟⎜ ⎟ ⎜ ...
⎟
x kn ⎟⎠⎜⎝ un ⎟⎠ ⎜
⎜ ∑ x ki ui ⎟
⎜
⎟
⎝ n ⎠
2
βˆ = ( X ' X )−1 X ' y
= ( X ' X ) X ' ( Xβ + u )
−1
= β + (X ' X ) X ' u
−1
Now
3
−1
p lim βˆ = β + p lim( X ' X ) X ' u
= β + p lim( X ' X ) p lim X ' u
−1
−1
X' X ⎞
X'u
⎛
= β + ⎜ p lim
⎟ p lim
n ⎠
n
⎝
4
Consistency of OLS
Non-stochastic X case.
We have the linear model
y = Xβ + u
and make the assumptions
X is full rank
E( u)=0
var(u ) = E (uu ') = σ 2 I
Assume
(X ' X )
= Q (this is true in any case – given that X is fixed). Note that:
n
βˆ = ( X ' X )−1 X ' y
= ( X ' X ) X ' ( Xβ + u ) n
−1
= β + (X ' X ) X ' u
−1
Now
−1
p lim βˆ = β + p lim ( X ' X ) X ' u
= β + p lim ( X ' X ) p lim X ' u
−1
−1
X 'X ⎞
X 'u
⎛
= β + ⎜ p lim
=
⎟ p lim
n ⎠
n
⎝
X 'u
β + Q −1 p lim
n
Now we check CQM (convergence in quadratic mean) for
X 'u
. First, note that
n
X 'u
X'
)=
E (u ) = 0 by assumption and because X is fixed.
n
n
Of course,
E(
p lim(
n →∞
X 'u
)=0
n
Moreover,
X 'u
1
1
1
σ2
) = 2 var( X ' u ) = 2 X ' var(u ) X = X ' X
n
n
n
n
n
X 'u
X 'u
) = Q × 0 = 0 , and therefore p lim(
) = 0 so OLS is consistent.
and lim var(
n →∞
n →∞
n
n
var(
Stochastic X case.
5
We still have the linear model
y = Xβ + u
but now make the assumptions
X is full rank
E( u|X)=0
var(u|X ) = E (uu '|X ) = σ 2 I
We now assume p lim
n →∞
(X ' X )
= Q , given that X is no longer fixed.
n
As before:
βˆ = ( X ' X )−1 X ' y
= ( X ' X ) X ' ( Xβ + u ) n
−1
= β + (X ' X ) X ' u
−1
Now
−1
p lim βˆ = β + p lim ( X ' X ) X ' u
= β + p lim ( X ' X ) p lim X ' u
−1
−1
X 'X ⎞
X 'u
⎛
= β + ⎜ p lim
=
⎟ p lim
n ⎠
n
⎝
X 'u
β + Q −1 p lim
n
Now we check CQM (convergence in quadratic mean) for
E(
(X ' X )
. First, note that
n
X 'u
1
) = 0 = E ( X ' u ) = E (u X ) .
n
n
Of course,
lim E (
n →∞
X 'u
)=0
n
Furthermore
var(
X 'u
σ2 X 'X
1
1
) = 2 var( X ' u ) = 2 ( X ' var(u ) X ) =
n
n
n
n n
σ2 X 'X
X 'u
X 'u
) = lim
)=0
= 0 . It then follows that p lim(
And lim var(
n →∞
x
→∞
n
n n
n
n →∞
6
And therefore as before
−1
p lim βˆ = β + p lim ( X ' X ) X ' u
−1
X 'X ⎞
X 'u
⎛
= β + ⎜ p lim
=
⎟ p lim
n ⎠
n
⎝
β + Q −1 p lim
X 'u
=β
n
so OLS is consistent also when X is stochastic.
7