07350015%2E2012%2E694743

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Further Results on the Limiting Distribution of
GMM Sample Moment Conditions
Nikolay Gospodinov , Raymond Kan & Cesare Robotti
To cite this article: Nikolay Gospodinov , Raymond Kan & Cesare Robotti (2012) Further
Results on the Limiting Distribution of GMM Sample Moment Conditions, Journal of Business &
Economic Statistics, 30:4, 494-504, DOI: 10.1080/07350015.2012.694743
To link to this article: http://dx.doi.org/10.1080/07350015.2012.694743

Accepted author version posted online: 29
May 2012.

Submit your article to this journal

Article views: 479

View related articles


Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 11 January 2016, At: 22:49

Further Results on the Limiting Distribution
of GMM Sample Moment Conditions
Nikolay GOSPODINOV
Department of Economics, Concordia University, Montreal, Quebec, Canada H3G 1M8
(nikolay.gospodinov@concordia.ca)

Raymond KAN
Joseph L. Rotman School of Management, University of Toronto, Toronto, Ontario, Canada M5S 3E6
(kan@chass.utoronto.ca)

Cesare ROBOTTI
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:49 11 January 2016

Research Department, Federal Reserve Bank of Atlanta, Atlanta, GA 30309 (cesare.robotti@atl.frb.org)

In this article, we examine the limiting behavior of generalized method of moments (GMM) sample
moment conditions and point out an important discontinuity that arises in their asymptotic distribution.
We show that the part of the scaled sample moment conditions that gives rise to degeneracy in the
asymptotic normal distribution is T-consistent and has a nonstandard limiting distribution. We derive the
appropriate asymptotic (weighted chi-squared) distribution when this degeneracy occurs and show how to
conduct asymptotically valid statistical inference. We also propose a new rank test that provides guidance
on which (standard or nonstandard) asymptotic framework should be used for inference. The finite-sample
properties of the proposed asymptotic approximation are demonstrated using simulated data from some
popular asset pricing models.
KEY WORDS: Asymptotic approximation; Generalized method of moments; Rank test; T-consistent
estimator; Weighted chi-square distribution.

1.

INTRODUCTION

Over the past 30 years, the generalized method of moments
(GMM) has established itself as arguably the most popular
method for estimating economic models defined by a set of
moment conditions. In his seminal article, Hansen (1982) developed the asymptotic distributions of the GMM estimator,

sample moment conditions, and test of over-identifying restrictions for possibly nonlinear models with sufficiently general
dependence structure. This large sample theory proved to cover
a large class of models and estimators that are of interest to
researchers in economics and finance.
There are cases, however, in which the root-T convergence
and asymptotic normality of the GMM sample moment conditions and estimators based on these moment conditions do not
accurately characterize their limiting behavior. In particular, it
is possible that some linear combinations of the GMM sample
moment conditions have a degenerate distribution and the standard limiting tools for inference are inappropriate. For example,
Gospodinov, Kan, and Robotti (2010) demonstrated that some
GMM estimators, which are functions of the sample moment
conditions, are proportional to the GMM objective function and,
hence, cannot be root-T consistent and asymptotically normally
distributed for correctly specified models. This situation is directly related to the results in lemma 4.1 and its subsequent
discussion in Hansen (1982) that draw attention to the singularity of the covariance matrix of the sample moment conditions.
However, to the best of our knowledge, the limiting behavior
of the GMM sample moment conditions in the degenerate case,
when the covariance matrix reduces to a matrix of zeros, has not
been formally investigated in the literature.


In this article, we study some linear combinations of the
sample moment conditions that give rise to degeneracy and analyze their asymptotic behavior. Interestingly, we show that in
this case, the scaled sample moment conditions evaluated at
the GMM estimator are characterized by a nonstandard limiting theory. More specifically, we demonstrate that the estimated
GMM moment conditions converge to zero (the value implied
by the population moment conditions) at rate T and have an
asymptotic weighted chi-squared distribution. Besides being of
academic interest, our results prove to be useful for inference
in asset pricing models. Furthermore, a similar degeneracy occurs in the asymptotic analysis of Lagrange multipliers in the
context of generalized empirical likelihood (GEL) estimation
that has gained increased popularity in econometrics in recent
years.
Nonstandard asymptotics (with an accelerated rate of convergence and nonnormal asymptotic distribution) often arises
when the parameter of interest is near or on the boundary of
the parameter space. Examples of this phenomenon include autoregressive roots near or on the unit circle (Dickey and Fuller
1979; Phillips 1987), (near) unit root in moving average models
(Davis and Dunsmuir 1996), local-to-zero signal-to-noise ratio in time-varying parameter models (Nyblom 1989), and zero
variance in nonnested model comparison tests (Vuong 1989).
In these cases, the rate of convergence of the estimated parameter or test statistic of interest increases from root-T to T and


494

© 2012 American Statistical Association
Journal of Business & Economic Statistics
October 2012, Vol. 30, No. 4
DOI: 10.1080/07350015.2012.694743

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:49 11 January 2016

Gospodinov, Kan, and Robotti: Limiting Behavior of GMM Sample Moment Conditions

the asymptotic distribution changes from normal to a weighted
chi-squared type of distribution. Interestingly, our article shows
that similar nonstandard asymptotics can be encountered in
seemingly regular GMM problems. It should also be noted that
the degeneracy of the asymptotic distribution has been analyzed in other contexts, although the existing results differ considerably from ours. See, for example, Sargan’s (1959, 1983)
discussion of nonlinear instrumental variable models with possible singularity; Park and Phillips (1989) and Sims, Stock, and
Watson (1990) for the analysis of time series models with a
unit root; and Phillips (2007) for models with slowly varying
regression functions.

The rest of this article is organized as follows. Section 2 introduces the general setup and discusses an asset pricing example
that illustrates the discontinuity in the asymptotic approximation
of the sample moment conditions. This section also provides
the main theoretical results on the limiting behavior of linear
combinations of sample moment conditions and presents an
easy-to-implement rank test that determines which asymptotic
approximation should be used. Section 3 reports simulation results based on a problem in empirical asset pricing and Section 4
concludes.

2.

ASYMPTOTICS FOR GMM SAMPLE MOMENT
CONDITIONS

Let θ ∈  denote a p × 1 parameter vector of interest with
true value θ 0 that lies in the interior of the parameter space
 and gt (θ) be a known function {g : Rp → Rm , m > p} of
the data and θ that satisfies the set of population orthogonality
conditions
(1)


The GMM estimator of θ 0 is defined as
θˆ = arg minθ∈ g¯ T (θ )′ WT g¯ T (θ ),

(2)

is an m × m positive definite weight matrix and
where WT 
g¯ T (θ ) = T1 Tt=1 gt (θ). The matrix WT is allowed to be a fixed
matrix that does not depend on the data and θ (the identity
matrix, for example), a matrix that depends on the data but not
on θ , or a matrix that depends on the data and a preliminary
consistent estimator of θ 0 as in the two-step and iterated GMM
estimation. Given the first-order asymptotic equivalence of the
two-step, iterated, and continuously updated GMM estimators,
our subsequent results can be easily modified to accommodate
the continuously updated (one-step) GMM estimator.


and make the

Let DT (θ ) = T1 Tt=1 ∂g∂θt (θ′ ) , D (θ ) = E ∂g∂θt (θ)

following assumptions.
Assumption 1. Assume that
T
1 
d
gt (θ 0 ) → N(0m , V),

T t=1

Assumption 2. Assume that
(a) gt (θ) is continuous in θ almost surely, E[supθ∈
|gt (θ )|] < ∞, and the parameter space  is a compact
subset of Rp ,
(b) there exists a unique θ 0 ∈  such that E [gt (θ 0 )] = 0m
and E [gt (θ )] = 0m for all θ = θ 0 ,
p
(c) WT → W, where W is a nonstochastic symmetric positive definite matrix,
p

(d) DT (θ ) → D (θ ) uniformly in θ on some neighborhood
of θ 0 and D0 ≡ D (θ 0 ) is of rank p.
Assumption 1 is a high-level assumption that implicitly imposes restrictions on the data and the vector gt (θ). The validity of this assumption can either be verified in the particular
context or it can be replaced by a set of explicit primitive conditions (see, for instance, Stock and Wright 2000). Assumpp
tion 2 imposes sufficient conditions that ensure θˆ → θ 0 in the
interior of the compact parameter space . The uniform convergence and the full rank condition in Assumption 2 (d) are
required for establishing the asymptotic distributions of θˆ and
ˆ
g¯ T (θ).

ˆ is asymptotically norUnder Assumptions 1 and 2, T g¯ T (θ)
mally distributed (Hansen 1982, lemma 4.1) with mean zero and
a singular asymptotic covariance matrix
0 = [Im − D0 (D′0 WD0 )−1 D′0 W]

2.1 Notation and Analytical Framework

E[gt (θ 0 )] = 0m .

495


× V[Im − D0 (D′0 WD0 )−1 D′0 W]′ .

Furthermore, it√can be easily seen that the asymptotic covariance matrix of T D′0 W¯gT (θˆ ) reduces to a p × p
of ze√matrix

ˆ
ros that renders the asymptotic distribution of T D0 W¯gT (θ)
degenerate. Provided that WT is a consistent
√ ′ estimator of
T D0 hT (θˆ ), where
W, a similar degeneracy occurs for
ˆ
ˆ ≡ WT g¯ T (θ).
hT (θ)
It is interesting to note that this type of asymptotic degeneracy extends to other setups and arises, for example, in the
analysis of Lagrange multipliers in the GEL estimation of moment condition models. In the GEL framework, the estimator of
the Lagrange multipliers associated with the moment conditions
takes a similar form as hT (θˆ ) and has an asymptotic covariance
matrix given by V−1 − V−1 D0 (D′0 V−1 D0 )−1 D′0 V−1 (see, for instance, Smith 1997). It is easy to see that premultiplying by D′0

reduces this asymptotic variance to a zero matrix. Therefore, the
results in this article can be adapted to deal with the possible
asymptotic degeneracy of sample Lagrange multipliers in the
GEL framework.
For our analysis, it is more convenient to rewrite the asymptotic normality result√in terms of the
√ nonzero parts of the coˆ Let Q denote an
variance matrices of T g¯ T (θˆ ) and T hT (θ).
m × (m − p) orthonormal matrix whose columns are orthogo1
nal to W 2 D0 . Then,
1

(3)



where V = ∞
j =−∞ E[gt (θ 0 ) gt+j (θ 0 ) ] is a finite positive definite matrix.

(4)

1

QQ′ = Im − W 2 D0 (D′0 WD0 )−1 D′0 W 2 .

(5)

Lemma 1. Under Assumptions 1 and 2,

1
1
1
d
T Q′ W 2 g¯ T (θˆ ) → N(0m−p , Q′ W 2 VW 2 Q)

(6)

496

Journal of Business & Economic Statistics, October 2012

and

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:49 11 January 2016


1
1
1
d
ˆ →
N(0m−p , Q′ W 2 VW 2 Q).
T Q′ W− 2 hT (θ)

(7)

The role of matrix Q in Lemma 1 is similar in spirit to the
decomposition of Sowell (1996) in which the m vector of nor1
malized population moment conditions W 2 E[gt (θ 0 )] is decomposed into p identifying restrictions used for the estimation of θ
that characterize the space of identifying restrictions and m − p
over-identifying restrictions that characterize the
of over√space
1

2g
¯ T (θˆ )
T
Q
W
identifying
restrictions.
Lemma
1
shows
that

1
′ −2
ˆ
and T Q W hT (θ) have a nondegenerate asymptotic normal
distribution. However, little is known about the limiting behavior
ˆ that do not have
of those linear combinations of g¯ T (θˆ ) or hT (θ)
an asymptotic normal distribution. The purpose of this article is
to establish the rate of convergence and asymptotic distributions
ˆ Before we present our main result,
of D′0 W¯gT (θˆ ) and D′0 hT (θ).
we provide an example to illustrate the discontinuous nature
of the asymptotic analysis for linear combinations of g¯ T (θˆ ) or
hT (θˆ ).
2.2 Motivation: An Asset Pricing Example
Let yt (θ ) be a candidate stochastic discount factor (SDF) at
time t, where θ is a p vector of parameters of the SDF. Suppose
we use m test assets to estimate the true SDF parameter vector
θ 0 as well as to test if the proposed SDF is correctly specified.
Denote by Rt the payoffs of the m test assets at time t and by q
the vector of the costs of the m test assets. Let
gt (θ) = Rt yt (θ) − q.

(8)

If the model is correctly specified, we have E[gt (θ 0 )] = 0m . A
popular method of estimating θ 0 is to choose θ to minimize
the sample squared Hansen–Jagannathan (HJ, 1997) distance,
defined as
δˆT2 = min g¯ T (θ)′ WT g¯ T (θ),
θ

(9)

−1
 
where WT = T1 Tt=1 Rt R′t
.
To determine whether the proposed SDF is correctly specified,
we can examine the sample pricing errors of the m test assets,
ˆ where θˆ is the vector of estimated parameters
that is, g¯ T (θ),
chosen to minimize the sample HJ-distance. Alternatively, let
λ denote an m × 1 vector of Lagrange multipliers associated
with the population moment conditions (pricing constraints)
and consider the estimated Lagrange multipliers
ˆ
λˆ = WT g¯ T (θ),

(10)

which are a transformation of the sample pricing errors. Hansen
and Jagannathan (1997) showed that if the proposed SDF does
not price the test assets correctly, then it is possible to correct
the mispricing of the SDF by subtracting λ′ Rt from yt (θ ). As
a result, researchers are often interested in testing H0 : λi = 0,
that is, in determining whether asset i is responsible for the
proposed SDF to deviate from the true SDF.
Gospodinov, Kan, and Robotti (2010) showed that for a linˆ is the squared
ˆ ′ WT g¯ T (θ)
ear SDF, q′ λˆ = −δˆT2 , where δˆT2 = g¯ T (θ)
sample HJ-distance. For the special case of q = [1, 0′m−1 ]′ (i.e.,
the payoff of the first test asset is a gross return and the rest

are excess returns), the estimate of the Lagrange multiplier associated with the first test asset, λˆ 1 , is T-consistent and shares
the weighted chi-squared distribution of δˆT2 under the assumption of a correctly specified model. This result is of practical
importance since applied researchers often rely on the statistical significance of individual Lagrange multipliers and pricing
errors to determine whether an asset pricing model is correctly
specified (Cochrane 1996; Hodrick and Zhang 2001). Similar
problems arise in other asset pricing contexts: for example, in
conducting inference on the pricing errors associated with traded
factors (see Pe˜naranda and Sentana 2010). More generally, as
we show below,
d

T D′0 λˆ → −(Ip ⊗ v′2 )v1 ,

(11)

where v1 and v2 are jointly normally distributed vectors of random variables. As a result, any linear combination of λˆ with
a vector of weights that is in the span of the column space of
D0 is also T-consistent with a nonstandard (product of normals)
asymptotic distribution.
It is interesting to note that a similar type of discontinuity in
the asymptotic approximation and accelerated rate of convergence have been established by Park and Phillips (1989) and
Sims, Stock, and Watson (1990) in an AR(p) model, p > 1,
with a unit root in the AR polynomial. In particular, these articles show that a linear combination of WT g¯ T (θ 0 ) with a vector
of weights [α1 , . . . , αp ]′ = [α,
¯ . . . , α]
¯ ′ , for some nonzero constant α,
¯ is root-T and asymptotically normally distributed, while
a linear combination of WT g¯ T (θ 0 ) with a vector of weights
¯ . . . , α]
¯ ′ yields a T-consistent and asymp[α1 , . . . , αp ]′ = [α,
totically nonnormally distributed estimator.
2.3 Main Results
We now turn to deriving the asymptotic distributions of
ˆ and D′ hT (θˆ ). Due to the similarities in their strucD′0 W¯gT (θ)
0
ture, we first present the results for D′0 hT (θˆ ) and discuss the
ˆ in the next subsection. The following addicase of D′0 W¯gT (θ)
ˆ
ˆ T = DT (θ)
tional assumption on the joint limiting behavior of D
ˆ is needed to establish the asymptotic distribution of
and hT (θ)
ˆ
D′ hT (θ).
0

Assumption 3. Assume that


1
ˆT) d




vec(Q′ W 2 D
→ N 0(m−p)(p+1) , 
T
1
ˆ
Q′ W− 2 hT (θ)

(12)

for some finite positive semidefinite matrix .

1
ˆ
The asymptotic normality of the m − p vector Q′ W− 2 hT (θ)
follows directly from Lemma 1. The main requirement is on the
ˆ T which is, however, rather
limiting behavior of the matrix D
weak and rules out only some trivial cases. It is important to
note that we do not need to impose any restriction on the rate
of convergence of WT apart from being a consistent estimator of W (Assumption 2 (c)). In contrast, as we argue later,
ˆ requires exderiving the asymptotic distribution of D′0 W¯gT (θ)
plicit assumptions on the rate of convergence of WT that can
differ for parametric and nonparametric heteroscedasticity and
autocorrelation consistent (HAC) estimators.

Gospodinov, Kan, and Robotti: Limiting Behavior of GMM Sample Moment Conditions

We now state our main result in the following theorem.

lemma shows that we can alternatively express it as a linear
combination of independent χ12 random variables.

Theorem 1. Under Assumptions 1–3,
d

T D′0 hT (θˆ ) → −(Ip ⊗ v′2 )v1 ,

(13)

where v1 and v2 are (m − p)p and (m − p) vectors, respectively,
and [v′1 , v′2 ]′ ∼ N(0(m−p)(p+1) , ).

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:49 11 January 2016

Proof. See the Appendix.
To make the asymptotic approximation derived in Theorem 1 operational for conducting inference, we need an estimate of the covariance matrix . In the following, we
provide explicit expressions that can be used for consistent
estimation of
 the covariance matrix  in Theorem 1. Let
GT (θ) = T1 Tt=1 ∂vec(∂gt (θ )/∂θ ′ )/∂θ ′ , G(θ ) = ∂vec(D(θ ))/
∂θ ′ , and G0 ≡ G(θ 0 ).
p

Assumption 4. Assume that GT (θ ) → G (θ) uniformly in θ
on some neighborhood of θ 0 , where G (θ) exists, is finite, and
is continuous in θ ∈  almost surely.
In the following lemma, we provide the explicit form of the
matrix .
˜ = (Ip ⊗ Q′ W 12 )G0 . Under Assumptions 1,
Lemma 2. Let G
2, and 4, we have
=




E[dt d′t+j ],

(14)

j =−∞

where dt = [d′1,t , d′2,t ]′ and
˜ ′0 WD0 )−1 D′0 Wgt (θ 0 )
d1,t = −G(D


1 ∂gt (θ 0 )
,
+ vec Q′ W 2
∂θ ′
1

d2,t = Q′ W 2 gt (θ 0 ).

(15)
(16)

Proof. See the Appendix.
The consistent estimation of the long-run covariance matrix
 can proceed by using a HAC estimator (see Andrews 1991,
for example) based on the sample counterparts of d1,t and d2,t .
2.4 Discussion
The result in Theorem 1 has important implications for the
ˆ with a
asymptotic distribution of a linear combination of hT (θ)
weighting vector α which is in the span of the column space of
D0 . In particular, if α = D0 c˜ for a constant nonzero p vector c˜ ,
then we have
d
ˆ →
T α ′ hT (θ)
−˜v′1 v2 ,
(17)

1
ˆ T c˜ , [˜v′ , v′ ]′ ∼
where v˜ 1 is the limit
T Q′ W 2 D
1
2
∞of

˜
˜
˜
˜
N(02(m−p) , ) with  = j =−∞ E[dt dt+j ], d˜ t = [d˜ ′1,t , d′2,t ]′
1
1
and d˜ 1,t = (˜c′ ⊗ Q′ W 2 )G0 (D′ WD0 )−1 D′ Wgt (θ 0 ) + Q′ W 2
∂gt (θ 0 )
c˜ .
∂θ ′

0

497

0

Instead of expressing the asymptotic distribution as
the inner product of two normal random vectors, the following

Lemma 3. Suppose that z = [z′1 , z′2 ]′ , where z1 and z2 are
both n × 1 vectors, is multivariate normally distributed
z ∼ N(02n , ),

(18)

where  is a positive semidefinite matrix with rank l ≤ 2n. Let
 = SϒS′ , where ϒ is an l × l diagonal matrix of the nonzero
eigenvalues of  and S is a 2n × l matrix of the corresponding
eigenvectors. In addition, let


1
I
0
1
1
⎢ n×n 2 n ⎥
Ŵ = ϒ 2 S′ ⎣
(19)
⎦ Sϒ 2 .
1
In 0n×n
2
Then,
z′1 z2 ∼

k


γi xi ,

(20)

i=1

where the γi ’s are the k ≤ l nonzero eigenvalues of Ŵ and the
xi ’s are independent χ21 random variables.
Proof. See the Appendix.
This lemma shows that the inner product of two vectors of
normal random variables (with mean zero) can always be written
as a linear combination of independent chi-squared random variables. This result proves very useful since it allows us to adopt
numerical procedures for obtaining the p-value of a weighted
chi-squared test that are already available in the literature (Imhof
1961; Davies 1980; Lu and King 2002). Furthermore, this result
helps us to reconcile the form of the asymptotic approximation
proposed in Theorem 1 with the weighted chi-squared distribution that arises in some special cases as in the asset pricing
example.
Extending the result in Theorem 1 to cover the limiting behavior of A′0 g¯ T (θˆ ), where A0 = WD0 , requires stronger conditions.
ˆ T , we need to replace Assumption 3 by asˆ T = WT D
Defining A
suming


1
ˆT)
vec(Q′ W− 2 A




d
→ N 0(m−p)(p+1) ,

(21)
T
1
ˆ
Q′ W 2 g¯ T (θ)
for some finite positive definite matrix
. The conditions that
ˆ T − A0 ) can be best seen
(21) imposes on the mp vector vec(A
using the decomposition


ˆ T − WD0 )
ˆ T − A0 ) = T (WT D
T (A


ˆ T − D0 ) + T (WT − W)D0
= T W(D

ˆ T − D0 )
+ T (WT − W)(D


ˆ T − D0 ) + T (WT − W)D0
= T W(D
+ op (1).

(22)

ˆ T are easily satisfied
While the conditions for the matrix D
(Assumption 3), the requirement of root-T convergence for WT
rules out nonparametric HAC estimators (see Andrews 1991,
for example) but allows for some parametric HAC estimators
(West 1997). In general, this assumption requires that WT is

498

Journal of Business & Economic Statistics, October 2012

computed using a martingale difference sequence process or a
dependent process for which the form of serial correlation is
known. Then, under the assumption in (21), it can be shown,
using similar arguments as in the proof of Theorem 1, that
d
T A′0 g¯ T (θˆ ) → −(Ip ⊗ u′2 )u1 ,

(23)

where [u′1 , u′2 ]′ ∼ N(0(m−p)(p+1) ,
). We should note that, under some regularity conditions on the kernel function and bandwidth parameter as in Andrews (1991) and Hall and Inoue
(2003), a similar limiting representation as in (23) can be derived for nonparametric HAC estimators but with a slower rate
of convergence that depends on the smoothing parameter.

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:49 11 January 2016

2.5 Rank Restriction Test
The speed of convergence of α ′ hT (θˆ ) and α ′ W¯gT (θˆ ) depends
crucially on whether α is in the column span of D0 . In some
cases, we know that α is in the column span of D0 (for instance, in our asset pricing example) and we should rely on the
nonstandard asymptotics developed earlier to conduct statistical
inference. In general, however, we do not know whether α is
in the column span of D0 and we need to resort to pretesting
to determine which asymptotic framework should be used for
the particular problem at hand. In the following, we propose a
computationally attractive pretest that determines whether α is
in the span of the column space of D0 .
Let Pα be an m × (m − 1) orthonormal matrix whose columns
are orthogonal to α such that
Pα P′α = Im − α(α ′ α)−1 α ′ .

(24)

Also, let = P′α D0 . It turns out that determining whether α is in
the span of the column space of D0 is equivalent to determining
whether is of reduced rank.
Under the null that is of (reduced) rank p − 1, H0 :
rank( ) = p − 1, there exists a nonzero p vector c˜ such that
D0 c˜ = α, or equivalently (by premultiplying by P′α and using
the properties of Pα ) ˜c = 0m−1 with the normalization c˜ ′ c˜ = 1.
As discussed in Cragg and Donald (1997), if has a reduced
column rank of p − 1, we can use an alternative normalization
and express one column of this matrix, say π j , as a linear combination of the other columns, assuming that c˜j = 0. Without
any loss of generality, we can order this column first and define
the rearranged partitioned matrix = [π 1 , 2 ] such that

−1
⎢c ⎥
⎢ 2 ⎥

[π 1 , 2 ] ⎢
⎢ .. ⎥ = 0m−1
⎣ . ⎦


2 c0 = π 1 ,

˜ t = −(Ip ⊗ P′α )G0 (D′0 WD0 )−1 D′0 Wgt (θ 0 )
m


′ ∂gt (θ 0 )
+ vec Pα
.
∂θ ′

(28)

In practice, M is replaced by a HAC estimator based on the
sample counterparts of G0 , D0 , W, and gt (θ 0 ). It should be
˜ t explicitly
stressed that the first term in the expression for m
accounts for the estimation uncertainty in θˆ when gt (θ 0 ) is a
nonlinear function of θ 0 . When gt (θ 0 ) is linear in θ 0 , the first
˜ t drops out (G0 = 0p×p ) and the
term in the expression for m
second term does not depend on θ 0 . As a result, the asymptotic
ˆ T depends only on the data and not on θ 0 .
variance of
ˆ 2,T c − πˆ 1,T . Define the test statistic
Let lT (c) =
ˆ T (c)−1 lT (c)],
LM = min T [lT (c)′
c

(29)

where
(c) = ([−1, c′ ] ⊗ Im−1 )M([−1, c′ ]′ ⊗ Im−1 ) and
ˆ T (c) denotes its consistent estimator obtained by substituting

a HAC estimator of M. The LM statistic tests the null
hypothesis that is of rank p − 1 against the alternative that
is of full rank p. The following lemma shows that the rank
test statistic LM is chi-squared distributed with m − p degrees
of freedom under the null.
1,

Lemma 4. Under Assumptions 1–4, and H0 : rank( ) = p −
d

2
.
LM → χm−p

(30)

Proof See the Appendix.
It is important to note that the rank test statistic in Equation
(29) is invariant to scaling of c. Furthermore, we would like to
emphasize that the minimization in (29) is with respect to only
a p − 1 vector c, and the complexity of the minimization problem does not increase with m. Although it can be shown (proof
is available upon request) that the LM statistic in (29) is numerically equivalent to the test statistic proposed by Cragg and
Donald (1997), it offers substantial computational advantages
over the highly dimensional optimization problem in Cragg and
Donald’s (1997) test. Finally, our simulation experiments show
that the test in (30) enjoys excellent size and power properties.

(25)

3. ILLUSTRATION: LINEAR ASSET PRICING MODEL

(26)

In this section, we specialize our theoretical results to the
linear specification of the asset pricing model in Section 2.2 and
assess the accuracy of the proposed asymptotic approximation
in this setup using a Monte Carlo simulation experiment. In
particular, we evaluate the size of the weighted chi-squared
test on the Lagrange multiplier associated with the first asset
when q = [1, 0′m−1 ]′ (i.e., the payoff of the first asset is a gross
return and the payoffs of the other assets are excess returns). We
consider two model specifications that are calibrated to monthly
data for the period January 1932 to December 2006. The first

cp

or

ˆ T . Using Assumption 3 and the proof of
ˆ T = P′α D
Let
Lemma 2, it can be shown that




d
ˆ T − ) →
T vec(
N 0(m−1)p , M ,
(27)
∞
˜ ′t+j ] and
˜ tm
where M = j =−∞ E[m

for some vector c0 = [c2 , . . . , cp ]′ . This is equivalent to imposing a normalization on c˜ such that its first element is −1.
With such a normalization, c0 is uniquely defined provided that
rank( ) = p − 1.

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:49 11 January 2016

Gospodinov, Kan, and Robotti: Limiting Behavior of GMM Sample Moment Conditions

one is calibrated to the capital asset pricing model (CAPM)
with the value-weighted market excess return as a risk factor.
For the CAPM, the returns on the test assets are the gross return
on the risk-free asset and the excess returns on 10 size ranked
portfolios. The second specification is calibrated to the threefactor model (FF3) of Fama and French (1993) with risk factors
given by the value-weighted market excess return, the return
difference between portfolios of small and large stocks, and the
return difference between portfolios of high and low book-tomarket ratios. For FF3, the returns on the test assets are the
gross return on the risk-free asset and the excess returns on 25
size and book-to-market ranked portfolios. All data are obtained
from Kenneth French’s website. The SDFs of the CAPM and
FF3 include an intercept term.
For each model, the factors and the returns on the test assets
are drawn from a multivariate normal distribution. The covariance matrix of the factors and returns is chosen based on the
covariance matrix estimated from the data. The mean return vector is chosen such that the asset pricing model holds exactly for
the test assets. For each simulated set of returns and factors, the
unknown parameters θ 0 of the linear SDF yt (θ 0 ) = f˜t′ θ 0 , where
f˜t = [1, ft′ ]′ with ft denoting the vector of risk factors, are estimated by minimizing the sample HJ-distance, which yields
ˆ T )−1 (D
ˆ ′T WT D
ˆ ′T WT q),
(31)
θˆ = (D


−1


ˆ T = 1 T Rt f˜t′ , WT = 1 T Rt R′t
, and q =
where D
t=1
t=1
T
T


[1, 0m−1 ] . The estimated Lagrange multipliers are given by


T

1
ˆ = WT
ˆ −q ,
λˆ = WT g¯ T (θ)
Rt yt (θ)
(32)
T t=1
and we consider the first element λˆ 1 . The next lemma specializes the results in Theorem 1 and Lemma 2 to this setup. It
should be noted that while Lemma 5 presents the limiting distribution of T λˆ 1 for the HJ-distance case, a similar result holds
for any weighting matrix WT that converges in probability to a
nonstochastic positive definite matrix W.

˜′
Lemma 5. Let D0 = E[Rt f˜t′ ], V = ∞
j =−∞ E[(Rt ft θ 0 −



−1
q)(Rt+j f˜t+j θ 0 − q) ], W = (E[Rt Rt ]) , Q denote an or1

thonormal matrix whose columns are orthogonal to W 2 D0 , and
suppose that Assumptions 1–4 hold. Then,
m−p
d
T λˆ 1 = T q′ λˆ → −



γi xi ,

(33)

i=1

where the xi ’s are independent χ21 random variables and the γi ’s
1
1
are the eigenvalues of the matrix Q′ W 2 VW− 2 Q.
Proof. See the Appendix.
In the analysis of the empirical size of our asymptotic approximation, the computed p-values from the weighted chi-squared
distribution in (33) are compared to the 10%, 5%, and 1% theoretical sizes of the test. For a comparison, we also provide the
empirical size of a standard normal test of H0 : λ1 = 0 used,
for example, in Hodrick and Zhang (2001). The empirical rejection probabilities are computed based on 100,000 Monte Carlo
replications.

499

Table 1. Empirical sizes of the tests of H0 : λ1 = 0
Standard normal
Level of significance
T

10%

5%

1%

150
300
450
600
750
900

0.978
0.977
0.976
0.976
0.975
0.976

Panel A: CAPM
0.929
0.689
0.925
0.682
0.923
0.679
0.924
0.679
0.923
0.679
0.923
0.680

150
300
450
600
750
900

1.000
1.000
1.000
1.000
1.000
1.000

1.000
1.000
1.000
1.000
1.000
1.000

Panel B: FF3
1.000
1.000
0.999
0.999
0.999
0.999

Weighted χ2
Level of significance
10%

5%

1%

0.144
0.121
0.115
0.111
0.109
0.107

0.082
0.065
0.060
0.057
0.057
0.055

0.022
0.015
0.014
0.013
0.012
0.011

0.284
0.178
0.151
0.138
0.130
0.125

0.189
0.105
0.084
0.074
0.070
0.067

0.072
0.031
0.022
0.018
0.016
0.015

NOTE: The table presents the actual probabilities of rejection for the asymptotic tests of
H0 : λ1 = 0 with different levels of significance, assuming that the factors and returns are
generated from a multivariate normal distribution. We consider two model specifications
that are calibrated to monthly data for the period January 1932 to December 2006. The
model specification in Panel A is calibrated to the capital asset pricing model (CAPM). The
model specification in Panel B is calibrated to the three-factor model of Fama and French
(FF3, 1993). The results for different number of time series observations (T) are based on
100,000 simulations.

For different sample sizes T, we report the simulation results
for the two model specifications in Panels A and B of Table 1. In
Panel A, the weighted chi-squared distribution provides a very
accurate approximation to the finite-sample behavior of λˆ 1 . In
contrast, the standard normal test leads to severe size distortions
and rejects the true null hypothesis about 92% of the time at
the 5% significance level. In the case of 25 risky assets (Panel
B), our approximation tends to overreject for small sample sizes.
This overrejection is a well documented fact in empirical finance
and occurs when the number of test assets m is large relative
to the number of time series observations T (see, for instance,
Ahn and Gadarowski 2004). As T increases, the empirical size of
the weighted chi-squared approximation approaches its nominal
level. In contrast, the standard normal test always rejects the
true null hypothesis 100% of the time and does not improve as
T increases.
While the incorrect size of the normal test is expected from
our theoretical analysis, the severity of these size distortions is
somewhat surprising and deserves further attention. Note that
the conventional t-statistic for testing H0 : α ′ λ = 0, where α =
D0 c˜ and c˜ is a nonzero p vector, is defined as

tα′ λˆ

√ ′
T α λˆ
T α ′ λˆ
=
,
=
ˆ T α) 21
ˆ T α) 21
(α ′ WT W
(T α ′ WT W

(34)

ˆ
where
√ ′  denotes a consistent estimator of 0 in (4). Since
T α λˆ is asymptotically degenerate, it might be expected that
the t-test will be undersized, which stands in contrast to the
overrejections reported in Table 1. The following lemma derives the asymptotic distribution of the t-statistic
√ for testing
H0 : α ′ λ = 0 and shows that the numerator T α ′ λˆ and the

500

Journal of Business & Economic Statistics, October 2012

Table 2. Empirical size and power of the rank test

1

ˆ T α] 2 shrink to zero at the same rate,
denominator [α ′ WT W
rendering the limit of the ratio tα′ λˆ a bounded random variable.
Lemma 6. Suppose Rt and ft are iid multivariate elliptically
distributed with finite fourth moments and kurtosis parameter
κ = µ4 /(3σ 4 ) − 1, where σ 2 and µ4 are the second and fourth
central moments of the elliptical distribution. Let H = E[f˜t f˜t′ ] +
κvar[f˜t ]. Then,


d
tα′ λˆ → r u + 1 − r 2 w,
(35)

where r = −˜c′ Hθ 0 / (˜c′ H˜c)(θ ′0 Hθ 0 ), u ∼ χ2m−p , w ∼ N(0, 1),
and u and w are independent of each other.

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:49 11 January 2016

Proof. See the Appendix.
The result in Lemma 6 shows that the asymptotic distribution
of the statistic tα′ λˆ is a mixture of two random variables. Also,
it is straightforward to show that the first and second moments
of the asymptotic distribution in Lemma 6 are given by
√ m−p+1


2
,
(36)
E[tα′ λˆ ] = r
Ŵ m−p
2
where Ŵ(·) is the gamma function, and
 
E tα2′ λˆ = 1 + r 2 (m − p − 1).

(37)

When α = q, we have that c˜ = θ 0 , r = −1, and the
 test statis-

tic of H0 : λ1 = 0 is asymptotically distributed as − χ2m−p . One
important implication of this result is that the correct asymptotic
distribution of the t-statistic of λˆ 1 is miscentered compared to the
standard normal approximation and the shift to the left increases
with the degree of overidentification. For example, the means
of this limiting distribution for the CAPM (with m − p = 9)
and FF3 (with m − p = 22) are −2.92 and −4.53, respectively.
These results clearly illustrate that the standard asymptotic inference can be grossly misleading.
However, Lemma 6 also suggests that there are situations in
which the standard normal distribution is still the appropriate
limiting distribution of tα′ λˆ . For example, this occurs when r = 0
or, equivalently, c˜ ′ Hθ 0 = 0. Since the mixing coefficient r is
consistently estimable, the asymptotic approximation in Lemma
6 conveniently bridges these two extremes (r 2 = 0 and r 2 = 1).
Next, we investigate the empirical size properties of the sequential test of H0 : α ′ λ = 0 that uses the LM rank test of
H0 : rank( ) = p − 1 from Section 2.5 as a pretest. Recall that
the rank test determines if the normal or the weighted χ 2 distribution theory should be used for testing the hypothesis of interest
H0 : α ′ λ = 0. Table 2 reports the empirical size and power of
the reduced rank test when α is in the span of the column space
of D0 (α = q = [1, 0′m−1 ]′ ) or not (α = 1m ). Overall, the rank
test exhibits excellent size and power properties.
Table 3 presents the results on the empirical size of the
sequential test of H0 : α ′ λ = 0 for α = q = [1, 0′m−1 ]′ and
α = 1m by setting the nominal levels of the rank pretest and
second-stage hypothesis test to be equal to each other. When
α = q = [1, 0′m−1 ]′ , has a reduced rank and the second-stage
test uses the weighted χ2 limiting distribution for inference. As
a result, the size of the sequential test (Panel A in Table 3) is very
similar to the size of the weighted χ2 approximation reported in
Table 1. On the other hand, when α is not in the column span

CAPM
Level of significance
T

10%

150
300
450
600
750
900

0.095
0.098
0.099
0.099
0.100
0.099

150
300
450
600
750
900

0.999
1.000
1.000
1.000
1.000
1.000

5%

1%

Panel A: α = q = [1,
0.044
0.007
0.048
0.009
0.050
0.009
0.049
0.010
0.050
0.010
0.050
0.010
Panel B: α = 1m
0.997
0.965
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000

FF3
Level of significance
10%

5%

1%

0.069
0.093
0.098
0.099
0.100
0.100

0.024
0.044
0.047
0.047
0.049
0.050

0.001
0.007
0.008
0.009
0.009
0.009

0.977
1.000
1.000
1.000
1.000
1.000

0.913
1.000
1.000
1.000
1.000
1.000

0.531
1.000
1.000
1.000
1.000
1.000

0′m−1 ]′

NOTE: The table presents the actual probabilities of rejection for the LM rank test of
H0 : rank( ) = p − 1 with different levels of significance, assuming that the factors and
returns are generated from a multivariate normal distribution. We consider two model
specifications (CAPM and FF3) that are calibrated to monthly data for the period January
1932 to December 2006. Panel A presents the empirical size of the rank test for α = q =
[1, 0′m−1 ]′ . Panel B reports the empirical power of the rank test for α = 1m . The results for
different number of time series observations (T) are based on 100,000 simulations.

of D0 , the appropriate distribution theory is based on the conventional asymptotic normal approximation. Panel B in Table 3
reveals that in this case, the rank test successfully identifies the
appropriate asymptotic framework and the empirical size of the
sequential test is very close to its nominal level.

Table 3. Empirical size of the sequential test
CAPM
Level of significance
T

10%

150
300
450
600
750
900

0.145
0.121
0.115
0.111
0.109
0.107

150
300
450
600
750
900

0.123
0.110
0.106
0.104
0.104
0.103

5%

1%

FF3
Level of significance
10%

Panel A: α = q = [1, 0′m−1 ]′
0.082
0.022
0.284
0.065
0.015
0.178
0.060
0.014
0.151
0.058
0.013
0.138
0.057
0.012
0.130
0.055
0.011
0.125
Panel B: α = 1m
0.067
0.022
0.057
0.012
0.054
0.011
0.053
0.011
0.052
0.011
0.051
0.011

0.177
0.132
0.121
0.116
0.112
0.110

5%

1%

0.189
0.105
0.085
0.074
0.070
0.067

0.072
0.031
0.022
0.018
0.016
0.015

0.124
0.072
0.065
0.061
0.059
0.057

0.130
0.018
0.014
0.013
0.013
0.012

NOTE: The table presents the actual probabilities of rejection for the sequential test (which
includes a reduced rank pretest) of H0 : λ1 = 0 with different levels of significance, assuming that the factors and returns are generated from a multivariate normal distribution. The
nominal levels of the rank pretest and the second-stage test of H0 : λ1 = 0 are set equal to
each other. We consider two model specifications (CAPM and FF3) that are calibrated to
monthly data for the period January 1932 to December 2006. Panel A presents the empirical
size of the sequential test for α = q = [1, 0′m−1 ]′ , that is, α is in the span of the column
space of D0 . Panel B reports the empirical size of the sequential test for α = 1m , that is, α
is not in the span of the column space of D0 . The results for different number of time series
observations (T) are based on 100,000 simulations.

Gospodinov, Kan, and Robotti: Limiting Behavior of GMM Sample Moment Conditions

4.

CONCLUSION

This article derives some new results on the asymptotic distribution of linear combinations of GMM sample moment conditions. These results complement lemma 4.1 of Hansen (1982)
with the cases that give rise to singularity of the asymptotic
covariance matrix and degeneracy of the asymptotic distribution. Interestingly, we establish that in these cases, the GMM
sample moment conditions converge at rate T to their population analogs and follow a nonstandard (weighted chi-squared)
limiting distribution. Finally, we propose an easy-to-implement
rank test to determine which asymptotic framework should be
adopted for the particular problem at hand.

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:49 11 January 2016

APPENDIX: PROOFS OF THEOREM AND LEMMAS
Proof of Theorem 1. Using the first-order condition
ˆ as
ˆ ′ hT (θˆ ) = 0p , we express T D′ hT (θ)
D
T
0
ˆ = −T (D
ˆ
ˆ T − D0 )′ hT (θ)
T D′0 hT (θ)


ˆ T − D0 )′ W 12 QQ′ + W 21 D0 (D′0 WD0 )−1
= − T (D


1
1

× D′0 W 2 T W− 2 hT θˆ
√ ′ 1 √

1
ˆ T W2 Q
= − TD
T Q′ W− 2 hT (θˆ )
√ 
′
1√
1
+ T Q′ W 2 D0 Q′ W− 2 T hT (θˆ )

√
ˆ T −D0 ) ′ WD0 (D′0 WD0 )−1
− T (D
√

× T D′0 hT (θˆ ) .

(A.1)



1

Since
Q′ W 2 D0 = 0(m−p)×p ,
T D′0 hT (θˆ ) = op (1),

ˆ T − D0 ) = Op (1), it follows that
T (D

and

(A.2)


1
ˆ T ) converge to a vector
Using Assumption 3, let T vec(Q′ W 2 D
of normal
random
variables
v
.
Similarly,
using (7) in Lemma
1


− 21
ˆ
1, let T Q W hT (θ ) converge to a vector of normal random
variables v2 and write the joint distribution of [v′1 , v′2 ]′ as

Thus,

d
ˆ ′ D0 ) →
T D′0 hT (θˆ ) = vec(T hT (θ)
−(Ip ⊗ v′2 )v1 .

This completes the proof of Theorem 1.

(A.3)

ˆ T − D0 ) =
T vec(D

T
1 
gt (θ 0 ) + op (1),
= −G0 (D′0 WD0 )−1 D′0 W √
T t=1





ˆT −D
˜ T ) + T vec(D
˜ T − D0 ).
T vec(D
(A.5)

(A.6)

where the first equality follows from Assumption 4 and the
second equality is ensured by the conditions imposed in Assumption 2. For the second term, we have



T 


∂gt (θ 0 )
˜ T − D0 ) = √1
vec

vec(D
T vec(D
)
.
0
∂θ ′
T t=1
(A.7)
Using expressions (A.5), (A.6), and (A.7), we have




1
ˆ T − D0 ))
ˆ T = T vec(Q′ W 12 (D
T vec Q′ W 2 D




1
ˆ T − D0 )
= Ip ⊗ Q′ W 2 T vec(D
T
1 

−1 ′
˜
= −G(D0 WD0 ) D0 W √
gt (θ 0 )
T t=1



T
1 
1 ∂gt (θ 0 )
+ op (1)
+√
vec Q′ W 2
∂θ ′
T t=1

(A.8)

1

˜ = (Ip ⊗ Q′ W 2 )G0 . Stacking the expression for
using that G



ˆ W 12 Q) with Q′ W 21 gt (θ 0 ), we have  = ∞
T vec(D
j =−∞
T
E[dt d′t+j ], where dt = [d′1,t , d′2,t ]′ and


˜ ′0 WD0 )−1 D′0 Wgt (θ 0 ) + vec Q′ W 21 ∂gt (θ 0 ) ,
d1,t = −G(D
∂θ ′
(A.9)
1
2

d2,t = Q′ W gt (θ 0 ).

(A.10)


Proof of Lemma 3. Defining z˜ = S′ z ∼ N(0l , ϒ), we can
write




1
1
I
I
0
0
n
n
n×n
n×n

′ ′⎢
2 ⎥
2 ⎥
z′1 z2 = z′ ⎣ 1
⎦ z = z˜ S ⎣ 1
⎦ S˜z.
In 0n×n
In 0n×n
2
2
(A.11)
1
Let e = ϒ − 2 z˜ ∼ N(0l , Il ). Then, we can write


1
0
I
n
n×n
1
1


2 ⎥
z′1 z2 = e′ ϒ 2 S′ ⎣ 1
(A.12)
⎦ Sϒ 2 e = e Ŵe.
In 0n×n
2
Since e is standard normal, it follows that

(A.4)

Proof of Lemma 2. To obtain
the asymptotic distribution of
ˆ T − D0 ), define D
˜ T = 1 T ∂gt (θ 0 )/∂θ ′ and write
vec(D
t=1
T


For the first term, we use the mean-value theorem to obtain

ˆT −D
˜T)
T vec(D

= G0 T (θˆ − θ 0 ) + op (1)

This completes the proof of Lemma 2.


√ ′ 1 √
1
ˆ + op (1).
ˆ T W2 Q
T Q′ W− 2 hT (θ)
T D′0 hT (θˆ ) = − T D




[v′1 , v′2 ]′ ∼ N 0(m−p)(p+1) ,  .

501

z′1 z2 ∼

k


γi xi ,

(A.13)

i=1

where the γi ’s are the k ≤ l nonzero eigenvalues of Ŵ and the
xi ’s are independent χ12 random variables. This completes the
proof of Lemma 3.

ˆ 2,T c − πˆ 1,T =
Proof of Lemma 4. Combining lT (c) =
ˆ 2,T c − πˆ 1,T ) = ([−1, c′ ] ⊗ Im−1 )vec(
ˆ T ) and Equation
vec(

502

Journal of Business & Economic Statistics, October 2012

(27), we have


d

T lT (c0 ) → N(0m−1 ,
(c0 )),

(A.14)

where
(c0 ) = ([−1, c′0 ] ⊗ Im−1 )M([−1, c′0 ]′ ⊗ Im−1 ). Let
cˆ = arg min lT (c)′
−1
T (c)lT (c)

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:49 11 January 2016

be the estimator of c0 . First, note that while the estimator cˆ
depends on a preliminary estimator θˆ when gt (θ) is a nonlinear
function of θ, the uncertainty associated with the estimation
of θ is already incorporated in M. Also, from the asymptotic
equivalence of the estimator in (A.15) with the two-step GMM
estimator, we have (Hansen 1982)

T (ˆc − c0 )

= −[ ′2
(c0 )−1 2 ]−1 ′2
(c0 )−1 T lT (c0 ) + op (1). (A.16)
Finally, using similar arguments as in Hansen (1982, lemma
4.2), it follows that LM is asymptotically distributed as a chisquared random variable with (m − 1) − (p − 1) = m − p degrees of freedom. This completes the proof of Lemma 4.

Proof of Lemma 5. In the case 
of asset pricing models
with a pricing constraint g¯ T (θ ) = T1 Tt=1 Rt yt (θ) − q, the expressions for d1,t and d2,t in the covariance matrix  =



j =−∞ E[dt dt+j ] in Lemma 2 specialize to
˜ ′0 WD0 )−1 D′0 W(Rt yt (θ 0 ) − q)
d1,t = −G(D
1

∂yt (θ 0 )
,
∂θ ′

(A.17)

1

d2,t = Q′ W 2 (Rt yt (θ 0 ) − q),
(A.18)


 ∂yt (θ 0 ) 
2
yt (θ 0 )
.
where D0 = E Rt ∂θ ′ and G0 = E (Rt ⊗ Ip ) ∂ ∂θ∂θ

For the special case of a linear SDF that prices the test assets
correctly, these expressions can be further simplified and have
the form
1

d1,t = Q′ W 2 Rt ⊗ f˜t ,

(A.19)

1
2

d2,t = Q′ W Rt f˜t′ θ 0
1
2

(A.20)
1
2

since G0 is a null matrix and Q′ W q = Q′ W D0 θ 0 = 0m−p
from the definition of Q.
ˆ where α = D0 c˜ and
For the linear combination T α ′ hT (θ),
ˆ
ˆ
hT (θ ) = λ, we have from the proof of Theorem 1 that
 √
√

1
ˆ + op (1).
ˆ T c˜ ′ T Q′ W− 12 hT (θ)
T c˜ ′ D′0 λˆ = − T Q′ W 2 D
(A.21)
It is straightforward to show using the results above that





1
d
ˆ T c˜ →
T Q′ W 2 D
E[d˜ 1,t d˜ ′1,t+j ]⎠ , (A.22)
N ⎝0m−p ,
j =−∞

1
ˆ
where d˜ 1,t = Q′ W 2 Rt f˜t′ c˜ . When c˜ = θ 0 , that is, c˜ ′ D′0 λˆ = q′ λ,
˜
we have d1t = d2,t and it follows that

′ˆ

d

Tqλ →

−v′2 v2 ,

Proof of Lemma 6. Using that α = D0 c˜ , the numerator of the
t-statistic tα′ λˆ can be expressed as

(A.15)

c

+ (Q′ W 2 Rt ⊗ Ip )

χ 2 distribution are given by the eigenvalues of the matrix
1
1
Q′ W 2 VW 2 Q. This completes the proof of Lemma 5.


(A.23)

which is a linear combination of m − p independent chisquared random variables with one degree of freedom. Since
1
1
v2 ∼ N(0m−p , Q′ W 2 VW 2 Q), the weights for the weighted

ˆ ′T )λˆ
T c˜ ′ D′0 λˆ = T (˜c′ D′0 − c˜ ′ D
ˆ ′T )W 21 QQ′ W− 21 λˆ + op (1)
= T (˜c′ D′0 − c˜ ′ D


ˆ
ˆ ′T W 21 Q][ T Q′ W− 12 λ]
= −[ T c˜ ′ D
d

+ op (1) → −z′1 z2 ,

(A.24)


1
2

ˆ T c˜ and z2 is
where z1 is the limiting distribution
of T Q′ W D


− 12 ˆ
the limiting distribution of T Q W λ.
Next, the consistent estimator of 0 in (4) is given by
ˆ = [Im − D
ˆ T )−1 D
ˆ ′T WT ]V
ˆ T (D
ˆ ′T WT D
ˆ


ˆ T )−1 D
ˆ ′T WT ]′
ˆ T (D
ˆ ′T WT D
× [Im − D

1
1
1
1
−1
ˆ T (D
ˆ ′T WT D
ˆ T )−1 D
ˆ ′T W 2 W 2 VW
ˆ T2
= WT 2 Im − WT2 D
T
T

1
1
1
ˆ T (D
ˆ ′T WT D
ˆ T )−1 D
ˆ ′T W 2 W− 2
× Im − WT2 D
T
T
1

1

1

1


ˆQ
ˆ ′ W− 2 ,
ˆQ
ˆ ′ W 2 VW
ˆ T2 Q
= WT 2 Q
T
T

(A.25)

ˆ is a consistent estimator of V and Q
ˆ is an m × (m − p)
where V
1
ˆ T . As
orthonormal matrix with its columns orthogonal to WT2 D
a result, we can rewrite the term inside the squared root of the
denominator of the t-statistic as
1



1

1



1

2 ˆ ˆ
2 ˆ
2
ˆ T α = T c˜ ′ D′0 W 2 Q
ˆ ˆ
˜.
T α ′ WT W
T Q WT VWT QQ WT D0 c
(A.26)
Since
1

ˆ ′ W 2 D0 c˜
TQ
T
1

ˆ ′ W 2 (D0 − D
ˆ T )˜c
= TQ
T

1
ˆ T )˜c + op (1)
= T Q′ W 2 (D0 − D

1
d
ˆ T c˜ + op (1) →
−z1 ,
(A.27)
= − T Q′ W 2 D

it follows that
d

1

1

ˆ T α → z′1 Q′ W 2 VW 2 Qz1 .
T α ′ WT W

(A.28)

Therefore, we have
z′1 z2

d

tα′ λˆ → −

1
2

1

1

[z′1 Q′ W VW 2 Qz1 ] 2

.

(A.29)

The jo