07350015%2E2013%2E818003

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Likelihood-Based Estimation of Dynamic Panels
With Predetermined Regressors
Enrique Moral-Benito
To cite this article: Enrique Moral-Benito (2013) Likelihood-Based Estimation of Dynamic
Panels With Predetermined Regressors, Journal of Business & Economic Statistics, 31:4,
451-472, DOI: 10.1080/07350015.2013.818003
To link to this article: http://dx.doi.org/10.1080/07350015.2013.818003

Accepted author version posted online: 12
Jul 2013.

Submit your article to this journal

Article views: 357

View related articles


Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 11 January 2016, At: 22:18

Likelihood-Based Estimation of Dynamic Panels
With Predetermined Regressors
Enrique MORAL-BENITO
Banco de España Alcalá 48, 28014 Madrid, Spain ([email protected])

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:18 11 January 2016

This article discusses the likelihood-based estimation of panel data models with individual-specific effects
and both lagged dependent variable regressors and additional predetermined explanatory variables. The
resulting new estimator, labeled as subsystem limited information maximum likelihood (ssLIML), is
asymptotically equivalent to standard panel generalized method of moment as N → ∞ for fixed T but
tends to present smaller biases in finite samples as illustrated in simulation experiments. Simulation results

also indicate that the estimator is preferred to other alternatives available in the literature in terms of finitesample performance. Finally, to provide an empirical illustration, I revisit the evidence on the relationship
between income and democracy in a panel of countries using the proposed estimator.
KEY WORDS: Dynamic panel data; Income and democracy; Maximum likelihood estimation; Monte
Carlo methods.

1.

INTRODUCTION

In this article, I consider a linear panel data model with
individual-specific effects and both a lagged dependent variable
regressor and additional predetermined explanatory variables.
In particular, I develop the implications of the model (without
imposing any additional restriction) for the first- and secondorder moments of the observed data to set up a likelihood function based on a multivariate regression with normal errors and
restrictions on the covariance matrix.
To avoid steady-state restrictions and the incidental parameter problem, the vector of initial observations and individual
effects is assumed to be normally distributed with unrestricted
mean and covariance matrix. Also, neither time-series nor conditional homoscedasticity are assumed. Therefore, the resulting
subsystem limited information maximum likelihood (LIML)—
henceforth ssLIML—estimator1 remains consistent under the

same assumptions as standard panel generalized method of moment (GMM) estimators (e.g., Arellano and Bond 1991).
More concretely, ssLIML is asymptotically equivalent to
the standard GMM estimator discussed in Arellano and Bond
(1991) augmented with moment conditions implied by the serial correlation properties of the errors as suggested by Ahn
and Schmidt (1995).2 Along these lines, ssLIML can be inThis article is a substantially revised version of a previous draft circulated under
the title “Dynamic Panels With Predetermined Regressors: Likelihood-based
Estimation and Bayesian Averaging With an Application to Cross-Country
Growth.” The STATA command xtmoralb that implements the estimator discussed in this article is available on my website at http://www.moralbenito.com.
1Note that the model consists of T structural equations for the dependent variable (y) completed with a set of reduced-form equations for the additional
predetermined explanatory variables (x) and the initial observations. Therefore,
the ssLIML labeling can be understood as an intermediate situation between
LIML—a single structural equation and a set of reduced-form equations—and
full information maximum likelihood (FIML)—a set of structural equations
without reduced-form equations (see Hendry 1995).

terpreted as the likelihood-based counterpart of method-ofmoments estimators for dynamic panels with predetermined
regressors.
It is well known from the simultaneous equations literature
(e.g., Anderson, Kunitomo, and Sawa 1982) that likelihoodbased approaches are preferred to method-of-moments counterparts in terms of finite-sample performance, especially when the
instruments are weak and/or many with respect to the sample

size. Therefore, the aforementioned ssLIML estimator is expected to be preferred to its method-of-moments counterpart in
terms of finite sample performance.
A comprehensive simulation study serves to evaluate the
finite-sample behavior of the ssLIML estimator. In settings with
individual effects and both a lagged dependent variable regressor
and an additional predetermined explanatory variable, I find that
the finite-sample biases of the ssLIML estimator are in general
negligible. Moreover, the ssLIML biases tend to be smaller than
those of other estimators available in the literature, especially
when instruments are expected to be weaker (i.e., the series
are more persistent because either the autoregressive parameter
is more sizable or a larger variance of the individual effects
exists).
This work is related to the literature exploring alternative estimators that are asymptotically equivalent to standard GMM as
N → ∞ for fixed T, but potentially having different finite sample behavior (e.g., Alonso-Borrego and Arellano 1999). This
literature is motivated by concern with the finite-sample biases in standard GMM, especially when the instruments are
weak and the number of moment conditions is large relative
to the cross-section dimension (see, e.g., Blundell and Bond
1998).


2More

concretely, Ahn and Schmidt (1995) suggested to exploit the moment
conditions resulting from the lack of autocorrelation in the errors available
under the assumption of conditional mean independence between the timevarying errors and the individual effects. Note that these moment conditions
make the estimation problem nonlinear and, thus, have been generally ignored
by applied researchers.
451

© 2013 American Statistical Association
Journal of Business & Economic Statistics
October 2013, Vol. 31, No. 4
DOI: 10.1080/07350015.2013.818003

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:18 11 January 2016

452

Based exclusively on the same identifying assumption as
standard GMM (i.e., predeterminedness), Alonso-Borrego and

Arellano (1999) considered both LIML analog3 and symmetrically normalized GMM estimators, which tend to reduce finitesample biases in standard GMM. Alternatively, Arellano and
Bover (1995) and Blundell and Bond (1998) resorted to auxiliary
mean-stationarity assumptions in addition to predeterminedness
and exploit them within a method-of-moments perspective in the
so-called system GMM (sGMM) estimator.
The present article is also related to the literature on
likelihood-based estimation of panel data models under fixedT and large-N settings. Bhargava and Sargan (1983), Alvarez
and Arellano (2003), and Dhaene and Jochmans (2012), among
others, considered likelihood functions for an autoregressive
panel data model with strictly exogenous regressors. In contrast, the likelihood function developed by Hsiao, Pesaran, and
Tahmiscioglu (2002) for data in differences can also accommodate predetermined regressors under the assumption of timeseries homoscedasticity of the error variances4 [Alvarez and
Arellano (2004) discussed the relationship of this approach
with other maximum likelihood (ML) estimators for the data in
levels].
As an empirical illustration, I use the proposed estimator to
revisit the evidence on the effect of income on democracy in a
panel of countries spanning from 1970 to 2000 with data at 5year intervals. Based on ordinary least squares (OLS) estimates,
Acemoglu et al. (2008) found that the positive correlation between income and democracy disappears when accounting for
country-specific effects. Given the potential feedback from income to democracy, Acemoglu et al. (2008) also considered
the Arellano and Bond (1991) GMM estimator in search of

a causal interpretation for their OLS estimates. Based on this
GMM estimator, Acemoglu et al. (2008) obtained negative (and
often significant) estimates of the effect of income on democracy. In contrast, ssLIML estimates are always statistically
insignificant.
The remainder of the article is organized as follows. Section 2
presents the likelihood function and the connection between the
resulting ssLIML estimator and its method-of-moments counterpart. Monte Carlo evidence on the finite-sample behavior
of the estimator is provided in Section 3. In Section 4, I revisit the evidence on the relationship between income and
democracy across countries using the ssLIML estimator. Finally, Section 5 concludes. Additional results are gathered in the
Appendix.

3Sargan

(1958) showed that the original LIML estimator (based on a proper
likelihood function) developed by Anderson and Rubin (1949) is equivalent to
an IV (instrumental variables)/GMM estimator that minimizes the maximum
possible sample correlation between the errors and a linear combination of the
instruments. Therefore, in contrast to the ssLIML estimator discussed in this
article, “minimax” method-of-moments estimators such as those considered in
Alonso-Borrego and Arellano (1999) and Akashi and Kunitomo (2010) are

usually labeled as LIML analog estimators in spite of not corresponding to any
meaningful ML estimator.
4Binder,

Hsiao, and Pesaran (2005) also considered a likelihood approach for
dynamic panel data models with predetermined regressors under time-series
homoscedasticity and mean-stationarity.

Journal of Business & Economic Statistics, October 2013

2.

DYNAMIC PANEL DATA WITH FEEDBACK:
LIKELIHOOD-BASED ESTIMATION

In addition to being potentially correlated with a timeinvariant unobserved heterogeneity component, regressors in
panel data models can be assumed to be uncorrelated to past,
present, and future time-varying errors (i.e., strictly exogenous)
or correlated with time-varying errors at all lags and leads (i.e.,
strictly endogenous). Besides these two configurations, it is possible to imagine intermediate settings in which the explanatory

variables can be correlated with time-varying errors at certain
periods but not at others (i.e., partially endogenous).
In this article, I consider a dynamic panel data model in
which the time-varying errors are uncorrelated with current and
lagged values of the regressors but not to their future values
(i.e., regressors are predetermined with respect to the current
time-varying error).5
Formally, I consider the model
yit = αyit−1 + xit′ β + wi′ δ + ηi + ζt + vit
(1)


t−1
t
E vit | yi , xi , wi , ηi = 0 (t = 1, . . . , T )(i = 1, . . . , N),

(2)

where xit and wi are vectors of variables of orders k and m,
respectively, and xit denotes a vector of the observations of


x accumulated up to t: xit = (xi1
, . . . , xit′ )′ . I use, with a slight
abuse of notation, the subscript it − 1 as a shorthand for i, t − 1.
Analogously to standard GMM, condition (2) is the only
assumption required for consistency and asymptotic normality
(under fixed T when N tends to infinity) of the likelihood-based
estimator developed in this article. Other ML estimators are
available in the literature for estimating the model in (1) and
(2). However, these estimators require additional assumptions
such as time-series homoscedasticity (i.e., Hsiao, Pesaran, and
Tahmiscioglu 2002) or mean-stationarity (i.e., Binder, Hsiao,
and Pesaran 2005) for fixed-T consistency.
In addition to the predetermined variables yit−1 and xit , the
model also includes a vector of time-invariant strictly exogenous
variables (wi ). Allowing for time-varying strictly exogenous regressors is straightforward in this context. However, in the spirit
of Hausman and Taylor (1981), I stress here the possibility of
identifying the effect of time-invariant observable variables in
addition to the unobservable time-invariant fixed effect (ηi ) by
assuming lack of correlation between the w variables and the

fixed effects. Finally, the term ζt in (1) captures unobserved common factors across units in the panel, and thus, this particular
form of cross-sectional dependence is allowed.6
The model in (1) and (2) assumes that both the lagged dependent variable and the x regressors are predetermined with
respect to the time-varying error term (vit ). This configuration
5Since

predetermined variables are potentially correlated to lagged values of
the time-varying error but are uncorrelated to present and future values, they
can be understood as a particular case of partial endogeneity. For instance, an
alternative partial endogeneity setting may also allow for nonzero correlation
between the regressor and the contemporaneous time-varying error so that the
regressor is no longer predetermined although partially endogenous.
6In

practice, this is done by simply working with cross-sectional demeaned
data. In the remainder of the exposition, all of the variables are assumed to be
deviations from their cross-sectional mean.

Moral-Benito: Likelihood-Based Estimation of Dynamic Panels

453

accommodates feedback from lagged values of the dependent
variable y to the current value of the x regressors.7 It is worth
emphasizing that assumption (2) also implies lack of autocorrelation in vit since lagged v’s are linear combinations of the
variables in the conditioning set.

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:18 11 January 2016

2.1 Likelihood Function With an Unrestricted
Feedback Process



, yi1 , . . . , xiT
, yiT )′
Ri = (yi0 , xi1

In contrast to a model with only strictly exogenous explanatory variables, the specification of the model in (1) and (2) with
predetermined variables is incomplete in the sense that it does
not lead to a likelihood in itself once we add an error distributional assumption. Therefore, in this section, I develop the
implications of the model for the first and second moments
of the observed variables. The parameters in Equation (1) are
augmented with additional (implicit and unrestricted) reducedform parameters required for specifying these moments. Once
the moments are specified, I can set up a likelihood function
based on a multivariate regression model with restrictions on
the covariance matrix.
The unrestricted feedback process is specified in the form of
a set of cross-sectional linear projections of the predetermined x
variables on all available lags (for t = 2, . . . , T ) having periodspecific coefficients:
xit = κt wi + γt0 yi0 + · · · + γt,t−1 yi,t−1
+ t1 xi1 + · · · + t,t−1 xi,t−1 + ct ηi + ϑit .

once I set up the corresponding likelihood function, I label the
approach as subsystem LIML (ssLIML) as opposed to FIML
(all equations in structural form) or LIML (a single structural
equation plus a set of reduced-form equations).
To rewrite the system in matrix form, I also define the T (k +
1) + 1 × 1 vector of observed data for individual i:

(3a)

Furthermore, the mean vector and covariance matrix of the
joint distribution of the initial observations (yi0 , xi1 ) and the
individual effects ηi are unrestricted. This is in sharp contrast
with the mean-stationarity assumption required by the so-called
sGMM estimator discussed in Arellano and Bover (1995) and
Blundell and Bond (1998). As a result, I do not impose any
stationarity assumption on the initial observations:8
yi0 = wi′ δ0 + c0 ηi + vi0

(3b)

xi1 = κ1 wi + γ10 yi0 + c1 ηi + ϑi1 ,

(3c)

where δ0 and ct are vectors of parameters of order m and k,
respectively; c0 is a scalar; and for h < t, γth is the k × 1 vector γth = (γth1 , . . . , γthk )′ with h = 0, . . . , T − 1.9 Moreover, th
and κt are matrices of parameters of orders k × k and k × m,
respectively, and ϑit is a k × 1 vector of prediction errors.
All in all, the full model consists of the T equations in (1)
plus the T k + 1 equations in (3a)–(3c). The equations of interest
(i.e., structural equations) are those corresponding to y. The
remaining equations in the system are not of relevant interest
and are thus considered as reduced-form equations. Therefore,
7A

static (i.e., without lagged dependent variable) version of the model might
also be of interest, for instance, in the estimation of production functions (e.g.,
Blundell and Bond 2000) or Euler equations for household consumption (e.g.,
Zeldes 1989). The likelihood function for such a static version is also discussed
in Appendix A.2.
8Note

also that this specification implies that cov(ηi , wi ) = 0 to ensure identification of δ in (1).

9Note that strict exogeneity of a given x j regressor can be tested by considering
j
j
the null H0 : γth = 0 for all h < t against the alternative H1 : γth = 0 for at
least one h < t.

and the T (k + 1) + 2 column vector of errors:



Ui = (ηi , vi0 , ϑi1
, vi1 , . . . , ϑiT
, viT )′

with covariance matrix


= var(Ui ) = diag ση2 , σv20 , ϑ1 , σv21 , . . . , ϑT , σv2T ,

where ϑt is a k × k matrix. Note that the block-diagonal
variance-covariance matrix allows for time-series heteroscedasticity.
I further define the T (k + 1) + 1 × T (k + 1) + 1 lower triangular matrix of coefficients B as


1
0
0
0
0 ...
0
0 0
⎜ −γ
Ik
0
0
0 ...
0
0 0⎟
10




⎜ −α −β ′

1
0
0
.
.
.
0
0
0




⎜ −γ20 − 21 −γ21 Ik
0 ...
0
0 0⎟


B=⎜ 0
1 ...
0
0 0⎟
0
−α −β ′




⎜ ..

..
..
..
.. . .
⎜ .
.
.
.
.
0
0 0⎟
.




⎝ −γT 0 − T 1 −γT 1 − T 2 −γT 2 . . . −γT ,T −1 Ik 0 ⎠
0

0

0

0

0

...

−α

−β ′ 1

together with the coefficient matrices C = (δ0′ κ1 δ ′ . . . κT
δ ′ )′ and


(c0 c1′ 1
1 . . . cT′ 1)
c2′
D=
IT (k+1)

of orders T (k + 1) + 1 × m and T (k + 1) + 1 × T (k + 1) + 2,
respectively.
I am now able to write the model as
Ri =
(θ )′ wi + Ui∗

(4)

with
(θ ) = (B −1 C)′
Ui∗

=B

−1

(5)

DUi

var(Ui∗ ) = (θ ) = B −1 D D ′ B ′−1
and θ = (α, β, δ, δ0 , c0 , ση2 , {ct , γth , vec(κt ), vec( th ), σv2t , vech
10
( ϑt )}t=T
t=1 ) is the p × 1 vector of parameters to be estimated.
Note that θ augments the parameters of interest (α, β, δ) from
Equation (1) with auxiliary parameters that are unrestricted except that the variances are positive (i.e., ση2 and σv2t are positive,
and ϑt is definite positive ∀t) so that (θ ) is positive definite.
Also, the vech operator here stacks by rows the lower triangle of
a square matrix so that we eliminate redundant elements of the
10More

specifically,
 −1
+ Tr=1
rk.

(T −1)k[(T −1)k+1]
2

p = 3 + 2k + T + (T − 1)(2 + k + m)k +

454

Journal of Business & Economic Statistics, October 2013

covariance matrix due to symmetry (vec operates analogously
but stacking by rows the full matrix).
I also define the matrices of data R = (Y0 X1 Y1 . . . XT YT ),
Xt = (Xt1 , . . . , Xtk ), and W = (w1 , w2 , . . . , wN )′ of orders N ×
T (k + 1) + 1, N × k, and N × m, respectively. Thus, the joint
distribution of R conditional on W is
R|W ∼ N (W
(θ ), IN ⊗ (θ ))

(6)

with log-likelihood
1
N
log det (θ ) − tr{(θ )−1 [R − W
(θ )]′
2
2
× [R − W
(θ )]},

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:18 11 January 2016

L∝−

(7)

which depends on a fixed number of parameters and satisfies the
usual regularity conditions. Moreover, if the distribution of R is
not assumed to be normal, the maximizer of L can be regarded
as a Gaussian pseudo ML estimator, which remains consistent
and asymptotically normally when N tends to infinity for fixed T
as shown in Section 2.3. Moreover, as also discussed in Section
2.3, this estimator is asymptotically equivalent to the class of
first-differenced GMM estimators discussed in Arellano and
Bond (1991) augmented with moments resulting from lack of
autocorrelation in the errors implied by assumption (2) (see Ahn
and Schmidt 1995).
This parameterization of the complete model is labeled as
Full Covariance Structure (FCS) representation. In this parameterization, the coefficient matrix B includes γth and th that
are the vectors and matrices gathering all the feedback from
lagged y’s to current x’s and the dynamic relationships between
the x variables, respectively. These parameters are not of relevant interest in our setting, but, in principle, they also need
to be estimated. This might represent a concern since the loglikelihood optimization problem becomes cumbersome given
the high dimension of θ .
However, to satisfy the same number of restrictions imposed
by (1) and (2) in the first- and second-order moments of the
observed data, there is a one-to-one mapping between the parameters in B and . For instance, the reduced-form parameters
in γth capture the feedback from yi in period h to xi in period t
with h < t. In the spirit of the simultaneous equations literature
and without imposing any additional restriction, this feedback
can also be captured through additional nonzero elements in
the variance-covariance matrix , which would no longer be
block-diagonal (see below). This mapping also applies to the
parameters in th capturing the dynamic effects between the x
regressors.
Fully developing this feature, I present in the next section
another parameterization [labeled as simultaneous equations
model (SEM)] that captures the feedback process and the dynamic relationships between the x’s in the variance-covariance
matrix of the system. This SEM parameterization turns out to
be useful in practice because it produces closed-form solutions
for some reduced-form parameter estimates. This enables me
to concentrate out the parameters of the dynamic relationships
between the x’s, which are not of pertinant interest. Given the
resulting profile likelihood (described in Appendix A.1), the
estimation problem becomes computationally affordable.

2.2 Simultaneous Equations Model (SEM)
Representation
In this section, I present an SEM representation that enables
me to make the likelihood optimization computationally affordable. For that purpose, I specify some reduced-form parameters
in the variance-covariance matrix so that, given the resulting parameterization, estimates for these parameters have closed-form
solutions.
Importantly, consistency and asymptotic normality in the
SEM parameterization still rely exclusively on assumption (2).
Stationarity assumptions are avoided by considering an unrestricted mean vector and covariance matrix of the joint distribution of initial observations and individual effects. Also, neither
time-series nor conditional homoscedasticity are assumed.
In the spirit of the simultaneous equations literature, the role
of the strictly exogenous regressors (i.e., the initial observations
yi0 and xi1 , and the wi regressors) is placed in the mean vector
of the endogenous variables. For this purpose, I first define the
following:

φ 1 + ǫi ,
ηi = φ0 yi0 + xi1

(8)

where the scalar φ0 and the k × 1 vector φ1 represent parameters
to be estimated and ǫi can be interpreted as an individual-specific
effect uncorrelated with the initial observations.11
Along these lines, I now define the reduced-form equations
for the x variables as follows:
xit = πt0 yi0 + πt1 xi1 + πtw wi + ξit

(t = 2, . . . , T ), (9a)

where ξit and πt0 are k × 1 vectors of prediction errors and
coefficients, respectively; πt1 is a k × k matrix and πtw a k × m
matrix.
Moreover, by substituting (8) in (1),

φ1 + wi′ δ + ǫi + vit
yit = αyi,t−1 + xit′ β + φ0 yi0 + xi1

(t = 1, . . . , T ).

(9b)

Note that the set of equations in (9a) only includes as righthand side variables the initial observations [which can be considered as strictly exogenous according to assumption (2)] and
the strictly exogenous regressors w. Therefore, reduced-form
equations in (9a) no longer include either the feedback process
or the dynamic relationships between the x’s. These relationships will be captured in the variance-covariance matrix of the
model, as discussed below.
To write the system in a matrix form convenient for concentration of the reduced-form parameters, I define the following
T + (T − 1)k × 1 vectors of data and errors for individual i:


′ ′
, xi3
, . . . , xiT
)
R̃i = (yi1 , yi2 , . . . , yiT , xi2

′ ′
Ũi = (ǫi + vi1 , . . . , ǫi + viT , ξi2′ , . . . , ξiT
).

Therefore, the model can be written in matrix form as
B̃ R̃i = C̃zi + Ũi ,
11Note

(10)

that another possibility would be to set the parameters capturing the
˜ 12
correlation between the effects and the initial observations (φ0 , φ1 ) in the
matrix below. Note also that in (8), I am implicitly assuming that cov(ηi , wi ) = 0
to ensure identification of δ.

Moral-Benito: Likelihood-Based Estimation of Dynamic Panels

455

where B̃ and C̃ are matrices of coefficients defined below and
zi is the (1 + k + m) × 1 vector of strictly exogenous variables:

zi = (yi0 , xi1
, wi′ )′ .

Moreover, if we additionally define the following vectors

′ ′

R̃i2 = (xi2
, xi3
, . . . , xiT
)

R̃i1 = (yi1 , yi2 , . . . , yiT )′

′ ′
Ũi2 = (ξi2′ , . . . , ξiT
),

Ũi1 = (ǫi + vi1 , . . . , ǫi + viT )′

it is then possible to rewrite





Ũi1
R̃i1
C̃1
B̃11 B̃12
=
zi +
,
0 Ik−1
R̃i2
C̃2
Ũi2

(11)

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:18 11 January 2016

B̃11



0

0

...

0

1

0

...

−α
..
.

1

...

..

..

0

...

0

0

0

...

0

0⎟


0⎟

.. ⎟

.⎠

0

...

−β ′
..
.

...

0 ⎟


0 ⎟

.. ⎟

. ⎠

⎜ −α



=⎜ 0
⎜ .
⎜ .
⎝ .


B̃12

1

⎜ −β ′



=⎜ 0
⎜ .
⎜ .
⎝ .

.

.

−α

..

.

1


πT 0

...

πT 1

0

πTw

where ι is a T × 1 vector of ones.
˜ 22 is the (T − 1)k × (T − 1)k covariance matrix that gath2.
ers all of the contemporaneous and dynamic relationships
between the x variables:


2,2


..
⎟,
˜ 22 = ⎜ ..

.

⎝ .
2,T

k(T −1)×(1+k+m)

In contrast to the FCS representation, in the SEM parameterization, the number of nonzero coefficients in the matrix B̃ is
only k + 1. This is so because the feedback process parameters
together with the parameters capturing the dynamic relationships between the x’s regressors have been “translated” into the
variance-covariance matrix of the model, which is no longer
block-diagonal.12
In particular,



˜ 12
˜ 11
Ũi1

˜
= var(Ũi ) = var
,
(12)
=
˜ 22
˜ 21
Ũi2


cov(vih , ξit ) =

T ,T

also that the matrix πtw contains the parameters capturing the effect
of wi on xit , and it is therefore the equivalent to the κt matrix in the FCS
parameterization.

(13a)
ψth
0

if h < t

(13b)

otherwise,

where φt , ψth , and 0 are k × 1 vectors.13 Therefore
⎛ ′



φ2 + ψ21
φ3′ + ψ31
. . . φT′ + ψT′ 1
⎜ φ′

φ3′ + ψ32
. . . φT′ + ψT′ 2 ⎟


2





⎜ φ′
φ3
. . . φT + ψT 3 ⎟
2



˜ 12 = ⎜

..
..
..


..


.
.
.
.








+
ψ
φ
φ
.
.
.
φ

T
T ,T −1 ⎠
2
3
φ2′

φ3′

...

φT′

.

T ×(T −1)k

Note here that the ψth (for h < t) parameters together with
πt0 capturing the feedback process in the SEM parameterization
are equivalent to the γth (for h < t) parameters in the FCS parameterization. Also, the φt vectors of parameters are equivalent
to the ct vectors in the FCS parameterization for t = 0, . . . , T .
Rearranging (10), we have
˜ θ̃ )′ zi + Ũi∗
R̃i =
(

(14)

with
˜ θ̃) = (B̃ −1 C̃)′
(
Ũi∗

= B̃

−1

(15)

Ũi

˜ B̃ ′−1
˜ θ̃ ) = B̃ −1
var(Ũi∗ ) = (

and
θ̃ = (α, β, δ, σǫ2 , φ0 , φ1 , {πt0 , vec(πt1 ), vec(πtw ), vech
2 t=T
( t,h ), φt , ψth }t=T
t=2 , {σvt }t=1 ).
13Other

12Note

...

cov(ǫi , ξit ) = φt


T ×T

−β ′ T ×k(T −1)


α + φ0 β ′ + φ1′ δ ′


⎜ φ0
δ′ ⎟
φ1′


C̃1 = ⎜ .
.. ⎟
..
⎜ ..
.⎟
.




δ T ×(1+k+m)
φ1
φ0


π20 π21 π2w
⎜ .
..
.. ⎟

C̃2 = ⎜
.
.
. ⎠
⎝ ..
0

˜ 11 has the classical error-component form but allowing for
1.
time-series heteroscedasticity:


˜ 11 = σǫ2 ιι′ + diag σv2 , . . . , σv2 ,

1
T

where t,h is the k × k covariance matrix between ξit and
ξih for t = 2, . . . , T and h = 2, . . . T . Note that the effects
˜ 22 and πt1 were captured
captured through parameters in
through th (with h < t) and ϑt in the FCS parameterization in Section 2.1.
˜ 12 captures the feedback process. In particular, given the
3.
assumptions above for t = 2, . . . , T , we can write

where


where

partial endogeneity configurations can also be accommodated in addition to the baseline predeterminedness assumption in (13a) and (13b). For
example, allowing for nonzero correlations between xit and contemporaneous
shocks (vit ) is straightforward by considering cov(vih , ξit ) = ψth if h ≤ t instead of (13b).

456

Journal of Business & Economic Statistics, October 2013

The joint distribution of R̃ conditional on Z can be written as


˜ θ̃ ), IN ⊗ (
˜ θ̃ )
R̃|Z ∼ N Z
(
(16)

with the resulting log-likelihood14

N
˜ θ̃)]′
˜ θ̃) − 1 tr{(
˜ θ̃)−1 [R̃ − Z
(
log det (
2
2
˜ θ̃)]},
× [R̃ − Z
(
(17)

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:18 11 January 2016

L̃ ∝ −

where R̃ is an N × T + (T − 1)k matrix that consists of the R̃i′
row vectors of each of the N units in the panel.
As in the case of the FCS representation, the maximizer of L̃
is a consistent and asymptotically normal estimator regardless
of nonnormality. The resulting (pseudo) ML estimator is asymptotically equivalent to the GMM estimator discussed in Arellano
and Bond (1991) augmented with the moments discussed in Ahn
and Schmidt (1995) given by the lack of autocorrelation in the
errors.
Finally, given the SEM parameterization, some reduced-form
coefficients in θ̃ have closed-form solution. Thus, the dimension
of the problem can be drastically reduced if we consider the
profile likelihood as described in Appendix A.1.
2.3 Asymptotic Normality and Relationship With
Method-of-Moments Estimators

E(Ri Ri′ ) = (θ ),

(18)
(19)

where 0 is a T (k + 1) + 1 × 1 vector of zeros and Ri and
(θ ) are defined above. Instead of a likelihood-based approach,
one can also consider the following moment conditions to
estimate θ :
vechE[Ri Ri′ − (θ )] = E[si − ω(θ )]

(20)

and proceed within a GMM framework provided that standard
identification conditions hold
θ̂GMM = arg min[s̄ − ω(c)]′ V̂ −1 [s̄ − ω(c)],

(21)

c

where s̄ is the sample mean of the q × 1 vector si and V̂ is some
consistent estimator of the covariance matrix V = var(si ).
In particular, the focus here is on GMM estimators based
on the optimal weighting matrix under normality V̂ −1 =
1 ′ ˆ −1
ˆ −1 )D with the selection matrix D given by D =
D ( ⊗ 
2

∂vec

ˆ = 1 N
and 
i=1 Ri Ri . Note that GMM estimators of
∂(vech)′
N
this type can be reformulated in terms of transformations of
the original moments in (20) (see Arellano 2003, p. 70); hence,
the GMM estimator in (21) belongs to the class of one-step
estimators discussed in Arellano and Bond (1991) augmented
14Since det(B̃) = 1 and given the properties of the trace of a product of matrices,
˜ − 1 tr(
˜ −1 Ũ ′ Ũ ).
the likelihood can also be expressed as L̃ ∝ − N2 log det( )
2
15Note

and the resulting estimator can be proved to be consistent and
asymptotically normal within the class of GMM estimators.16
In contrast, the pseudo ML estimator (ssLIML) discussed in
this article solves


N
1
−1 ′
log det (c) + tr((c) R R) .
θ̂ssLIML = arg min
2
2
c
(23)

Without considering strictly exogenous variables (w) for the
ease of exposition, the model in Equations (1) and (2) implies
the following first- and second-order moments of the observed
data as discussed in the previous sections:15
E(Ri ) = 0

with the Ahn and Schmidt (1995) quadratic moment restrictions
given by the lack of autocorrelation of the errors implied by assumption (2). Indeed, the number of overidentifying restrictions
(q − p) imposed by this GMM estimator coincides with the
number of restrictions imposed in the covariance matrix by the
pseudo ML estimator (ssLIML). For instance, if T = 3, k = 1,
and m = 0, the aforementioned GMM estimator exploits seven
moment conditions for estimating two parameters and thus imposing 7 − 2 = 5 overidentifying restrictions; in the case of the
ssLIML estimator, if T = 3, k = 1, and m = 0, the unrestricted
covariance matrix  includes (7 × 8)/2 = 28 parameters to be
estimated, while the restricted version in (5) only contains 23
parameters implying 28 − 23 = 5 overidentifying restrictions.
The first-order conditions from the problem in (21) are


∂ω(c) ′ −1
V̂ [s̄ − ω(c)] = 0
(22)

∂c′

that the variables are expressed in deviations from their cross-sectional
mean to consider time effects.

The first-order conditions resulting from this problem are
given by


∂ω(c) ′ ′ −1
D ( (c) ⊗ −1 (c))D[s̄ − ω(c)] = 0. (24)

∂c′
The first-order conditions in (24) have the same form as those
of the GMM problem in Equation (22). The only difference
arises due to the (continuously updated) weighting matrix implicitly used by the optimization problem in (23). Therefore, the
asymptotic distribution of both estimators θ̂ssLIML and θ̂GMM is
the same with or without normality.
To formally establish this equivalence together with the
explicit formula of the asymptotic variance-covariance matrix,
I elaborate a restatement of the results in Section 4.4 of
Chamberlain (1984). In particular, the following assumptions
will be imposed throughout:
1.  ⊂ Rp is an open set containing θ .
2. ω is a continuous, one-to-one mapping of  into Rq with a
continuous inverse.
3. ω has continuous second partial derivatives in .
4. rank[ ∂ω(c)
] = p for c ∈ .
∂c′
5. (c) is nonsingular for c ∈ .

D
6. N(s̄ − ω(θ )) → N(0, V), where V = var(Ri ⊗ Ri ) and
a.s.
s̄ → ω(θ ).
Theorem 1.√Let Assumptions 1–6 hold. Then, for fixed T
as N → ∞,
distri√
√ N(θ̂ssLIML − θ ) has the same limiting
bution as N (θ̂GMM − θ ), which is given by N(θ̂ssLIML −
D

θ ) → N(0, W), where W = (G′ G)−1 G′ VG(G′ G)−1 ,
, and  = ((θ ) ⊗ (θ ))−1 .
G = ∂ω(θ)
∂θ ′

16See Chamberlain (1982) and Abowd and Card (1989) for detailed analyses on
the estimation and testing of covariance structures from panel data.

Moral-Benito: Likelihood-Based Estimation of Dynamic Panels

457

Proof. See propositions 2 and 5 in Chamberlain (1984).17 

Theorem 1 states the asymptotic equivalence between θ̂ssLIML
and θ̂GMM together with the common asymptotic variancecovariance matrix. If the distribution of Ri is normal, W is the
minimal variance-covariance matrix within this class of estimators (see Richard 1975). Under nonnormality both estimators
remain consistent and asymptotically normal but they are inefficient for large-N
to an alternative GMM estimator
 relative

¯ ′.
s
s

ss
based on V̂ = N1 N
i=1 i i

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:18 11 January 2016

3.

MONTE CARLO EVIDENCE

In this section, I investigate the finite sample behavior of
the ssLIML estimator in relation to other available alternatives
for estimating panel data models with both lagged dependent
variable regressors and additional predetermined explanatory
variables correlated with the individual effects. For this purpose,
I closely follow the simulation setting in Bun and Kiviet (2006).
3.1 Data-Generating Process
I generate the simulated samples from a panel data model
with a lagged dependent variable, individual effects, and another (potentially predetermined) explanatory variable. More
specifically, data for the dependent variable y and the explanatory variable x are generated according to the following:
(25)

yit = αyit−1 + βxit + ηi + vit
xit = ρxit−1 + φyit−1 + π ηi + ξit ,

(26)

iid(0, σv2 ),

where vit , ξit , and ηi are generated as vit ∼
ξit ∼
iid(0, σξ2 ), and ηi ∼ iid(0, ση2 ), respectively. The parameter φ in
(26) captures the feedback from the lagged dependent variable
to the regressor so that x is strictly exogenous if φ = 0.
This particular data-generating process (DGP) corresponds
to scheme 2 in Bun and Kiviet (2006).18 With respect to the
parameter values, I consider all of the feasible designs in Bun
and Kiviet (2006). In particular, the parameter values explored
include α = {0.25, 0.75}, β = 1 − α, ρ = {0.50, 0.95}, φ =
[φ1 (1 − α)(1 − ρ)]/[1 + βφ1 ] with φ1 = {−1, 0, 1}, and π =
π1 (1 − ρ − φ) − φ/(1 − α) with π1 = {−1, 0, 1}. The variance
of the disturbance term σv2 is normalized to 1, and the variance
of the individual effects (ση2 ) is fixed such that the impact of
the two variance components ηi and vit on the variance of yit
has a ratio μ2 with μ = {0, 1, 5}. Finally, for the regressor’s
disturbance (σξ2 ), I consider the signal-to-noise ratio (ζ ), which
determines the usefulness of xit for explaining yit and is fixed
to be either 3 or 9 in the simulations. Bun and Kiviet (2006)
provided more details about this choice of parameter values,
which covers a wide range of configurations.
The resulting combinations of parameter values are such that
the processes for y and x are both stable but not necessarily
stationary. Bun and Kiviet (2006) only considered stationarity
17Note

that since s̄ is a sample average, θ̂GMM is both a GMM and a minimum
distance estimator in the terminology of Chamberlain (1984).

settings by additionally assuming that both processes started in
the distant past. Equivalently, stationarity can also be achieved
by imposing that the distribution of the initial observations coincides with the steady-state distribution. In particular, stationarity
in mean requires that the mean of the initial observation conditional on the individual effects coincides with the steady-state
mean. Since I generate the initial observations according to
yi0 = πy0 ηi + vi0

(27)

xi0 = πx0 ηi + ξi0 ,

(28)

mean-stationarity can be achieved by imposing
πy0 =

βπ + (1 − ρ)
(1 − ρ)(1 − α) − βφ

(29)

πx0 =

φ + π (1 − α)
(1 − ρ)(1 − α) − βφ

(30)

while the processes can be mean-nonstationary provided πy0
and πx0 are left unrestricted.
Allowing for mean-nonstationarity may be desirable in many
empirical applications. For instance, when analyzing crosscountry datasets starting at the end of a war (or micro panels for
young workers or new firms), initial conditions at the start of the
sample may not be representative of the steady-state behavior
of the different processes, and thus mean-stationarity may not
hold. Therefore, in contrast to Bun and Kiviet (2006), I also
consider the case of mean-nonstationarity in the simulations.
3.2

Estimators

In addition to the ssLIML estimator discussed in Section 2,
the performance of seven competing estimators is evaluated in
the simulations.
Note that Equation (25) can be rewritten as
yit = h′it b + ηi + vit

(31)



with hit = (yit−1 , xit ) and b = (α, β) .
The standard assumption considered throughout the article
[see Equation (2)] is given by


(32)
E vit |hti , ηi = 0.

Under assumption (32), the following set of linear moment
conditions is available:19



∗′
E ht−1
(33)
i (yit − hit b) = 0.

Based on these orthogonality conditions, I first consider the
well-known first-differenced GMM estimator (e.g., Arellano
and Bond 1991), which solves
b̂GMM = arg min(y ∗ − H ∗ b)′ ZAN Z ′ (y ∗ − H ∗ b),

(34)

b

where yi∗ = (yi1 , . . . , yi,T −1 )′ , h∗i = (h′i1 , . . . , h′i,T −1 )′ , y ∗ =




(y1∗ , . . . , yN∗ )′ , H ∗ = (h∗1 , . . . , h∗N )′ , and Z = (Z1′ , . . . , ZN′ )′ .
Zi is a block diagonal matrix whose tth block is ht−1
i . In
particular, I consider the one-step nonrobust weighting matrix
AN ∝ (Z ′ Z)−1 .

18I

consider this scheme because, as acknowledged by Bun and Kiviet (2006),
it is more realistic than their baseline scheme 1, considered for convenience in
the evaluation of their analytical results.

19Starred

variables denote forward orthogonal deviations (see Arellano and
Bover 1995).

458

Journal of Business & Economic Statistics, October 2013

Since b̂GMM is expected to have poor finite-sample properties
[see for instance Blundell and Bond (1998)], I also consider
two asymptotically equivalent alternatives that may have different finite-sample properties. First, the symmetrically normalized
GMM estimator (henceforth SNM) proposed in Alonso-Borrego
and Arellano (1999) exploits the vector of orthogonality conditions in (33) but normalized to have unit length:
b̂SNM = arg min

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:18 11 January 2016

b

(y ∗ − H ∗ b)′ ZAN Z ′ (y ∗ − H ∗ b)
.
1 + b′ b

(35)

Second, LIML analog estimators (e.g., Alonso-Borrego and
Arellano 1999; Akashi and Kunitomo 2010), or “continuously
updated GMM” in the terminology of Hansen, Heaton, and
Yaron (1996), minimize the maximum possible sample correlation between the errors and the instruments according to moment
conditions in (33). In particular, we consider in our simulation
study a nonrobust LIML analogue that minimizes a criterion of
the form
b̂LIML = arg min(y ∗ − H ∗ b)′ ZAN (b)Z ′ (y ∗ − H ∗ b), (36)

variables can be used as instruments for the errors in levels:
E[hit (yit − h′it b)] = 0

E[vi,t−1 (yit − h′it b)] = 0

(t = 3, . . . , T )

(37)

and exploit them within the standard GMM framework above.
The resulting GMM estimator, labeled here as b̂AHNS , minimizes a criterion such as the one in (34) but also using lagged
first-differenced errors as instruments for the errors in levels.
Note that the GMM estimation problem becomes nonlinear in
this case.
The ssLIML estimator (b̂ssLIML ) proposed in this article implicitly considers these additional moment conditions, and it can
be shown to be asymptotically equivalent to b̂AHNS as N → ∞
for fixed T (see Section 2.3).
In addition to the baseline assumption in (32), one can also
assume that the processes for y and x are mean-stationary.
Mean-stationarity implies that E(yit |ηi ) and E(xit |ηi ) are timeinvariant so that changes in yit or xit are mean independent of
the individual effect ηi :
E(yit − yi,t−1 |ηi ) = 0

(38)

E(xit − xi,t−1 |ηi ) = 0.

(39)

If we are willing to consider this mean-stationarity assumption, we can also exploit the resulting moment conditions as
discussed in Arellano and Bover (1995). In particular, under assumptions (38) and (39), first differences of the right-hand-side

(40)

As a result, the so-called sGMM estimator (b̂sGMM ) solves
the minimization problem in (34) using the combined set of
moments in (40) and (33) and the matrices of data and instruments defined accordingly [see Arellano and Bover (1995) for
more details].
The sGMM estimator is also included in the simulation
given its popularity in applied research. However, it is worth
emphasizing that the remaining estimators considered (i.e.,
GMM, SNM, LIML, AHNS, and ssLIML) are consistent under
mean-nonstationarity while they remain consistent under meanstationarity. In contrast, sGMM requires mean-stationarity
(which might not hold in many empirical applications) for consistency.
Finally, the ML estimator in first differences suggested in
Hsiao, Pesaran, and Tahmiscioglu (2002) is also included in
the simulation exercise. In particular, this estimator (labeled as
b̂HPT ) solves

b

where AN (b) = (Z ′ Z)−1 /(y ∗ − H ∗ b)′ (y ∗ − H ∗ b).
The estimators b̂GMM , b̂SNM , and b̂LIML are all based on the
same moment conditions in (33) and have the same asymptotic
distribution as N → ∞ for fixed T (see Alonso-Borrego and
Arellano 1999 for more details). However, they are expected to
have different sampling behavior mainly because of the alternative normalization rules adopted in the three cases (see Hillier
1990).
As suggested by Ahn and Schmidt (1995), henceforth AHNS,
in addition to the orthogonality conditions in (33), we can also
consider the following moment conditions given the lack of
serial correlation in the errors implied by (32):

(t = 3, . . . , T ).

N

b̂HPT = arg min
b

1
N
§
§
ln det § +
ui ′§ ui ,
2
2 i=1

(41)

§

where § and ui are given by equations (3.2) and (4.16) in
Hsiao, Pesaran, and Tahmiscioglu (2002). Note that, in contrast
to the remaining estimators considered here (i.e., GMM, SNM,
LIML, AHNS, ssLIML, and sGMM), b̂HPT will be inconsistent
for fixed T as N → ∞ if the unconditional variances of the
errors vary over time.
For the sake of completeness, we also consider the withingroup or least-squares dummy-variable (henceforth LSDV) estimator which is given by the slope coefficients in an OLS
regression of yit on hit and a full set of individual dummy variables, or equivalently by the OLS estimate in deviations from
time means or orthogonal deviations.
3.3 Results
For each possible combination of parameter values (i.e., design), I generate 5000 samples. Then I compute the median bias,
the 75th–25th interquartile range (iqr), and the median absolute
error (MAE) for all of the estimators evaluated, that is, GMM,
SNM, LIML, AHNS, ssLIML, sGMM, HPT, and LSDV (means
and standard deviations are not reported because SNM, LIML,
and ssLIML estimators are expected to have infinite moments).
The large number of designs considered precludes the discussion of all of the results in detail. Based on MAEs, Figure 1
depicts an overview of the relative performance of the estimators
for all of the DGPs considered with (N, T ) = (50, 4). Figure 1
presents MAEs for α (y-axis) and β (x-axis) when the DGPs are
mean-stationary.
MAEs corresponding to the LSDV estimator present the expected pattern (large MAEs for α, while MAEs for β are relatively large only when σξ2 is small regardless of the feedback;
moreover, LSDV presents the lowest iqrs). In the case of GMM,
MAEs are of similar magnitude for α and β and relatively large
in both cases, especially for large values of ση2 and α. SNM
presents systematically lower biases for both α and β but also

Moral-Benito: Likelihood-Based Estimation of Dynamic Panels

1.5

2

.5

1
β

1.5

2

1.5

2

α
.4
.2
0
1
β

1.5

2

.5

1
β

1.5

.5

2

1
β

1.5

2

.8
.6

.8
.6

α
.4
.2

.2
0
0

0

ssLIML

α
.4

α
.4
.2
0
1
β

.5

sGMM

.6

.8
.6
α
.4
.2
0

.5

0

AHNS
.8

HPT

0

.6

.6
α
.4
.2
0

0

1
β

0

.2
0
.5

LIML

.8

.8
.6
α
.4

.6
α
.4
.2
0
0

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:18 11 January 2016

SNM
.8

GMM

.8

LSDV

459

0

.5

1
β

1.5

2

0

.5

1
β

1.5

2

Figure 1. Median absolute errors (MAEs) under mean-stationarity. This figure presents the MAEs for all estimators of α and β corresponding
to all the designs considered in our simulations under mean-stationarity.

larger iqrs, and thus MAEs are relatively similar to those of
GMM. The AHNS estimator presents both lower biases and iqrs
resulting in a better performance in terms of MAEs, especially in
the case of α. In contrast, LIML suffers from large biases and iqrs
when σξ2 is large relative to ση2 so that its overall performance is
relatively worse than the other estimators. The sGMM estimator
performs relatively well under mean-stationarity and, overall, it
seems to be preferable to GMM, SNM, and LIML (note that in
many designs, there are no individual effects and then sGMM
is expected to perform well). ssLIML presents MAEs similar to
AHNS (slightly lower for α and larger for β) caused by lower
biases for both α and β together with larger iqrs for β. Also, the
performance of the HPT estimator is similar to that of AHNS
and ssLIML (a bit better for β and somewhat worse for α),
given its lower iqrs and slightly larger biases. Overall, MAEs
in Figure 1 seem to favor AHNS, HPT, and ssLIML over other
estimators.
Figure 2 plots MAEs in the case of mean-nonstationarity
designs. Overall, one can conclude that the relative performance of the competing estimators is very similar to that of
mean-stationary settings. This is so because, with the exception of sGMM, the considered estimators do not require meanstationarity for fixed-T consistency. Therefore, only sGMM
tends to present larger MAEs, thus performing considerably
worse than in the case of mean-stationary DGPs. Moreover,
for large values of ση2 relative to σv2 , they all improve because
the correlation between lagged levels and first differences gets
larger as shown in Hayakawa (2009). Note also that many MAEs
plotted in Figures 1 and 2 correspond to designs with ση2 = 0, so
the effect of mean-stationarity assumptions becomes negligible.
Finally, it is worth stressing that very different configurations

are “aggregated” in Figures 1 and 2, so the r