07350015%2E2014%2E962697
Journal of Business & Economic Statistics
ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20
Rethinking the Univariate Approach to Panel Unit
Root Testing: Using Covariates to Resolve the
Incidental Trend Problem
Joakim Westerlund
To cite this article: Joakim Westerlund (2015) Rethinking the Univariate Approach to Panel Unit
Root Testing: Using Covariates to Resolve the Incidental Trend Problem, Journal of Business &
Economic Statistics, 33:3, 430-443, DOI: 10.1080/07350015.2014.962697
To link to this article: http://dx.doi.org/10.1080/07350015.2014.962697
Accepted author version posted online: 25
Sep 2014.
Submit your article to this journal
Article views: 121
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]
Date: 11 January 2016, At: 19:53
Rethinking the Univariate Approach to Panel
Unit Root Testing: Using Covariates to Resolve
the Incidental Trend Problem
Joakim WESTERLUND
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
Department of Economics, Lund University, SE-22007 Lund, Sweden; Deakin University, 3125 Burwood, Australia
([email protected])
In an influential article, Hansen showed that covariate augmentation can lead to substantial power gains
when compared to univariate tests. In this article, we ask if this result extends also to the panel data
context? The answer turns out to be yes, which is maybe not that surprising. What is surprising, however,
is the extent of the power gain, which is shown to more than outweigh the well-known power loss in the
presence of incidental trends. That is, the covariates have an order effect on the neighborhood around unity
for which local asymptotic power is negligible.
KEY WORDS: Covariates; Incidental trends; Local asymptotic power; Panel data; Unit root test.
1.
INTRODUCTION
As is well known, univariate unit root tests, such as the
conventional augmented Dickey–Fuller (ADF) test, have low
power, and much effort has therefore gone into the development
of various modifications aimed to increase power (see Leybourne, Kim, and Newbold 2005, and the references provided
therein). In many cases, however, there is more information to be
had, and then power can be increased without the need for such
modifications. For example, in regression analysis, due to the
risk of obtaining spurious results, it is quite common to pretest
for unit roots, and then it seems quite natural to try to make
use also of the information contained in the other variables of
the model. After all, we typically do not use regressions unless
we believe that the included variables are correlated. This is the
idea of Hansen (1995), who developed a covariate augmented
ADF (CADF) test that is shown to be at least as powerful as
the ADF test. (The CADF considered here is not to be confused
with the cross-sectionally augmented Dickey–Fuller of Pesaran
(2007).)
But while the CADF approach has attracted some attention,
the single most common way by far in which researchers have
been trying to increase the power of the ADF test is through the
use of panel data. Thus, in this case the source of extraneous
information is not a set of correlated covariates but rather a
cross-section of similar units.
In light of these developments, the question naturally arises
if there are any power gains to be made by considering a panel
CADF (PCADF) test that exploits both sources of information?
Intuitively, since the two types of information are individually
important for power, there should be some merit in combining
them. Of course, this article is not the first to recognize the
value of covariate augmentation in a panel data context (see,
e.g., Pesaran 2007; Chang and Song 2009; Pesaran, Smith, and
Yamagata 2013, who used covariates to address the problem
of cross-section dependence); however, it is the first to study
analytically the power implications of doing so. In other words,
while previously the rationale for the covariates (in panels) has
always been to improve upon size accuracy, no one has yet
considered their effect on power.
Our main finding is that the information contained in the
covariates is useful when testing for a unit root in panel data, and
that the power of the PCADF test can be substantially increased,
far beyond that achievable by existing tests that do not employ
any covariate information. The largest difference occurs in the
presence of incidental trends, which are problematic in the sense
that their estimation is known to lead to low power. In fact, as
Moon, Perron, and Phillips (2007, p. 445) concluded from their
analysis of the local power of univariate panel unit root tests
with trends:
An important empirical consequence of the present investigation is that increasing the complexity of the fixed effects in a
panel model inevitably reduces the potential power of unit root
tests. This reduction in power has a quantitative manifestation
in the radial order of the shrinking neighborhoods around unity
for which asymptotic power is nonnegligible. When there are
no fixed effects or constant fixed √
effects, tests have power in a
neighborhood of unity of order 1/ N T (where N and T denote
the size of the cross-section and time dimensions, respectively).
When incidental trends are fitted, the tests only have power in a
larger neighborhood of order 1/N 1/4 T .
Moon and Perron (1999) showed that the maximum likelihood estimator of the local-to-unity parameter in near unit
root panels is inconsistent. They called this phenomenon,
which arises because of the presence of an infinite number
of nuisance parameters, an “incidental trend problem,” because it is analogous to the well-known incidental parameter
problem in dynamic fixed-T panels. The above-mentioned reduction in the order of the shrinking neighborhoods around
unity for which power is nonnegligible is a manifestation of
this problem, and has in fact given rise to a separate literature (see, e.g., Moon and Perron 2004; Moon and Phillips
430
© 2015 American Statistical Association
Journal of Business & Economic Statistics
July 2015, Vol. 33, No. 3
DOI: 10.1080/07350015.2014.962697
Westerlund: Rethinking the Univariate Approach to Panel Unit Root Testing
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
2004; Moon, Perron, and Phillips 2007; Phillips and Sul
2007). One of the main conclusions from this literature is
that the incidental trend problem is a general phenomenon that
applies to all panel unit root tests. Indeed, as Moon, Perron, and
Phillips (2007, p. 445) concluded, “the present article shows that
discriminatory power against a unit root is generally weakened
as more complex deterministic regressors are included.”
In this article, we show that this need not be the case, and that
the use of covariates can compensate for the loss of power caused
by the incidental trends.√That is, the PCADF test has nonnegligible power within 1/ N T -neighborhoods of the null even
if incidental trends are present. This property makes PCADF
unique, as there is presently no other test with incidental trends
that has power within such neighborhoods. Conversely, if the
rate of shrinking is given by 1/N 1/4 T , unlike existing tests, the
power of PCADF is actually increasing in N.
2.
MODEL AND ASSUMPTIONS
Consider the panel variable yi,t , observable for t = 1, . . . , T
time series and i = 1, . . . , N cross-section units. The datagenerating process (DGP) of this variable is given by
yi,t = θi′ dt + ui,t ,
(1)
where ui,t is the stochastic part of yi,t , while dt = (1, t)′ is the deterministic part, for which there are two models; (1) θ = (θ1 , θ2 )′
with θ1 unrestricted and θ2 = 0 (unit-specific intercepts), and
(2) θ unrestricted (unit-specific intercepts and incidental time
trends). Thus, while our main focus lies with model 2, for completeness, we will also consider model 1. The stochastic part is
allowed to depend on an m-vector of covariates, xi,t , which could
potentially be common across i (thereby allowing for some form
of cross-section dependence). Specifically,
φi (L)ui,t = ρi ui,t−1 + vi,t ,
′
vi,t = λi (L) (xi,t − γi ) + ǫi,t ,
i (L)(xi,t − γi ) = εi,t ,
(2)
(3)
(4)
p
where γi = E(xi,t ), and i (L) = Im − j =1 j i Lj , φi (L) =
p
p
1 − j =1 φj i Lj , and λi (L) = j = 0 λj i Lj are polynomials in
the lag operator L. In the assumptions that
√ follow k denotes
a generic constant, and tr(A) and ||A|| = tr(A′ A) denote the
trace and Frobenius (Euclidean) norm of the matrix A, respectively.
Assumption 1.
′ ′
) is independent and identically dis(i) ηi,t = (ǫi,t , εi,t
′
tributed (iid) such that E(ηi,t ) = 0, E(ηi,t ηi,t
) = ηi =
2
k
diag(σǫi , εi ) > 0 and E(||ηi,t || ) < ∞ for k ≥ 4;
′
′ ′ 2
(ii) E(||(ui,−p , xi,−p
)′ ||2 ), . . . , E(||(ui,0 , xi,0
) || ) < ∞;
(iii)
i (L) and φi (L) have all roots outside the unit circle, and
p
j = 0 ||λj i || < ∞.
Similar to Hansen (1995) our asymptotic analysis supposes
that ρi is local-to-zero as N, T → ∞. However, since we are
using panel data, the rate of shrinking is different. In particular,
431
it is assumed that
ρi =
φi (1)ci
,
NκT
(6)
where κ > 0 is a constant and ci is drift parameter that satisfies
Assumption 2.
Assumption 2.
(i) ci is iid with µkc = E(cik ) < ∞ for k ≥ 3 and µ0c = 1;
(ii) ci and ηi,t are mutually independent.
If ci = ρi = 0, then yi,t is unit root nonstationary, whereas if
ci = 0, then yi,t is either locally stationary (ci < 0) or locally explosive (ci > 0). The null and alternative hypotheses considered
here are given by H0 : c1 = · · · = cN = 0 and H1 : ci = 0 for
some i, respectively, which can be formulated more compactly
in terms of the moment of ci as H0 : µ2c = 0 and H1 : µ2c > 0,
respectively.
As Hansen (1995) showed, with serially correlated errors, the power of the CADF test depends not only on ci ,
but also on the long-run correlation coefficient between vi,t
and ǫi,t , as given by ρvǫi = σǫi /σvi ∈ (0, 1], where σvi2 =
λi (1)′ i (1)−1 εi i (1)−1 λi (1)′ + σǫi2 (see the Appendix). Thus,
if ρvǫi → 1, then xi,t does not make any contribution to the variation in yi,t , whereas if ρvǫi → 0, then xi,t explains all the
variation in yi,t .
N k
Assumption 3.
i=1 ρvǫi /N → ρ kvǫ ∈ (0, ∞) as N → ∞
for k ∈ (−∞, ∞).
Remark 1. The assumption that ηi,t is cross-section independent is restrictive, but can be relaxed by requiring that some of
the elements of xi,t are constant in i, in which case (3) becomes a
common factor model, with the common elements of xi,t taking
the role of common factors. In the article, we assume that xi,t is
known, in which case the presence of common covariates does
not affect the results. If there are common covariates that are
unobserved, then one possibility is to follow Bai and Ng (2004),
and use estimated principal component factors in their stead.
In Section 4, we elaborate on this point. For the time being,
however, we maintain the assumption that xi,t is known.
Remark 2. Assumption 1 (ii) ensures that the initial values
of ui,t and xi,t are Op (1), which is relevant if the initialization
took place somewhere in the recent past. While admittedly the
simplest way to relax the otherwise so common zero initial
value assumption (see, e.g., Moon, Perron, and Phillips 2007),
the results reported herein hold also when the initialization is in
the distant
past, such that the initial values of ui,t and xi,t are
√
Op ( T ) (see Westerlund 2014a). This is different from the time
series case where the size of the initial value strongly influences
the performance of unit root tests, up to the point of reversing
the ranking of different tests (see, e.g., M¨uller and Elliott 2003).
Remark 3. The requirement that ci has at least three moments
is only needed when analyzing the local power, and is not necessary for deriving the asymptotic null distribution of the PCADF
test, in which case ci and all its moments are zero. In fact, the
current moment condition is less restrictive than the otherwise
so common bounded support assumption, which implies that
432
Journal of Business & Economic Statistics, July 2015
all moments are bounded (see, e.g., Moon, Perron, and Phillips
2007; Moon and Perron 2008).
Remark 4. Unlike most other studies where the rate of shrinking of the local alternative, κ, is prespecified a priori, in the
present study the “appropriate” value of κ will be considered a
part of the analysis (see Section 3.2 for a detailed discussion).
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
Remark 5. The requirement that λi (L), φi (L), and i (L) are
all of the same order p is not a restriction. If the orders are
different, then we simply set p equal to the maximum order of
φi (L) and λi (L). i (L) does not have to be estimated and can
therefore be of any order (even infinite, although that would
require changing Assumption 1 (iii)). The lag order also does
not have to be the same for all i, but could be allowed to differ
without affecting the results.
Remark 6. As with p (see Remark 5), the assumption that the
number of regressors contained in xi,t , m, is the same for all i is
not a restriction. Hence, in practice there is nothing that prevents
the number of covariates to differ from unit to unit, which is of
course a great advantage, especially in applications where data
on some units are scarce (see Section 4 for a discussion). In fact,
T could also be allowed to differ across units.
3.
In this section, we begin by introducing the PCADF test
statistic and its asymptotic distribution. Then we discuss, in
turn, the implications for power and implementation.
3.1 The PCADF Statistic and its Asymptotic Distribution
Rd yi,t − 1 = yi,t−1 −
t=p + 2
yi,t−1 dt′ ⎝
T
t=p + 2
The asymptotic distribution of this test statistic is provided in
Theorem 1.
Theorem 1. Under Assumptions 1–3, as N, T → ∞,
tPCADF −
2
√
N 1/2−j κ ((ρ −1vǫ − ρ 1vǫ )r2j − ρ 1vǫ r1j )
Nµ ∼
j =1
r22
+ N(0, σ 2 )
2
√
µ1c
µ3c
N
+ Op
+Op
+Op √ ,
κ
3κ−1/2
N
N
T
+ N 1/2−2κ ρ 1vǫ
where ∼ signifies asymptotic equivalence,
r11 =
r12 =
⎞−1
dt dt′ ⎠
dt
be the detrended version of yi,t−1 , where Rd is the ordinary
least-square (OLS) residual operator. Equations (1) and (2) can
be rewritten as
Rd yi,t = ρi Rd yi,t−1 + ′i Rd zi,t + Rd ǫi,t ,
(7)
′
′
)′ with i =
, . . . , xi,t−p
where zi,t = (yi,t−1 , . . . , yi,t−p , xi,t
′ ′
′
(φ1i , . . . , φpi , λ0i , . . . , λpi ) being the associated vector of coefficients. Define
Ai,T =
T
1 1
Rz Rd yi,t−1 Rz Rd yi,t ,
σˆ yi σˆ ǫi T t=p + 2
Bi,T =
T
1 1
(Rz Rd yi,t−1 )2 ,
σˆ yi2 T 2 t=p + 2
µ1c α0 β1
3/2
2β0
µ2c α0 β2
3/2
2β0
,
−
3µ21c α0 β12
5/2
,
8β0
µj c βj −1
,
√
β0
α0 ρ
µ = √ 1vǫ ,
β0
σ 2 = 1 − ρ 2vǫ +
ρ 2 α 2 α2
α1 ρ 2vǫ
+ 1vǫ 30 ,
β0
4β0
and numerical values of α0 , α1 , α2 , β0 , β1 , and β2 are given in
Table 1.
Let
⎛
√
NAT
=
.
BT
tPCADF
r2j =
MAIN RESULTS
T
is given by
p
where σˆ yi2 = σˆ vi2 /(1 − j =1 φˆ j i )2 in an estimator of σyi2 =
2
2
σvi2 /φi (1)2 , σˆ vi2 = Tt=p+2 vˆi,t
/T , σˆ ǫi2 = Tt=p+2 ǫˆi,t
/T with
p
ˆ i,
vˆi,t = Rd yi,t − ρˆi Rd yi,t−1 − j =1 φˆ j i Rd yi,t−j , and ρˆi ,
ˆ
φj i and ǫˆi,t coming from the OLS fit of (6), and Rz is Rd with
zi,t in place of dt . Letting AT = N
i=1 AiT /N with a similar
definition of B T , the PCADF statistic considered in this article
Remark 7.√The last three terms in the asymptotic distribution
of tPCADF − N µ are remainders. The first two of these are only
relevant under the alternative that ci = 0, and are negligible for
all κ > 1/6 (provided that ci has at least three moments). The
third remainder does not depend on ci and is therefore there also
under the unit root null. It follows that for this term to go away
we need N/T → 0 as N, T → ∞ (which in practice means
that N 1/2, then
−N 1/2−κ r11 = o(1), and therefore power is negligible, whereas
if κ < 1/2, then −N 1/2−κ r11 diverges, and therefore power goes
to one as N → ∞. Only in the intermediate case when κ =
1/2, such that −(N 1/2−κ r11 + N 1/2−2κ√(r12 − r22 /2)) = −(r11 +
N −1/2 (r12 − r22 /2)) = −r11 + O(1/ N ) is power nonnegligible in the usual nonincreasing sense. This is in agreement with
the results reported by, for example, Moon and Perron (2004)
for their t + panel unit root test that does not employ any covariate information. As usual in the literature, Moon and Perron
(2004) only considered the first-order term, r11 , which only depends on µ1c . Their results are therefore silent when it comes
to the effect of higher order moments. Theorem 1 includes an
433
Table 1. Coefficients of the asymptotic distribution of the
PCADF statistic
Coefficient
Model 1
Model 2
α0
α1
α2
β0
β1
β2
−1/2
1/12
1/45
1/6
1/12
1/20
−1/2
1/60
11/6300
1/15
0
−1/420
NOTE: Models 1 and 2 refer to the cases with a heterogenous intercept but no trend, and
heterogenous intercepts and trends, respectively.
additional second-order term, −N −1/2 (r12 − r22 /2), which depends on µ2c and is therefore more general in this regard. Note
in particular that if µ1c = 0 and µ2c > 0 (positive and negative
values of ci cancel out), such that r11 = 0 and (r12 − r22 /2) = 0,
then −(N 1/2−κ r11 + N 1/2−2κ (r12 − r22 /2)) = −N 1/2−2κ (r12 −
r22 /2). This means that while negligible for κ = 1/2, power
is nonnegligible for κ = 1/4. Thus, in this case the results of
Moon and Perron (2004) would lead us to believe that there
√ is
no power, when in fact there is, but just not within 1/ N T neighborhoods of the null.
At the other end of the scale, if maxi∈[1,N] ρvǫi → 0, then
ρ 1vǫ → 0 and (ρ −1vǫ − ρ 1vǫ ) = ρ −1vǫ + o(1), where ρ −1vǫ
is divergent. Moon, Perron, and Phillips (2007) derived
the power envelope for the model without covariates and
showed that it is defined for κ = 1/2. The PCADF test also
has power in such
neighborhoods. However, since in this
case (7) reduces to 2j =1 N 1/2−j κ (ρ −1vǫ − ρ 1vǫ )r2j = (ρ −1vǫ −
√
ρ 1vǫ )r21 + O(1/ N ) = ρ −1vǫ r21 + o(1), the power of this test
approaches one
√ as maxi∈[1,N] ρvǫi → 0 for any µ1c = 0, such
that r21 = µ1c β0 = 0. It is therefore more powerful than the
existing tests that ignore the covariates.
Let us now consider model 2 with both unit-specific intercept
and trends, which is our main focus. If mini∈[1,N] ρvǫi → 1,
then we again have that (ρ −1vǫ − ρ 1vǫ ) = o(1), and therefore
power is determined by −(N 1/2−κ r11 + N 1/2−2κ (r12 − r22 /2)).
However, since β1 = 0 in this case (see Table 1), r11 = r22 =
0, which means that −(N 1/2−κ r11 + N 1/2−2κ (r12 − r22 /2)) =
−N 1/2−2κ r12 . This shows that power is negligible for κ = 1/2,
which is a reflection of the incidental trend problem. However,
while negligible for κ = 1/2, since (r12 − r22 /2) = 0, power
is still nonnegligible for κ = 1/4, which is also the value of
κ that defines the power envelope for model 2 without covariates (Moon, Perron, and Phillips 2007). The fact that the
PCADF test “only” has power within 1/N 1/4 T -neighborhoods
when mini∈[1,N] ρvǫi → 1 of the null is therefore not totally
unexpected.
The situation is, however, very different when
mini∈[1,N] ρvǫi → k ∈ (0, 1) (at least some covariate information). Indeed, since r21 = 0, this means that (7) can
be written as N 1/2−κ (ρ −1vǫ − ρ 1vǫ )r21 + O(N 1/2−2κ ), where
(ρ −1vǫ − ρ 1vǫ )r21 = 0, suggesting that power is no longer
negligible for κ = 1/2. Thus, as in model 1, the use of the
covariates has implications for power. The main difference
is that the effect is now much stronger than before, with the
covariates even having an effect on the value of κ for which
434
Journal of Business & Economic Statistics, July 2015
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
power is nonnegligible. Since the envelope without covariates in
this case is defined for κ = 1/4, PCADF is again more powerful
than existing tests. However, unlike the situation in model 1,
this superiority does not require max
√i∈[1,N] ρvǫi → 0. In fact, all
that is needed for power within 1/ N T -neighborhoods of the
null is that the fraction of cross-section unit for which ρvǫi < 1
is nonnegligible, such that (ρ −1vǫ − ρ 1vǫ ) > 0. Moreover, since
power approaches one as maxi∈[1,N] ρvǫi → 0 (see the above
discussion for model 1), the power of PCADF with trends
can be made arbitrarily close to the power of the same test
without trends, meaning that the covariates should be able to
compensate fully for the loss of power caused by the incidental
trends.
Remark 11. The intuition for the increased power in the
presence of covariates can be appreciated by looking at (1)
and (2), which with θi = 0 can be rewritten as φi (L)yi,t =
ρi yi,t−1 + vi,t . The corresponding model conditional on xi,t and
assuming for simplicity that γi = 0 is given by φi (L)yi,t =
ρi yi,t−1 + λi (L)′ xi,t + ǫi,t . The variance of ǫi,t is given by
σǫi2 = σvi2 − λi (1)′ i (1)−1 εi i (1)−1 λi (1)′ ≤ σvi2 (see Section
2), suggesting that the OLS estimator of the parameters of the
conditional model will be more precise, leading to a more powerful test statistic. Of course, as a referee of this journal correctly points out, while indicative of higher power, this does
not imply in any way the above-mentioned effect on κ when
mini∈[1,N] ρvǫi → k ∈ (0, 1).
4.
ISSUES OF IMPLEMENTATION
4.1 Mean and Variance Correction Factors
driving power (as would be the case if positive and negative
values of ci cancel out; see Section 3.2), which calls for the
use of a two-sided test. Thus, if the researcher has little or no
feeling for the integration properties of his/her data, it is probably safest to use a two-sided test (although this is expected
to lead to a loss of power when compared with the case when
the alternative is known to be one-sided). (Needless to say, the
choice of alternative matters for interpretation of the test outcome. If the alternative is formulated as ci < 0, then a rejection
should be taken as evidence in favor of stationarity, whereas if
the alternative is formulated as ci = 0, then a rejection should
be interpreted more broadly as providing evidence against the
unit root null.)
4.3 Cross-Section Dependence
As mentioned in Remark 1, one way to accommodate crosssection dependence in the current DGP is to assume that some
of the elements in xi,t are common across i, which, if known,
will not affect the results presented so far. If there are common
covariates that are unobserved, one possibility is to follow Bai
and Ng (2004, 2010), and use estimated principal component
factors in their stead. To formalize the ideas, suppose that the
DGP is again given by (2)–(4) but that yi,t in (1) has the following
factor structure:
yi,t = θi′ dt + ′i Ft + ui,t ,
(9)
where Ft is an r-dimensional vector of common factors (or
unobserved covariates) with i being the associated vector of
factor loadings, and dt and ui,t are as before.
Assumption 4.
As pointed out in Remark 8, for√standard normal inference
(under the null), we use (tPCADF − Nµ)/σ . However, this test
statistic is not really feasible, as µ and σ 2 depend on ρ kvǫ . In
applications, this quantity therefore has to be replaced by an
estimator. A natural consistent candidate is given by ρˆ kvǫ =
N
k
ˆ ǫi /σˆ vi . (The sample correlation
i=1 ρˆvǫi /N, where ρˆvǫi = σ
coefficient between vˆi,t and ǫˆi,t can also be used to estimate
ρˆvǫi .)
Remark 12. The PCADF test can be applied regardless of
whether there are any covariates available. Without covariates,
the test is similar in spirit to the one of Levin, Lin, and Chu
(2002). The main difference lies in the definition of µ. In this
article, µ is asymptotic, whereas in Levin, Lin, and Chu (2002)
it is estimated using kernel methods, which not only complicates the computation of the test statistic, but can also lead
to poor small-sample performance (Westerlund and Breitung
2013). The PCADF test is therefore expected to be more robust
in this regard.
4.2 Critical Region
Whether the test should be one- or two-sided depends on what
one is willing to assume regarding the DGP. If µ1c is driving
power (see Sections 3.1 and 3.2) and the null is tested against
the one-sided (locally stationary) alternative that ci < 0, then
the left-tail standard normal critical values are enough. As already mentioned, most research only consider µ1c . It is therefore
standard to focus on left-tailed tests. The problem is if µ2c is
(i)
such that ||i || < ∞ and
i N is nonrandom
′
/N
→
>
0 as N → ∞;
i
i=1
i
(ii) Ft = (L)gt , where gt is iid with E(gt ) = 0,
′
E(g
E(||gt ||4 )
< ∞,
(L) =
g > 0,
∞t gt ) =
∞
n
′
′
L
,
E[F
(F
)
]
=
n
t
t
n
g
n > 0,
n=0
n=0
∞
∗
j
||
||
<
∞,
and
(1)
has
rank
r
∈
[0,
r];
n
n=0
(iii) ui,t and gt are mutually independent.
Assumption 4 is the same as in Bai and Ng (2004, 2010),
and we therefore refer to these articles for a discussion. An
important feature of the above DGP is that Ft and ui,t can have
different orders of integration (as r ∗ , the rank of the long-run
covariance matrix of Ft , is not required to be full, but can take
on any value in [0, r]). In this section, however, we focus on
testing ui,t ; see Bai and Ng (2004) for a detailed treatment of
the testing of Ft . The basic idea is exactly the same as in Bai and
Ng (2004, 2010), that is, we begin by estimating and subtracting
from yi,t an estimate of ′i Ft . Since the resulting “defactored”
yi,t is consistent for ui,t , it can be subjected to any existing panel
unit root test. While Bai and Ng (2004) considered one of the
combination of p-value type statistics of Choi (2001), Bai and
Ng (2010) considered versions of the pooled t-tests of Moon
and Perron (2004). In this section, we apply tPCADF .
Consider model 1. Under the above conditions,
yi,t = ′i Ft + ui,t ,
(10)
which is just a static common factor model for yi,t . However,
unlike the common factor model for yi,t , in the above model
Westerlund: Rethinking the Univariate Approach to Panel Unit Root Testing
both the common and idiosyncratic components are stationary.
Applying the principal components method to this model yields
ˆ i and F
t of (the space spanned by) i and Ft ,
estimates
respectively. The defactored version of yi,t is simply
the accuˆ ′i F
n ; uˆ i,t = tn=2 u
i,n
i,n = yi,n −
mulated sum of u
for t = 2, . . . , T and uˆ i,1 = 0. The resulting PCADF test statis∗
, is just as before but with yi,t replaced by
tic, denoted tPCADF
uˆ i,t . In model 2, yi,t has a nonzero mean (given by θ2 ). In this
case, we therefore demean yi,t prior to application of principal
components.
Proposition 1. Under Assumptions 1–4, as N, T → ∞ with
N/T → 0 and κ > 1/6,
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
∗
tPCADF
−
√
Nµ ∼
2
j =1
N 1/2−j κ ((ρ −1vǫ − ρ 1vǫ )r2j − ρ 1vǫ r1j )
r22
+ N(0, σ 2 ).
2
According to Proposition 1, the PCADF statistic based on
the defactored data has the same asymptotic distribution as the
original test statistic in the case without common factors. In
other words, the defactoring has no effect on the local power of
the test for a unit root in ui,t .
+ N 1/2−2κ ρ 1vǫ
4.4 Selecting the Covariates
Of course, one may argue that in applications the above results
are somewhat “idealized,” in the sense that the covariates in xi,t
might be difficult to find. However, we argue that this criticism
need not be too much of a problem. There are a number of
reasons for this.
• Most variables in economics and finance are correlated, a
finding with ample theoretical support. Indeed, as Pesaran,
Smith, and Yamagata (2013) argued, in these fields it is
actually difficult to find variables that are uncorrelated. For
example, in testing for unit roots in a panel of real outputs,
one would expect the shocks to output to also manifest
themselves in employment, consumption, and investment.
In the case of testing for unit roots in inflation, one would
expect the shocks to inflation to also affect short-term and
long-term interest rates. Hence, given the availability of
panel data, candidate covariates should be relatively easy
to find. Also, as pointed out in Section 1, typically the unit
root testing is a part of the analysis of multiple variables,
in which case relevant covariate candidates are particularly
easy to find.
• Pretesting for covariate relevance is very simple. Indeed,
because of the differing orders of magnitude of the associated variables, the OLS estimators of ρi and i in (6) are
asymptotically uncorrelated, suggesting that we do not lose
generality by considering a separate hypothesis test for i .
ˆ i of i is asymptotically normal (a
The OLS estimator
formal proof is available upon request), suggesting that the
testing can be carried out in the usual manner using, for
example, a Wald test.
• As already pointed out (see Remark 6), the number of
included covariates for each unit can differ. Hence, since
the only thing that matters for power is the information
content of the average covariate, as measured by ρ −1vǫ and
435
ρ 1vǫ (see Theorem 1), there can even be units where xi,t =
{∅}. Needless to say, this flexibility is a great advantage in
practice, as it allows one to selectively pick those covariates
for each unit that are most relevant/readily available.
• As long as the lag augmentation order is larger than p
(the true order), there is no need to pinpoint p. Indeed, if
λi (L) = 0, such that the covariates are absent, the asymptotic distribution of the (unrestricted) PCADF statistic is
still the same as in Theorem 1. This means that the asymptotic “price” of including redundant lags is zero, a result
that is verified by our simulations (see in Section 5). Similarly, the price of including redundant covariates (contemporaneously and/or in lagged form) is also zero. Erroneous
omission of covariates is, on the other hand, more problematic, as in this case Theorem 1 need not hold. However,
even in situations such as this there is still “hope,” in that
λi (L) does not have to be zero; if λi (L) = λ0i , such that xi,t
only enters the equation for vi,t contemporaneously, then
Theorem 1 continues to hold even if xi,t is omitted (a proof
is available upon request). As a rule, though, all covariates
that are significant should be included in the testing. Then
there is also the fact that if one would like to entertain the
possibility of (omitted) covariates, one is likely to be better
off using the covariates available rather than no covariates at all (as when using a conventional univariate panel
data test).
Remark 13. In practice, the elements in xi,t need not be stationary. In such cases, we recommend first-differencing all unit
root covariates. (Most of the assumptions placed on the covariates can be relaxed as long as the number of units for which the
assumptions fail remains fixed as N → ∞. We can, for example, permit for unit root covariates, provided that the number of
unit root units is fixed. The intuition is simple; if the number
is fixed, then the faction of units with a violation goes to zero
as N → ∞, and therefore their impact on the test is going to
be negligible. In practice this means that the number of units
with a violation should be “small” relative to N.) The results
reported in Theorem 1 are unaffected by this. Thus, just as in
cointegration analysis, unless the order of integration of xi,t is
known, it should be pretested for unit roots. The problem is if
the order of integration of xi,t is misspecified. Hansen (1995)
showed that while erroneous inclusion of unit root covariates
invalidates the test, over-differencing only results in mild power
losses. He therefore recommended taking first differences not
only of all unit root covariates but also of all near unity root covariates, and so do we. (Some simulation results for the case of
under/over-differenced covariates are available upon request.)
5.
SIMULATIONS
In this section, we investigate the small-sample properties
of the (nondefactored) PCADF test through a small simulation
study using (1)–(5) as DGP. For simplicity, we assume that
m = 1, θi = 0, γi = 0, (ǫi,t , εi,t ) ∼ N(0, I2 ), ui,0 = xi,0 = 0,
κ = 1/2, and ci ∼ U (a, b). Note in particular how the mean
and variance of ci can be written in terms of a and b as
µ1c = (a + b)/2 and µ2c − µ21c = (b − a)2 /12, respectively.
For our theory to provide an accurate description of actual test
behavior, the values of a and b cannot be “too large,” as this will
436
Journal of Business & Economic Statistics, July 2015
Table 2. Size when ρvǫ = 1
N
T
t+
VˆNT
VNT
tPCADF
10
10
10
20
20
20
40
40
40
100
200
400
100
200
400
100
200
400
6.4
5.9
5.4
5.5
5.4
6.5
5.7
5.6
5.4
Model 1
2.0
1.7
2.3
2.6
3.5
3.2
4.3
3.8
3.2
4.5
4.9
4.6
5.1
5.7
5.2
5.0
5.0
5.2
8.6
7.1
6.2
7.2
6.4
6.9
9.1
6.7
6.3
4.5
4.9
4.6
5.1
5.7
5.2
5.0
5.0
5.2
9.5
7.7
6.9
10.6
8.2
7.1
13.1
8.1
6.5
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
Model 2
10
10
10
20
20
20
40
40
40
100
200
400
100
200
400
100
200
400
5.2
5.1
5.3
4.2
5.3
5.7
4.1
4.7
4.7
2.5
2.7
3.5
2.9
4.3
4.8
2.7
4.7
4.7
NOTES: t+ and VˆNT refer to the tests of Moon and Perron (2008), and Moon, Perron,
and Phillips (2007), respectively, VNT refers to the power envelope for the model without
covariates, and ρvǫ refers to the (homogenous) correlation between vi,t and ǫi,t . See Table
1 for an explanation of models 1 and 2.
√
tend to activate the Op (µ1c / N ) and Op (µ3c /N) remainders
in the asymptotic distribution reported
in Theorem 1. Similarly,
√
in order not to activate the Op ( N/T ) remainder, in the simulations we set T >> N. The presence of serial correlation
did not have any major effects on the results, and we therefore also set φi (L) = i (L) = 1 and λi (L) = λ0 . (For example,
with φi (L) = 1 − φ1 L and φ1 = 0.5 the results for the PCADF
test were basically indistinguishable from the ones already in
the article. Another advantage of focusing on the results for the
case without serial correlation is that enables comparison with
the power envelope and point optimal test of Moon, Perron, and
Phillips (2007).) Thus, in this DGP, σǫi2 = 1 and σvi2 = λ20 + 1,
2
2
= ρvǫ
= 1/(λ20 + 1).
and hence ρvǫi
The PCADF test is constructed as left-tailed with ρˆ kvǫ in place
of ρ mvǫ when computing µ and σ 2 (see Section 4). The power
envelope for the model without covariates, denoted VNT , the
VˆNT common point-optimal test of Moon, Perron, and Phillips
(2007), and the t + test of Moon and Perron (2008) are also
simulated. (VNT is based on setting ci in the test to −0.5. Moon,
Perron, and Phillips (2007) also considered (in our notation)
ci = −1 and ci = −2; however, in our simulations ci = −5
generally led to the best performance.) All tests are carried out
at the 5% level, and the number of replications is set to 3000.
All powers are adjusted for size.
The size results for the case when ρvǫ = 1 are reported in
Table 2. (As alluded to in Remark 8, the asymptotic distribution
of the PCADF test is asymptotically invariant with respect to
ρvǫ , a result that is supported by our (unreported) simulation
results. Hence, since under the null the value of ρvǫ is irrelevant,
in Table 2 we focus on the case when ρvǫ = 1.) If asymptotic
theory is a reliable guide to the small-sample behavior of the
tests, all sizes should be close to 5%. In agreement with this,
we see that while generally oversized, as expected, the distortions of the PCADF test tend to diminish with increases in T.
Conversely, the distortions increases with decreases in T; therefore, the distortions for T < 100 are generally larger than those
reported in Table 2. Another observation is that, while t + and
PCADF are oversized, VˆNT is undersized. However, the distortions are generally not larger than that they can be attributed to
simulation uncertainty. Indeed, with 3000 replications the 95%
confidence interval for the size of the 5% level tests studied here
(in %) is [4.2, 5.8].
The power results reported in Tables 3 and 4 can be summarized as follows:
• Power is usually above what is predicted by asymptotic
theory, as obtained by simulating the asymptotic distribution given in Theorem 1 with all parameters set to their
values in the DGP, especially for ρvǫ close to one. However, the discrepancy diminishes with increases in N and
T.
• The asymptotic distributions of VˆNT , t + , and tPCADF
√ in
model 1 when
by µ1c / 2 +
√ a = b√and ρvǫ = 1 are given √
N(0, 1), 3 5µ1c /3 51 + N(0, 1), and
30µ1c /16 +
N(0, 1), respectively. (The asymptotic distributions of t +
and VˆNT are given in Moon, Perron, and Phillips (2007,
sec. 4.1).) Consistent with this we see that VˆNT is generally most powerful, at least among the larger values of
N, followed by t + and then tPCADF . However, we also see
that there is a large range of empirically relevant values for
N and T where the difference in power is not that large.
The PCADF test therefore performs well even when the
covariates are irrelevant.
• As expected, the power of the PCADF test for the case
when ρvǫ = 1 is mainly driven by µ1c . However, there
is also a second-order effect working through variance of
ci . In particular, both the empirical and theoretical power
seem to be decreasing in |a − b|. This is illustrated in
Table 3, which reports power for a = −4 and b = 0, and
a = b = −2. Thus, while µ1c is the same in the two cases,
in the former the variance is larger (4/3 as compared to
zero).
• Since κ = 1/2 in the simulations, when ρvǫ = 1 in model
2 none of the tests considered, including PCADF, should
have any power beyond size, and this is also what we see
in Table 3.
• As ρvǫ is reduced, the relative power of the PCADF test
increases. This is seen in Table 4. Take, for example, the
case when ρvǫ = 0.3 in model 1, in which the power of
the PCADF test is almost two times as large as the power
envelope for the model without covariates, and it is almost
four times as large as the power of t + and VˆNT .
• As expected, the difference in power when ρvǫ < 1 is larger
in model 2 than in model 1, with PCADF being the only test
with power beyond size. In fact, according to the results
reported in Table 4, in this case the power of the PCADF
test is no less than 10 times as large as that of t + and
VˆNT . Of course, since the power of the two latter tests
is negligible, while the power of the former is not, the
Westerlund: Rethinking the Univariate Approach to Panel Unit Root Testing
437
Table 3. Size-adjusted power when ρvǫ = 1
a = −4, b = 0
N
T
10
10
10
20
20
20
40
40
40
100
200
400
100
200
400
100
200
400
a = b = −2
+
VˆNT
VNT
tPCADF
9.7
11.2
13.0
12.2
13.3
13.9
13.7
15.9
15.8
14.7
17.7
19.3
18.8
19.7
26.2
21.8
26.1
28.6
52.8
50.7
51.1
48.3
48.1
47.6
49.2
50.6
49.9
7.6
9.0
8.9
10.5
10.7
10.6
11.1
11.4
11.0
t
Theory
Model 1
8.6
7.0
7.9
8.5
7.9
8.9
9.5
9.6
10.0
+
VˆNT
VNT
tPCADF
Theory
9.9
12.1
13.2
13.0
14.9
15.0
14.8
16.9
16.8
15.6
19.9
22.1
21.0
23.0
28.2
23.9
28.8
31.3
43.8
41.7
41.7
39.8
39.7
39.2
40.9
42.1
41.3
7.6
9.3
9.0
10.2
11.3
11.7
12.0
12.1
11.7
10.2
8.2
9.5
9.5
9.1
10.1
9.9
10.3
11.1
5.3
5.8
5.5
5.4
5.3
5.1
5.2
5.1
5.5
5.6
5.7
5.5
5.5
5.6
5.5
5.1
5.3
5.9
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.2
5.2
5.3
5.0
5.5
5.4
5.4
5.3
5.6
5.5
5.2
5.4
5.3
5.3
5.2
5.2
5.2
5.3
t
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
Model 2
10
10
10
20
20
20
40
40
40
100
200
400
100
200
400
100
200
400
5.5
5.9
5.5
5.4
5.2
5.1
5.3
5.3
5.6
5.9
6.2
5.8
5.6
5.7
5.6
5.4
5.5
5.9
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.5
5.3
5.5
4.8
5.5
5.5
5.3
5.4
5.6
5.6
5.3
5.6
5.4
5.4
5.3
5.2
5.3
5.3
√
NOTES: a and b are such that ρi = ci / N T , where ci ∼ U (a, b). “Theory” refers to the theoretical power of the PCADF test (see Theorem 1). See Tables 1 and 2 for an explanation of
the rest.
difference in power will increase as a and b get further
away from zero.
The results for the PCADF test based on defactored data are
not reported but we briefly describe them. First, as expected, the
size results are very close to those reported in Table 2. This is
true regardless of how i and Ft are generated. Second, power
is very close to the theoretical prediction obtained by simulating
the asymptotic distribution given in Proposition 1. Hence, power
is unaffected by the defactoring.
Table 4. Size-adjusted power when a = b = −2
ρvǫ = 0.7
ρvǫ = 0.3
N
T
t+
VˆNT
VNT
tPCADF
10
10
10
20
20
20
40
40
40
100
200
400
100
200
400
100
200
400
12.1
11.8
12.9
12.1
14.9
16.0
14.2
15.1
19.0
16.6
18.4
21.9
19.5
24.4
26.5
23.1
25.8
35.5
43.8
41.7
41.7
39.8
39.7
39.2
40.9
42.1
41.3
19.1
19.6
17.9
19.4
21.3
22.4
19.2
24.1
22.7
Theory
Model 1
16.7
13.9
15.3
15.9
15.7
16.3
17.6
17.7
18.6
t+
VˆNT
VNT
tPCADF
Theory
11.5
12.0
13.4
12.2
13.9
16.7
15.6
15.9
17.2
14.6
17.8
22.7
19.5
23.2
24.5
22.5
29.9
29.5
43.8
41.7
41.7
39.8
39.7
39.2
40.9
42.1
41.3
74.9
74.5
74.8
78.0
77.8
78.4
79.2
79.9
81.0
58.3
55.5
56.2
62.5
63.2
62.3
69.8
70.6
69.8
5.9
5.5
5.8
5.4
5.4
5.3
5.1
5.2
5.3
5.8
6.0
5.0
5.5
5.2
5.7
5.2
5.2
5.6
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
45.3
45.5
47.6
45.7
48.8
47.0
44.1
46.8
48.8
48.8
46.5
46.1
44.0
44.1
43.7
45.1
46.6
45.0
Model 2
10
10
10
20
20
20
40
40
40
100
200
400
100
200
400
100
200
400
5.4
5.4
5.6
5.2
5.7
5.5
5.5
4.9
5.5
NOTE: See Tables 1–3 for an explanation.
6.1
5.5
5.6
5.5
5.3
5.4
5.5
5.7
5.4
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
11.7
11.0
10.5
11.0
11.3
10.9
10.5
10.2
12.2
11.1
8.9
10.2
9.4
9.0
9.9
9.3
9.6
9.9
438
Journal of Business & Economic Statistics, July 2015
Overall, the simulation results suggest that our asymptotic
theory provides a useful guide to the small-sample performance
of the PCADF test. They also suggest that the PCADF test can
lead to substantial power gains when compared to existing tests,
especially in model 2. In fact, since the inclusion of irrelevant
covariates seems to cause only minor reductions in power, the
use of PCADF seems to come at little or no cost.
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
6.
CONCLUDING REMARKS
The power increasing potential of covariate augmentation in
the panel setting is interesting not only by itself but also because
of the implications it has for theoretical and applied work. For
example, the fact that in the presence of incidental trends the
use of covariates has an order effect on the shrinking neighborhoods around unity for which asymptotic power is nonnegligible
is expected to have implications for the rate of consistency for
estimation of autoregressive roots near unity (see Moon and
Phillips 2004, and the references provided therein). This possibility is currently being explored in a separate work. From an
applied point of view, the minor additional complication of having to estimate ρvǫi has a major benefit in that the precision of
the test is expected to be drastically improved. This is especially
true in the case of incidental trends, in which some authors have
even gone as far as to recommend not using some of their univariate panel data tests (see, e.g., Moon and Perron 2004, 2008).
Thus, given the availability of data on potential covariates, and
the fact that their information content does not even have to be
particularly high, the PCADF test developed here should be a
valuable addition to the already existing menu of panel unit root
tests.
1)x 2 /2 + O((px)3 ), we can further show that
ci
ci2
R1i,t +
R2i,t
κ
N T
2(N κ T )2
√ 3
ci2
T ci
+ Op
+ Op
√
N 3κ
N 2κ T
φi (L)ui,t = R0i,t +
= R0i,t
T
T
1
1
1
2
2
.
(R
R
y
)
=
(R
y
)
+
O
z d i,t−1
d i,t−1
p
T 2 t=p + 2
T 2 t=p + 2
T
Similarly, from the definitions of σˆ yi2 and σˆ ǫi2 , and the consistency
√
ˆ , and φˆ j i , we get σˆ yi2 − σyi2 = Op (1/ T ) and σˆ ǫi2 − σǫi2 =
of ρˆi ,
√i
Op (1/ T ). (The details of these calculations are available upon request.) By using this, Taylor expansion of the inverse of σˆ yi2 , and then
Rd yi,t = Rd ui,t ,
Bi,T =
xi,t − γi = − i (1)−1 ∗i (L)xi,t + i (1)−1 εi,t ,
κ
ci ∗
φ (L)ui,t−1 ,
NκT i
=
T
1
1 1
2
(R
u
)
+
O
√
d i,t−1
p
σyi2 T 2 t=p + 2
T
=
T
ci
1 1
Rd U0i,t−1 + κ Rd U1i,t−1
2
2
N T
σyi T t=p + 2
2
ci2
R
U
d
2i,t−1
2(N κ T )2
3
1
ci
+ Op
+ Op √
N 3κ
T
3
2
ci
1
j
+ Op
,
=
N −j κ ci Bj i,T + Op √
3κ
N
T
j =0
But we also have i (L) = i (1) + ∗i (L)(1 − L) (with ∗i (L) defined
similarly to λ∗i (L)), leading to
−
T
1 1
(Rz Rd yi,t−1 )2
σˆ yi2 T 2 t=p + 2
where
B0i,T =
T
1 1
(Rd U0i,t−1 )2 ,
σyi2 T 2 t=p + 2
B1i,T =
T
2 1
Rd U0i,t−1 Rd U1i,t−1 ,
σyi2 T 3 t=p + 2
B2i,T =
T
1 1
(Rd U1i,t−1 )2
σyi2 T 4 t=p + 2
where ri,t = λi (1)′ i (1)−1 εi,t + ǫi,t . Backward substitution now yields
φi (L)ui,t
t
ci t − s
1+ κ
ri,s + Op (1).
=
N T
s=p + 2
Let Upi,t = Rpi,t /φi (1), where Rpi,t = ts=p + 2 (t − s)p ri,s . By using
this and Taylor expansion of the type (1 + x)p = 1 + px + p(p −
(A.2)
+
= ρi ui,t−1 + (λi (1)′ (xi,t − γi )+ ǫi,t )+ λ∗i (L)�
ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20
Rethinking the Univariate Approach to Panel Unit
Root Testing: Using Covariates to Resolve the
Incidental Trend Problem
Joakim Westerlund
To cite this article: Joakim Westerlund (2015) Rethinking the Univariate Approach to Panel Unit
Root Testing: Using Covariates to Resolve the Incidental Trend Problem, Journal of Business &
Economic Statistics, 33:3, 430-443, DOI: 10.1080/07350015.2014.962697
To link to this article: http://dx.doi.org/10.1080/07350015.2014.962697
Accepted author version posted online: 25
Sep 2014.
Submit your article to this journal
Article views: 121
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]
Date: 11 January 2016, At: 19:53
Rethinking the Univariate Approach to Panel
Unit Root Testing: Using Covariates to Resolve
the Incidental Trend Problem
Joakim WESTERLUND
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
Department of Economics, Lund University, SE-22007 Lund, Sweden; Deakin University, 3125 Burwood, Australia
([email protected])
In an influential article, Hansen showed that covariate augmentation can lead to substantial power gains
when compared to univariate tests. In this article, we ask if this result extends also to the panel data
context? The answer turns out to be yes, which is maybe not that surprising. What is surprising, however,
is the extent of the power gain, which is shown to more than outweigh the well-known power loss in the
presence of incidental trends. That is, the covariates have an order effect on the neighborhood around unity
for which local asymptotic power is negligible.
KEY WORDS: Covariates; Incidental trends; Local asymptotic power; Panel data; Unit root test.
1.
INTRODUCTION
As is well known, univariate unit root tests, such as the
conventional augmented Dickey–Fuller (ADF) test, have low
power, and much effort has therefore gone into the development
of various modifications aimed to increase power (see Leybourne, Kim, and Newbold 2005, and the references provided
therein). In many cases, however, there is more information to be
had, and then power can be increased without the need for such
modifications. For example, in regression analysis, due to the
risk of obtaining spurious results, it is quite common to pretest
for unit roots, and then it seems quite natural to try to make
use also of the information contained in the other variables of
the model. After all, we typically do not use regressions unless
we believe that the included variables are correlated. This is the
idea of Hansen (1995), who developed a covariate augmented
ADF (CADF) test that is shown to be at least as powerful as
the ADF test. (The CADF considered here is not to be confused
with the cross-sectionally augmented Dickey–Fuller of Pesaran
(2007).)
But while the CADF approach has attracted some attention,
the single most common way by far in which researchers have
been trying to increase the power of the ADF test is through the
use of panel data. Thus, in this case the source of extraneous
information is not a set of correlated covariates but rather a
cross-section of similar units.
In light of these developments, the question naturally arises
if there are any power gains to be made by considering a panel
CADF (PCADF) test that exploits both sources of information?
Intuitively, since the two types of information are individually
important for power, there should be some merit in combining
them. Of course, this article is not the first to recognize the
value of covariate augmentation in a panel data context (see,
e.g., Pesaran 2007; Chang and Song 2009; Pesaran, Smith, and
Yamagata 2013, who used covariates to address the problem
of cross-section dependence); however, it is the first to study
analytically the power implications of doing so. In other words,
while previously the rationale for the covariates (in panels) has
always been to improve upon size accuracy, no one has yet
considered their effect on power.
Our main finding is that the information contained in the
covariates is useful when testing for a unit root in panel data, and
that the power of the PCADF test can be substantially increased,
far beyond that achievable by existing tests that do not employ
any covariate information. The largest difference occurs in the
presence of incidental trends, which are problematic in the sense
that their estimation is known to lead to low power. In fact, as
Moon, Perron, and Phillips (2007, p. 445) concluded from their
analysis of the local power of univariate panel unit root tests
with trends:
An important empirical consequence of the present investigation is that increasing the complexity of the fixed effects in a
panel model inevitably reduces the potential power of unit root
tests. This reduction in power has a quantitative manifestation
in the radial order of the shrinking neighborhoods around unity
for which asymptotic power is nonnegligible. When there are
no fixed effects or constant fixed √
effects, tests have power in a
neighborhood of unity of order 1/ N T (where N and T denote
the size of the cross-section and time dimensions, respectively).
When incidental trends are fitted, the tests only have power in a
larger neighborhood of order 1/N 1/4 T .
Moon and Perron (1999) showed that the maximum likelihood estimator of the local-to-unity parameter in near unit
root panels is inconsistent. They called this phenomenon,
which arises because of the presence of an infinite number
of nuisance parameters, an “incidental trend problem,” because it is analogous to the well-known incidental parameter
problem in dynamic fixed-T panels. The above-mentioned reduction in the order of the shrinking neighborhoods around
unity for which power is nonnegligible is a manifestation of
this problem, and has in fact given rise to a separate literature (see, e.g., Moon and Perron 2004; Moon and Phillips
430
© 2015 American Statistical Association
Journal of Business & Economic Statistics
July 2015, Vol. 33, No. 3
DOI: 10.1080/07350015.2014.962697
Westerlund: Rethinking the Univariate Approach to Panel Unit Root Testing
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
2004; Moon, Perron, and Phillips 2007; Phillips and Sul
2007). One of the main conclusions from this literature is
that the incidental trend problem is a general phenomenon that
applies to all panel unit root tests. Indeed, as Moon, Perron, and
Phillips (2007, p. 445) concluded, “the present article shows that
discriminatory power against a unit root is generally weakened
as more complex deterministic regressors are included.”
In this article, we show that this need not be the case, and that
the use of covariates can compensate for the loss of power caused
by the incidental trends.√That is, the PCADF test has nonnegligible power within 1/ N T -neighborhoods of the null even
if incidental trends are present. This property makes PCADF
unique, as there is presently no other test with incidental trends
that has power within such neighborhoods. Conversely, if the
rate of shrinking is given by 1/N 1/4 T , unlike existing tests, the
power of PCADF is actually increasing in N.
2.
MODEL AND ASSUMPTIONS
Consider the panel variable yi,t , observable for t = 1, . . . , T
time series and i = 1, . . . , N cross-section units. The datagenerating process (DGP) of this variable is given by
yi,t = θi′ dt + ui,t ,
(1)
where ui,t is the stochastic part of yi,t , while dt = (1, t)′ is the deterministic part, for which there are two models; (1) θ = (θ1 , θ2 )′
with θ1 unrestricted and θ2 = 0 (unit-specific intercepts), and
(2) θ unrestricted (unit-specific intercepts and incidental time
trends). Thus, while our main focus lies with model 2, for completeness, we will also consider model 1. The stochastic part is
allowed to depend on an m-vector of covariates, xi,t , which could
potentially be common across i (thereby allowing for some form
of cross-section dependence). Specifically,
φi (L)ui,t = ρi ui,t−1 + vi,t ,
′
vi,t = λi (L) (xi,t − γi ) + ǫi,t ,
i (L)(xi,t − γi ) = εi,t ,
(2)
(3)
(4)
p
where γi = E(xi,t ), and i (L) = Im − j =1 j i Lj , φi (L) =
p
p
1 − j =1 φj i Lj , and λi (L) = j = 0 λj i Lj are polynomials in
the lag operator L. In the assumptions that
√ follow k denotes
a generic constant, and tr(A) and ||A|| = tr(A′ A) denote the
trace and Frobenius (Euclidean) norm of the matrix A, respectively.
Assumption 1.
′ ′
) is independent and identically dis(i) ηi,t = (ǫi,t , εi,t
′
tributed (iid) such that E(ηi,t ) = 0, E(ηi,t ηi,t
) = ηi =
2
k
diag(σǫi , εi ) > 0 and E(||ηi,t || ) < ∞ for k ≥ 4;
′
′ ′ 2
(ii) E(||(ui,−p , xi,−p
)′ ||2 ), . . . , E(||(ui,0 , xi,0
) || ) < ∞;
(iii)
i (L) and φi (L) have all roots outside the unit circle, and
p
j = 0 ||λj i || < ∞.
Similar to Hansen (1995) our asymptotic analysis supposes
that ρi is local-to-zero as N, T → ∞. However, since we are
using panel data, the rate of shrinking is different. In particular,
431
it is assumed that
ρi =
φi (1)ci
,
NκT
(6)
where κ > 0 is a constant and ci is drift parameter that satisfies
Assumption 2.
Assumption 2.
(i) ci is iid with µkc = E(cik ) < ∞ for k ≥ 3 and µ0c = 1;
(ii) ci and ηi,t are mutually independent.
If ci = ρi = 0, then yi,t is unit root nonstationary, whereas if
ci = 0, then yi,t is either locally stationary (ci < 0) or locally explosive (ci > 0). The null and alternative hypotheses considered
here are given by H0 : c1 = · · · = cN = 0 and H1 : ci = 0 for
some i, respectively, which can be formulated more compactly
in terms of the moment of ci as H0 : µ2c = 0 and H1 : µ2c > 0,
respectively.
As Hansen (1995) showed, with serially correlated errors, the power of the CADF test depends not only on ci ,
but also on the long-run correlation coefficient between vi,t
and ǫi,t , as given by ρvǫi = σǫi /σvi ∈ (0, 1], where σvi2 =
λi (1)′ i (1)−1 εi i (1)−1 λi (1)′ + σǫi2 (see the Appendix). Thus,
if ρvǫi → 1, then xi,t does not make any contribution to the variation in yi,t , whereas if ρvǫi → 0, then xi,t explains all the
variation in yi,t .
N k
Assumption 3.
i=1 ρvǫi /N → ρ kvǫ ∈ (0, ∞) as N → ∞
for k ∈ (−∞, ∞).
Remark 1. The assumption that ηi,t is cross-section independent is restrictive, but can be relaxed by requiring that some of
the elements of xi,t are constant in i, in which case (3) becomes a
common factor model, with the common elements of xi,t taking
the role of common factors. In the article, we assume that xi,t is
known, in which case the presence of common covariates does
not affect the results. If there are common covariates that are
unobserved, then one possibility is to follow Bai and Ng (2004),
and use estimated principal component factors in their stead.
In Section 4, we elaborate on this point. For the time being,
however, we maintain the assumption that xi,t is known.
Remark 2. Assumption 1 (ii) ensures that the initial values
of ui,t and xi,t are Op (1), which is relevant if the initialization
took place somewhere in the recent past. While admittedly the
simplest way to relax the otherwise so common zero initial
value assumption (see, e.g., Moon, Perron, and Phillips 2007),
the results reported herein hold also when the initialization is in
the distant
past, such that the initial values of ui,t and xi,t are
√
Op ( T ) (see Westerlund 2014a). This is different from the time
series case where the size of the initial value strongly influences
the performance of unit root tests, up to the point of reversing
the ranking of different tests (see, e.g., M¨uller and Elliott 2003).
Remark 3. The requirement that ci has at least three moments
is only needed when analyzing the local power, and is not necessary for deriving the asymptotic null distribution of the PCADF
test, in which case ci and all its moments are zero. In fact, the
current moment condition is less restrictive than the otherwise
so common bounded support assumption, which implies that
432
Journal of Business & Economic Statistics, July 2015
all moments are bounded (see, e.g., Moon, Perron, and Phillips
2007; Moon and Perron 2008).
Remark 4. Unlike most other studies where the rate of shrinking of the local alternative, κ, is prespecified a priori, in the
present study the “appropriate” value of κ will be considered a
part of the analysis (see Section 3.2 for a detailed discussion).
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
Remark 5. The requirement that λi (L), φi (L), and i (L) are
all of the same order p is not a restriction. If the orders are
different, then we simply set p equal to the maximum order of
φi (L) and λi (L). i (L) does not have to be estimated and can
therefore be of any order (even infinite, although that would
require changing Assumption 1 (iii)). The lag order also does
not have to be the same for all i, but could be allowed to differ
without affecting the results.
Remark 6. As with p (see Remark 5), the assumption that the
number of regressors contained in xi,t , m, is the same for all i is
not a restriction. Hence, in practice there is nothing that prevents
the number of covariates to differ from unit to unit, which is of
course a great advantage, especially in applications where data
on some units are scarce (see Section 4 for a discussion). In fact,
T could also be allowed to differ across units.
3.
In this section, we begin by introducing the PCADF test
statistic and its asymptotic distribution. Then we discuss, in
turn, the implications for power and implementation.
3.1 The PCADF Statistic and its Asymptotic Distribution
Rd yi,t − 1 = yi,t−1 −
t=p + 2
yi,t−1 dt′ ⎝
T
t=p + 2
The asymptotic distribution of this test statistic is provided in
Theorem 1.
Theorem 1. Under Assumptions 1–3, as N, T → ∞,
tPCADF −
2
√
N 1/2−j κ ((ρ −1vǫ − ρ 1vǫ )r2j − ρ 1vǫ r1j )
Nµ ∼
j =1
r22
+ N(0, σ 2 )
2
√
µ1c
µ3c
N
+ Op
+Op
+Op √ ,
κ
3κ−1/2
N
N
T
+ N 1/2−2κ ρ 1vǫ
where ∼ signifies asymptotic equivalence,
r11 =
r12 =
⎞−1
dt dt′ ⎠
dt
be the detrended version of yi,t−1 , where Rd is the ordinary
least-square (OLS) residual operator. Equations (1) and (2) can
be rewritten as
Rd yi,t = ρi Rd yi,t−1 + ′i Rd zi,t + Rd ǫi,t ,
(7)
′
′
)′ with i =
, . . . , xi,t−p
where zi,t = (yi,t−1 , . . . , yi,t−p , xi,t
′ ′
′
(φ1i , . . . , φpi , λ0i , . . . , λpi ) being the associated vector of coefficients. Define
Ai,T =
T
1 1
Rz Rd yi,t−1 Rz Rd yi,t ,
σˆ yi σˆ ǫi T t=p + 2
Bi,T =
T
1 1
(Rz Rd yi,t−1 )2 ,
σˆ yi2 T 2 t=p + 2
µ1c α0 β1
3/2
2β0
µ2c α0 β2
3/2
2β0
,
−
3µ21c α0 β12
5/2
,
8β0
µj c βj −1
,
√
β0
α0 ρ
µ = √ 1vǫ ,
β0
σ 2 = 1 − ρ 2vǫ +
ρ 2 α 2 α2
α1 ρ 2vǫ
+ 1vǫ 30 ,
β0
4β0
and numerical values of α0 , α1 , α2 , β0 , β1 , and β2 are given in
Table 1.
Let
⎛
√
NAT
=
.
BT
tPCADF
r2j =
MAIN RESULTS
T
is given by
p
where σˆ yi2 = σˆ vi2 /(1 − j =1 φˆ j i )2 in an estimator of σyi2 =
2
2
σvi2 /φi (1)2 , σˆ vi2 = Tt=p+2 vˆi,t
/T , σˆ ǫi2 = Tt=p+2 ǫˆi,t
/T with
p
ˆ i,
vˆi,t = Rd yi,t − ρˆi Rd yi,t−1 − j =1 φˆ j i Rd yi,t−j , and ρˆi ,
ˆ
φj i and ǫˆi,t coming from the OLS fit of (6), and Rz is Rd with
zi,t in place of dt . Letting AT = N
i=1 AiT /N with a similar
definition of B T , the PCADF statistic considered in this article
Remark 7.√The last three terms in the asymptotic distribution
of tPCADF − N µ are remainders. The first two of these are only
relevant under the alternative that ci = 0, and are negligible for
all κ > 1/6 (provided that ci has at least three moments). The
third remainder does not depend on ci and is therefore there also
under the unit root null. It follows that for this term to go away
we need N/T → 0 as N, T → ∞ (which in practice means
that N 1/2, then
−N 1/2−κ r11 = o(1), and therefore power is negligible, whereas
if κ < 1/2, then −N 1/2−κ r11 diverges, and therefore power goes
to one as N → ∞. Only in the intermediate case when κ =
1/2, such that −(N 1/2−κ r11 + N 1/2−2κ√(r12 − r22 /2)) = −(r11 +
N −1/2 (r12 − r22 /2)) = −r11 + O(1/ N ) is power nonnegligible in the usual nonincreasing sense. This is in agreement with
the results reported by, for example, Moon and Perron (2004)
for their t + panel unit root test that does not employ any covariate information. As usual in the literature, Moon and Perron
(2004) only considered the first-order term, r11 , which only depends on µ1c . Their results are therefore silent when it comes
to the effect of higher order moments. Theorem 1 includes an
433
Table 1. Coefficients of the asymptotic distribution of the
PCADF statistic
Coefficient
Model 1
Model 2
α0
α1
α2
β0
β1
β2
−1/2
1/12
1/45
1/6
1/12
1/20
−1/2
1/60
11/6300
1/15
0
−1/420
NOTE: Models 1 and 2 refer to the cases with a heterogenous intercept but no trend, and
heterogenous intercepts and trends, respectively.
additional second-order term, −N −1/2 (r12 − r22 /2), which depends on µ2c and is therefore more general in this regard. Note
in particular that if µ1c = 0 and µ2c > 0 (positive and negative
values of ci cancel out), such that r11 = 0 and (r12 − r22 /2) = 0,
then −(N 1/2−κ r11 + N 1/2−2κ (r12 − r22 /2)) = −N 1/2−2κ (r12 −
r22 /2). This means that while negligible for κ = 1/2, power
is nonnegligible for κ = 1/4. Thus, in this case the results of
Moon and Perron (2004) would lead us to believe that there
√ is
no power, when in fact there is, but just not within 1/ N T neighborhoods of the null.
At the other end of the scale, if maxi∈[1,N] ρvǫi → 0, then
ρ 1vǫ → 0 and (ρ −1vǫ − ρ 1vǫ ) = ρ −1vǫ + o(1), where ρ −1vǫ
is divergent. Moon, Perron, and Phillips (2007) derived
the power envelope for the model without covariates and
showed that it is defined for κ = 1/2. The PCADF test also
has power in such
neighborhoods. However, since in this
case (7) reduces to 2j =1 N 1/2−j κ (ρ −1vǫ − ρ 1vǫ )r2j = (ρ −1vǫ −
√
ρ 1vǫ )r21 + O(1/ N ) = ρ −1vǫ r21 + o(1), the power of this test
approaches one
√ as maxi∈[1,N] ρvǫi → 0 for any µ1c = 0, such
that r21 = µ1c β0 = 0. It is therefore more powerful than the
existing tests that ignore the covariates.
Let us now consider model 2 with both unit-specific intercept
and trends, which is our main focus. If mini∈[1,N] ρvǫi → 1,
then we again have that (ρ −1vǫ − ρ 1vǫ ) = o(1), and therefore
power is determined by −(N 1/2−κ r11 + N 1/2−2κ (r12 − r22 /2)).
However, since β1 = 0 in this case (see Table 1), r11 = r22 =
0, which means that −(N 1/2−κ r11 + N 1/2−2κ (r12 − r22 /2)) =
−N 1/2−2κ r12 . This shows that power is negligible for κ = 1/2,
which is a reflection of the incidental trend problem. However,
while negligible for κ = 1/2, since (r12 − r22 /2) = 0, power
is still nonnegligible for κ = 1/4, which is also the value of
κ that defines the power envelope for model 2 without covariates (Moon, Perron, and Phillips 2007). The fact that the
PCADF test “only” has power within 1/N 1/4 T -neighborhoods
when mini∈[1,N] ρvǫi → 1 of the null is therefore not totally
unexpected.
The situation is, however, very different when
mini∈[1,N] ρvǫi → k ∈ (0, 1) (at least some covariate information). Indeed, since r21 = 0, this means that (7) can
be written as N 1/2−κ (ρ −1vǫ − ρ 1vǫ )r21 + O(N 1/2−2κ ), where
(ρ −1vǫ − ρ 1vǫ )r21 = 0, suggesting that power is no longer
negligible for κ = 1/2. Thus, as in model 1, the use of the
covariates has implications for power. The main difference
is that the effect is now much stronger than before, with the
covariates even having an effect on the value of κ for which
434
Journal of Business & Economic Statistics, July 2015
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
power is nonnegligible. Since the envelope without covariates in
this case is defined for κ = 1/4, PCADF is again more powerful
than existing tests. However, unlike the situation in model 1,
this superiority does not require max
√i∈[1,N] ρvǫi → 0. In fact, all
that is needed for power within 1/ N T -neighborhoods of the
null is that the fraction of cross-section unit for which ρvǫi < 1
is nonnegligible, such that (ρ −1vǫ − ρ 1vǫ ) > 0. Moreover, since
power approaches one as maxi∈[1,N] ρvǫi → 0 (see the above
discussion for model 1), the power of PCADF with trends
can be made arbitrarily close to the power of the same test
without trends, meaning that the covariates should be able to
compensate fully for the loss of power caused by the incidental
trends.
Remark 11. The intuition for the increased power in the
presence of covariates can be appreciated by looking at (1)
and (2), which with θi = 0 can be rewritten as φi (L)yi,t =
ρi yi,t−1 + vi,t . The corresponding model conditional on xi,t and
assuming for simplicity that γi = 0 is given by φi (L)yi,t =
ρi yi,t−1 + λi (L)′ xi,t + ǫi,t . The variance of ǫi,t is given by
σǫi2 = σvi2 − λi (1)′ i (1)−1 εi i (1)−1 λi (1)′ ≤ σvi2 (see Section
2), suggesting that the OLS estimator of the parameters of the
conditional model will be more precise, leading to a more powerful test statistic. Of course, as a referee of this journal correctly points out, while indicative of higher power, this does
not imply in any way the above-mentioned effect on κ when
mini∈[1,N] ρvǫi → k ∈ (0, 1).
4.
ISSUES OF IMPLEMENTATION
4.1 Mean and Variance Correction Factors
driving power (as would be the case if positive and negative
values of ci cancel out; see Section 3.2), which calls for the
use of a two-sided test. Thus, if the researcher has little or no
feeling for the integration properties of his/her data, it is probably safest to use a two-sided test (although this is expected
to lead to a loss of power when compared with the case when
the alternative is known to be one-sided). (Needless to say, the
choice of alternative matters for interpretation of the test outcome. If the alternative is formulated as ci < 0, then a rejection
should be taken as evidence in favor of stationarity, whereas if
the alternative is formulated as ci = 0, then a rejection should
be interpreted more broadly as providing evidence against the
unit root null.)
4.3 Cross-Section Dependence
As mentioned in Remark 1, one way to accommodate crosssection dependence in the current DGP is to assume that some
of the elements in xi,t are common across i, which, if known,
will not affect the results presented so far. If there are common
covariates that are unobserved, one possibility is to follow Bai
and Ng (2004, 2010), and use estimated principal component
factors in their stead. To formalize the ideas, suppose that the
DGP is again given by (2)–(4) but that yi,t in (1) has the following
factor structure:
yi,t = θi′ dt + ′i Ft + ui,t ,
(9)
where Ft is an r-dimensional vector of common factors (or
unobserved covariates) with i being the associated vector of
factor loadings, and dt and ui,t are as before.
Assumption 4.
As pointed out in Remark 8, for√standard normal inference
(under the null), we use (tPCADF − Nµ)/σ . However, this test
statistic is not really feasible, as µ and σ 2 depend on ρ kvǫ . In
applications, this quantity therefore has to be replaced by an
estimator. A natural consistent candidate is given by ρˆ kvǫ =
N
k
ˆ ǫi /σˆ vi . (The sample correlation
i=1 ρˆvǫi /N, where ρˆvǫi = σ
coefficient between vˆi,t and ǫˆi,t can also be used to estimate
ρˆvǫi .)
Remark 12. The PCADF test can be applied regardless of
whether there are any covariates available. Without covariates,
the test is similar in spirit to the one of Levin, Lin, and Chu
(2002). The main difference lies in the definition of µ. In this
article, µ is asymptotic, whereas in Levin, Lin, and Chu (2002)
it is estimated using kernel methods, which not only complicates the computation of the test statistic, but can also lead
to poor small-sample performance (Westerlund and Breitung
2013). The PCADF test is therefore expected to be more robust
in this regard.
4.2 Critical Region
Whether the test should be one- or two-sided depends on what
one is willing to assume regarding the DGP. If µ1c is driving
power (see Sections 3.1 and 3.2) and the null is tested against
the one-sided (locally stationary) alternative that ci < 0, then
the left-tail standard normal critical values are enough. As already mentioned, most research only consider µ1c . It is therefore
standard to focus on left-tailed tests. The problem is if µ2c is
(i)
such that ||i || < ∞ and
i N is nonrandom
′
/N
→
>
0 as N → ∞;
i
i=1
i
(ii) Ft = (L)gt , where gt is iid with E(gt ) = 0,
′
E(g
E(||gt ||4 )
< ∞,
(L) =
g > 0,
∞t gt ) =
∞
n
′
′
L
,
E[F
(F
)
]
=
n
t
t
n
g
n > 0,
n=0
n=0
∞
∗
j
||
||
<
∞,
and
(1)
has
rank
r
∈
[0,
r];
n
n=0
(iii) ui,t and gt are mutually independent.
Assumption 4 is the same as in Bai and Ng (2004, 2010),
and we therefore refer to these articles for a discussion. An
important feature of the above DGP is that Ft and ui,t can have
different orders of integration (as r ∗ , the rank of the long-run
covariance matrix of Ft , is not required to be full, but can take
on any value in [0, r]). In this section, however, we focus on
testing ui,t ; see Bai and Ng (2004) for a detailed treatment of
the testing of Ft . The basic idea is exactly the same as in Bai and
Ng (2004, 2010), that is, we begin by estimating and subtracting
from yi,t an estimate of ′i Ft . Since the resulting “defactored”
yi,t is consistent for ui,t , it can be subjected to any existing panel
unit root test. While Bai and Ng (2004) considered one of the
combination of p-value type statistics of Choi (2001), Bai and
Ng (2010) considered versions of the pooled t-tests of Moon
and Perron (2004). In this section, we apply tPCADF .
Consider model 1. Under the above conditions,
yi,t = ′i Ft + ui,t ,
(10)
which is just a static common factor model for yi,t . However,
unlike the common factor model for yi,t , in the above model
Westerlund: Rethinking the Univariate Approach to Panel Unit Root Testing
both the common and idiosyncratic components are stationary.
Applying the principal components method to this model yields
ˆ i and F
t of (the space spanned by) i and Ft ,
estimates
respectively. The defactored version of yi,t is simply
the accuˆ ′i F
n ; uˆ i,t = tn=2 u
i,n
i,n = yi,n −
mulated sum of u
for t = 2, . . . , T and uˆ i,1 = 0. The resulting PCADF test statis∗
, is just as before but with yi,t replaced by
tic, denoted tPCADF
uˆ i,t . In model 2, yi,t has a nonzero mean (given by θ2 ). In this
case, we therefore demean yi,t prior to application of principal
components.
Proposition 1. Under Assumptions 1–4, as N, T → ∞ with
N/T → 0 and κ > 1/6,
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
∗
tPCADF
−
√
Nµ ∼
2
j =1
N 1/2−j κ ((ρ −1vǫ − ρ 1vǫ )r2j − ρ 1vǫ r1j )
r22
+ N(0, σ 2 ).
2
According to Proposition 1, the PCADF statistic based on
the defactored data has the same asymptotic distribution as the
original test statistic in the case without common factors. In
other words, the defactoring has no effect on the local power of
the test for a unit root in ui,t .
+ N 1/2−2κ ρ 1vǫ
4.4 Selecting the Covariates
Of course, one may argue that in applications the above results
are somewhat “idealized,” in the sense that the covariates in xi,t
might be difficult to find. However, we argue that this criticism
need not be too much of a problem. There are a number of
reasons for this.
• Most variables in economics and finance are correlated, a
finding with ample theoretical support. Indeed, as Pesaran,
Smith, and Yamagata (2013) argued, in these fields it is
actually difficult to find variables that are uncorrelated. For
example, in testing for unit roots in a panel of real outputs,
one would expect the shocks to output to also manifest
themselves in employment, consumption, and investment.
In the case of testing for unit roots in inflation, one would
expect the shocks to inflation to also affect short-term and
long-term interest rates. Hence, given the availability of
panel data, candidate covariates should be relatively easy
to find. Also, as pointed out in Section 1, typically the unit
root testing is a part of the analysis of multiple variables,
in which case relevant covariate candidates are particularly
easy to find.
• Pretesting for covariate relevance is very simple. Indeed,
because of the differing orders of magnitude of the associated variables, the OLS estimators of ρi and i in (6) are
asymptotically uncorrelated, suggesting that we do not lose
generality by considering a separate hypothesis test for i .
ˆ i of i is asymptotically normal (a
The OLS estimator
formal proof is available upon request), suggesting that the
testing can be carried out in the usual manner using, for
example, a Wald test.
• As already pointed out (see Remark 6), the number of
included covariates for each unit can differ. Hence, since
the only thing that matters for power is the information
content of the average covariate, as measured by ρ −1vǫ and
435
ρ 1vǫ (see Theorem 1), there can even be units where xi,t =
{∅}. Needless to say, this flexibility is a great advantage in
practice, as it allows one to selectively pick those covariates
for each unit that are most relevant/readily available.
• As long as the lag augmentation order is larger than p
(the true order), there is no need to pinpoint p. Indeed, if
λi (L) = 0, such that the covariates are absent, the asymptotic distribution of the (unrestricted) PCADF statistic is
still the same as in Theorem 1. This means that the asymptotic “price” of including redundant lags is zero, a result
that is verified by our simulations (see in Section 5). Similarly, the price of including redundant covariates (contemporaneously and/or in lagged form) is also zero. Erroneous
omission of covariates is, on the other hand, more problematic, as in this case Theorem 1 need not hold. However,
even in situations such as this there is still “hope,” in that
λi (L) does not have to be zero; if λi (L) = λ0i , such that xi,t
only enters the equation for vi,t contemporaneously, then
Theorem 1 continues to hold even if xi,t is omitted (a proof
is available upon request). As a rule, though, all covariates
that are significant should be included in the testing. Then
there is also the fact that if one would like to entertain the
possibility of (omitted) covariates, one is likely to be better
off using the covariates available rather than no covariates at all (as when using a conventional univariate panel
data test).
Remark 13. In practice, the elements in xi,t need not be stationary. In such cases, we recommend first-differencing all unit
root covariates. (Most of the assumptions placed on the covariates can be relaxed as long as the number of units for which the
assumptions fail remains fixed as N → ∞. We can, for example, permit for unit root covariates, provided that the number of
unit root units is fixed. The intuition is simple; if the number
is fixed, then the faction of units with a violation goes to zero
as N → ∞, and therefore their impact on the test is going to
be negligible. In practice this means that the number of units
with a violation should be “small” relative to N.) The results
reported in Theorem 1 are unaffected by this. Thus, just as in
cointegration analysis, unless the order of integration of xi,t is
known, it should be pretested for unit roots. The problem is if
the order of integration of xi,t is misspecified. Hansen (1995)
showed that while erroneous inclusion of unit root covariates
invalidates the test, over-differencing only results in mild power
losses. He therefore recommended taking first differences not
only of all unit root covariates but also of all near unity root covariates, and so do we. (Some simulation results for the case of
under/over-differenced covariates are available upon request.)
5.
SIMULATIONS
In this section, we investigate the small-sample properties
of the (nondefactored) PCADF test through a small simulation
study using (1)–(5) as DGP. For simplicity, we assume that
m = 1, θi = 0, γi = 0, (ǫi,t , εi,t ) ∼ N(0, I2 ), ui,0 = xi,0 = 0,
κ = 1/2, and ci ∼ U (a, b). Note in particular how the mean
and variance of ci can be written in terms of a and b as
µ1c = (a + b)/2 and µ2c − µ21c = (b − a)2 /12, respectively.
For our theory to provide an accurate description of actual test
behavior, the values of a and b cannot be “too large,” as this will
436
Journal of Business & Economic Statistics, July 2015
Table 2. Size when ρvǫ = 1
N
T
t+
VˆNT
VNT
tPCADF
10
10
10
20
20
20
40
40
40
100
200
400
100
200
400
100
200
400
6.4
5.9
5.4
5.5
5.4
6.5
5.7
5.6
5.4
Model 1
2.0
1.7
2.3
2.6
3.5
3.2
4.3
3.8
3.2
4.5
4.9
4.6
5.1
5.7
5.2
5.0
5.0
5.2
8.6
7.1
6.2
7.2
6.4
6.9
9.1
6.7
6.3
4.5
4.9
4.6
5.1
5.7
5.2
5.0
5.0
5.2
9.5
7.7
6.9
10.6
8.2
7.1
13.1
8.1
6.5
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
Model 2
10
10
10
20
20
20
40
40
40
100
200
400
100
200
400
100
200
400
5.2
5.1
5.3
4.2
5.3
5.7
4.1
4.7
4.7
2.5
2.7
3.5
2.9
4.3
4.8
2.7
4.7
4.7
NOTES: t+ and VˆNT refer to the tests of Moon and Perron (2008), and Moon, Perron,
and Phillips (2007), respectively, VNT refers to the power envelope for the model without
covariates, and ρvǫ refers to the (homogenous) correlation between vi,t and ǫi,t . See Table
1 for an explanation of models 1 and 2.
√
tend to activate the Op (µ1c / N ) and Op (µ3c /N) remainders
in the asymptotic distribution reported
in Theorem 1. Similarly,
√
in order not to activate the Op ( N/T ) remainder, in the simulations we set T >> N. The presence of serial correlation
did not have any major effects on the results, and we therefore also set φi (L) = i (L) = 1 and λi (L) = λ0 . (For example,
with φi (L) = 1 − φ1 L and φ1 = 0.5 the results for the PCADF
test were basically indistinguishable from the ones already in
the article. Another advantage of focusing on the results for the
case without serial correlation is that enables comparison with
the power envelope and point optimal test of Moon, Perron, and
Phillips (2007).) Thus, in this DGP, σǫi2 = 1 and σvi2 = λ20 + 1,
2
2
= ρvǫ
= 1/(λ20 + 1).
and hence ρvǫi
The PCADF test is constructed as left-tailed with ρˆ kvǫ in place
of ρ mvǫ when computing µ and σ 2 (see Section 4). The power
envelope for the model without covariates, denoted VNT , the
VˆNT common point-optimal test of Moon, Perron, and Phillips
(2007), and the t + test of Moon and Perron (2008) are also
simulated. (VNT is based on setting ci in the test to −0.5. Moon,
Perron, and Phillips (2007) also considered (in our notation)
ci = −1 and ci = −2; however, in our simulations ci = −5
generally led to the best performance.) All tests are carried out
at the 5% level, and the number of replications is set to 3000.
All powers are adjusted for size.
The size results for the case when ρvǫ = 1 are reported in
Table 2. (As alluded to in Remark 8, the asymptotic distribution
of the PCADF test is asymptotically invariant with respect to
ρvǫ , a result that is supported by our (unreported) simulation
results. Hence, since under the null the value of ρvǫ is irrelevant,
in Table 2 we focus on the case when ρvǫ = 1.) If asymptotic
theory is a reliable guide to the small-sample behavior of the
tests, all sizes should be close to 5%. In agreement with this,
we see that while generally oversized, as expected, the distortions of the PCADF test tend to diminish with increases in T.
Conversely, the distortions increases with decreases in T; therefore, the distortions for T < 100 are generally larger than those
reported in Table 2. Another observation is that, while t + and
PCADF are oversized, VˆNT is undersized. However, the distortions are generally not larger than that they can be attributed to
simulation uncertainty. Indeed, with 3000 replications the 95%
confidence interval for the size of the 5% level tests studied here
(in %) is [4.2, 5.8].
The power results reported in Tables 3 and 4 can be summarized as follows:
• Power is usually above what is predicted by asymptotic
theory, as obtained by simulating the asymptotic distribution given in Theorem 1 with all parameters set to their
values in the DGP, especially for ρvǫ close to one. However, the discrepancy diminishes with increases in N and
T.
• The asymptotic distributions of VˆNT , t + , and tPCADF
√ in
model 1 when
by µ1c / 2 +
√ a = b√and ρvǫ = 1 are given √
N(0, 1), 3 5µ1c /3 51 + N(0, 1), and
30µ1c /16 +
N(0, 1), respectively. (The asymptotic distributions of t +
and VˆNT are given in Moon, Perron, and Phillips (2007,
sec. 4.1).) Consistent with this we see that VˆNT is generally most powerful, at least among the larger values of
N, followed by t + and then tPCADF . However, we also see
that there is a large range of empirically relevant values for
N and T where the difference in power is not that large.
The PCADF test therefore performs well even when the
covariates are irrelevant.
• As expected, the power of the PCADF test for the case
when ρvǫ = 1 is mainly driven by µ1c . However, there
is also a second-order effect working through variance of
ci . In particular, both the empirical and theoretical power
seem to be decreasing in |a − b|. This is illustrated in
Table 3, which reports power for a = −4 and b = 0, and
a = b = −2. Thus, while µ1c is the same in the two cases,
in the former the variance is larger (4/3 as compared to
zero).
• Since κ = 1/2 in the simulations, when ρvǫ = 1 in model
2 none of the tests considered, including PCADF, should
have any power beyond size, and this is also what we see
in Table 3.
• As ρvǫ is reduced, the relative power of the PCADF test
increases. This is seen in Table 4. Take, for example, the
case when ρvǫ = 0.3 in model 1, in which the power of
the PCADF test is almost two times as large as the power
envelope for the model without covariates, and it is almost
four times as large as the power of t + and VˆNT .
• As expected, the difference in power when ρvǫ < 1 is larger
in model 2 than in model 1, with PCADF being the only test
with power beyond size. In fact, according to the results
reported in Table 4, in this case the power of the PCADF
test is no less than 10 times as large as that of t + and
VˆNT . Of course, since the power of the two latter tests
is negligible, while the power of the former is not, the
Westerlund: Rethinking the Univariate Approach to Panel Unit Root Testing
437
Table 3. Size-adjusted power when ρvǫ = 1
a = −4, b = 0
N
T
10
10
10
20
20
20
40
40
40
100
200
400
100
200
400
100
200
400
a = b = −2
+
VˆNT
VNT
tPCADF
9.7
11.2
13.0
12.2
13.3
13.9
13.7
15.9
15.8
14.7
17.7
19.3
18.8
19.7
26.2
21.8
26.1
28.6
52.8
50.7
51.1
48.3
48.1
47.6
49.2
50.6
49.9
7.6
9.0
8.9
10.5
10.7
10.6
11.1
11.4
11.0
t
Theory
Model 1
8.6
7.0
7.9
8.5
7.9
8.9
9.5
9.6
10.0
+
VˆNT
VNT
tPCADF
Theory
9.9
12.1
13.2
13.0
14.9
15.0
14.8
16.9
16.8
15.6
19.9
22.1
21.0
23.0
28.2
23.9
28.8
31.3
43.8
41.7
41.7
39.8
39.7
39.2
40.9
42.1
41.3
7.6
9.3
9.0
10.2
11.3
11.7
12.0
12.1
11.7
10.2
8.2
9.5
9.5
9.1
10.1
9.9
10.3
11.1
5.3
5.8
5.5
5.4
5.3
5.1
5.2
5.1
5.5
5.6
5.7
5.5
5.5
5.6
5.5
5.1
5.3
5.9
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.2
5.2
5.3
5.0
5.5
5.4
5.4
5.3
5.6
5.5
5.2
5.4
5.3
5.3
5.2
5.2
5.2
5.3
t
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
Model 2
10
10
10
20
20
20
40
40
40
100
200
400
100
200
400
100
200
400
5.5
5.9
5.5
5.4
5.2
5.1
5.3
5.3
5.6
5.9
6.2
5.8
5.6
5.7
5.6
5.4
5.5
5.9
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.5
5.3
5.5
4.8
5.5
5.5
5.3
5.4
5.6
5.6
5.3
5.6
5.4
5.4
5.3
5.2
5.3
5.3
√
NOTES: a and b are such that ρi = ci / N T , where ci ∼ U (a, b). “Theory” refers to the theoretical power of the PCADF test (see Theorem 1). See Tables 1 and 2 for an explanation of
the rest.
difference in power will increase as a and b get further
away from zero.
The results for the PCADF test based on defactored data are
not reported but we briefly describe them. First, as expected, the
size results are very close to those reported in Table 2. This is
true regardless of how i and Ft are generated. Second, power
is very close to the theoretical prediction obtained by simulating
the asymptotic distribution given in Proposition 1. Hence, power
is unaffected by the defactoring.
Table 4. Size-adjusted power when a = b = −2
ρvǫ = 0.7
ρvǫ = 0.3
N
T
t+
VˆNT
VNT
tPCADF
10
10
10
20
20
20
40
40
40
100
200
400
100
200
400
100
200
400
12.1
11.8
12.9
12.1
14.9
16.0
14.2
15.1
19.0
16.6
18.4
21.9
19.5
24.4
26.5
23.1
25.8
35.5
43.8
41.7
41.7
39.8
39.7
39.2
40.9
42.1
41.3
19.1
19.6
17.9
19.4
21.3
22.4
19.2
24.1
22.7
Theory
Model 1
16.7
13.9
15.3
15.9
15.7
16.3
17.6
17.7
18.6
t+
VˆNT
VNT
tPCADF
Theory
11.5
12.0
13.4
12.2
13.9
16.7
15.6
15.9
17.2
14.6
17.8
22.7
19.5
23.2
24.5
22.5
29.9
29.5
43.8
41.7
41.7
39.8
39.7
39.2
40.9
42.1
41.3
74.9
74.5
74.8
78.0
77.8
78.4
79.2
79.9
81.0
58.3
55.5
56.2
62.5
63.2
62.3
69.8
70.6
69.8
5.9
5.5
5.8
5.4
5.4
5.3
5.1
5.2
5.3
5.8
6.0
5.0
5.5
5.2
5.7
5.2
5.2
5.6
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
45.3
45.5
47.6
45.7
48.8
47.0
44.1
46.8
48.8
48.8
46.5
46.1
44.0
44.1
43.7
45.1
46.6
45.0
Model 2
10
10
10
20
20
20
40
40
40
100
200
400
100
200
400
100
200
400
5.4
5.4
5.6
5.2
5.7
5.5
5.5
4.9
5.5
NOTE: See Tables 1–3 for an explanation.
6.1
5.5
5.6
5.5
5.3
5.4
5.5
5.7
5.4
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
11.7
11.0
10.5
11.0
11.3
10.9
10.5
10.2
12.2
11.1
8.9
10.2
9.4
9.0
9.9
9.3
9.6
9.9
438
Journal of Business & Economic Statistics, July 2015
Overall, the simulation results suggest that our asymptotic
theory provides a useful guide to the small-sample performance
of the PCADF test. They also suggest that the PCADF test can
lead to substantial power gains when compared to existing tests,
especially in model 2. In fact, since the inclusion of irrelevant
covariates seems to cause only minor reductions in power, the
use of PCADF seems to come at little or no cost.
Downloaded by [Universitas Maritim Raja Ali Haji] at 19:53 11 January 2016
6.
CONCLUDING REMARKS
The power increasing potential of covariate augmentation in
the panel setting is interesting not only by itself but also because
of the implications it has for theoretical and applied work. For
example, the fact that in the presence of incidental trends the
use of covariates has an order effect on the shrinking neighborhoods around unity for which asymptotic power is nonnegligible
is expected to have implications for the rate of consistency for
estimation of autoregressive roots near unity (see Moon and
Phillips 2004, and the references provided therein). This possibility is currently being explored in a separate work. From an
applied point of view, the minor additional complication of having to estimate ρvǫi has a major benefit in that the precision of
the test is expected to be drastically improved. This is especially
true in the case of incidental trends, in which some authors have
even gone as far as to recommend not using some of their univariate panel data tests (see, e.g., Moon and Perron 2004, 2008).
Thus, given the availability of data on potential covariates, and
the fact that their information content does not even have to be
particularly high, the PCADF test developed here should be a
valuable addition to the already existing menu of panel unit root
tests.
1)x 2 /2 + O((px)3 ), we can further show that
ci
ci2
R1i,t +
R2i,t
κ
N T
2(N κ T )2
√ 3
ci2
T ci
+ Op
+ Op
√
N 3κ
N 2κ T
φi (L)ui,t = R0i,t +
= R0i,t
T
T
1
1
1
2
2
.
(R
R
y
)
=
(R
y
)
+
O
z d i,t−1
d i,t−1
p
T 2 t=p + 2
T 2 t=p + 2
T
Similarly, from the definitions of σˆ yi2 and σˆ ǫi2 , and the consistency
√
ˆ , and φˆ j i , we get σˆ yi2 − σyi2 = Op (1/ T ) and σˆ ǫi2 − σǫi2 =
of ρˆi ,
√i
Op (1/ T ). (The details of these calculations are available upon request.) By using this, Taylor expansion of the inverse of σˆ yi2 , and then
Rd yi,t = Rd ui,t ,
Bi,T =
xi,t − γi = − i (1)−1 ∗i (L)xi,t + i (1)−1 εi,t ,
κ
ci ∗
φ (L)ui,t−1 ,
NκT i
=
T
1
1 1
2
(R
u
)
+
O
√
d i,t−1
p
σyi2 T 2 t=p + 2
T
=
T
ci
1 1
Rd U0i,t−1 + κ Rd U1i,t−1
2
2
N T
σyi T t=p + 2
2
ci2
R
U
d
2i,t−1
2(N κ T )2
3
1
ci
+ Op
+ Op √
N 3κ
T
3
2
ci
1
j
+ Op
,
=
N −j κ ci Bj i,T + Op √
3κ
N
T
j =0
But we also have i (L) = i (1) + ∗i (L)(1 − L) (with ∗i (L) defined
similarly to λ∗i (L)), leading to
−
T
1 1
(Rz Rd yi,t−1 )2
σˆ yi2 T 2 t=p + 2
where
B0i,T =
T
1 1
(Rd U0i,t−1 )2 ,
σyi2 T 2 t=p + 2
B1i,T =
T
2 1
Rd U0i,t−1 Rd U1i,t−1 ,
σyi2 T 3 t=p + 2
B2i,T =
T
1 1
(Rd U1i,t−1 )2
σyi2 T 4 t=p + 2
where ri,t = λi (1)′ i (1)−1 εi,t + ǫi,t . Backward substitution now yields
φi (L)ui,t
t
ci t − s
1+ κ
ri,s + Op (1).
=
N T
s=p + 2
Let Upi,t = Rpi,t /φi (1), where Rpi,t = ts=p + 2 (t − s)p ri,s . By using
this and Taylor expansion of the type (1 + x)p = 1 + px + p(p −
(A.2)
+
= ρi ui,t−1 + (λi (1)′ (xi,t − γi )+ ǫi,t )+ λ∗i (L)�