07350015%2E2013%2E818008

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Uniform Inference in Predictive Regression Models
Willa W. Chen , Rohit S. Deo & Yanping Yi
To cite this article: Willa W. Chen , Rohit S. Deo & Yanping Yi (2013) Uniform Inference in
Predictive Regression Models, Journal of Business & Economic Statistics, 31:4, 525-533, DOI:
10.1080/07350015.2013.818008
To link to this article: http://dx.doi.org/10.1080/07350015.2013.818008

Published online: 23 Oct 2013.

Submit your article to this journal

Article views: 236

View related articles

Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20

Download by: [Universitas Maritim Raja Ali Haji]

Date: 11 January 2016, At: 22:19

Uniform Inference in Predictive Regression
Models
Willa W. CHEN
Department of Statistics, Texas A&M University, College Station, Texas 77843 (wchen@stat.tamu.edu)

Rohit S. DEO
New York University, 44 W 4th St., New York, NY 10012 (rdeo@stern.nyu.edu)

Yanping YI

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016

School of Statistics and Management, Shanghai University of Finance and Economics, 200433 Shanghai,
People’s Republic of China (yi.yanping@mail.shufe.edu.cn)
The restricted likelihood has been found to provide a well-behaved likelihood ratio test in the predictive
regression model even when the regressor variable exhibits almost unit root behavior. Using the weighted

least squares approximation to the restricted likelihood obtained in Chen and Deo, we provide a quasi
restricted likelihood ratio test (QRLRT), obtain its asymptotic distribution as the nuisance persistence
parameter varies, and show that this distribution varies very slightly. Consequently, the resulting sup
bound QRLRT is shown to maintain size uniformly over the parameter space without sacrificing power.
In simulations, the QRLRT is found to deliver uniformly higher power than competing procedures with
power gains that are substantial.
KEY WORDS: Efron curvature; REML; Sup bound.

1.

INTRODUCTION

The basic predictive regression model is given by
 
ut
Yt = η + βXt−1 + ut
∼ iid N (0, ) ,
vt
Xt = μ + αXt−1 + vt


(1.1)

where X0 = 0 and α ∈ (0, 1]. In this model, it is well known
that carrying out inference on β using standard procedures, such
as t-statistics or the usual likelihood ratio test, is problematic,
particularly when the nuisance autoregressive parameter α in
the regressor Xt is close to unity, as is often the case in empirical applications. Chen and Deo (2009a) showed that the reason
why the standard inference procedures fail is primarily due to
the presence of the nuisance intercept parameter μ and not as
much due to the autoregressive parameter α, as was generally
thought. Hence, they proposed using the restricted likelihood
(RL), which is the exact likelihood of any linear transformation
of the data (Yt , Xt ) that eliminate the intercepts η and μ (the
linear transformations are not unique, though the RL is unique
up to a multiple). Chen and Deo (2009a) showed that the RL
possesses low Efron curvature, due to which it yields a well
behaved restricted likelihood ratio test (RLRT). When the autoregressive parameter α is in the stationary region, they show
that the error in the χ 2 distribution approximation to that of the
RLRT is very small and, more importantly, does not depend on
α, that is, the distribution of the RLRT is second-order pivot

with respect to α. As a consequence, the deviation of the RLRT
distribution from χ 2 would not be expected to be substantial as
α approached unity, an expectation that was supported in their
simulations. This finding suggests that a sup-bound approach to
inference for β based on the distribution of the RLRT across
different configurations of α will result in a testing procedure
that, while maintaining correct size, will not result in a substan-

tial power loss. However, Chen and Deo (2009a) did not obtain
the limiting distribution of the RLRT when α is close to the unit
root. As a consequence, the current theory does not allow for
uniform inference over the entire parameter space of α.
In this article, we consider the weighted least squares approximate restricted likelihood (WLSRL) of vector autoregressive
(VAR) processes (Chen and Deo 2010) which approximates the
exact RL and has the virtue of yielding an explicit weighted
least squares estimator of the AR coefficient matrix. Chen and
Deo (2010) showed that this WLSRL estimator is asymptotically equivalent to the exact RL maximum likelihood estimator
and shares its superior bias properties. We obtain the limiting
distribution of the WLSRL estimators of the predictive regression model (1.1) under three scenarios for the AR coefficient
α, viz. stationarity, moderate deviations from unity and local to

unity. We then obtain the asymptotic distribution of the resulting quasi-RLRT (QRLRT), based on the WLSRL, under these
scenarios. Since the WLSRL approximates the RL, the discussion above suggests that this limiting distribution also should
not vary too much as α varies. Indeed, we find in simulations
that the resulting sup-bound test based on the QRLRT maintains
size over the entire parameter space without significant power
loss and has uniformly higher power than the Jansson Moreira (2006) test, with power gains that can be substantial. The
asymptotic distribution of the QRLRT is also obtained under
local alternatives allowing for varying degrees of persistence in
the autoregressive parameter.
In the next section, we introduce the WLSRL for the predictive regression model and obtain the limiting distributions of

525

© 2013 American Statistical Association
Journal of Business & Economic Statistics
October 2013, Vol. 31, No. 4
DOI: 10.1080/07350015.2013.818008

526


Journal of Business & Economic Statistics, October 2013

the resulting WLSRL estimators, comparing them to the limiting distributions of the usual ordinary least squares (OLS)
estimators. In Section 3, we obtain the limiting distribution of
the QRLRT under the three scenarios mentioned above for α and
define the sup-bound test procedure for inference on β, which
yields uniform inference over the entire nuisance parameter
space. We also obtain the limiting distribution of the QRLRT
under various specifications of local alternatives for β. Finally,
in Section 4 we evaluate finite sample performance of the supbound QRLRT test through a simulation study and compare its
performance to the procedure of Jansson and Moreira (2006).
All proofs are relegated to the appendix at the end.

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016

2.

WEIGHTED LEAST SQUARES APPROXIMATED
LIKELIHOOD


Though the RL of AR processes is known to have good properties (see Chen and Deo 2009a, 2009b, 2012), it is difficult to
use it in vector AR series, since optimizing the RL is not easy due
to its nonlinear nature. Hence, Chen and Deo (2010) obtained
a weighted least squares approximation to the RL of vector AR
processes, which yields weighted least squares estimators that
are easy to compute and which share the good properties of
the exact RL estimators. Letting Zt = (Yt , Xt )′ , the predictive
regression model (1.1) implies that Zt is a bivariate AR process,
allowing us to use Chen and Deo (2010) to obtain the WLSRL
of Zt , as follows.
We start by defining some notation. For t = 2, . . . , n and any
time series ξt , we will denote the corresponding sample mean
corrected series and its lag series as,
n

ξt,μ̂ = ξt −

1 
ξu ,
n − 1 u=2


n

ξt−1,μ̂ = ξt−1 −

1 
ξu−1 .
n − 1 u=2

Furthermore, denote the series with the first observation subtracted as
s = 1, . . . , n.

ξs,d = ξs − ξ1 ,
For t = 2, . . . , n, let

ǫt,μ̂ (β, α) =
rt (β, α) =




ut,μ̂
vt,μ̂



=



Yt,μ̂ − βXt−1,μ̂
Xt,μ̂ − αXt−1,μ̂


Yt,d − βXt−1,d

Xt,d − αXt−1,d



(β̂WLS , α̂WLS )′


n

2
ˆ −1 +
Xt−1,
=⎣
μ̂ 
t=2



ˆ −1
× 

⎤−1
2
 n

1

n ⎦
Xt−1,d W
n − 1 t=2


n
n

1  
Xt−1,μ̂ Zt,μ̂ +
Xs−1,d
Zt,d .
Wn
n−1
t=2
s=2
t=2

n


(2.3)

Theorem 1. Under the model in (1.1), the WLSRL estimator
given in (2.3) has the following limiting distributions:

(2.1)

where
n


The objective function QOLS is the usual least squares function
which uses the sample mean corrected data. The second term,
QW , in the WLSRL function Qn (α, β) serves as a correction,
with the magnitude of correction depending on its weight function Wn , which in turn depends on α. Lemma 2 in the Appendix
shows how this weight
√ function Wn changes as α = 1 − c/kn
varies. When kn ≪ n, the sample mean X is a consistent
estimator of μ (Giraitis and Phillips 2012) and so estimators
based on sample mean corrected data will behave well. In this
situation, as Lemma 2 shows, the weight function Wn is asymptotically negligible and the WLSRL behaves
√ approximately like
the OLS function. However, when kn ≫ n, the sample mean
is not a consistent estimator of μ (Giraitis and Phillips 2012) but
the weight function Wn is nonnegligible. As a matter of fact, it
results in a correction term QW which is such that the WLSRL
estimator of α is asymptotically the same as the OLS estimator computed with Xt,d = Xt − X1 , and so achieves location
invariance without depending on the sample mean.
Letting Zt = (Yt , Xt )′ , the WLSRL estimator of (β, α)′ is the
minimizer of Qn (β, α) and is given by

and

 , β̂, α̂) be any consistent estimators of (, β, α). For
Let (
example, such consistent estimators can be obtained from OLS
estimation of model (1.1). From Lemma 1 of Chen and Deo
(2010), the WLSRL of model (1.1) based on the observations
(Yt , Xt )′ , t = 1, . . . , n is given by

QOLS (β, α) =

Wn (, β, α) = { + (n − 1)(I − H )(I − H ′ )}−1 ,


0 β
.
(2.2)
H =
0 α

The next theorem gives the limiting distribution of the WLSRL
estimator. The parameter ρ = Corr(ut , vt ) will play an important role in all of our theoretical results.

.

Qn (β, α) = QOLS (β, α) + QW (β, α),

n = Wn (
 , β̂, α̂) with
and W

ˆ −1 ǫt,μ̂ (β, α),
ǫt,′ μ̂ (β, α)

t=2


 n

 n


1

rt (β, α) ,
rs (β, α) Wn
QW (β, α) =
n − 1 s=2
t=2

(i) If α ∈ (−1, 1) and α is fixed,




√ β̂WLS − β
−−−→ N 0, (1 − α 2 ) .
n
α̂WLS − α
D

(ii) If α = 1 − c/kn , where kn = nλ , λ ∈ (0, 1), then
  β̂WLS − β 


−−−→ N 0, 2c .
nkn
α̂WLS − α
D

(iii) If α = 1 − c/n,




β̂WLS − β
1 − ρ2 ρ
n
−−−→
D
α̂WLS − α
0
1


 2
−1/2
Jc,μ (r)dr
Z
×  2
,
−1 
Jc (r)dW (r)
Jc (r)dr

Chen, Deo, and Yi: Uniform Inference in Predictive Regression Models

where
Jc (r) =



1

e−(r−u)c dW (u),

Jc,μ (r) = Jc (r) −

0



r

Jc (s)ds
0

and W (r) is Brownian Motion that is independent from
Z ∼ N(0, 1).

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016

The result above shows that regardless of the value of α, the
limiting distribution of the WLSRL estimator of α is identical
to that of the OLS estimator of α in a model without intercept.
It is well known that when α is close to unity, this distribution is
markedly different from that of the OLS estimator in the model
with an intercept, with significantly smaller bias and variance. In
the next section, we study the QRLRT for testing the hypothesis
that β = 0.
3.

THE QUASI-RESTRICTED LIKELIHOOD
RATIO TEST

As pointed out in the Introduction, the WLSRL approximates
the exact RL of a vector AR process, and hence can be used
to construct a quasi-RL ratio test to test any hypothesis on the
parameters in the coefficient matrix. In the predictive regression
context, one is interested in testing the null hypothesis of no
predictability, that is, in testing H0 : β = 0. To compute the
corresponding QRLRT for this hypothesis, one needs to first
obtain the constrained estimate of the nuisance parameter α
under the hypothesis β = 0. This constrained estimator is the
minimizer of Qn (0, α) and is given by

2 ⎫−1
 n
n





22
2
0
+
Xt−1,
= σ̂ 22
X
α̂WLS
t−1,d
μ̂


n−1
t=2

×

×


n
t=2

 n


t=2

Xt−1,μ̂ (σ̂ 21 Yt,μ̂ + σ̂ 22 Xt,μ̂ ) +
Xs−1,d

 n

t=2

s=2

1
n−1


ŵ21 Yt,d + ŵ22 Xt,d

, (3.1)

ˆ −1 and W
n , respecwhere σ̂ ij and ŵij are the (i, j )th entries of 
tively. The following lemma provides the asymptotic distribution of this constrained estimator when α lies in different parts
of the parameter space.
Lemma 1. When β = 0, we get
(i) If α ∈ (−1, 1) and α is fixed,



√  0
n α̂WLS − α −−−→ N 0, (1 − α 2 )(1 − ρ 2 )
D

(ii) If α = 1 − c/kn , where kn = nλ , λ ∈ (0, 1),

 0



nkn α̂WLS
− α −−−→ N 0, 2c(1 − ρ 2 )
D

(iii) If α = 1 − c/n,


(1 − ρ 2 gc,ρ ) Jc (λ)dW (λ)

− α −−−→
n
D
Jc2 (λ)dλ
1/2 
ρgc,ρ 1 − ρ 2 gc,ρ Z

,
+
Jc2 (λ)dλ


0
α̂wls



527

where Jc (r), W (r), and Z are defined as in Theorem 1
and
−1 

( Jc (λ)dλ)2
1 − Jc2 (λ)dλ
.
gc,ρ =

−1 
1 − ρ 2 Jc2 (λ)dλ
( Jc (λ)dλ)2
A QRLRT statistic for testing H0 : β = 0 based on the
WLSRL may now be defined as


0
− Qn (β̂WLS , α̂WLS ),
(3.2)
n = Qn 0, α̂WLS

and the next theorem provides its asymptotic distribution.

Theorem 2. For the QRLRT defined above, we have, under
H0 : β = 0,
(i) If α ∈ (−1, 1) and α is fixed, n −−−→ χ12 .
D

(ii) If α = 1 − c/kn , where kn = nλ , λ ∈ (0, 1), n −−−→
D

χ12 .
(iii) If α = 1 − c/n, n −−−→ c,ρ , where
D

c,ρ


2

1/2
2
:= ρgc,ρ τc + 1 − ρ gc,ρ Z
,


Jc (λ)dW (λ)
,
τc = 
Jc2 (λ)dλ

and Z ∼ N (0, 1) is independent from τc and gc,ρ is defined as in Lemma 1.
Remark 1. By its definition, 0 < gc,ρ ≤ 1 and the equality
holds only if |ρ| = 1.
Remark 2. The random variable τc is the limiting distribution
of the t-statistic for testing α in a local-to-unity AR(1) process
with zero mean. As c → ∞, τc −→D N (0, 1) (Bobkoski 1983;
Phillips 1987) and, from Phillips (1987), gc,ρ −→p 1. Thus
c, ρ −→D χ12 , as c → ∞.
Remark 3. Larsson (1995) showed that |P (τc2 > x) −
P (χ12 > x)| = o(1) as x → ∞. Thus the right tail of c,ρ should
be very close to that of a χ12 .
Remark 4. Note that the limiting distribution of QRLRT in
Theorem 2 above is obtained under the assumption of the null
hypothesis H0 : β = 0, which is the commonly considered hypothesis of no predictability. If one wished to test an arbitrary
value of β, say H0 : β = b for some b = 0, then one would
simply replace the Yt values by Yt − bXt−1 and then test the
H0 : β = 0. Hence, there is no loss of generality in focusing on
the null value of 0 for β.
The theorem above shows that the QRLRT distribution
changes depending on the value of the nuisance parameter α.
Hence, to have a test that controls size over the entire nuisance
parameter space, one would have to work with the sup-bound
critical value, with the supremum taken over all possible values
of α; see, for example, Cavanagh, Elliot, and Stock (1995). More
specifically, for a given ρ and level of significance δ, define
q ρ,δ = sup{qρ,δ (c) : P ( c,ρ > qρ,δ (c)) = δ}.
c≥0

528

Journal of Business & Economic Statistics, October 2013

Table 1. P ( c,ρ > q ρ,0.05 ) in percentage δ = 5%
ρ/c

0

1

3

5

7

9

χ12

0.98
0.75
0.50

5%
5%
5%

4.66%
4.57%
4.78%

4.31%
4.16%
4.55%

4.21%
4.01%
4.47%

4.16%
3.94%
4.43%

4.16%
3.95%
4.42%

4.13%
3.91%
4.40%

NOTE: Based on 1,000,000 replications simulated from the distribution of (c, ρ). Each replication of (c, ρ) is generated by approximating the relevant stochastic integrals in it using
32,768 (215 ) observations from a Gaussian white-noise.

Since, as pointed out in Remark 2, the local-to-unity limiting
distribution of the QRLRT transitions to the limiting chi-square
distribution as c → ∞, the results of Theorem 2 imply that
lim sup Pβ=0 ( n > q ρ,δ ) ≤ δ

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016

n→∞ α

α ∈ (−1, 1].

Though ρ is not known, it is trivial to obtain a consistent estimator of it and hence of q ρ,δ , which we denote by q ρ̂,δ . We thus
get the following theorem:

lim sup Pβ=0 ( n > q ρ̂,δ ) ≤ δ

n −−−→ χ12 {b2 /(2c)}
D

a noncentral χ12 with noncentrality parameter b2 /(2c).
(iii) If α = 1 − c/n and β = b/n, then n −−−→ c,ρ (b),
D

where



1/2
c,ρ (b) = ρgc,ρ
τc +

Theorem 3. For testing H0 : β = 0,
n→∞ α

(ii) If √
α = 1 − c/kn , where kn = nλ , λ ∈ (0, 1), and β =
b/ nkn , then

α ∈ (−1, 1].

Theorem 3 defines our sup-bound test, which maintains size
over the nuisance parameter space, with q ρ̂,δ as the required
sup-bound critical value. This sup-bound critical value can be
easily found by simulation and, as a matter of fact, we found in
our extensive simulation study that for a given δ and ρ, qρ,δ (c)
is monotone decreasing in c, that is,
q ρ,δ = sup{qρ,δ (c) : P ( c,ρ > qρ,δ (c)) = δ} = qρ,δ (0).
c≥0

This phenomenon of the sup-bound occurring at the unit root
was also noticed in the sup-bound tests considered in Cavanagh,
Elliot, and Stock (1995).
In general, the drawback of a sup-bound test is that the test
can be too conservative if the null distribution changes a great
deal over the nuisance parameter space (see Cavanagh, Elliot,
and Stock 1995), resulting in significant power losses. However,
Remark 3 above shows that the distributions of the QRLRT for
different values of α look quite similar in the right tail. As a
consequence, the sup-bound test for the QRLRT should not be
expected to be very conservative. Table 1 below demonstrates
this by computing the tail probability of c,ρ beyond sup-bound
critical values q ρ,0.05 for different values of c and ρ at δ = 5%. It
is seen that though the sup-bound test can become conservative,
it is not very undersized, suggesting that any power losses will
not be significant. This is indeed found to be the case in our
simulation study presented in Section 4.
In the next theorem, we establish the limiting distributions
of the QRLRT statistic
√ √under the local alternatives Ha : β =
b/sn , where sn = n, nkn and n for α that is fixed, deviates
moderately from unity and is local to unity, respectively.
Theorem 4. For the QRLRT n , we have

(i) If α is fixed and β = b/ n then
n −−−→ χ12 {b2 /(1 − α 2 )},
D

a noncentral χ12 with noncentrality parameter b2 /(1 − α 2 ).

+

1/2
bgc,ρ




1 − ρ 2 gc,ρ Z
Jc2 (r)dr

2

where Z, τc , and gc,ρ are defined as in Theorem 2.
3.1 The AR(p) Regressor Case
The above results in Theorem 2 for the QRLRT also extend
easily to the more general case where the regressor series follows
an AR(p) process. Most specifically, let
Yt = η + βXt−1 + ut

(3.3)

Xt = μ + α1 Xt−1 + · · · + αp Xt−p + vt .
!p
Letting
!p α = i=1 αi and γ = (γ1 , . . . , γp−1 ), where γs =
− j =s αj , we can rewrite Xt in the Dickey-Fuller form,
Xt = μ + αXt−1 + γ1 ∇Xt−1 · · · + γp−1 ∇Xt−p+1 + vt .

The expressions of the WLSRL Qn (β, α, γ ) along with the

)′ and the constrained estimator
WLSRL estimator (β̂, α̂WLS , γ̂WLS
0
0′ ′
under H0 : β = 0, (α̂WLS , γ̂WLS ) are given by Equations (A.1),
(A.2), and (A.3) in Appendix A. The proof of the following
theorem is similar to that of Theorem 2, except it is much more
lengthier, we thus omit it.
Theorem 5. Let Qn (β, α, γ ) be the WLSRL objected func′
)′ be the WLSRL estimation under model 3.3, (β̂, α̂WLS , γ̂WLS
0
0′ ′
tor and (α̂WLS , γ̂WLS ) be the WLSRL estimator under constraint
β = 0. Then the QRLRT


0
0
, γ̂WLS
− Qn (β̂WLS , α̂WLS , γ̂WLS )
n = Qn 0, α̂WLS
has the same limiting distributions as stated in Theorem 2.

We now turn toward studying the finite sample size and power
of our procedure through simulations.
4.

SIMULATION STUDY

In this section, we carry out Monte Carlo studies to compare
the size and power of our sup-bound-based QRLRT test given

Chen, Deo, and Yi: Uniform Inference in Predictive Regression Models

Table 2. Rejection rates for testing H0 : β = 0. True β =

b



529

1−ρ 2
,
n

ρ = −0.98, α = 1 − nc , nominal size = 5%, 5000 replications

b=0
c
0

1

5

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016

10

20

b = 25

b = 50

n

QRLRT

JM

QRLRT

JM

QRLRT

JM

100
200
400
100
200
400
100
200
400
100
200
400
100
200
400

0.0526
0.0534
0.0522
0.0472
0.0460
0.0440
0.0400
0.0400
0.0394
0.0416
0.0378
0.0396
0.0460
0.0368
0.0380

0.0462
0.0544
0.0448
0.0506
0.0522
0.0500
0.0492
0.0500
0.0568
0.0560
0.0500
0.0570
0.0610
0.0570
0.0622

0.9514
0.9536
0.9552
0.8832
0.8910
0.8908
0.2556
0.2658
0.2620
0.1598
0.1574
0.1512
0.1182
0.1052
0.0994

0.7718
0.7846
0.7906
0.4796
0.4688
0.4680
0.0782
0.0828
0.0886
0.0724
0.0650
0.0728
0.0646
0.0634
0.0684

1
1
1
0.9998
1
1
0.9872
0.9922
0.9908
0.6682
0.6752
0.6684
0.3428
0.3198
0.3138

0.9852
0.9940
0.9992
0.9990
0.9990
0.9996
0.5454
0.5352
0.5124
0.1110
0.1016
0.1044
0.0784
0.0770
0.0804

in Theorem 3 with the test proposed in Jansson and Moreira
(2006) (henceforth, JM). The data were generated from

 
 
1 ρ
Yt = βXt−1 + ut
ut iid
c
α =1−
,
∼ N 0,
n
ρ 1
Xt = αXt−1 + vt
vt
where the intercept in each equation was set to zero without loss
of generality, since our test is location invariant. As mentioned
earlier in Section 3, initial simulation results showed that for
a given pair (ρ, δ), the sup-bound critical value was qρ,δ (0),
that is, the supremum, over c, of the critical value was attained
at c = 0. Hence, for each replication, we first estimated ρ and
then used qρ̂,δ (0) as the critical value for our test. Furthermore,
since Chen and Deo (2010) reported that the iterated WLSRL
estimates, obtained by using the first step WLSRL estimates in
the weight function Wn , have better finite sample behavior, we
computed the QRLRT using these iterated WLSRL estimates.
In Table 2, we compare our test procedure with JM’s conditional test for ρ = −0.98. This large negative value of ρ was
chosen to reflect the kind of values observed empirically (see
Chen and Deo 2006). Simulation studies not reported here for

Table 3. Rejection rates for testing H0 : β = 0. True β =
c

c=0

b



other values of ρ yielded results similar in nature to those reported here for ρ = −0.98. The sample size n was set to be
100, 200, and 400 with c = 0, 1, 5, 10,√and 20. Following JM’s
b

1−ρ 2

, where b = 0, 25,
set-up, the parameter β was set to be
n
and 50. The rejection rates when b = 0 correspond to the size
of the tests and when b = 25 and 50, to the power against local alternatives. Not surprisingly, it is seen from the table that
the size of our test is slightly below the nominal level when c
takes large value, since QRLRT is asymptotically conservative.
However, despite the conservative size, the power of our test is
uniformly higher than that of the JM test, with the power gains
being very substantial at times.
In Table 3, we provide, for comparison, simulation results
for the configuration presented in Table II on page 702 of JM.
Following the settings in Table II of JM, the correlation ρ is
set to −0.5 and 0.5, while the sample size is set at n = 1000
with 500 replications. Once again, it is seen that the QRLRT
maintains size while delivering power that is either identical to
or higher than that of the JM test, with several power gains being
very substantial.

1−ρ 2
,
n

α = 1 − nc , nominal size = 5%, n = 1000, 500 replications

c=5

c = 10

c = 15

ρ

b

QRLRT

JM

QRLRT

JM

QRLRT

JM

QRLRT

JM

–0.5

0
5
10
15
0
5
10
15

0.040
0.488
0.892
0.990
0.062
0.398
0.728
0.908

0.054
0.424
0.826
0.946
0.046
0.412
0.602
0.720

0.042
0.186
0.620
0.904
0.052
0.196
0.504
0.746

0.058
0.102
0.338
0.702
0.036
0.132
0.220
0.274

0.056
0.138
0.438
0.772
0.064
0.150
0.372
0.622

0.042
0.076
0.180
0.392
0.018
0.054
0.122
0.196

0.046
0.112
0.316
0.646
0.058
0.114
0.300
0.508

0.066
0.080
0.154
0.230
0.022
0.040
0.088
0.082

0.5

530

Journal of Business & Economic Statistics, October 2013

APPENDIX A: MODEL WITH AN AR (p) REGRESSOR
We start by defining some notation. For t = p + 1, . . . , n
and any time series ζt , we will denote the corresponding sample
mean corrected series and its lag series as,
ζt,μ̂ = ζt −

n

1
ζu ,
n − p u=p+1

ζt−1,μ̂ = ζt−1 −

1
n−p

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016

∇ζt−k,μ̂ = ∇ζt−k −
For t = p + 1, . . . , n, let


ut,μ̂
ǫt,μ̂ (β, α, b) =
vt,μ̂


=⎜
⎝X

t,μ̂

n


H1 =

u=p+1

n

1
∇ζu−k .
n − p u=p+1

Yt,μ̂ − βXt−1,μ̂

− αXt−1,μ̂ −

k=1

D = ((I − H1 − H2 )′ . . .

× n − p(I − H1 )′ )

(I − H1 − Hp )′

and

ζu−1 ,

p−1


n = Wn (
 , β̂, α̂, γ̂ ) with Wn (, β, α, γ ) = {Ip ⊗  +
and W
D ′ D}−1 , where

γk ∇Xt−k,μ̂



0

β

0

α



,

Hs =



0

0

0 γs−1



s = 2, . . . , p .

,

For the WLSRL estimator of (β, α, γ ), define

n

1
VZ = ⎝Z2,d , . . . , Zp,d , √
Zt,d ⎠ ,
n − p t=p+1





n

1
U = ⎝X1,d , . . . , Xp−1,d , √
Xt−1,d ⎠ ,
n − p t=p+1




⎟.


Furthermore, denote the series with the first observation
subtracted as, ζs,d = ζs − ζ1 , s = 1, . . . , n, as before. Define
R(β, α, γ ) = (R1 , . . . , Rp ), where


Y2,d − βX1,d
R1 =
X2,d − αX1,d


Ys+1,d − βXs,d


s−1
⎟, s = 2, . . . , p − 1 ,

Rs = ⎜
⎝X
γk ∇Xs−k ⎠
s+1,d − αXs,d −

and the (p − 1) × p matrix S = (S1 , . . . , Sp ), where
Sk′ = (∇Xs , . . . , ∇X1 , 0, . . . , 0), k = 1, . . . , p − 1
Sp′ = √

n

1
(∇Xt−1 , . . . , ∇Xt−p+1 ).
n − p t=p+1

Further define the p × 1 vector Lt = (∇Xt−1,μ̂ , . . . ,
∇Xt−p+1,μ̂ )′ . Let Ar2 , Ac2 , A22 denote the second row, the second column, and the (2, 2) entry of a 2 × 2 matrix A, respectively, the WLSRL estimator

k=1

and

n

1
Rp = √
rt where
n − p t=p+1


Yt,d − βXt−1,d


p−1
⎟.

rt = ⎜
⎝ X − αX
γk ∇Xt−k ⎠
t,d
t−1,d −

t=p+1

+

 , β̂, α̂, γ̂ ) be any consistent estimators of (, β, α, γ ).
Let (
From Lemma 1 of Chen and Deo (2010), the WLSRL of model
(3.3) based on the observations (Yt , Xt )′ , t = 1, . . . , n is given
by

n





× 2

+

(A.1)

where
QOLS (β, α, γ ) =


p


ˆ −1 ǫt,μ̂ (β, α, γ ),
ǫt,′ μ̂ (β, α, γ )

ij

n


i,j =1





 −1 Zt,μ̂ Xt−1,μ̂

−1
r2

Zt,μ̂ Lt



⎫−1


ij
Ui Sj′ +Uj Si′ ⊗ Ŵc2 ⎬
 ′
 ij
Si Sj + Sj Si′ Ŵ22 ⎭



⎫
Ŵ ij (VZ,i Uj + VZ,j Ui ) ⎬
( ij 
)′
,

Ŵr2 VZ,i Sj′ + VZ,j Si′

(A.2)

n and VZ,i is
where Ŵ ij is the (i, j ) block, a 2 × 2 matrix, of W
the ith column of VZ . Let

t=2

n vec{R(β, α, γ )},
QW (β, α, γ ) = vec{R(β, α, γ )}′ W

−1
Lt L′t 22

(Ui Sj + Uj Si ) ⊗ Ŵr2

t=p+1

p


−1
c2
Xt−1 L′t ⊗ 

2Ui Uj Ŵ ij

i,j =1

k=1

Qn (β, α, γ ) = QOLS (β, α, γ ) + QW (β, α, γ ),


)′
(β̂WLS , β̂WLS , γ̂WLS


n
⎨ 
2  −1

Xt−1
= 2
−1
r2

Xt−1 Lt ⊗ 

UX =



U
S



= (UX,1 , . . . , UX,p ) and

LX,t =



Xt−1,μ̂
Lt



,

Chen, Deo, and Yi: Uniform Inference in Predictive Regression Models

Proof of Lemma 1. The restricted estimate of α in (3.1) is

the WLSRL estimator under H0 : β0 is given by

 ⎧
p
n
⎨ 
0


α̂WLS
′ 


+
L
L
UX,i UX,j
2
=
22
X,t
X,t
0

γ̂WLS
t=p+1

 ij

+ UX,j UX,i
Ŵ22

⎫−1



0
= α0 + (1 − ρ 2 )Tn0 + ρ̂Sn0 + Rn0 ,
α̂WLS

where

i,j =1

×





n


2

t=p+1

−1
r2

Zt,μ̂ LX,t


p


( ij 

)



+
+ VZ,j UX,i
Ŵr2 VZ,i UX,j
.


(A.3)

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016

Throughout
! this appendix, we will use the summation nota!
tion t for nt=2 unless stated otherwise.

Proof of Theorem 1. Without loss of generality, we will
prove for β = 0. Following from Phillips (1987), Phillips and
Magdalinos (2007), and Giraitis and Phillips (2012),



2
Xt−1,
Xt−1,d = Op (kn n + n). (B.1)
μ̂ = Op (nkn ),
t

Using the matrix inversion formulas and letting
2
!
 
n )
W
kn
det(
s Xs−1,d
,
ηn =
! 2
= Op
n−1
n2
t Xt−1,μ̂

n =

t

2
Xt−1,
μ̂



1+

1
n−1

!

!

t

s

Xs−1,d

2
Xt−1,
μ̂


× (ŵ11 + ŵ22 + 2ρ̂ ŵ12 + ηn )

2

!

s Xs−1,d
! 2
(1−ρ̂ 2 )ŵ22
t Xt−1,μ̂ +
n−1

!

t Yt,d
!
2 .
s Xs−1,d

Let Sn and Tn be the same as in (B.2), and define
! 2
t Xt−1,μ̂
gn := !

2
(1−ρ̂ 2 )ŵ22 !
2
t Xt−1,μ̂ +
s Xs−1,d
n−1

1 + Op (kn /n),
λ ∈ [0, 1)
,
=
gρ,c + Op (n−3/2 ), λ = 1

(B.3)

by (B.1), Lemma 2 and Corollary 2. We obtain
(B.4)

The remainder
term Rn0 = Op (1/n) if λ ∈ [0, 1/2] and Rn0 =

Op (1/kn n) if λ ∈ [1/2, 1]. The limiting distribution of
0

(α̂WLS
− α0 follows from from (B.4) and Lemma 3.
Proof of Theorem 2. We will prove only for Q , the proof
of L will follow by the same proof in combination with Lemma
2 and Corollary 2. For algebraic convenience, we use the following notations:
̺n =


t

2
Xt−1,
μ̂ ,

and ςn = √

1
n−1



Xs−1,d .

s

The second partial derivatives of Qn (β, α) are
∂ 2 Qn (β, α)
2̺n
2̺n
=
+ 2ŵ11 ςn2 =
+ Op (1),
2
2
∂β
1 − ρ̂
1 − ρ̂ 2

.

With detailed algebraic calculation and writing ut = ρvt + et ,
(β̂WLS , α̂WLS − α0 )′ = (Sn + ρ̂Tn , Tn )′ + Rn′ ,

=

ŵ12
n−1

0
α̂WLS
− α0 = (1 − ρ̂ 2 gn )Tn + ρ̂gn Sn + Rn0 .

by (B.1), Lemma 2, and Corollary 2, the WLS estimate in (2.3)
becomes



β̂WLS
1 
ˆ −1
n−1 
=
I + ηn W
n
α̂WLS



n  
W

×
Xs−1,d Zt,d ,
Xt−1,μ̂ Zt,μ̂ +
n−1 s t
t


t

Rn0

APPENDIX B: PROOF

where

!

ŵ22 !
Xt−1,μ̂ vt,v + n−1
s,t Xs−1,d (Xt,d − α0 Xt−1,d )
=
,

2
! 2
(1−ρ̂ 2 )ŵ22 !
s Xs−1,d
t Xt−1,μ̂ +
n−1
!
t Xt−1,μ̂ et,μ̂
0
Sn = !

2
(1−ρ̂ 2 )ŵ22 !
2
s Xs−1,d
t Xt−1,μ̂ +
n−1

Tn0

and

i,j =1

t

531

(B.2)

where

!
t Xt−1,μ̂ et,μ̂
Sn = !
,
2
t Xt−1,μ̂
!
ŵ22 !
t Xt−1,μ̂ vt,v + n−1
s,t Xs−1,d (Xt,d − α0 Xt−1,d )
Tn =
,

2
! 2
ŵ22 !
s Xs−1,d
t Xt−1,μ̂ + n−1

and the remainder
√ term Rn = Op (1/n) if λ ∈ [0, 1/2], and
Rn = Op (1/kn n) if λ ∈ [1/2, 1] by (B.1), Lemmas 2 and
Corollary 2. The theorem follows from Giraitis and Phillips
(2012), the theorem follows from Phillips (1987) and Phillips
and Magdalinos (2007).


∂ 2 Qn (β, α)
2ρ̺̂n
2ρ̺̂n
= −
+ 2ŵ12 ςn2 = −
+ Op (kn ),
2
∂α∂β
1 − ρ̂
1 − ρ̂ 2

∂ 2 Qn (β, α)
2̺n
=
+ 2ŵ22 ςn2 ,
∂α 2
1 − ρ̂ 2

by (B.1), Lemma 2 and Corollary 2. With a Taylor expansion of
Qn ,


0
n = Qn 0, α̂WLS
− Qn (β̂WLS , α̂WLS )
 2
1 ∂ Qn (β̂WLS , α̂WLS )
=
(β0 − β̂WLS )2
2
∂β 2
2∂ 2 Qn (β̂WLS , α̂WLS )
0
(β0 − β̂WLS )(α̂WLS
− α̂WLS )
∂α∂β

∂ 2 Qn (β̂WLS , α̂WLS ) 0
2
(B.5)
+
(α̂WLS − α̂WLS ) .
∂α 2
+

532

Journal of Business & Economic Statistics, October 2013

0
0
Writing α̂WLS
− α̂WLS = (α̂WLS
− α0 ) − (α̂WLS − α0 ) and using
(B.2) and (B.4), Equation (B.5) becomes

̺n
2ρ̺̂n
{Sn + ρ̂Tn }2 −
{Sn + ρ̂Tn } (−ρ̂ 2 gn Tn + ρ̂gn Sn )
1 − ρ̂ 2
1 − ρ̂ 2
+
=



Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016

and furthermore,


(1 − ρ̂ 2 gn )̺n 2
Sn
̺n + ŵ22 ςn2 ρ̂ 2 gn Tn2 +
1 − ρ̂ 2

+
=

̺n + (1 − ρ̂ 2 )ŵ22 ςn2
(−ρ̂ 2 gn Tn + ρ̂gn Sn )2
1 − ρ̂ 2

2(1 − ρ̂ 2 gn )ρ̺̂n
Sn Tn
1 − ρ̂ 2


⎨


̺n + ŵ22 ς 2 ρ̂gn1/2 Tn +

Wn (ρ, 0, α)
⎧ −1+2λ 2
{c (1 − ρ 2 )}−1  + O(n−1+λ ), λ ∈ [0, 1/2)

⎨n
λ = 1/2
,
= {1 + c2 (1 − ρ 2 )}−1  + O(n−1/2 ),


 + O(n−λ ),
λ ∈ (1/2, 1]


0 0
.
where  =
0 1

⎫2
(1 − ρ̂ 2 gn )̺n ⎬
Sn .

1 − ρ̂ 2

The limiting distribution of n follows from Lemmas 4
and 5 .


Proof of Theorem 4. Let βn = b/ nkn and α̂β be the
WLSRL estimator of α for fixed β. Now,
0=
=

Lemma 2. Let Wn (ρ, β, α) be defined as (Corollary 2.2) and
α = 1 − c/nλ , λ ∈ [0, 1]. If β = 0, then

O(n−2+2λ ),
λ ∈ [0, 1/2]
det(Wn ) =
−1
−2λ
(n − 1) + O(n ), λ ∈ (1/2, 1]

Proof. With a few algebraic steps,

Wn (ρ, β, α) =


∂Qn (β0 , α̂, β0 ) ∂Qn (βn , α̂, βn )

∂α
∂α

1 + (n − 1)(1 − α)2

× −ρ − (n − 1)
{(1 − α)(ρ − β)}


∂ 2 Qn
∂ 2 Qn
(β0 − βn ) +
(α̂β0 − α̂βn ) + op ( nkn ).
2
∂α∂β
∂α
b ∂ 2 Qn * ∂ 2 Qn +−1
nkn ∂α∂β ∂α 2

b
ρ̂gn + op (1/ nkn ).
= −√
nkn

α̂β0 − α̂βn = − √

= (n − 1){(1 + 2ρβ + β 2 ) − 2ρ(β + ρ)(1 − α)

+(1 − α)2 } + (1 − ρ 2 ){1 + (n − 1)2 (1 − α)2 }

Under HA : β = βn , the expressions of (B.2) and (B.4) are valid
with β0 being replaced by βn , thus, together with the above
equation,

We have, when β = 0,


det Wn−1 (ρ, 0, α)

= (n − 1){(1 − 2ρ 2 (1 − α) + (1 − α)2 }

0
α̂WLS
− α̂WLS = α̂β0 − α̂βn + α̂0 − α0 − (α̂WLS − α0 )


ρ̂gn b
= −ρ̂ gn Tn + ρ̂gn Sn − √
+ op (1/ nkn )
nkn
2

b
− (Sn + ρ̂Tn ).
nkn

Plugging the above equations into (B.5) and ignoring the smaller
order terms,


⎬2

⎨ ρ̺1/2 1 − ρ̂ 2 g
2
(1 − ρ̂ gn )̺n
b
n
n

Sn + √
Tn +
n =
2
2

1

ρ̂
nk
1 − ρ̂
n ⎭


1/2

Tn +
= ρ̂gn1/2 ̺n + w22 ςn2

1/2

+

gn

(1 − ρ̂ 2 gn )̺n
Sn
1 − ρ̂ 2


1/2 2
b
̺n + w22 ςn2
,

nkn

The theorem follows from Lemmas 4 and 5 .

and,

+ (1 − ρ 2 ){1 + (n − 1)2 (1 − α)2 }

(n − 1)2−2λ c2 (1 − ρ 2 ) + O(n),






= (n − 1){1 + c2 (1 − ρ 2 )},


(n − 1)(1 + O(n1−2λ )),





Wn (ρ, 0, α) =



1

det Wn−1 (ρ, 0, α)

The lemma follows immediately.





α = 1 − c/nλ ,

λ ∈ [0, 1/2)

α = 1 − c/ n ,
α = 1 − c/nλ ,
λ ∈ (1/2, 1]

1 + c2 /n
−ρ(1 + c)


−ρ(1 + c)
.
n


Corollary 1. Let Wn (ρ, β, α) be defined as 2.2, then the expressions of Wn in Lemma 2 hold for any value of β.

because
1/2

(1 − ρ̂ 2 gn )̺n
.
= gn1/2 ̺n + w22 ςn2
2
1 − ρ̂


−ρ − (n − 1){(1 − α)(ρ − β)}

1 + (n − 1)(1 − 2βρ + β 2 )

and the determinant of Wn−1 (ρ, β, α),


det Wn−1 (ρ, β, α)

We get, from the above equation,

β0 − β̂WLS = β0 − βn − (β̂WLS − βn ) = − √

1


det Wn−1 (ρ, β, α)



Corollary 2. Let ρ̂ be a consistent estimate of ρ and β̂ be
a consistent estimate of β. Furthermore, let α̂ be a consistent
estimate of α such that n(1+λ)/2 |α̂ − α| = Op (1) where α =
1 − c/nλ , λ ∈ [0, 1]. Then the expression of Wn in Lemma 2
holds for Wn (ρ̂, β̂, α̂) holds with the order of remainder terms
O(·) being replaced by Op (·).

Chen, Deo, and Yi: Uniform Inference in Predictive Regression Models

Lemma 3. Let w22 be the (2, 2)th entry of Wn (α, β, ρ) in
(2.2),
*!
!
2 +
w22
X
X

α
X

s−1,d
t,d
s−1,d
s,t
s
Xt−1,μ̂ vt,μ̂ +
n−1
t
!
λ
λ ∈ [0, 1/2]
t Xt−1,μ̂ vt,μ̂ + Op (n ),
= !

λ ∈ [1/2, 1]
t Xt−1 vt + Op ( n),

and



2
Xt−1,
μ̂

t

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016

=

!

!

+

t

w22

!

Xs−1,d
n−1
s

2

2

Xt−1,
μ̂ + Op (n ),

λ ∈ [0, 1/2]

+ Op (n

λ ∈ [1/2, 1]

2
t Xt−1

1/2+λ

),

.

s,t

s



{vt − (1 − α)X1 }
=
(Xs − X1 )
s

Xs−1,d

s




2

⎪ N(0, (1 − α )), α is fixed , kn = 1

α = 1 − c/kn , kn = nλ , λ ∈ (0, 1)
× N(0, 2c),



⎩  J (r)2 dr τ , α = 1 − c/n .
c
c

Proof. The lemma follows from Lemma 3, Phillips (1987),
and Phillips and Magdalinos (2007).


t

ACKNOWLEDGMENTS

and

s

then

nkn Tn −−−→
D

O(n−1+2λ ), when λ ∈ [0, 1/2] and w22 = 1 + o(1), when λ ∈
(1/2, 1], and the fact that

2


Xs−1,d Xt,d − α
Xs−1,d

×

! 2
→p (1 − α 2 )−1 . When
When α is fixed, n−1 t Xt−1,
μ̂ !
2
λ
−1
α = 1 − c/n with λ ∈ (0, 1), (nkn )
1/2c from
t Xt−1 →p!
Phillips and Magdalinos (2007), and (n − 1)−1 t Xt−1 =
Op (kn2 /n) from Giraitis and Phillips (2012), thus
1  2
1  2
1
.
Xt−1,μ̂ =
Xt−1 (1 + op (1)) −→p
nkn t
nkn t
2c
! 2
Furthermore,
when
α = 1 − c/n,
n−2 t Xt−1,
μ̂ ⇒

2
Jc mu (r) dr.

Lemma 5. Let w22 be the (2, 2)th entry of Wn , and
!
w22 !
t Xt−1,μ̂ vt,v + n−1
s,t Xs−1,d (Xt,d − α0 Xt−1,d )
Tn =
,

2
! 2
w22 !
s Xs−1,d
t Xt−1,μ̂ + n−1

Proof. The lemma follows from Lemma 2 that w22 =




533

2

=




Xs−1

s

Xs−1 + (n − 1)

2

X12

2

Chen’s research was supported by NSF grant DMS-1007652.
− (n − 1)X1

[Received January 2013. Revised June 2013.]

.
REFERENCES


Lemma 4. Let et = ut − ρvt and
!
t Xt−1,μ̂ et,μ̂
,
Sn = !
2
t Xt−1,μ̂

then

nkn Sn −−−→
D

N(0, (1 − ρ 2 )(1 − α 2 )), α is fixed, kn = 1





2

α = 1 − c/kn , kn = nλ , λ ∈ (0, 1)
⎨N(0, 2c(1 − ρ )),

−1/2
2


(r)dr
α = 1 − c/n .
Jc,μ




⎩ 
1 − ρ 2 Z,
Proof. Since et,μ̂ is independent of Xt−1,μ̂ ,

!

Xt−1,μ̂ et,μ̂
-Xt−1,μ̂ ⇒ Z .

!
2
2
(1 − ρ ) t Xt−1,μ̂
t

Bobkoski, M. J. (1983), “Hypothesis Testing in Nonstationary Time Series,”
Ph.D. Thesis, The University of Wisconsin. [527]
Cavanagh, C. L., Elliott, G., and Stock, J. H. (1995), “Inference in Models With
Nearly Integrated Regressors. Trending Multiple Time Series,” Econometric
Theory, 11, 1131–1147. [527,528]
Chen, W., and Deo, R. (2009a), “Bias Reduction and Likelihood Based AlmostExactly Sized Hypothesis Testing in Predictive Regressions Using the Restricted Likelihood,” Econometric Theory, 25, 1143–1179. [525,526]
——— (2009b), “The Restricted Likelihood Ratio Test at the Boundary in
Autoregressive Series,” Journal of Time Series Analysis, 30, 618–630. [526]
——— (2010), “Weighted Least Squares Approximate Restricted Likelihood
Estimation for Vector Autoregressive Processes,” Biometrika, 97, 231–237.
[525,526,529,530]
——— (2012), “The Restricted Likelihood Ratio Test for Autoregressive Processes,” Journal of Time Series Analysis, 33, 325–339. [526]
Giraitis, L., and Phillips, P. C. B. (2012), “Mean and Autocovariance Function
Estimation Near the Boundary of Stationarity,” Journal of Econometrics,
169, 166–178. [526,531,533]
Jansson, M., and Moreira, M. (2006), “Optimal Inference in Regression Models
With Nearly Integrated Regressors,” Econometrica, 74, 681–714. [525,529]
Larsson, R. (1995), “The Asymptotic Distributions of Some Test Statistics in
Near-Integrated AR Processes,” Econometric Theory, 11, 306–330. [527]
Phillips, P. C. B. (1987), “Towards a Unified Asymptotic Theory for Autoregression,” Biometrika, 74, 535–547. [527,531,533]
Phillips, P. C. B., and Magdalinos, T. (2007), “Limit Theory for Moderate
Deviations From a Unit Root,” Journal of Econometrics, 136, 115–130.
[531,533]