07350015%2E2013%2E827985
Journal of Business & Economic Statistics
ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20
Long-Horizon Return Regressions With Historical
Volatility and Other Long-Memory Variables
Natalia Sizova
To cite this article: Natalia Sizova (2013) Long-Horizon Return Regressions With Historical
Volatility and Other Long-Memory Variables, Journal of Business & Economic Statistics, 31:4,
546-559, DOI: 10.1080/07350015.2013.827985
To link to this article: http://dx.doi.org/10.1080/07350015.2013.827985
Accepted author version posted online: 08
Aug 2013.
Submit your article to this journal
Article views: 167
View related articles
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]
Date: 11 January 2016, At: 22:19
Long-Horizon Return Regressions
With Historical Volatility and Other
Long-Memory Variables
Natalia SIZOVA
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
Department of Economics, Rice University, Houston, TX 77251 ([email protected])
The predictability of long-term asset returns increases with the time horizon as estimated in regressions
of aggregated-forward returns on aggregated-backward predictive variables. This previously established
evidence is consistent with the presence of common slow-moving components that are extracted upon
aggregation from returns and predictive variables. Long memory is an appropriate econometric framework
for modeling this phenomenon. We apply this framework to explain the results from regressions of returns
on risk measures. We introduce suitable econometric methods for construction of confidence intervals and
apply them to test the predictability of NYSE/AMEX returns.
KEY WORDS: Long-range dependence; Return predictability; Spurious regression.
1.
INTRODUCTION
Short-term asset returns (e.g., monthly) appear to be largely
unpredictable. At the same time, a number of studies have
demonstrated more significant predictability in long-term returns (e.g., annual). This increase in predictability for the aggregated returns occurs naturally if the predictive variables are
persistent. It has become standard practice to model persistent
predictive variables as stationary autoregressive processes with
the high first autocorrelations (see Stambaugh 1999; Boudoukh,
Richardson, and Whitelaw 2008). More formally, such processes are modeled as nearly integrated (see Phillips 1988;
Valkanov 2003).
In this article, we compare the implications of this accepted
model of persistence in predictive variables to the implications
from an alternative, long-memory, model that has received less
attention in the long-run predictability literature. Note that if the
predictive variables are modeled as nearly integrated, then this
assumption leads to exponentially decaying autocorrelations.
Therefore, this model may be inadequate in at least two cases.
The first case occurs when the model is used to predict the effects
of small shocks to returns that decay at a rate that is appreciably slower than exponential. The second case occurs when the
predictive variable is only modestly persistent but may contain
several slowly moving components that become manifest only
upon aggregation. These two cases, on the other hand, fit naturally within the framework with fractionally integrated (i.e.,
long-memory) predictive variables (see Baillie 1996). As we
will show, fractional integration has implications for the return
predictability at different forecasting horizons as well as for the
properties of the sample statistics in long-horizon regressions
(i.e., regressions with long-term returns).
In our measurement procedure, we focus on regressions with
two-way aggregation of the regressor and regressand, as proposed by Bandi and Perron (2008). One of the motivations for
this choice is the extraction of the long-run signals from both
the returns and predictive variables. We show that the population R 2 in these long-horizon regressions converges to zero as
the horizon increases unless the predictive variables are frac-
tionally integrated. Therefore, although an increase in return
predictability occurs for highly persistent (nearly integrated)
short-memory processes, extreme persistence of the variables
would be required for the longest horizons. Such persistence is
not observed for, for example, financial volatility, term spreads,
or unemployment rates. However, the behavior of these variables may be consistent with the presence of long memory. We
focus on the volatility of a broad stock market index, which
is a particularly relevant example because, in principle, it depends on the same variances of the fundamental shocks that
constitute the equity premium (e.g., Merton 1973; Campbell
and Cochrane 1999; Bansal and Yaron 2004). The return predictability by other measures of risk should also be analyzed
within the same long-memory framework.
We present three results in this article. First, we revisit the
model with nearly integrated predictive variables and adapt it
to the case with volatility as a regressor. In particular, we explicitly account for the heteroscedasticity in the returns and the
modest size of the observed predictability over short horizons.
We show that this model cannot fully account for the empirical evidence in the data, such as the magnitude of the return
predictability over the longest horizons. Second, we consider
the predictability in the long-memory framework. We find that
the increasing patterns of R 2 as a function of the horizon are
inherent in this framework. However, we find that the return
predictability (when present) is underestimated in small samples. We derive asymptotic distributions for both models, and
the resulting confidence intervals are then applied in our empirical study. For the volatility, we find that the long-memory
model produces wider confidence intervals compared with the
model with nearly integrated regressors. We, however, confirm
the predictability evidence in Bandi and Perron (2008) for the
longest horizons of 9 and 10 years.
546
© 2013 American Statistical Association
Journal of Business & Economic Statistics
October 2013, Vol. 31, No. 4
DOI: 10.1080/07350015.2013.827985
Sizova: Long-Horizon Return Regressions With Historical Volatility and Other Long-Memory Variables
The article is organized as follows. Section 2 documents the
empirical facts against which we check the validity of our econometric frameworks. Section 3 outlines a short-memory framework with a nearly integrated volatility and compares the implications of this model with the empirical facts. Section 4 outlines
a new framework in which the predictive variable is assumed to
follow a long-memory process. Finally, we test the predictability of NYSE/AMEX returns using our asymptotic results in
Section 5.
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
2.
STYLIZED EMPIRICAL FACTS
There are two pronounced empirical facts that are hard to
replicate using existing models. First, the data suggest that returns are highly predictable over long horizons. Second, the predictability of the predictive variables themselves is quite low. For
illustration, we reproduce the results of the article by Bandi and
Perron (2008), which analyzes the predictability of long-term
NYSE/AMEX returns using volatility, and extend the original
results to include 2011 data.
The data are constructed as follows. The
variable to be prerf
e
t
= M
dicted is the monthly excess return, Rt,t+1
j =1 rt,j − rt,t+1 ,
where rt,j is the jth continuously compounded daily return (including dividends) during the tth month for the NYSE/AMEX
index. These data were provided by the Center for Research
in Security Prices (CRSP)/Wharton Research Data Services
(WRDS) and cover the period from January 1952 to Decemrf
ber 2011. The risk-free rate rt,t+1 is from the CRSP “Fama
Risk-Free Rates” data file. This dataset covers the period from
January 1952 to December 2011, and is based on the prices of
1 month T-bills. Mt is the number of observations in month t.
The goal is to
forecast future long-term returns over H perie
e
ods, Rt,t+H
= H
i=1 Rt+i−1,t+i , using the data on past realized
variance (volatility) in the market:
RVt−H,t =
H
i=1
RVt−H +i−1,t−H +i ,
(1)
where
RVt,t+1 =
Mt
2
rt,j
.
(2)
j =1
Note that we use the same horizon H for the regressor RVt−H,t
e
and for the return Rt,t+H
. This is the diagonal of the matrix
reported by Bandi and Perron (2008); note that their study also
provides results for different horizons of returns and volatilities.
We focus only on one subset of these results because Bandi and
Perron (2008) demonstrated that predictability is generally at its
highest for this diagonal.
To evaluate the predictability of returns at different horizons,
we run two types of regressions:
Regression A
e
= ar + br RVt−H,t + urt+H |t ,
Rt,t+H
(3)
RVt,t+H = aσ + bσ RVt−H,t + uσt+H |t ,
(4)
Regression B
547
and record the corresponding correlations:
⎛
T
−H
e
e
1
e
Rt,t+H − Rt,t+H
ρˆ Rt,t+H = ⎝
T − 2H t=H
⎞
× (RVt−H,t − RVt−H,t )⎠
⎛
⎝
×
T
−H
2
e
1
e
Rt,t+H − Rt,t+H
T − 2H t=H
1
T − 2H
T
−H
t=H
⎞
(RVt−H,t − RVt−H,t )2 ⎠, (5)
e
e
where Rt,t+H
and RVt−H,t are the averages of Rt,t+H
and
RVt−H,t over the sample that starts with observation H and
ends at T − H . We use symbol for the correlation ρ in (5) to
indicate that this coefficient is calculated using sample data, in
contrast to
e
e
e
ρ Rt,t+H
= cov Rt,t+H
, RVt−H,t var Rt,t+H
var(RVt−H,t ),
which, if it exists, is a nonrandom number.
Analogously, for regression B we have
⎛
T
−H
1
⎝
(RVt,t+H − RVt,t+H )
ρ(RV
ˆ
t,t+H ) =
T − 2H t=H
⎞
× (RVt−H,t − RVt−H,t )⎠
⎛
⎝
×
T
−H
1
(RVt,t+H − RVt,t+H )2
T − 2H t=H
⎞
T
−H
1
(RVt−H,t − RVt−H,t )2 ⎠. (6)
T − 2H t=H
We work with correlations to preserve the information
about the sign of the relationship. The usual regression coefficients of determination (R 2 ) can be obtained simply as
e
2
(ρ(RV
ˆ
ˆ t,t+H
))2 × 100%. The coeffit,t+H )) × 100% and (ρ(R
cients of determination for regressions A and B are reported in
panel I(a) of Figure 1, and the corresponding correlations are
reported in panel II(a). Whereas ρˆ for regression A increases
monotonically from approximately zero at the 1 month horizon
to 0.81 at the 10 year horizon, ρˆ for regression B decreases. The
percentage of the explained variation in RVt,t+H changes from
25% to nearly zero. The return predictability therefore appears
to increase with the time horizon, whereas the predictability of
the predictive variable itself seems to disappear.
A similar pattern is observed when we replace the realized volatility with another standard predictor of the long-term
returns—the dividend yield, as shown in panels I(b) and II(b) of
Figure 1. We continue with two-way aggregation of the regressor and returns, as in the volatility case. However, a difference in
548
Journal of Business & Economic Statistics, October 2013
Panel I(b)
Panel I(a)
100%
100%
Return Predictability, R2
Volatility Predictability, R
80%
60%
40%
40%
20%
20%
0
20
40
60
80
Horizon (months)
Div.Yield Predictability, R2
80%
60%
0%
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
Return Predictability, R2
2
100
120
0%
0
Panel II(a)
20
40
60
80
Horizon (months)
100
120
Panel II(b)
1
1
Return Predictability, ρ
Volatility Predictability, ρ
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0
20
40
60
80
Horizon (months)
Return Predictability, ρ
Div.Yield Predictability, ρ
0.8
100
120
0
20
40
60
80
Horizon (months)
100
120
Figure 1. Sample R 2 and correlations for 1952–2011, NYSE/AMEX returns. Panel I(a) shows the sample R 2 in the regressions of excess
returns on past volatility and future volatility on past volatility. Panel I(b) shows the corresponding values when the dividend yield is used as the
predictor. Panels II(a,b) display the corresponding sample correlations. The forecasting horizon is indicated on the OX axis.
interpretation from the previous case should be noted. While the
aggregated variance over H periods, RVt−H,t , is also a measure
of the H-period variance, there is no economic motivation for
the aggregation of the dividend yields. However, the aggregation
can be interpreted statistically as a signal extraction procedure.
This method is suitable under the assumption that the predictive variable contains uninformative short-run noise, which is in
effect removed through aggregation.
Figure 2 provides further details of the mechanism behind
the high return predictability in long-horizon regressions by
plotting future aggregated returns against the past aggregated
realized variance. The figure shows how by increasing the horizon H, we reveal a linear relation between returns and variance
at H = 10 years from the data, which, at monthly horizons,
does not contain any apparent information regarding the risk
relation. This exercise was performed by Bandi and Perron
(2008) using a dataset that did not include the stock market
crash of 2008 and led to the same findings. The fact that inclusion of this new data did not seem to change the conclusions
speaks to the robustness of the relation between the variance and
returns.
3.
ECONOMETRIC FRAMEWORK: NEARLY
INTEGRATED PREDICTOR
It has been demonstrated that the analysis of long-horizon
regressions such as (3) and (4) requires special methods that
account for small-sample effects, which are exacerbated by the
persistence of the regressors (see Ventosa-Santaul`aria 2009).
To account for the persistence, Valkanov (2003) modeled predictive variables as nearly integrated processes. He developed
asymptotic results specifically for long-horizon regressions of
asset returns.
Our first goal is to check whether this type of model can
capture the long-run predictability pattern shown in Figure 1.
We extend the model developed by Valkanov (2003) to volatility regressors to capture various well-known facts regarding the
Sizova: Long-Horizon Return Regressions With Historical Volatility and Other Long-Memory Variables
H = 12 months
20
Excess Return from t to t + H
Excess Return from t to t + H
H = 1 month
10
0
−10
−20
−30
0
200
400
600
60
40
20
0
−20
−40
−60
−80
800
0
Return Variance from t − H to t
500
50
0
−50
500
1000
1500
1500
2000
2500
H = 120 months
Excess Return from t to t + H
Excess Return from t to t + H
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
H = 60 months
0
1000
Return Variance from t − H to t
100
−100
549
2000
Return Variance from t − H to t
150
100
50
0
−50
500
1000
1500
2000
2500
3000
Return Variance from t − H to t
e
Figure 2. Effect of aggregation on return predictability. The excess returns, Rt,t+H
, are plotted versus RVt−H,H for the NYSE/AMEX index
from 1952 to 2011.
return–variance relationship, including leverage, heteroscedasticity, and positive variance:
e
Rt,t+1
= βσt2 + σt εt+1,1 ,
c
L b(L)(σt − μσ ) = εt,2 ,
1− 1+
T
corr(εt,1 , εt,2 ) = r < 0.
(7)
The vector (εt,1 , εt,2 ) is a martingale difference sequence. The
variances of εt,1 and εt,2 are normalized to one. The process vt =
b(L)−1 εt,2 satisfies a mixing condition from Herrndorf (1984):
vt is a zero-mean strong
sequence with mixing coeffi mixing
1−2/b
α
< ∞, lim supt>1 E|vt |b < ∞
cients αm , such that ∞
m=1 m
for b > 2, and the limit limT →∞ E(1/T ( Tt=1 vt )2 ) exists and
is positive. The initial value of σ0 is a random variable, whose
distribution is independent of T.
For this model, estimated correlations ρˆ in (5) and (6) have
nonstandard asymptotic distributions because σt behaves similar
to a unit-root process for T → ∞. To derive these distributions,
we replace RVt−H,t in the definitions of the sample correlations
ρˆ by its measurement-error-free analog,
integrated variance,
2
which is the sum of past variances, t−1
t−H στ , here denoted as
σ
RVt−H,t . The results remain the same for RVt−H,t , since under
our assumptions RVt,t+H ∼ Op (T 2 ) and RVσt,t+H − RVt,t+H ∼
op (T 2 ).
The standard assumption in the literature on overlapping
observations is that H is a nontrivial portion of the sample.
Formally, it is captured by the condition limT →∞ H /T = λ,
0 < λ < 1/2. Under this assumption, as shown in Theorem 1, ρˆ
in regression A converges to a nondegenerate random variable.
This result closely matches similar findings in Bandi and Perron
(2008, Proposition 1).
Theorem 1. For dynamics (7), the sample correlation ρˆ in
e
on the past integrated vari(5) for the regression of Rt,t+H
t−1 2
e
ˆ t,t+H
)⇒
ance t−H στ converges weakly to the functional ρ(R
Fρ (A, B), where processes A(τ ) and B(τ ) are defined on the
interval [λ, 1 − λ] as follows:
1−λ ζ +λ
τ +λ
1
2
2
J2,s
dsdζ,
(8)
J2,s
ds −
A(τ ) =
1 − 2λ λ
s=ζ
s=τ
τ
1−λ ζ
1
2
2
J2,s
ds −
B(τ ) =
J2,s
dsdζ, (9)
1 − 2λ λ
s=τ −λ
s=ζ −λ
where J2,s is an Ornstein–Uhlenbeck (OU) process driven by a
standard Brownian motion W2,s , dJ2,s = cJ2,s ds + dW2,s , and
Fρ is a functional of two processes that are defined and almost
surely (a.s.) continuous on the interval [λ, 1 − λ],
1−λ
Y (τ )X(τ )dτ
λ
Fρ (Y, X) ≡
.
1−λ 2
1−λ 2
X
(τ
)dτ
Y
(τ
)dτ
λ
λ
[T s]
The process W2,s is a limit of the partial sums T −1/2 t=1
εt,2
when they are appropriately normalized. Therefore, the limit of
550
Journal of Business & Economic Statistics, October 2013
e
ρ(R
ˆ t,t+H
) depends only on the characteristics of εt,2 and not on
εt,1 . For example, this limit remains the same if the shocks to
returns are absent and excess returns are perfectly collinear with
σ
variance, which implies that ρ(RV
ˆ
t,t+H ) should converge to the
same limit, that is,
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
ρˆ RVσt,t+H ⇒ Fρ (A, B).
(10)
The intuition behind this result is similar to that of cointegration.
e
That is, at their limits, the long-term return Rt,t+H
and variance
σ
RVt,t+H can be said to move in unison. Therefore, their long-run
predictabilities measured by ρˆ are the same. Thus, the difference
in sample correlations ρˆ across these two regressions converges
to zero, which clearly is not in line with the empirical observations presented in Section 2. The next modification of the
original model resolves this qualitative mismatch between the
model and the data.
Local-to-Zero Predictability
The modification we suggest concerns the predictability of
returns, namely, the parameter β. The predictability at short
horizons is so small that it can be modeled as a local-to-zero
predictability, that is, β = β0 /T . In this section, we show that
in this case, we can qualitatively match the ρˆ pattern we observe
e
)
in the data. That is, we can obtain high realizations of ρ(R
ˆ t,t+H
σ
e
and a substantial difference between ρ(R
ˆ t,t+H ) and ρ(RV
ˆ
).
t,t+H
The assumption that β = β0 /T allows the effect of the shock
in returns to be significant, even as T → ∞. Thus, the leverage
effect, which is a negative correlation between εt,1 and εt,2 ,
influences the estimated return predictability, in accord with
prior studies (Stambaugh 1999). The following result is proved
using the same arguments as in Theorem 1.
Theorem 2. For dynamics (7), if β = β0 /T , the sample core
relation ρˆ in the regression of Rt,t+H
on the past integrated
t−1 2
variance t−H στ converges weakly to
e
⇒ Fρ (D, B),
ρˆ Rt,t+H
(11)
where the functional Fρ and the process B(τ ) are defined as in
Theorem 1.
Process D(τ ) = β0 ωA(τ ) + C(τ ) is defined on [λ, 1 − λ],
where A(τ ) is also given in Theorem 1. The constant ω =
b(1)−1 , and C(τ ) is a process driven by a standard Brownian
motion, W1,s , given by
C(τ ) =
τ +λ
s=τ
J2,s dW1,s −
1
1 − 2λ
λ
1−λ
ζ +λ
J2,s dW1,s dζ.
s=ζ
(12)
Processes W1,s and W2,s are correlated and d[W1,s , W2,s ] =
rds. The asymptotic distributions of the OLS slope br and
the OLS t-statistic for the slope tbr are ωT br ⇒ Fβ (D, B)
and √tbTr ⇒ Ft (D, B), where the new functionals are defined as
follows:
1−λ
Y (τ )X(τ )dτ
,
1−λ
X2 (τ )dτ
λ
1−λ
1
(Y (τ ) − Fβ (Y, X)X(τ ))2 dτ
Fσe2 (Y, X) =
1 − 2λ λ
1−λ
Fβ (Y, X) λ X2 (τ )dτ
Ft (Y, X) =
.
Fσe2 (Y, X)
Fβ (Y, X) =
λ
The new component C(τ ) in the above theorem depends on
the characteristics of the process εt,1 . Therefore, under locale
to-zero predictability, the limiting distribution of ρ(R
ˆ t,t+h
) also
depends on the characteristics of the shock εt,1 , in particular, on
its correlation with εt,2 . Correlation for the variance regression
σ
ρ(RV
ˆ
t,t+H ) still converges to the same limit as in the case with
a constant β:
(13)
ρ
RVσt,t+H ⇒ Fρ (A, B).
The result of the above modifications to the original Valkanov
(2003) framework is that we can now study the implications of a
short-memory model that takes into account overlapping observations, persistence in the predictive variable, heteroscedasticity
in the returns, and the effect of a negative correlation between
shocks to returns and shocks to variances. To determine if this
general model can match the observed data, we consider reasonable values of c, β0 , ρ, and ω and examine the asymptotic
distribution for sample correlations.
The coefficients are chosen as follows. The persistence parameter, c = (0.7 − 1)T , corresponds to the first autocorrelation of 0.7 for the monthly variances. The value of the slope is
e
e
/var(Rt,t+1
) ≈ 1.88. The parameter ω defines the raβ = Rt,t+1
2
2 2
tio varσ /E σt : the choices ω = 0.0318 and ω = 0.0551 correspond to varσ 2 /E 2 σt2 = 1 and varσ 2 /E 2 σt2 = 9. The leverage
coefficient, namely, the correlation, r, is fixed at −0.76 (A¨ıtSahalia and Kimmel 2007). For comparison, we also report the
case with no leverage, r = 0.
We construct the distribution of the correlation, ρ,
ˆ in regressions A and B using formula (11). The results are reported in
Table 1 for the horizon H = 120 and T = 708 observations, as
in Section 2. Table 1 shows that even asymptotically, the sample
correlations can assume a wide range of values and therefore
provide a poor assessment of the strengths of relationships. For
example, if ω = 0.0551 and r = −0.76, then the 90% probe
) extends from −0.533 to 0.673.
ability interval for ρ(R
ˆ t,t+H
However, note that despite the high dispersion of the estimates,
a value of ρˆ above 0.81 is rarely observed for regression A.
Based on the data in Table 1, the probability of this event is less
than 1% for all of the cases.
Another interesting observation that follows from this table
is that positive and negative relations between the past volatility and future returns are nearly equally likely, in contrast to the
positive relation prescribed by the sign of β0 . Indeed, the median
e
value of ρ(R
ˆ t,t+H
) is nearly zero in all of the cases. The distrie
bution of ρ(R
ˆ t,t+H
) is therefore centered at approximately zero,
as for β0 = 0, implying that 10 year horizon regressions are
uninformative in testing for predictability. Note, however, that
this conclusion is an outcome of the current framework with the
Sizova: Long-Horizon Return Regressions With Historical Volatility and Other Long-Memory Variables
551
Table 1. Correlations ρ:
ˆ short-memory framework with nearly integrated regressors
Percentiles
Case 1: ω = 0.0551, r = −0.76
e
)
ρ(R
ˆ t,t+H
ρ(RV
ˆ
t,t+H )
e
ρ(R
ˆ t,t+H
) − ρ(RV
ˆ
t,t+H )
Case 2: ω = 0.0551, r = 0.0
e
)
ρ(R
ˆ t,t+H
ρ(RV
ˆ
t,t+H )
e
) − ρ(RV
ˆ
ρ(R
ˆ t,t+H
t,t+H )
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
Case 3: ω = 0.0318, r =-0.76
e
)
ρ(R
ˆ t,t+H
ρ(RV
ˆ
t,t+H )
e
ρ(R
ˆ t,t+H
) − ρ(RV
ˆ
t,t+H )
1.0%
5.0%
10.0%
Median
90%
95%
99%
−0.707
−0.820
−0.997
−0.533
−0.716
−0.590
−0.417
−0.643
−0.360
0.125
−0.308
0.424
0.585
0.150
1.051
0.673
0.301
1.192
0.791
0.545
1.413
−0.793
−0.820
−0.847
−0.670
−0.716
−0.523
−0.577
−0.643
−0.357
−0.098
−0.308
0.198
0.443
0.150
0.744
0.559
0.301
0.886
0.724
0.545
1.119
−0.680
−0.820
−1.016
−0.496
−0.716
−0.595
−0.369
−0.643
−0.352
0.167
−0.308
0.466
0.604
0.150
1.103
0.688
0.301
1.243
0.801
0.545
1.462
NOTES: The table reports the percentiles for the correlations ρˆ in regressions (3) and (4). The percentiles are calculated based on formulas (11) and (13), using 100,000 simulations. The
integrals are calculated using 1000 steps per unit interval. It is assumed that the sample consists of 708 monthly observations, and the forecasting horizon is 120 months.
nearly integrated predictor. In the next section, we demonstrate
that this finding is not accidental: for the model considered here,
e
the unobserved true value of ρ(Rt,t+H
) is in fact nearly zero for
long horizons.
4.
ECONOMETRIC FRAMEWORK: LONG MEMORY
How can we explain why the previous framework fails to
e
deliver significant ρ(R
ˆ t,t+H
) values? One explanation is that although a nearly integrated process is almost nonstationary, it is
still a short-memory process in small samples. This implies that
although the autocorrelations of such a process are initially high,
they decay at a rapid (exponential) rate. In contrast to this autocorrelation structure, long-memory processes do not necessarily
exhibit high first autocorrelations, but the effect of shocks tend to
persist over longer periods of time. We address this argument in
this section by allowing for long-range dependence in volatility
Before we turn to the details of the long-memory framework,
one controversy remains to be addressed. Our assumption about
the long-range dependence in variance implies a long-range
dependence in the equity premium and therefore a long-range
dependence in the returns. This may seem to be at odds with
Rogers (1997), who showed how long memory in prices can
cause a violation of the no-arbitrage condition. However, Rogers
(1997, p. 104) also stated that under certain assumptions, stock
prices may exhibit long-range dependence and still satisfy the
no-arbitrage condition. These assumptions hold by default if
the return dynamics are obtained by solving a structural asset
pricing model, and naturally, asset pricing models with riskaverse investors will produce long-memory equity premium if
the volatility of the dividend stream is a long-memory process.
For example, Bollerslev, Sizova, and Tauchen (2012) solved for
asset prices in a long-run risk model with a long-range dependent volatility, and the condition described by Rogers (1997) is
satisfied in their model.
returns grows faster than the variance of unexpected returns.”
In this section, we demonstrate the accuracy of this explanation when the variance exhibits long-range dependence. On
e
) for
the contrary, for short-memory processes, high ρ(R
ˆ t,t+H
10 year horizons cannot be explained by this accumulation of
predictability.
Herein, we define a long-memory process Xt (d) based on the
behavior of its spectral density around zero.
Assumption 1. The spectral density of Xt (d), fx (ω), is
defined, and there exists a positive constant C such that
limω→0 fx (ω)|1 − e−iω |2d = C for some 0 ≤ d < 1/2.
The above assumption defines Xt (d) as a long-memory process if d > 0. The same assumption defines a short-memory
process when d = 0. Suppose the predictive variable (i.e., realized variance) satisfies Assumption 1 with the parameter d ≥ 0,
that is,
RVt−1,t = Xt (d), d ≥ 0.
(14)
Also suppose that the return can be represented as the sum of a
predictable component βXt (d) and the shock εt+1 :
e
Rt,t+1
= βXt (d) + εt+1 ,
(15)
4.1 Long-Memory Framework: Fixed Forecasting
Horizon
where εt+1 also satisfies Assumption 1 with parameter d = 0,
that is, the limit of its spectral density at zero is finite. The results of this section do not change if RVt−1,t is a sum of Xt (d)
and noise, as long as the noise process has the integration order
d ′ ≥ 0 less than d. For example, this accommodates the case
when RVt−1,t is just a proxy for RVσt−1,t in the model for return,
and thus, the difference RVt−1,t − RVσt−1,t is the noise. Also, the
results extend to the case when the returns are predicted by several variables, and Xt (d) is the one with the highest integration
order. Thus, the results of this section hold for the models with
several risk factors, such as those seen in, for example, different
versions of the long-run risk models (e.g., Bollerslev, Tauchen,
and Zhou 2009; Drechsler and Yaron 2011).
From Lemma 2 in Appendix B, it follows that
Fama and French (1988, p. 4) explained long-term return
predictability as a process by which “the variance of expected
Theorem 3. For return dynamics (14) and (15), where Xt (d) is
a long-memory process satisfying Assumption 1, the population
552
e
correlation ρ(Rt,t+H
) defined as
e
e
cov Rt,t+H
, RVt−H,H
ρ Rt,t+H =
e
var Rt,t+H
var(RVt−H,H )
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
converges to (22d − 1) × sign(β) as H → ∞.
e
There is a connection between ρ(Rt,t+H
) as defined above
e
e
) is the limit
and ρ(R
ˆ t,t+H ), defined in (5). Correlation ρ(Rt,t+H
e
of ρ(R
ˆ t,t+H ) as T → ∞ when H is fixed. Thus, the above expression is a sequential asymptotic result.
It follows from Theorem 3 that for all short-memory models,
e
ρ(Rt,t+H
) converges to zero. We therefore expect there to be
no predictability in long-term returns and conclude that high
e
) can arise only due to the high dispersion of the correρ(R
ˆ t,t+H
lation. In a numerical exercise with the model parameters from
e
)| increases up to the medium
Table 1, we found that |ρ(Rt,t+H
horizon of 1 year but declines to zero for longer horizons. These
results can be made available upon request.
However, the logic of Fama and French’s (1988) analysis
still applies to the case of long-memory processes. Due to the
e
)| converges to a posiaccumulation of predictability, |ρ(Rt,t+H
e
tive constant. For example, the limit of ρ(Rt,t+H
) is ±0.815 if
d = 0.43, which is a commonly found value of d for the realized
variance in empirical work.
Long-range dependence in the predictive variables, therefore,
leads to long-run predictability. Nevertheless, we are cautious
in interpreting this finding because the correlation ρ is herein
defined as the limit as the available data span tends to infinity,
while H is held constant. For long-horizon regressions, however,
H becomes a large portion of the total sample. We, therefore,
study the behavior of the estimated ρˆ under the assumption that
H /T converges to λ > 0 in Section 4.2.
4.2 Long-Memory Framework: Increasing
Forecasting Horizon
In this section, we calculate the asymptotic distributions of
correlations ρˆ as normally estimated using sample covariances
e
and RVt,t+H ; see (5) and (6). Simiand variances of Rt,t+H
lar to Valkanov (2003), we assume that observations overlap;
however, in contrast to Valkanov (2003), we do not make the
assumption of the local-to-unity root. We instead assume that
RVt,t+1 exhibits long-range dependence, that is, long memory.
Following the literature on long-horizon regressions, the limite
ing distributions of ρ(R
ˆ t,t+H
) and ρ(RV
ˆ
t,t+H ) are derived under
the assumption that H is large, that is, limT →∞ H /T = λ > 0.
Suppose that we have the same model as in the previous section,
that is, the dynamics of returns and variances are described by
the system of Equations (14) and (15). Again, the results do not
change if the realized variance is the sum of two components,
Xt (d) and a less persistent noise, and if the returns are predicted
by several factors, as long as Xt (d) has the highest integration
order among them. To derive the asymptotic result, we rely
on a more restrictive definition of the stationary long-memory
process based on the limiting distribution of its partial sums.
Assumption 2. For a fixed τ ∈ [0, 1] and 0 < d < 1/2,
[τ T ]
i=1 (Xi (d) − μx )
⇒ σd W(d),τ ,
T 1/2+d
Journal of Business & Economic Statistics, October 2013
var( Ti=1 Xi (d))
, μx = EXt (d), and W(d),τ is
where σd2 = limT →∞
T 1+2d
a Type I fractional Brownian motion.
A fractional Brownian motion of Type I is defined in
Mandelbrot and Van Ness (1968), Tsay and Chung (2000), and
Marinucci and Robinson (1999) as follows:
τ
(1 + 2d)Ŵ(1 − d)
(τ − s)d dW2,s
W(d),τ =
Ŵ(1 + d)Ŵ(1 − 2d) 0
0
(16)
[(τ − s)d − (−s)d ]dW2,s ,
+
−∞
where dW2,s are increments of a standard Brownian motion. Assumption 1 is more general than Assumption 2; for an overview
of different definitions and properties of long-range dependent
processes, see Baillie (1996). We now consider which models
satisfy Assumption 2. Naturally, this assumption is satisfied for
short-memory processes if d = 0 under the general conditions
of the functional central limit theorem. For long-memory processes, a general class that satisfies Assumption 2 is that of
moving-average processes.
Example 1.
Xt (d) = μx +
∞
x
θi εt−i
,
i=0
εtx
where
are iid zero-mean shocks with a finite variance,
and the sequence {θi }∞
i=0 decays hyperbolically (Marinucci and
Robinson 2000). That is, coefficients θi decay as l(i)i d−1 ,
0 ≤ d < 1/2 for i → ∞, where l(.) is a Lebesgue-measurable
slow-varying function, bounded on compact subsets and positive on [a, +∞) for some a > 0.
√
Furthermore, if instead of Xt (d), only Xt (d) can be represented as a such stationary moving-average process, then Xt (d)
may still satisfy Assumption 2.
Example 2.
Xt (d) = μ˜ x +
∞
x
θi εt−i
i=0
2
,
where εtx are iid Gaussian zero-mean shocks, the sequence
{θi }∞
˜ x
= 0.
i=0 satisfies the assumption from Example 1, and μ
The fractional stochastic volatility model suggested by
Comte and Renault (1998), which is one of the few available
continuous-time long-memory models for financial volatility,
satisfies Assumption 2.
Example 3.
Xt (d) =
t
σ 2 (u)du,
t−1
d ln σ (t) = −κ ln σ (t)dt + σ dw(d) (t),
where w(d) (t) is a truncated fractional Brownian motion and κ >
0. This example and the previous one both satisfy the condition
of nonnegativity of the variance and can be proven to satisfy
Assumption 2 using the arguments of Taqqu (1975). Cases that
violate Assumption 2 can also be found in Taqqu (1975).
Sizova: Long-Horizon Return Regressions With Historical Volatility and Other Long-Memory Variables
Theorem 4. For the dynamics (14) and (15), under Assumption 2, if limT →∞ H /T = λ, then
e
ρˆ Rt,t+H
⇒ Fρ (Ad , B d ),
(17)
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
where Ad (τ ) and B d (τ ) are stochastic processes defined on
[0, 1]:
Ad (τ ) = W(d),τ +λ − W(d),τ
1−λ
1
W(d),s+λ − W(d),s ds,
−
1 − 2λ λ
d
B (τ ) = W(d),τ − W(d),τ −λ
1−λ
1
W(d),s − W(d),s−λ ds, (18)
−
1 − 2λ λ
and the process W(d),τ is a Type I fractional Brownian motion
(16).
The above result readily follows from the continuous mapping
theorem (CMT). Note that ρ(RV
ˆ
t,t+H ) converges to the same
limit, as the error term does not matter for the asymptotic dise
). Table 2 lists the asymptotic distribution
tribution of ρ(R
ˆ t,t+H
e
of ρ(R
ˆ t,t+H ) for different levels of data aggregation, λ = H /T .
For example, if H = 120 months and T = 708 months, then λ
is approximately 0.17.
The data reported in Table 2 include both long-memory
(d = 0.43) and short-memory (d = 0) cases. The second cole
) for large horiumn of the table lists the genuine ρ(Rt,t+H
zons H, that is, its limit as H → ∞. If d = 0.43, then based
e
on Theorem 3, limH →∞ ρ(Rt,t+H
) = 0.815. For the shortmemory case, the genuine long-run predictability is zero, that is,
e
) = 0. The next three columns list the median
limH →∞ ρ(Rt,t+H
values and the 5th and 95th percentiles for the asymptotic limits
e
) for large horizons H as T → ∞ and
of the sample ρ(R
ˆ t,t+H
H /T → λ.
A few conclusions follow from Table 2. First, there is a negae
tive bias in ρ(R
ˆ t,t+H
) that increases with the aggregation parame
eter, λ. Indeed, all the median values of ρ(R
ˆ t,t+H
) are well below
e
the value of ρ(Rt,t+H ). In the long-memory case, for time horizons that constitute 1/20 of the sample (approximately 3 years in
e
) is 0.402,
our empirical example), the median value of ρ(R
ˆ t,t+H
e
Table 2. Correlation ρ(R
ˆ t,t+H
): long-memory framework
ρ:
ˆ H /T is fixed
ρ
Long-memory (d = 0.43)
λ = 0.05
λ = 0.10
λ = 0.17
Short-memory (d = 0)
λ = 0.05
λ = 0.10
λ = 0.17
Median
5%
95%
0.815
0.815
0.815
0.402
0.216
−0.081
−0.018
−0.363
−0.675
0.697
0.667
0.652
0.000
0.000
0.000
−0.077
−0.148
−0.306
−0.362
−0.551
−0.697
0.226
0.263
0.270
NOTES: The table reports the percentiles for ρˆ in regression (3) under the assumption that
limT →∞ H /T = λ. The percentiles are calculated based on the long-memory framework
presented in Section 4.2 formula (17), using 100,000 simulations. The integrals are calculated using 1000 steps per unit interval. The first column shows limH →∞ ρ(H ) = (22d − 1),
which is the population equivalent of ρˆ for long horizons.
553
e
which is well below the population value of 0.815. ρ(R
ˆ t,t+H
)
decreases even further for 1/10 of the sample, corresponding to
approximately a 6 year horizon, and drops below zero for 10
year horizons. The implication is that the predictability of returns and variances in small samples is always underestimated.
e
) deSecond, the accuracy of the sample correlation ρ(R
ˆ t,t+H
creases with an increase in the aggregation level λ. For example,
e
) has a
if d = 0.43 and H is 1/20 of the total sample, ρ(R
ˆ t,t+H
90% probability of taking a value between −0.018 and 0.697. If
H is 0.17 of the total sample, then the 90% probability interval
is [−0.675, 0.652].
Finally, despite the sharp decline in accuracy due to aggregating observations, the long-memory model results in a dise
) that is more often positive than negative
tribution of ρ(R
ˆ t,t+H
e
) for
for λ ≤ 0.10. Simultaneously, the distribution of ρ(R
ˆ t,t+H
short-memory models is centered on zero. Thus, in the shortmemory models, we are equally likely to observe positive and
negative correlations between future returns and variances for
long horizons.
Now consider the difference between the regression for returns (3) and the regression for volatilities (4). Up until now,
the unpredictable part of the returns did not affect the result in
(17) because the predictable part dominated the distribution as
T → ∞. As in the short-memory local-to-unity framework, to
ensure that the unpredictable part of the return affects the lime
), we introduce the local-to-zero
iting distribution of ρ(R
ˆ t,t+H
predictability into the model as follows:
β0
Xt (d) + εt+1 .
(19)
Td
Here, Xt (d) is the long-run component of the realized variance,
RVt−1,t , that is, the part of the realized variance with the largest
long-memory parameter, d. Xt (d) again satisfies Assumption
2 with d > 0. εt+1 is a process satisfying the assumptions of
the functional central
theorem, that is, Assumption 2 with
limit
T]
ε
d = 0, so that √1T [τ
t=1 t+1 ⇒ σε W1,τ , forming a multivariate
fractional Brownian motion (mfBm) jointly with the limit of the
partial sums of RVt−1,t . The use of mfBm as a limiting process
and efficient methods for the simulation of mfBm are discussed
in Amblard et al. (2011).
The new result that applies to the dynamics (19) is given by
Theorem 5 and, again, can be obtained by using the CMT.
e
=
Rt,t+1
Theorem 5. For the dynamics (14) and (19), the sample
e
) in regression (3) converges weakly to
ρ(R
ˆ t,t+H
e
⇒ Fρ (D d , B d ),
ρˆ Rt,t+H
(20)
if limT →∞ H /T = λ > 0. D d (τ ) is a stochastic process defined
on [0, 1],
σd
D d (τ ) = β0 Ad (τ ) + W1,τ +λ − W1,τ
σε
λ
1
−
(W1,s+λ − W1,s )ds,
1 − 2λ λ
where W1,τ is a standard Brownian motion, and Ad (τ ) and
B d (τ ) are defined as in Theorem 4. The parameter β0 defines the predictability of returns in (19), and σd is an
asymptotic standard deviation
for the normalized sums of
Xt (d), that is, limT →∞ var(( Tt=1 Xt (d))/T 1/2+d ) = σd2 , and
554
Journal of Business & Economic Statistics, October 2013
Table 3. Correlation ρ:
ˆ long-memory framework with local-to-zero predictability
e
ρ(R
ˆ t,t+H
)
ρ(RV
ˆ
t−H,H )
Percentiles
Median
5%
95%
Sim (95%)
Long-memory
(d = 0.43, r = −0.76, β = 3)
λ = 0.05
λ = 0.10
λ = 0.17
0.402
0.216
−0.081
−0.018
−0.363
−0.675
0.702
0.667
0.652
0.723
0.687
0.686
0.402
0.216
−0.081
−0.018
−0.363
−0.674
0.702
0.667
0.652
0.723
0.687
0.686
−0.080
−0.153
−0.306
−0.362
−0.554
−0.697
−0.080
−0.153
−0.306
−0.362
−0.554
−0.697
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
Long-memory
(d = 0.43, r = 0, β = 3)
λ = 0.05
λ = 0.10
λ = 0.17
Short-memory
√
(d = 0, r = −0.76, β = 0.15)
λ = 0.05
λ = 0.10
λ = 0.17
Short-memory
√
(d = 0, r = 0.0, β = 0.15)
λ = 0.05
λ = 0.10
λ = 0.17
Median
5%
95%
Sim (95%)
0.425
0.470
0.421
0.130
−0.024
−0.334
0.645
0.773
0.842
0.613
0.744
0.823
0.191
0.132
0.008
−0.201
−0.410
−0.690
0.528
0.614
0.695
0.524
0.610
0.686
0.230
0.263
0.270
0.035
0.090
0.155
−0.280
−0.412
−0.499
0.339
0.535
0.691
0.230
0.263
0.270
−0.036
−0.072
−0.111
−0.343
−0.492
−0.695
0.277
0.434
0.541
NOTES: The table reports the percentiles for ρˆ in regressions (3) and (4) under the assumption that limT →∞ H /T = λ and for local-to-zero predictability. The percentiles are calculated
based on the long-memory framework presented in Section 4.2, formula (20), using 100,000 simulations. The integrals are calculated using 1000 steps per unit interval. The last column
e
in each block, for ρ(RV
ˆ
ˆ t,t+H
), provides the simulated 95th percentiles in the long-memory model of Comte and Renault (1998).
t−H,H ) and ρ(R
σε is an asymptotic standard
deviation for normalized sums
of εt+1 , that is, limT →∞ var(( Tt=1 εt+1 )/T 1/2 ) = σε2 . It also
d
d
d
holds that ρ(RV
ˆ
t,t+H ) ⇒ Fρ (A , B ), the OLS slope b√
rT ⇒
σε
F (Dd , Bd ), and the OLS t-statistic for the slope tbr / T ⇒
σd β
Ft (Dd , Bd ).
Note that the leverage coefficient enters the above formulas only implicitly, through the correlation between the
processes W1,τ and W2,τ , where the latter drives W(d),τ in
(16). Let d[W1,τ , W2,τ ] = rdτ . The new parameters that enter
the above formulas, therefore, include the ratio (β0 σd )/σε
and r. The first parameter defines the variance ratio between the predictable and
parts
the return
unpredictable
of
H
β0
−1
=
process, limH →∞ var( H
t=1 T d Xt (d)) × var( t=1 εt+1 )
2 2d
((β0 σd )/σε ) λ . To select the value of (β0 σd )/σε , we match
the order of predictability for monthly and annual horizons.
For example, if (β0 σd )/σε = 3 and 2d ≈ 1, then the return
predictability is 15% for annual returns (λ = 1/60) and 1.25%
for monthly returns (λ = 1/60/12) . These figures are similar
to those reported by Drechsler and Yaron (2011) and Bollerslev,
Tauchen, and Zhou (2009). For the short-memory model, the
asymptotic variance ratio remains the same for all horizons.
To calibrate (β0 σd )/σε for the short-memory case, we match
the same√predictability at annual horizons so that (β0 σd )/σε is
fixed at 0.15.
The second parameter, namely, the correlation r, is most
accurately defined as the asymptotic long-run leverage effect.
Although this r is not necessarily equal to the corresponding
parameter calculated based on high-frequency observations, we
use the estimate of the latter as a proxy for r and fix its value
at −0.76 (A¨ıt-Sahalia and Kimmel 2007). For comparison, we
also report the results for the zero-leverage case (r = 0).
Table 3 reports the percentiles for the asymptotic distribution
e
) and ρ(RV
ˆ
of ρ(R
ˆ t,t+H
t,t+H ) as functions of the aggregation
level λ = H /T . Table 3 is formatted such that it is convenient to compare the short-memory and long-memory cases
and the leverage (r = −0.76) and zero-leverage cases (r = 0).
The first three columns list the percentiles for the distribution of
ρ(RV
ˆ
t,t+H ). Again, we observe that for all cases, the estimated
predictability in volatility is biased downward.
e
The next three columns report the percentiles for ρ(R
ˆ t,t+H
).
e
The very first observation is that the median values of ρ(R
ˆ t,t+H
)
are always larger than the median values of ρ(RV
ˆ
t,t+H ). This
observation agrees with the empirical findings in Section 2. For
example, for the long-memory case with r = −0.76, the mee
ˆ t,t+H
) are −0.081 and
dian ρ(RV
ˆ
t,t+H ) and the median ρ(R
0.421, respectively, for λ = H /T = 0.17. We observe this
positive difference for all the cases; however, the difference
e
) − ρ(RV
ˆ
ρ(R
ˆ t,t+H
t,t+H ) is sufficiently large only if we allow
for the leverage effect. The effect of the leverage on the bias in
regressions has been studied by Stambaugh (1999) for autoregressive AR(1) predictive variables. In our framework, aggregation of the data (with H > 1) and long-range dependence alter
the relation derived by Stambaugh (1999). However, similar
results still hold. In addition to the small-sample bias of Stambaugh (1999), the leverage also produces a large-sample effect
by inducing a negative correlation between the predictable and
unpredictable parts of the return.
The second observation from Table 3 is that the long-memory
case can produce nonnegligible predictability in returns for long
horizons. This follows from the last column of the table, which
e
). For
reports the 95th percentile for the distribution of ρ(R
ˆ t,t+H
example, if λ = 0.17, which approximates the level of the overlap for the 10 year horizon, and r = −0.76, then the probability
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
Sizova: Long-Horizon Return Regressions With Historical Volatility and Other Long-Memory Variables
e
of attaining ρ(R
ˆ t,t+H
) = 0.81 is above 5%. This stands in contrast to the short-memory case, for which, just as in Table 1,
the same probability is less than 1%. To summarize, long memory in volatility combined with the leverage effect generates
high predictability in returns together with low predictability in
volatility itself.
Since our conclusions are based on the magnitudes of the 95th
percentiles, we next verify that the right tails of the asymptotic
distributions are representative of the right tails of the smallsample distributions. The last column in each block of Table 3
reports the 95th percentiles for correlations simulated using a
model in which the regressor is a fractional long-memory process, see Comte and Renault (1998) (for simulations, T = 708
t
d−1
vt dt, where vt is a zeromonths and σt2 = σ02 + −∞ (t−s)
Ŵ(d)
mean affine process), with the same parameters as described
above. As follows from the table, the asymptotic and the simulated percentiles are within 1%–7% of each other for all the
cases, and within 1%–3% for λ = 0.17.
4.3 Generalization to the Multivariate Case
This section briefly discusses how the asymptotic results derived in Sections 3 and 4 can be generalized to the multivariate
setting, in which predictive variables can be of either persistence
type (nearly integrated or fractionally integrated). The assumptions of Sections 3 and 4 can be combined into the following
general condition.
Assumption 3. Suppose that
e
= β ′ ft + εt+1 ,
Rt,t+1
where β ∈ Rp and (ft , εt+1 ) is a random (p + 1)-vector, such
that for any fixed τ ∈ [0, 1], we have
(α)
[τ T ]
fi
i=1
εi
−θ
⇒
υ(τ )
,
u(τ )
(21)
where α = (α1 , . . . , αp+1 ) and θ are constant vectors, (α) =
diag{T −α1 , . . . , T −αp+1 }, and (υ(τ ), u(τ ))′ is a random a.s.
continuous nondegenerate vector-process with p elements in
υ(τ ) and one element
in u(τ ). Define υ μ (τ ) = υ(τ ) −
1−λ
1
υ(τ − λ) − 1−2λ λ (υ(s) − υ(s − λ))ds, uμ (τ ) = u(τ +
1−λ
1
(u(s + λ) − u(s))ds, and let ϒυ,υ =
λ) − u(τ ) − 1−2λ
λ
1−λ
μ
μ
υ (τ )(υ (τ ))′ dτ be a.s. positive definite.
λ
For example, αj = 2 for the square of a nearly integrated predictor under the assumptions in Section 3, αj = 1/2 + d for a
fractionally integrated predictor under the assumptions in Section 4, and αj = 3/2 for all of the nearly integrated regressors
considered by Valkanov (2003). For the last element, αp+1 = 1,
as in Section 3, if the heteroscedasticity in the returns is driven by
a nearly integrated process, and αp+1 = 1/2, as in Section 4, if
the heteroscedasticity in the returns is driven by a long-memory
process.
Let br,j , 1 ≤ j ≤ p be the jth OLS slope in the regression
e
(with intercept) of Rt,t+H
on all of the elements in H
i=1 ft−i .
Let tb,j , 1 ≤ j ≤ p to be the co
ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20
Long-Horizon Return Regressions With Historical
Volatility and Other Long-Memory Variables
Natalia Sizova
To cite this article: Natalia Sizova (2013) Long-Horizon Return Regressions With Historical
Volatility and Other Long-Memory Variables, Journal of Business & Economic Statistics, 31:4,
546-559, DOI: 10.1080/07350015.2013.827985
To link to this article: http://dx.doi.org/10.1080/07350015.2013.827985
Accepted author version posted online: 08
Aug 2013.
Submit your article to this journal
Article views: 167
View related articles
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]
Date: 11 January 2016, At: 22:19
Long-Horizon Return Regressions
With Historical Volatility and Other
Long-Memory Variables
Natalia SIZOVA
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
Department of Economics, Rice University, Houston, TX 77251 ([email protected])
The predictability of long-term asset returns increases with the time horizon as estimated in regressions
of aggregated-forward returns on aggregated-backward predictive variables. This previously established
evidence is consistent with the presence of common slow-moving components that are extracted upon
aggregation from returns and predictive variables. Long memory is an appropriate econometric framework
for modeling this phenomenon. We apply this framework to explain the results from regressions of returns
on risk measures. We introduce suitable econometric methods for construction of confidence intervals and
apply them to test the predictability of NYSE/AMEX returns.
KEY WORDS: Long-range dependence; Return predictability; Spurious regression.
1.
INTRODUCTION
Short-term asset returns (e.g., monthly) appear to be largely
unpredictable. At the same time, a number of studies have
demonstrated more significant predictability in long-term returns (e.g., annual). This increase in predictability for the aggregated returns occurs naturally if the predictive variables are
persistent. It has become standard practice to model persistent
predictive variables as stationary autoregressive processes with
the high first autocorrelations (see Stambaugh 1999; Boudoukh,
Richardson, and Whitelaw 2008). More formally, such processes are modeled as nearly integrated (see Phillips 1988;
Valkanov 2003).
In this article, we compare the implications of this accepted
model of persistence in predictive variables to the implications
from an alternative, long-memory, model that has received less
attention in the long-run predictability literature. Note that if the
predictive variables are modeled as nearly integrated, then this
assumption leads to exponentially decaying autocorrelations.
Therefore, this model may be inadequate in at least two cases.
The first case occurs when the model is used to predict the effects
of small shocks to returns that decay at a rate that is appreciably slower than exponential. The second case occurs when the
predictive variable is only modestly persistent but may contain
several slowly moving components that become manifest only
upon aggregation. These two cases, on the other hand, fit naturally within the framework with fractionally integrated (i.e.,
long-memory) predictive variables (see Baillie 1996). As we
will show, fractional integration has implications for the return
predictability at different forecasting horizons as well as for the
properties of the sample statistics in long-horizon regressions
(i.e., regressions with long-term returns).
In our measurement procedure, we focus on regressions with
two-way aggregation of the regressor and regressand, as proposed by Bandi and Perron (2008). One of the motivations for
this choice is the extraction of the long-run signals from both
the returns and predictive variables. We show that the population R 2 in these long-horizon regressions converges to zero as
the horizon increases unless the predictive variables are frac-
tionally integrated. Therefore, although an increase in return
predictability occurs for highly persistent (nearly integrated)
short-memory processes, extreme persistence of the variables
would be required for the longest horizons. Such persistence is
not observed for, for example, financial volatility, term spreads,
or unemployment rates. However, the behavior of these variables may be consistent with the presence of long memory. We
focus on the volatility of a broad stock market index, which
is a particularly relevant example because, in principle, it depends on the same variances of the fundamental shocks that
constitute the equity premium (e.g., Merton 1973; Campbell
and Cochrane 1999; Bansal and Yaron 2004). The return predictability by other measures of risk should also be analyzed
within the same long-memory framework.
We present three results in this article. First, we revisit the
model with nearly integrated predictive variables and adapt it
to the case with volatility as a regressor. In particular, we explicitly account for the heteroscedasticity in the returns and the
modest size of the observed predictability over short horizons.
We show that this model cannot fully account for the empirical evidence in the data, such as the magnitude of the return
predictability over the longest horizons. Second, we consider
the predictability in the long-memory framework. We find that
the increasing patterns of R 2 as a function of the horizon are
inherent in this framework. However, we find that the return
predictability (when present) is underestimated in small samples. We derive asymptotic distributions for both models, and
the resulting confidence intervals are then applied in our empirical study. For the volatility, we find that the long-memory
model produces wider confidence intervals compared with the
model with nearly integrated regressors. We, however, confirm
the predictability evidence in Bandi and Perron (2008) for the
longest horizons of 9 and 10 years.
546
© 2013 American Statistical Association
Journal of Business & Economic Statistics
October 2013, Vol. 31, No. 4
DOI: 10.1080/07350015.2013.827985
Sizova: Long-Horizon Return Regressions With Historical Volatility and Other Long-Memory Variables
The article is organized as follows. Section 2 documents the
empirical facts against which we check the validity of our econometric frameworks. Section 3 outlines a short-memory framework with a nearly integrated volatility and compares the implications of this model with the empirical facts. Section 4 outlines
a new framework in which the predictive variable is assumed to
follow a long-memory process. Finally, we test the predictability of NYSE/AMEX returns using our asymptotic results in
Section 5.
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
2.
STYLIZED EMPIRICAL FACTS
There are two pronounced empirical facts that are hard to
replicate using existing models. First, the data suggest that returns are highly predictable over long horizons. Second, the predictability of the predictive variables themselves is quite low. For
illustration, we reproduce the results of the article by Bandi and
Perron (2008), which analyzes the predictability of long-term
NYSE/AMEX returns using volatility, and extend the original
results to include 2011 data.
The data are constructed as follows. The
variable to be prerf
e
t
= M
dicted is the monthly excess return, Rt,t+1
j =1 rt,j − rt,t+1 ,
where rt,j is the jth continuously compounded daily return (including dividends) during the tth month for the NYSE/AMEX
index. These data were provided by the Center for Research
in Security Prices (CRSP)/Wharton Research Data Services
(WRDS) and cover the period from January 1952 to Decemrf
ber 2011. The risk-free rate rt,t+1 is from the CRSP “Fama
Risk-Free Rates” data file. This dataset covers the period from
January 1952 to December 2011, and is based on the prices of
1 month T-bills. Mt is the number of observations in month t.
The goal is to
forecast future long-term returns over H perie
e
ods, Rt,t+H
= H
i=1 Rt+i−1,t+i , using the data on past realized
variance (volatility) in the market:
RVt−H,t =
H
i=1
RVt−H +i−1,t−H +i ,
(1)
where
RVt,t+1 =
Mt
2
rt,j
.
(2)
j =1
Note that we use the same horizon H for the regressor RVt−H,t
e
and for the return Rt,t+H
. This is the diagonal of the matrix
reported by Bandi and Perron (2008); note that their study also
provides results for different horizons of returns and volatilities.
We focus only on one subset of these results because Bandi and
Perron (2008) demonstrated that predictability is generally at its
highest for this diagonal.
To evaluate the predictability of returns at different horizons,
we run two types of regressions:
Regression A
e
= ar + br RVt−H,t + urt+H |t ,
Rt,t+H
(3)
RVt,t+H = aσ + bσ RVt−H,t + uσt+H |t ,
(4)
Regression B
547
and record the corresponding correlations:
⎛
T
−H
e
e
1
e
Rt,t+H − Rt,t+H
ρˆ Rt,t+H = ⎝
T − 2H t=H
⎞
× (RVt−H,t − RVt−H,t )⎠
⎛
⎝
×
T
−H
2
e
1
e
Rt,t+H − Rt,t+H
T − 2H t=H
1
T − 2H
T
−H
t=H
⎞
(RVt−H,t − RVt−H,t )2 ⎠, (5)
e
e
where Rt,t+H
and RVt−H,t are the averages of Rt,t+H
and
RVt−H,t over the sample that starts with observation H and
ends at T − H . We use symbol for the correlation ρ in (5) to
indicate that this coefficient is calculated using sample data, in
contrast to
e
e
e
ρ Rt,t+H
= cov Rt,t+H
, RVt−H,t var Rt,t+H
var(RVt−H,t ),
which, if it exists, is a nonrandom number.
Analogously, for regression B we have
⎛
T
−H
1
⎝
(RVt,t+H − RVt,t+H )
ρ(RV
ˆ
t,t+H ) =
T − 2H t=H
⎞
× (RVt−H,t − RVt−H,t )⎠
⎛
⎝
×
T
−H
1
(RVt,t+H − RVt,t+H )2
T − 2H t=H
⎞
T
−H
1
(RVt−H,t − RVt−H,t )2 ⎠. (6)
T − 2H t=H
We work with correlations to preserve the information
about the sign of the relationship. The usual regression coefficients of determination (R 2 ) can be obtained simply as
e
2
(ρ(RV
ˆ
ˆ t,t+H
))2 × 100%. The coeffit,t+H )) × 100% and (ρ(R
cients of determination for regressions A and B are reported in
panel I(a) of Figure 1, and the corresponding correlations are
reported in panel II(a). Whereas ρˆ for regression A increases
monotonically from approximately zero at the 1 month horizon
to 0.81 at the 10 year horizon, ρˆ for regression B decreases. The
percentage of the explained variation in RVt,t+H changes from
25% to nearly zero. The return predictability therefore appears
to increase with the time horizon, whereas the predictability of
the predictive variable itself seems to disappear.
A similar pattern is observed when we replace the realized volatility with another standard predictor of the long-term
returns—the dividend yield, as shown in panels I(b) and II(b) of
Figure 1. We continue with two-way aggregation of the regressor and returns, as in the volatility case. However, a difference in
548
Journal of Business & Economic Statistics, October 2013
Panel I(b)
Panel I(a)
100%
100%
Return Predictability, R2
Volatility Predictability, R
80%
60%
40%
40%
20%
20%
0
20
40
60
80
Horizon (months)
Div.Yield Predictability, R2
80%
60%
0%
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
Return Predictability, R2
2
100
120
0%
0
Panel II(a)
20
40
60
80
Horizon (months)
100
120
Panel II(b)
1
1
Return Predictability, ρ
Volatility Predictability, ρ
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0
20
40
60
80
Horizon (months)
Return Predictability, ρ
Div.Yield Predictability, ρ
0.8
100
120
0
20
40
60
80
Horizon (months)
100
120
Figure 1. Sample R 2 and correlations for 1952–2011, NYSE/AMEX returns. Panel I(a) shows the sample R 2 in the regressions of excess
returns on past volatility and future volatility on past volatility. Panel I(b) shows the corresponding values when the dividend yield is used as the
predictor. Panels II(a,b) display the corresponding sample correlations. The forecasting horizon is indicated on the OX axis.
interpretation from the previous case should be noted. While the
aggregated variance over H periods, RVt−H,t , is also a measure
of the H-period variance, there is no economic motivation for
the aggregation of the dividend yields. However, the aggregation
can be interpreted statistically as a signal extraction procedure.
This method is suitable under the assumption that the predictive variable contains uninformative short-run noise, which is in
effect removed through aggregation.
Figure 2 provides further details of the mechanism behind
the high return predictability in long-horizon regressions by
plotting future aggregated returns against the past aggregated
realized variance. The figure shows how by increasing the horizon H, we reveal a linear relation between returns and variance
at H = 10 years from the data, which, at monthly horizons,
does not contain any apparent information regarding the risk
relation. This exercise was performed by Bandi and Perron
(2008) using a dataset that did not include the stock market
crash of 2008 and led to the same findings. The fact that inclusion of this new data did not seem to change the conclusions
speaks to the robustness of the relation between the variance and
returns.
3.
ECONOMETRIC FRAMEWORK: NEARLY
INTEGRATED PREDICTOR
It has been demonstrated that the analysis of long-horizon
regressions such as (3) and (4) requires special methods that
account for small-sample effects, which are exacerbated by the
persistence of the regressors (see Ventosa-Santaul`aria 2009).
To account for the persistence, Valkanov (2003) modeled predictive variables as nearly integrated processes. He developed
asymptotic results specifically for long-horizon regressions of
asset returns.
Our first goal is to check whether this type of model can
capture the long-run predictability pattern shown in Figure 1.
We extend the model developed by Valkanov (2003) to volatility regressors to capture various well-known facts regarding the
Sizova: Long-Horizon Return Regressions With Historical Volatility and Other Long-Memory Variables
H = 12 months
20
Excess Return from t to t + H
Excess Return from t to t + H
H = 1 month
10
0
−10
−20
−30
0
200
400
600
60
40
20
0
−20
−40
−60
−80
800
0
Return Variance from t − H to t
500
50
0
−50
500
1000
1500
1500
2000
2500
H = 120 months
Excess Return from t to t + H
Excess Return from t to t + H
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
H = 60 months
0
1000
Return Variance from t − H to t
100
−100
549
2000
Return Variance from t − H to t
150
100
50
0
−50
500
1000
1500
2000
2500
3000
Return Variance from t − H to t
e
Figure 2. Effect of aggregation on return predictability. The excess returns, Rt,t+H
, are plotted versus RVt−H,H for the NYSE/AMEX index
from 1952 to 2011.
return–variance relationship, including leverage, heteroscedasticity, and positive variance:
e
Rt,t+1
= βσt2 + σt εt+1,1 ,
c
L b(L)(σt − μσ ) = εt,2 ,
1− 1+
T
corr(εt,1 , εt,2 ) = r < 0.
(7)
The vector (εt,1 , εt,2 ) is a martingale difference sequence. The
variances of εt,1 and εt,2 are normalized to one. The process vt =
b(L)−1 εt,2 satisfies a mixing condition from Herrndorf (1984):
vt is a zero-mean strong
sequence with mixing coeffi mixing
1−2/b
α
< ∞, lim supt>1 E|vt |b < ∞
cients αm , such that ∞
m=1 m
for b > 2, and the limit limT →∞ E(1/T ( Tt=1 vt )2 ) exists and
is positive. The initial value of σ0 is a random variable, whose
distribution is independent of T.
For this model, estimated correlations ρˆ in (5) and (6) have
nonstandard asymptotic distributions because σt behaves similar
to a unit-root process for T → ∞. To derive these distributions,
we replace RVt−H,t in the definitions of the sample correlations
ρˆ by its measurement-error-free analog,
integrated variance,
2
which is the sum of past variances, t−1
t−H στ , here denoted as
σ
RVt−H,t . The results remain the same for RVt−H,t , since under
our assumptions RVt,t+H ∼ Op (T 2 ) and RVσt,t+H − RVt,t+H ∼
op (T 2 ).
The standard assumption in the literature on overlapping
observations is that H is a nontrivial portion of the sample.
Formally, it is captured by the condition limT →∞ H /T = λ,
0 < λ < 1/2. Under this assumption, as shown in Theorem 1, ρˆ
in regression A converges to a nondegenerate random variable.
This result closely matches similar findings in Bandi and Perron
(2008, Proposition 1).
Theorem 1. For dynamics (7), the sample correlation ρˆ in
e
on the past integrated vari(5) for the regression of Rt,t+H
t−1 2
e
ˆ t,t+H
)⇒
ance t−H στ converges weakly to the functional ρ(R
Fρ (A, B), where processes A(τ ) and B(τ ) are defined on the
interval [λ, 1 − λ] as follows:
1−λ ζ +λ
τ +λ
1
2
2
J2,s
dsdζ,
(8)
J2,s
ds −
A(τ ) =
1 − 2λ λ
s=ζ
s=τ
τ
1−λ ζ
1
2
2
J2,s
ds −
B(τ ) =
J2,s
dsdζ, (9)
1 − 2λ λ
s=τ −λ
s=ζ −λ
where J2,s is an Ornstein–Uhlenbeck (OU) process driven by a
standard Brownian motion W2,s , dJ2,s = cJ2,s ds + dW2,s , and
Fρ is a functional of two processes that are defined and almost
surely (a.s.) continuous on the interval [λ, 1 − λ],
1−λ
Y (τ )X(τ )dτ
λ
Fρ (Y, X) ≡
.
1−λ 2
1−λ 2
X
(τ
)dτ
Y
(τ
)dτ
λ
λ
[T s]
The process W2,s is a limit of the partial sums T −1/2 t=1
εt,2
when they are appropriately normalized. Therefore, the limit of
550
Journal of Business & Economic Statistics, October 2013
e
ρ(R
ˆ t,t+H
) depends only on the characteristics of εt,2 and not on
εt,1 . For example, this limit remains the same if the shocks to
returns are absent and excess returns are perfectly collinear with
σ
variance, which implies that ρ(RV
ˆ
t,t+H ) should converge to the
same limit, that is,
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
ρˆ RVσt,t+H ⇒ Fρ (A, B).
(10)
The intuition behind this result is similar to that of cointegration.
e
That is, at their limits, the long-term return Rt,t+H
and variance
σ
RVt,t+H can be said to move in unison. Therefore, their long-run
predictabilities measured by ρˆ are the same. Thus, the difference
in sample correlations ρˆ across these two regressions converges
to zero, which clearly is not in line with the empirical observations presented in Section 2. The next modification of the
original model resolves this qualitative mismatch between the
model and the data.
Local-to-Zero Predictability
The modification we suggest concerns the predictability of
returns, namely, the parameter β. The predictability at short
horizons is so small that it can be modeled as a local-to-zero
predictability, that is, β = β0 /T . In this section, we show that
in this case, we can qualitatively match the ρˆ pattern we observe
e
)
in the data. That is, we can obtain high realizations of ρ(R
ˆ t,t+H
σ
e
and a substantial difference between ρ(R
ˆ t,t+H ) and ρ(RV
ˆ
).
t,t+H
The assumption that β = β0 /T allows the effect of the shock
in returns to be significant, even as T → ∞. Thus, the leverage
effect, which is a negative correlation between εt,1 and εt,2 ,
influences the estimated return predictability, in accord with
prior studies (Stambaugh 1999). The following result is proved
using the same arguments as in Theorem 1.
Theorem 2. For dynamics (7), if β = β0 /T , the sample core
relation ρˆ in the regression of Rt,t+H
on the past integrated
t−1 2
variance t−H στ converges weakly to
e
⇒ Fρ (D, B),
ρˆ Rt,t+H
(11)
where the functional Fρ and the process B(τ ) are defined as in
Theorem 1.
Process D(τ ) = β0 ωA(τ ) + C(τ ) is defined on [λ, 1 − λ],
where A(τ ) is also given in Theorem 1. The constant ω =
b(1)−1 , and C(τ ) is a process driven by a standard Brownian
motion, W1,s , given by
C(τ ) =
τ +λ
s=τ
J2,s dW1,s −
1
1 − 2λ
λ
1−λ
ζ +λ
J2,s dW1,s dζ.
s=ζ
(12)
Processes W1,s and W2,s are correlated and d[W1,s , W2,s ] =
rds. The asymptotic distributions of the OLS slope br and
the OLS t-statistic for the slope tbr are ωT br ⇒ Fβ (D, B)
and √tbTr ⇒ Ft (D, B), where the new functionals are defined as
follows:
1−λ
Y (τ )X(τ )dτ
,
1−λ
X2 (τ )dτ
λ
1−λ
1
(Y (τ ) − Fβ (Y, X)X(τ ))2 dτ
Fσe2 (Y, X) =
1 − 2λ λ
1−λ
Fβ (Y, X) λ X2 (τ )dτ
Ft (Y, X) =
.
Fσe2 (Y, X)
Fβ (Y, X) =
λ
The new component C(τ ) in the above theorem depends on
the characteristics of the process εt,1 . Therefore, under locale
to-zero predictability, the limiting distribution of ρ(R
ˆ t,t+h
) also
depends on the characteristics of the shock εt,1 , in particular, on
its correlation with εt,2 . Correlation for the variance regression
σ
ρ(RV
ˆ
t,t+H ) still converges to the same limit as in the case with
a constant β:
(13)
ρ
RVσt,t+H ⇒ Fρ (A, B).
The result of the above modifications to the original Valkanov
(2003) framework is that we can now study the implications of a
short-memory model that takes into account overlapping observations, persistence in the predictive variable, heteroscedasticity
in the returns, and the effect of a negative correlation between
shocks to returns and shocks to variances. To determine if this
general model can match the observed data, we consider reasonable values of c, β0 , ρ, and ω and examine the asymptotic
distribution for sample correlations.
The coefficients are chosen as follows. The persistence parameter, c = (0.7 − 1)T , corresponds to the first autocorrelation of 0.7 for the monthly variances. The value of the slope is
e
e
/var(Rt,t+1
) ≈ 1.88. The parameter ω defines the raβ = Rt,t+1
2
2 2
tio varσ /E σt : the choices ω = 0.0318 and ω = 0.0551 correspond to varσ 2 /E 2 σt2 = 1 and varσ 2 /E 2 σt2 = 9. The leverage
coefficient, namely, the correlation, r, is fixed at −0.76 (A¨ıtSahalia and Kimmel 2007). For comparison, we also report the
case with no leverage, r = 0.
We construct the distribution of the correlation, ρ,
ˆ in regressions A and B using formula (11). The results are reported in
Table 1 for the horizon H = 120 and T = 708 observations, as
in Section 2. Table 1 shows that even asymptotically, the sample
correlations can assume a wide range of values and therefore
provide a poor assessment of the strengths of relationships. For
example, if ω = 0.0551 and r = −0.76, then the 90% probe
) extends from −0.533 to 0.673.
ability interval for ρ(R
ˆ t,t+H
However, note that despite the high dispersion of the estimates,
a value of ρˆ above 0.81 is rarely observed for regression A.
Based on the data in Table 1, the probability of this event is less
than 1% for all of the cases.
Another interesting observation that follows from this table
is that positive and negative relations between the past volatility and future returns are nearly equally likely, in contrast to the
positive relation prescribed by the sign of β0 . Indeed, the median
e
value of ρ(R
ˆ t,t+H
) is nearly zero in all of the cases. The distrie
bution of ρ(R
ˆ t,t+H
) is therefore centered at approximately zero,
as for β0 = 0, implying that 10 year horizon regressions are
uninformative in testing for predictability. Note, however, that
this conclusion is an outcome of the current framework with the
Sizova: Long-Horizon Return Regressions With Historical Volatility and Other Long-Memory Variables
551
Table 1. Correlations ρ:
ˆ short-memory framework with nearly integrated regressors
Percentiles
Case 1: ω = 0.0551, r = −0.76
e
)
ρ(R
ˆ t,t+H
ρ(RV
ˆ
t,t+H )
e
ρ(R
ˆ t,t+H
) − ρ(RV
ˆ
t,t+H )
Case 2: ω = 0.0551, r = 0.0
e
)
ρ(R
ˆ t,t+H
ρ(RV
ˆ
t,t+H )
e
) − ρ(RV
ˆ
ρ(R
ˆ t,t+H
t,t+H )
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
Case 3: ω = 0.0318, r =-0.76
e
)
ρ(R
ˆ t,t+H
ρ(RV
ˆ
t,t+H )
e
ρ(R
ˆ t,t+H
) − ρ(RV
ˆ
t,t+H )
1.0%
5.0%
10.0%
Median
90%
95%
99%
−0.707
−0.820
−0.997
−0.533
−0.716
−0.590
−0.417
−0.643
−0.360
0.125
−0.308
0.424
0.585
0.150
1.051
0.673
0.301
1.192
0.791
0.545
1.413
−0.793
−0.820
−0.847
−0.670
−0.716
−0.523
−0.577
−0.643
−0.357
−0.098
−0.308
0.198
0.443
0.150
0.744
0.559
0.301
0.886
0.724
0.545
1.119
−0.680
−0.820
−1.016
−0.496
−0.716
−0.595
−0.369
−0.643
−0.352
0.167
−0.308
0.466
0.604
0.150
1.103
0.688
0.301
1.243
0.801
0.545
1.462
NOTES: The table reports the percentiles for the correlations ρˆ in regressions (3) and (4). The percentiles are calculated based on formulas (11) and (13), using 100,000 simulations. The
integrals are calculated using 1000 steps per unit interval. It is assumed that the sample consists of 708 monthly observations, and the forecasting horizon is 120 months.
nearly integrated predictor. In the next section, we demonstrate
that this finding is not accidental: for the model considered here,
e
the unobserved true value of ρ(Rt,t+H
) is in fact nearly zero for
long horizons.
4.
ECONOMETRIC FRAMEWORK: LONG MEMORY
How can we explain why the previous framework fails to
e
deliver significant ρ(R
ˆ t,t+H
) values? One explanation is that although a nearly integrated process is almost nonstationary, it is
still a short-memory process in small samples. This implies that
although the autocorrelations of such a process are initially high,
they decay at a rapid (exponential) rate. In contrast to this autocorrelation structure, long-memory processes do not necessarily
exhibit high first autocorrelations, but the effect of shocks tend to
persist over longer periods of time. We address this argument in
this section by allowing for long-range dependence in volatility
Before we turn to the details of the long-memory framework,
one controversy remains to be addressed. Our assumption about
the long-range dependence in variance implies a long-range
dependence in the equity premium and therefore a long-range
dependence in the returns. This may seem to be at odds with
Rogers (1997), who showed how long memory in prices can
cause a violation of the no-arbitrage condition. However, Rogers
(1997, p. 104) also stated that under certain assumptions, stock
prices may exhibit long-range dependence and still satisfy the
no-arbitrage condition. These assumptions hold by default if
the return dynamics are obtained by solving a structural asset
pricing model, and naturally, asset pricing models with riskaverse investors will produce long-memory equity premium if
the volatility of the dividend stream is a long-memory process.
For example, Bollerslev, Sizova, and Tauchen (2012) solved for
asset prices in a long-run risk model with a long-range dependent volatility, and the condition described by Rogers (1997) is
satisfied in their model.
returns grows faster than the variance of unexpected returns.”
In this section, we demonstrate the accuracy of this explanation when the variance exhibits long-range dependence. On
e
) for
the contrary, for short-memory processes, high ρ(R
ˆ t,t+H
10 year horizons cannot be explained by this accumulation of
predictability.
Herein, we define a long-memory process Xt (d) based on the
behavior of its spectral density around zero.
Assumption 1. The spectral density of Xt (d), fx (ω), is
defined, and there exists a positive constant C such that
limω→0 fx (ω)|1 − e−iω |2d = C for some 0 ≤ d < 1/2.
The above assumption defines Xt (d) as a long-memory process if d > 0. The same assumption defines a short-memory
process when d = 0. Suppose the predictive variable (i.e., realized variance) satisfies Assumption 1 with the parameter d ≥ 0,
that is,
RVt−1,t = Xt (d), d ≥ 0.
(14)
Also suppose that the return can be represented as the sum of a
predictable component βXt (d) and the shock εt+1 :
e
Rt,t+1
= βXt (d) + εt+1 ,
(15)
4.1 Long-Memory Framework: Fixed Forecasting
Horizon
where εt+1 also satisfies Assumption 1 with parameter d = 0,
that is, the limit of its spectral density at zero is finite. The results of this section do not change if RVt−1,t is a sum of Xt (d)
and noise, as long as the noise process has the integration order
d ′ ≥ 0 less than d. For example, this accommodates the case
when RVt−1,t is just a proxy for RVσt−1,t in the model for return,
and thus, the difference RVt−1,t − RVσt−1,t is the noise. Also, the
results extend to the case when the returns are predicted by several variables, and Xt (d) is the one with the highest integration
order. Thus, the results of this section hold for the models with
several risk factors, such as those seen in, for example, different
versions of the long-run risk models (e.g., Bollerslev, Tauchen,
and Zhou 2009; Drechsler and Yaron 2011).
From Lemma 2 in Appendix B, it follows that
Fama and French (1988, p. 4) explained long-term return
predictability as a process by which “the variance of expected
Theorem 3. For return dynamics (14) and (15), where Xt (d) is
a long-memory process satisfying Assumption 1, the population
552
e
correlation ρ(Rt,t+H
) defined as
e
e
cov Rt,t+H
, RVt−H,H
ρ Rt,t+H =
e
var Rt,t+H
var(RVt−H,H )
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
converges to (22d − 1) × sign(β) as H → ∞.
e
There is a connection between ρ(Rt,t+H
) as defined above
e
e
) is the limit
and ρ(R
ˆ t,t+H ), defined in (5). Correlation ρ(Rt,t+H
e
of ρ(R
ˆ t,t+H ) as T → ∞ when H is fixed. Thus, the above expression is a sequential asymptotic result.
It follows from Theorem 3 that for all short-memory models,
e
ρ(Rt,t+H
) converges to zero. We therefore expect there to be
no predictability in long-term returns and conclude that high
e
) can arise only due to the high dispersion of the correρ(R
ˆ t,t+H
lation. In a numerical exercise with the model parameters from
e
)| increases up to the medium
Table 1, we found that |ρ(Rt,t+H
horizon of 1 year but declines to zero for longer horizons. These
results can be made available upon request.
However, the logic of Fama and French’s (1988) analysis
still applies to the case of long-memory processes. Due to the
e
)| converges to a posiaccumulation of predictability, |ρ(Rt,t+H
e
tive constant. For example, the limit of ρ(Rt,t+H
) is ±0.815 if
d = 0.43, which is a commonly found value of d for the realized
variance in empirical work.
Long-range dependence in the predictive variables, therefore,
leads to long-run predictability. Nevertheless, we are cautious
in interpreting this finding because the correlation ρ is herein
defined as the limit as the available data span tends to infinity,
while H is held constant. For long-horizon regressions, however,
H becomes a large portion of the total sample. We, therefore,
study the behavior of the estimated ρˆ under the assumption that
H /T converges to λ > 0 in Section 4.2.
4.2 Long-Memory Framework: Increasing
Forecasting Horizon
In this section, we calculate the asymptotic distributions of
correlations ρˆ as normally estimated using sample covariances
e
and RVt,t+H ; see (5) and (6). Simiand variances of Rt,t+H
lar to Valkanov (2003), we assume that observations overlap;
however, in contrast to Valkanov (2003), we do not make the
assumption of the local-to-unity root. We instead assume that
RVt,t+1 exhibits long-range dependence, that is, long memory.
Following the literature on long-horizon regressions, the limite
ing distributions of ρ(R
ˆ t,t+H
) and ρ(RV
ˆ
t,t+H ) are derived under
the assumption that H is large, that is, limT →∞ H /T = λ > 0.
Suppose that we have the same model as in the previous section,
that is, the dynamics of returns and variances are described by
the system of Equations (14) and (15). Again, the results do not
change if the realized variance is the sum of two components,
Xt (d) and a less persistent noise, and if the returns are predicted
by several factors, as long as Xt (d) has the highest integration
order among them. To derive the asymptotic result, we rely
on a more restrictive definition of the stationary long-memory
process based on the limiting distribution of its partial sums.
Assumption 2. For a fixed τ ∈ [0, 1] and 0 < d < 1/2,
[τ T ]
i=1 (Xi (d) − μx )
⇒ σd W(d),τ ,
T 1/2+d
Journal of Business & Economic Statistics, October 2013
var( Ti=1 Xi (d))
, μx = EXt (d), and W(d),τ is
where σd2 = limT →∞
T 1+2d
a Type I fractional Brownian motion.
A fractional Brownian motion of Type I is defined in
Mandelbrot and Van Ness (1968), Tsay and Chung (2000), and
Marinucci and Robinson (1999) as follows:
τ
(1 + 2d)Ŵ(1 − d)
(τ − s)d dW2,s
W(d),τ =
Ŵ(1 + d)Ŵ(1 − 2d) 0
0
(16)
[(τ − s)d − (−s)d ]dW2,s ,
+
−∞
where dW2,s are increments of a standard Brownian motion. Assumption 1 is more general than Assumption 2; for an overview
of different definitions and properties of long-range dependent
processes, see Baillie (1996). We now consider which models
satisfy Assumption 2. Naturally, this assumption is satisfied for
short-memory processes if d = 0 under the general conditions
of the functional central limit theorem. For long-memory processes, a general class that satisfies Assumption 2 is that of
moving-average processes.
Example 1.
Xt (d) = μx +
∞
x
θi εt−i
,
i=0
εtx
where
are iid zero-mean shocks with a finite variance,
and the sequence {θi }∞
i=0 decays hyperbolically (Marinucci and
Robinson 2000). That is, coefficients θi decay as l(i)i d−1 ,
0 ≤ d < 1/2 for i → ∞, where l(.) is a Lebesgue-measurable
slow-varying function, bounded on compact subsets and positive on [a, +∞) for some a > 0.
√
Furthermore, if instead of Xt (d), only Xt (d) can be represented as a such stationary moving-average process, then Xt (d)
may still satisfy Assumption 2.
Example 2.
Xt (d) = μ˜ x +
∞
x
θi εt−i
i=0
2
,
where εtx are iid Gaussian zero-mean shocks, the sequence
{θi }∞
˜ x
= 0.
i=0 satisfies the assumption from Example 1, and μ
The fractional stochastic volatility model suggested by
Comte and Renault (1998), which is one of the few available
continuous-time long-memory models for financial volatility,
satisfies Assumption 2.
Example 3.
Xt (d) =
t
σ 2 (u)du,
t−1
d ln σ (t) = −κ ln σ (t)dt + σ dw(d) (t),
where w(d) (t) is a truncated fractional Brownian motion and κ >
0. This example and the previous one both satisfy the condition
of nonnegativity of the variance and can be proven to satisfy
Assumption 2 using the arguments of Taqqu (1975). Cases that
violate Assumption 2 can also be found in Taqqu (1975).
Sizova: Long-Horizon Return Regressions With Historical Volatility and Other Long-Memory Variables
Theorem 4. For the dynamics (14) and (15), under Assumption 2, if limT →∞ H /T = λ, then
e
ρˆ Rt,t+H
⇒ Fρ (Ad , B d ),
(17)
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
where Ad (τ ) and B d (τ ) are stochastic processes defined on
[0, 1]:
Ad (τ ) = W(d),τ +λ − W(d),τ
1−λ
1
W(d),s+λ − W(d),s ds,
−
1 − 2λ λ
d
B (τ ) = W(d),τ − W(d),τ −λ
1−λ
1
W(d),s − W(d),s−λ ds, (18)
−
1 − 2λ λ
and the process W(d),τ is a Type I fractional Brownian motion
(16).
The above result readily follows from the continuous mapping
theorem (CMT). Note that ρ(RV
ˆ
t,t+H ) converges to the same
limit, as the error term does not matter for the asymptotic dise
). Table 2 lists the asymptotic distribution
tribution of ρ(R
ˆ t,t+H
e
of ρ(R
ˆ t,t+H ) for different levels of data aggregation, λ = H /T .
For example, if H = 120 months and T = 708 months, then λ
is approximately 0.17.
The data reported in Table 2 include both long-memory
(d = 0.43) and short-memory (d = 0) cases. The second cole
) for large horiumn of the table lists the genuine ρ(Rt,t+H
zons H, that is, its limit as H → ∞. If d = 0.43, then based
e
on Theorem 3, limH →∞ ρ(Rt,t+H
) = 0.815. For the shortmemory case, the genuine long-run predictability is zero, that is,
e
) = 0. The next three columns list the median
limH →∞ ρ(Rt,t+H
values and the 5th and 95th percentiles for the asymptotic limits
e
) for large horizons H as T → ∞ and
of the sample ρ(R
ˆ t,t+H
H /T → λ.
A few conclusions follow from Table 2. First, there is a negae
tive bias in ρ(R
ˆ t,t+H
) that increases with the aggregation parame
eter, λ. Indeed, all the median values of ρ(R
ˆ t,t+H
) are well below
e
the value of ρ(Rt,t+H ). In the long-memory case, for time horizons that constitute 1/20 of the sample (approximately 3 years in
e
) is 0.402,
our empirical example), the median value of ρ(R
ˆ t,t+H
e
Table 2. Correlation ρ(R
ˆ t,t+H
): long-memory framework
ρ:
ˆ H /T is fixed
ρ
Long-memory (d = 0.43)
λ = 0.05
λ = 0.10
λ = 0.17
Short-memory (d = 0)
λ = 0.05
λ = 0.10
λ = 0.17
Median
5%
95%
0.815
0.815
0.815
0.402
0.216
−0.081
−0.018
−0.363
−0.675
0.697
0.667
0.652
0.000
0.000
0.000
−0.077
−0.148
−0.306
−0.362
−0.551
−0.697
0.226
0.263
0.270
NOTES: The table reports the percentiles for ρˆ in regression (3) under the assumption that
limT →∞ H /T = λ. The percentiles are calculated based on the long-memory framework
presented in Section 4.2 formula (17), using 100,000 simulations. The integrals are calculated using 1000 steps per unit interval. The first column shows limH →∞ ρ(H ) = (22d − 1),
which is the population equivalent of ρˆ for long horizons.
553
e
which is well below the population value of 0.815. ρ(R
ˆ t,t+H
)
decreases even further for 1/10 of the sample, corresponding to
approximately a 6 year horizon, and drops below zero for 10
year horizons. The implication is that the predictability of returns and variances in small samples is always underestimated.
e
) deSecond, the accuracy of the sample correlation ρ(R
ˆ t,t+H
creases with an increase in the aggregation level λ. For example,
e
) has a
if d = 0.43 and H is 1/20 of the total sample, ρ(R
ˆ t,t+H
90% probability of taking a value between −0.018 and 0.697. If
H is 0.17 of the total sample, then the 90% probability interval
is [−0.675, 0.652].
Finally, despite the sharp decline in accuracy due to aggregating observations, the long-memory model results in a dise
) that is more often positive than negative
tribution of ρ(R
ˆ t,t+H
e
) for
for λ ≤ 0.10. Simultaneously, the distribution of ρ(R
ˆ t,t+H
short-memory models is centered on zero. Thus, in the shortmemory models, we are equally likely to observe positive and
negative correlations between future returns and variances for
long horizons.
Now consider the difference between the regression for returns (3) and the regression for volatilities (4). Up until now,
the unpredictable part of the returns did not affect the result in
(17) because the predictable part dominated the distribution as
T → ∞. As in the short-memory local-to-unity framework, to
ensure that the unpredictable part of the return affects the lime
), we introduce the local-to-zero
iting distribution of ρ(R
ˆ t,t+H
predictability into the model as follows:
β0
Xt (d) + εt+1 .
(19)
Td
Here, Xt (d) is the long-run component of the realized variance,
RVt−1,t , that is, the part of the realized variance with the largest
long-memory parameter, d. Xt (d) again satisfies Assumption
2 with d > 0. εt+1 is a process satisfying the assumptions of
the functional central
theorem, that is, Assumption 2 with
limit
T]
ε
d = 0, so that √1T [τ
t=1 t+1 ⇒ σε W1,τ , forming a multivariate
fractional Brownian motion (mfBm) jointly with the limit of the
partial sums of RVt−1,t . The use of mfBm as a limiting process
and efficient methods for the simulation of mfBm are discussed
in Amblard et al. (2011).
The new result that applies to the dynamics (19) is given by
Theorem 5 and, again, can be obtained by using the CMT.
e
=
Rt,t+1
Theorem 5. For the dynamics (14) and (19), the sample
e
) in regression (3) converges weakly to
ρ(R
ˆ t,t+H
e
⇒ Fρ (D d , B d ),
ρˆ Rt,t+H
(20)
if limT →∞ H /T = λ > 0. D d (τ ) is a stochastic process defined
on [0, 1],
σd
D d (τ ) = β0 Ad (τ ) + W1,τ +λ − W1,τ
σε
λ
1
−
(W1,s+λ − W1,s )ds,
1 − 2λ λ
where W1,τ is a standard Brownian motion, and Ad (τ ) and
B d (τ ) are defined as in Theorem 4. The parameter β0 defines the predictability of returns in (19), and σd is an
asymptotic standard deviation
for the normalized sums of
Xt (d), that is, limT →∞ var(( Tt=1 Xt (d))/T 1/2+d ) = σd2 , and
554
Journal of Business & Economic Statistics, October 2013
Table 3. Correlation ρ:
ˆ long-memory framework with local-to-zero predictability
e
ρ(R
ˆ t,t+H
)
ρ(RV
ˆ
t−H,H )
Percentiles
Median
5%
95%
Sim (95%)
Long-memory
(d = 0.43, r = −0.76, β = 3)
λ = 0.05
λ = 0.10
λ = 0.17
0.402
0.216
−0.081
−0.018
−0.363
−0.675
0.702
0.667
0.652
0.723
0.687
0.686
0.402
0.216
−0.081
−0.018
−0.363
−0.674
0.702
0.667
0.652
0.723
0.687
0.686
−0.080
−0.153
−0.306
−0.362
−0.554
−0.697
−0.080
−0.153
−0.306
−0.362
−0.554
−0.697
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
Long-memory
(d = 0.43, r = 0, β = 3)
λ = 0.05
λ = 0.10
λ = 0.17
Short-memory
√
(d = 0, r = −0.76, β = 0.15)
λ = 0.05
λ = 0.10
λ = 0.17
Short-memory
√
(d = 0, r = 0.0, β = 0.15)
λ = 0.05
λ = 0.10
λ = 0.17
Median
5%
95%
Sim (95%)
0.425
0.470
0.421
0.130
−0.024
−0.334
0.645
0.773
0.842
0.613
0.744
0.823
0.191
0.132
0.008
−0.201
−0.410
−0.690
0.528
0.614
0.695
0.524
0.610
0.686
0.230
0.263
0.270
0.035
0.090
0.155
−0.280
−0.412
−0.499
0.339
0.535
0.691
0.230
0.263
0.270
−0.036
−0.072
−0.111
−0.343
−0.492
−0.695
0.277
0.434
0.541
NOTES: The table reports the percentiles for ρˆ in regressions (3) and (4) under the assumption that limT →∞ H /T = λ and for local-to-zero predictability. The percentiles are calculated
based on the long-memory framework presented in Section 4.2, formula (20), using 100,000 simulations. The integrals are calculated using 1000 steps per unit interval. The last column
e
in each block, for ρ(RV
ˆ
ˆ t,t+H
), provides the simulated 95th percentiles in the long-memory model of Comte and Renault (1998).
t−H,H ) and ρ(R
σε is an asymptotic standard
deviation for normalized sums
of εt+1 , that is, limT →∞ var(( Tt=1 εt+1 )/T 1/2 ) = σε2 . It also
d
d
d
holds that ρ(RV
ˆ
t,t+H ) ⇒ Fρ (A , B ), the OLS slope b√
rT ⇒
σε
F (Dd , Bd ), and the OLS t-statistic for the slope tbr / T ⇒
σd β
Ft (Dd , Bd ).
Note that the leverage coefficient enters the above formulas only implicitly, through the correlation between the
processes W1,τ and W2,τ , where the latter drives W(d),τ in
(16). Let d[W1,τ , W2,τ ] = rdτ . The new parameters that enter
the above formulas, therefore, include the ratio (β0 σd )/σε
and r. The first parameter defines the variance ratio between the predictable and
parts
the return
unpredictable
of
H
β0
−1
=
process, limH →∞ var( H
t=1 T d Xt (d)) × var( t=1 εt+1 )
2 2d
((β0 σd )/σε ) λ . To select the value of (β0 σd )/σε , we match
the order of predictability for monthly and annual horizons.
For example, if (β0 σd )/σε = 3 and 2d ≈ 1, then the return
predictability is 15% for annual returns (λ = 1/60) and 1.25%
for monthly returns (λ = 1/60/12) . These figures are similar
to those reported by Drechsler and Yaron (2011) and Bollerslev,
Tauchen, and Zhou (2009). For the short-memory model, the
asymptotic variance ratio remains the same for all horizons.
To calibrate (β0 σd )/σε for the short-memory case, we match
the same√predictability at annual horizons so that (β0 σd )/σε is
fixed at 0.15.
The second parameter, namely, the correlation r, is most
accurately defined as the asymptotic long-run leverage effect.
Although this r is not necessarily equal to the corresponding
parameter calculated based on high-frequency observations, we
use the estimate of the latter as a proxy for r and fix its value
at −0.76 (A¨ıt-Sahalia and Kimmel 2007). For comparison, we
also report the results for the zero-leverage case (r = 0).
Table 3 reports the percentiles for the asymptotic distribution
e
) and ρ(RV
ˆ
of ρ(R
ˆ t,t+H
t,t+H ) as functions of the aggregation
level λ = H /T . Table 3 is formatted such that it is convenient to compare the short-memory and long-memory cases
and the leverage (r = −0.76) and zero-leverage cases (r = 0).
The first three columns list the percentiles for the distribution of
ρ(RV
ˆ
t,t+H ). Again, we observe that for all cases, the estimated
predictability in volatility is biased downward.
e
The next three columns report the percentiles for ρ(R
ˆ t,t+H
).
e
The very first observation is that the median values of ρ(R
ˆ t,t+H
)
are always larger than the median values of ρ(RV
ˆ
t,t+H ). This
observation agrees with the empirical findings in Section 2. For
example, for the long-memory case with r = −0.76, the mee
ˆ t,t+H
) are −0.081 and
dian ρ(RV
ˆ
t,t+H ) and the median ρ(R
0.421, respectively, for λ = H /T = 0.17. We observe this
positive difference for all the cases; however, the difference
e
) − ρ(RV
ˆ
ρ(R
ˆ t,t+H
t,t+H ) is sufficiently large only if we allow
for the leverage effect. The effect of the leverage on the bias in
regressions has been studied by Stambaugh (1999) for autoregressive AR(1) predictive variables. In our framework, aggregation of the data (with H > 1) and long-range dependence alter
the relation derived by Stambaugh (1999). However, similar
results still hold. In addition to the small-sample bias of Stambaugh (1999), the leverage also produces a large-sample effect
by inducing a negative correlation between the predictable and
unpredictable parts of the return.
The second observation from Table 3 is that the long-memory
case can produce nonnegligible predictability in returns for long
horizons. This follows from the last column of the table, which
e
). For
reports the 95th percentile for the distribution of ρ(R
ˆ t,t+H
example, if λ = 0.17, which approximates the level of the overlap for the 10 year horizon, and r = −0.76, then the probability
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:19 11 January 2016
Sizova: Long-Horizon Return Regressions With Historical Volatility and Other Long-Memory Variables
e
of attaining ρ(R
ˆ t,t+H
) = 0.81 is above 5%. This stands in contrast to the short-memory case, for which, just as in Table 1,
the same probability is less than 1%. To summarize, long memory in volatility combined with the leverage effect generates
high predictability in returns together with low predictability in
volatility itself.
Since our conclusions are based on the magnitudes of the 95th
percentiles, we next verify that the right tails of the asymptotic
distributions are representative of the right tails of the smallsample distributions. The last column in each block of Table 3
reports the 95th percentiles for correlations simulated using a
model in which the regressor is a fractional long-memory process, see Comte and Renault (1998) (for simulations, T = 708
t
d−1
vt dt, where vt is a zeromonths and σt2 = σ02 + −∞ (t−s)
Ŵ(d)
mean affine process), with the same parameters as described
above. As follows from the table, the asymptotic and the simulated percentiles are within 1%–7% of each other for all the
cases, and within 1%–3% for λ = 0.17.
4.3 Generalization to the Multivariate Case
This section briefly discusses how the asymptotic results derived in Sections 3 and 4 can be generalized to the multivariate
setting, in which predictive variables can be of either persistence
type (nearly integrated or fractionally integrated). The assumptions of Sections 3 and 4 can be combined into the following
general condition.
Assumption 3. Suppose that
e
= β ′ ft + εt+1 ,
Rt,t+1
where β ∈ Rp and (ft , εt+1 ) is a random (p + 1)-vector, such
that for any fixed τ ∈ [0, 1], we have
(α)
[τ T ]
fi
i=1
εi
−θ
⇒
υ(τ )
,
u(τ )
(21)
where α = (α1 , . . . , αp+1 ) and θ are constant vectors, (α) =
diag{T −α1 , . . . , T −αp+1 }, and (υ(τ ), u(τ ))′ is a random a.s.
continuous nondegenerate vector-process with p elements in
υ(τ ) and one element
in u(τ ). Define υ μ (τ ) = υ(τ ) −
1−λ
1
υ(τ − λ) − 1−2λ λ (υ(s) − υ(s − λ))ds, uμ (τ ) = u(τ +
1−λ
1
(u(s + λ) − u(s))ds, and let ϒυ,υ =
λ) − u(τ ) − 1−2λ
λ
1−λ
μ
μ
υ (τ )(υ (τ ))′ dτ be a.s. positive definite.
λ
For example, αj = 2 for the square of a nearly integrated predictor under the assumptions in Section 3, αj = 1/2 + d for a
fractionally integrated predictor under the assumptions in Section 4, and αj = 3/2 for all of the nearly integrated regressors
considered by Valkanov (2003). For the last element, αp+1 = 1,
as in Section 3, if the heteroscedasticity in the returns is driven by
a nearly integrated process, and αp+1 = 1/2, as in Section 4, if
the heteroscedasticity in the returns is driven by a long-memory
process.
Let br,j , 1 ≤ j ≤ p be the jth OLS slope in the regression
e
(with intercept) of Rt,t+H
on all of the elements in H
i=1 ft−i .
Let tb,j , 1 ≤ j ≤ p to be the co