value theory and stochastic volatility, which do not, a-priori, rely on the existence Ž
. Ž
. of second moments. See also McCulloch 1997a and Rachev and Mittnik 2000
for further discussion. This paper makes two contributions. We extend previously proposed stable-
GARCH models to incorporate more general GARCH structures driven by stable Paretian innovations and use these in conjunction with SuS tests performed on
estimated model residuals to appropriately test for the stable Paretian hypothesis. By focusing on the properties of the innoÕations driving GARCH processes,
namely heavy-tailedness and summability, our work differs from related work
Ž .
Ž .
such as de Vries 1991 , Davis and Mikosch 1998 , and Mikosch and Starica Ž
. 2000 , who investigate properties of the observed output variable of GARCH-type
processes driven by finite-variance innovations. The remainder of this paper is organized as follows. Section 2 details the
stable-GARCH model including two new extensions and provides simulation results on the small-sample behavior of the maximum likelihood estimator for the
parameters. Section 3 examines the GARCH SuS issue raised by GK in more detail and demonstrates analogous findings for stable-GARCH models. By apply-
ing a formal test for stable Paretian variates, Section 4 provides empirical evidence that the estimated GARCH-filtered innovations driving East Asian currency-return
series comply with the stable law. Section 5 contains the conclusion.
2. Stable-GARCH processes
We first introduce the GARCH process with stable Paretian innovations and briefly discuss the required modification of the sufficient condition for stationarity
Ž .
detailed in Mittnik et al. 1999c . The model is then generalized to support more general GARCH structures, which have been recently shown to provide a signifi-
cant advantage in terms of model fit and forecasting. Simulation results are then shown, demonstrating the acceptable small sample properties of the estimated
parameters of stable-GARCH models.
2.1. Standard stable-GARCH process Ž
. We refer to sequence y as an asymmetric stable Paretian power-GARCH
t d
Ž .
process or, in short, an S GARCH r, s process, if
a , b
y sm qe , 1
Ž .
t t
t
iid
Ž .
where e ss z , z ;S 0,1 and:
t t
t t
a , b
r s
d d
d
s sc q c e
q d s
, 2
Ž .
Ý Ý
t i
tyi j
tyj
is1 js1
Ž .
with S 0,1 denoting the standard asymmetric stable Paretian distribution with
a , b
stable index 0 - a F2, skewness parameter y1FbF1, zero location parameter, and unit scale parameter. Recall that for a - 2, z does not possess moments of
t
order a or higher, and that for a s2, z is normally distributed. By letting
t
Ž . location parameter m in Eq. 1 be time-varying, a range of mean equations,
t
including regression andror ARMA structures, is possible. Ž
.
d
Ž .
Mittnik et al. 1999c show that the S GARCH r, s process with 1 - a F2
a , b
and 0 - d - a has a unique strictly stationary solution if c 0, c G0, is1,
i
. . . , r, r G1, d G0, js1, . . . , s, sG0, and
j
r s
d
V sE z c q
d F1 3
Ž .
Ý Ý
t i
j
is1 js1
d
holds. For 1 - a F2 and 0-d-a, quantity E z can be expressed in closed
t
form as
d
1 d
d
2 a d
2
E z s
G 1 y 1 qt
cos arctant
, 4
Ž .
Ž .
t a , b
a , b
ž ž
c a
a
d
Ž .
where t [ b tan ap
r2 and
a , b
pd
°
G 1 yd cos ,
if d 1,
Ž .
~
c s 5
Ž .
2
d
¢
pr2, if d s1.
d
As d approaches a , E z increases without bound and, in the limit, leads to
t
explosive behavior uncommon in practice.
d
Ž .
d
This implies that for integrated S GARCH r, s processes, denoted as S
a , b a , b
Ž .
IGARCH r, s , the usual Asum-to-oneB condition, i.e., does not hold. In fact, this is true for GARCH processes driven by any non-normal innovations process for
d 2
which E z 1.
t d
Ž .
Inspection of generated S GARCH r, s processes, i.e., with d sa, as
a , b
Ž .
assumed in Liu and Brorsen 1995 , suggests that they are non-stationary: they exhibit extreme volatility clustering and eventually increase without bound. This
behavior is highly dependent on the closeness to the IGARCH border implied by Ž .
Eq. 3 , with simulated IGARCH processes AexplodingB for very small T while models with volatility persistence parameter V F0.9 often appear stationary for
d
ˆ ˆ
Ž .
T 5000. Thus, estimated S GARCH r, s models with V f1 and dfa
ˆ
a , b
2
Ž .
For example, an IGARCH 1,1 process driven by Student’s t innovations with n degrees of Ž
. Ž
. freedom and d s2 would set d s1y c n r n y2 for 0- c - n y2 rn.
1 1
1
could be the sign of model misspecification. Modeling returns on several foreign Ž
. exchange rates, Mittnik et al. 1999b show that in all cases considered, the
ˆ
resulting models were stationary and d was significantly less than a.
ˆ
By juxtaposing graphs of iid stable Paretian data with those generated from a stationary normal-GARCH model, GK made the point that particularly when
c qd is far from the IGARCH border, the two processes exhibit similar
1 1
characteristics and cannot be easily distinguished from one another. In Fig. 1, we also show simulations from such processes, but accompany them with the sample
Ž .
autocorrelation function SACF of the absolute values. For neither process is the small-sample behavior of the SACF well known,
3
although it is evident that between the two processes, the SACFs behave quite differently.
d
Ž .
Fig. 2 plots realizations from six stationary S GARCH 1, 1 processes for
a , b
different a and b combinations, using the same seed value. Their SACF plots Ž
. not shown were relatively similar and show unequivocally that the absolute data
possess strong correlation and are, thus, far from being iid. As a decreases, the Ž
. relative severity of the large shocks increase, while with asymmetric left-skewed
shocks, that of the negative innovations increase, the magnitude of which is Ž
. dependent on a as a approaches 2, b loses its effect .
2.2. Extending the GARCH-stable process As mentioned in the Introduction, the use of the summability property to
effectively test the stable Paretian hypothesis in finite samples requires the data to be independent. Certainly, the residuals from an ARMA-GARCH filter applied to
financial return data will be much closer to iid than their unfiltered counterparts. Nevertheless, few would insist that observed data series are really generated by an
ARMA-GARCH process, let alone from the particular handful of models chosen to be fitted by the researcher. Thus, SuS tests may be jeopardized by the extent to
which the filtered residuals deviate from iid-ness. Although attaining genuine iid residuals is an ideal which, at least for financial data with their inherently complex
data generating mechanisms, will hardly be reached, it is essential that one uses models that filter the data as effectively as possible.
To this end, we entertain some of the numerous extensions of the standard GARCH equation which have succeeded in adding only a small number of
parameters to the model, but which are highly significant and easily favored by even the harshest of penalty measures. Two such models, both of which nest a
variety of previously proposed extensions, have proven very successful with regard to in-sample and out-of-sample fit. These are the Asymmetric Power
Ž .
ARCH, or A-PARCH model, proposed by Ding et al. 1993 and the Generalized
3
Ž .
Ž .
For correlated stable data, see, however, Brockwell and Davis 1991 , Adler et al. 1998 and Ž
. Ž
. Ž
. Davis and Mikosch 1998 ; while, for normal-GARCH data, Diebold 1986 and Bera et al. 1992 .
S. Mittnik
et al.
r Journal
of Empirical
Finance 7
2000 389
– 416
395 Fig. 1. Left upper panel is iid S
data; right upper is GARCH with c s0.2, d s0.6 and iid normal innovations. Bottom panels are their respective SACFs
1.8,0 1
1
of the squared data.
S. Mittnik
et al.
r Journal
of Empirical
Finance 7
2000 389
– 416
396
1
Ž .
Ž .
Fig. 2. S GARCH 1,1 processes with c s0.01, c s0.2 and d chosen such that V s0.95. From left to right, a s2.0, 1.8, 1.4; top bottom is for b s0
a , b 1
1
Ž .
b sy0.9 .
Ž .
Quadratic ARCH, or Q-GARCH model, introduced by Sentana 1991, 1995 . Both models allow current conditional volatility to react asymmetrically to the size of
the previous periods’ innovations, which is necessary to capture the well-known Ž
. Ž leverage effect as originally noted by Black 1976
see also Christie, 1982; .
French et al., 1987; Pagan and Schwert, 1990 . Whether or not the leverage effect is present, the additional parameters required to incorporate these asymmetries are
Ž quite often found to be highly significant for a variety of data sets see Ding et al.,
.
4
1993; Franses and van Dijk, 1996; Lane et al., 1996; Paolella, 1997 . Ž
. The A-PARCH r, s model is given by
r s
d d
d
s sc q c
e yg e
q d s
, 6
Ž .
Ž .
Ý Ý
t i
tyi i
tyi i
tyi
is1 is1
for g - 1 with the special case d s2, gs0 corresponding to the standard
i
GARCH model. Negative values of g result in relatively higher values of s
i t
when z is negative. By assuming the existence of a stationary solution to Eq.
tyi
Ž .
d
Ž .
6 , the unconditional expectation of s is c s 1yV , where
g
r s
V s k c q
d 7
Ž .
Ý Ý
g i
i j
is1 js1
Ž . is a generalization of V specified in Eq. 3 and
d d
k [ E z yg z
sE z yg z 8
Ž .
Ž .
Ž .
i tyi
i tyi
i
depends on the distributional specification of z and exists only for values of
t
Ž
d
. d 0 for which E z
- `. For example, with Student’s t innovations with n degrees of freedom, elementary integration yields
d
1 d q1
n n yd
d d
y1 2
k sn 1 qg
q 1yg G
G G
,
Ž .
Ž .
i i
i
ž
ž ž
2 2
2 2 p
Ž .
which exists when d - n . For S 0, 1
innovations for which 1 - a - 2,
a , b
constraint d - a is required for k to exist. For this latter model, say S
d i
a , b
Ž .
A-PARCH r, s no closed-form expression for k is known, leaving numerical
i
integration the only alternative. This is easier said than done.
4
Ž .
It is worth mentioning that neither model nests the well-known Exponential GARCH EGARCH Ž
. Ž
. from Nelson 1991 , as also considered in GK, nor its generalizations proposed by Hentschel 1995 .
Although GK simulated from this model, they did not estimate it, which is known to be numerically Ž
. w
x demanding. In particular, Franses and van Dijk 1996 noted that A . . . only with starting-values with a
Ž .
precision of eight digits could we obtain convergenceB, while Kaiser 1996 observed A . . . severe estimation problemsB associated with the EGARCH model. Similar findings have been reported by
Ž .
Frachot 1995 as well. For this reason, we do not entertain the EGARCH model further.
In particular, recalling the thickness of the stable Paretian tails, which are then Ž
.
d
compounded with the factor z
yg z , the integration needs to be per-
tyi i
tyi
formed over a very large range, with accurate values of the pdf far into the tails. Ž
. We accomplish this by using the pdf of Doganoglu and Mittnik 1998 , which
couples a polynomial approximation to the mode-centered asymmetric stable w
x density over y10, 10 with a series-approximation for the tails. They show that
for a 1.2, the approximation delivers a maximum relative inaccuracy of 10
y6
Ž .
over the whole real line. This is accurate and fast
enough to be used in conjunction with numerical integration routines to approximate the k . The
i
restriction that a 1.2 is of little importance in this context, since the estimated value of tail index a for conditional GARCH residuals rarely dips below 1.5,
Ž even for highly volatile data Liu and Brorsen, 1995; Mittnik et al., 1998b,c,
. 1999b .
Ž .
The Q-GARCH r, s model is given by
r r
r r
s
2 2
2
s sc q c e
q c e
q2 c e
e q
d s .
9
Ž .
Ý Ý
Ý Ý Ý
t i
tyi i i
tyi i j
tyi tyj j
tyj
is1 is1
is1 jsiq1 js1
With s s0, it has the natural interpretation as being a second order approximation to the unknown conditional volatility function and can be shown to be the most
general quadratic version of ARCH-type models. An obvious generalization is to Ž .
relax the exponent in Eq. 9 , i.e.,
r r
r r
s
d d
d
s sc q c e
q c e
q2 c e
e q
d s .
Ý Ý
Ý Ý Ý
t i
tyi i i
tyi i j
tyi tyj j
tyj
is1 is1
is1 jsiq1 js1
10
Ž .
d
Ž .
This model, denoted S Q-GARCH r, s will be useful in cases where the
a , b
second moment of e does not exist, e.g., with non-normal stable Paretian
t
Ž .
innovations. Taking iterated expectations shows that E e e
s0 for i j, so
tyi tyj
Ž
d
. Ž
. that E s
sc r 1yV , where
Q
r s
d
V s c E z
q d .
11
Ž .
Ý Ý
Q i i
t j
is1 js1
d
Ž .
Ž . Note that E z
in Eq. 11 is just the k defined in Eq. 8 with g s0, for which
t i
iid
Ž . the closed-form expression Eq. 4 can be used when z ;S
, 0 - d - a and
t a , b
1 - a F2. 2.3. Small sample behaÕior of point estimates
Ž .
The estimation and even simulation
of joint GARCH-stable models is, however, not straightforward and has been only very recently considered. In the
Ž unconditional case, where only the four parameters tail-thickness, a ; skewness,
. b ; and the location and scale parameters need to be estimated, fast and reliable
methods have been in use for quite some time, the most successful of these being Ž
. the quantile-based estimator of McCulloch 1986 . In the context of more general
conditional models, however, this estimator is inapplicable and one would rather Ž
. Ž
rely on the maximum likelihood ML estimator see, e.g., Nolan, 2000, and the .
references therein . The greatest hindrance for ML estimation involves the com- Ž
. plex calculation of the stable Paretian density function pdf , which is not available
Ž .
in closed form. For symmetric pdfs, McCulloch 1998 has provided an accurate Ž
. polynomial approximation; while Doganoglu and Mittnik 1998 and Mittnik et al.
Ž .
1999a pursue fast and reliable methods for the general, asymmetric case. To gain some understanding of the small-sample performance of the model, we
1
Ž .
simulate 100 replications of an S A-PARCH 1, 1 process for sample size
a ,0
T s5000 with parameter values c s0.01, c s0.2, g sy0.5 and d s0.95 y
1 1
1
k c , i.e., that value d such that V s0.95 and k depends on the chosen value
1 1 1
g 1
of a.
5
Fig. 3 and the upper plots in Fig. 4 show boxplots of the estimated parameters using actual tail-index values a s1.5, 1.6, . . . , 2.0.
6
With one exception, the estimates appear virtually unbiased with a symmetric distribution not dominated
by outliers. This does not hold for the estimate of a when a s2, which, owing to the necessary constraint that 1 - a F2, is highly skewed. In this case, however, it
ˆ
is interesting to note how close the estimates are to the correct value of 2.0. All 100 values were greater than 1.99, with 90 being greater than 1.996. With respect
ˆ
to the different a-values, each of the small-sample distributions of m, c , d and
ˆ ˆ
1 1
g are remarkably similar, while, as a moves towards two, that of c exhibits a
ˆ ˆ
1
ˆ
Ž very slight increase in variance; that of V tends to exhibit fatter tails more mass
g
ˆ
. in the center and more outliers ; and the variance of b increases greatly. For the
latter, this is to be expected, recalling that the effect of b diminishes as a Ž
. increases. Finally, regarding a , for smaller values of the actual a 1.5 and 1.6 ,
ˆ
the resulting small-sample distribution is thin-tailed; but as the true value in-
5
Ž . Ž
. We restrict d s1, since, as E z s0 for a 1, it follows that E z yg z s E z , which is
i
available in closed form and, thus, avoids the otherwise necessary numerical integration. In addition, we use the scaled innovations S
r 2 , which has the benefit that the a s2 case coincides with
a ,0
Ž .
Ž .
N 0,1 instead of N 0,2 . It is simple to show that the corresponding parameter values for c , is0, . . . ,
i
r, assuming standard S innovations are c r 2 .
a , b i
6
The quasi-Newton BFGS algorithm as implemented in Matlab is used for estimation, with starting values arbitrarily chosen as values typical for financial time-series. On a 200 MHz Pentium PC, most
models are estimated in less than 30 s. Further simulation results as well as estimation with actual data Ž
. from a variety of sources Mittnik et al., 1999b demonstrate that in the vast majority of cases, use of
Ž .
different and oftentimes quite poor starting values leads, without difficulty, to the same log-likelihood maximum.
1
Ž .
Fig. 3. Estimated parameters from 100 simulated S -A-PARCH 1, 1 processes for a s1.5, 1.6, . . . ,
a ,0
Ž .
2.0 x-axis with S r 2 innovations, c s0.01, c s0.2 and d chosen such that V s0.95. Boxes
a ,0 1
1 g
have lines at the 25, 50 and 75 quantiles.
creases, it becomes very fat-tailed and, as a approaches two, very skewed. Somewhat oddly, the behavior at a s1.7 and 1.8 is not quite consistent with the
general pattern. It should be, nevertheless, emphasized that for all replications and all a-levels considered, a was 65 and 98.5 of the time within 0.01 and 0.05
ˆ
of the true value, respectively. Finally, the lower half of Fig. 4 shows the small-sample performance of two
useful goodness-of-fit criteria based on the empirical distribution function of the
Ž .
Fig. 4. Continuation of Fig. 3 along with the goodness-of-fit measures in Eq. 12 .
Ž .
Ž .
residuals; the Kolmogorov–Smirnov KS and Anderson–Darling AD statistics, defined by
ˆ
F x yF x
Ž . Ž .
s
ˆ
KS s100= sup F x yF x and
AD s sup ,
Ž . Ž .
s
ˆ ˆ
xgR xgR
F x 1 yF x
Ž . Ž .
Ž .
12
Ž .
ˆ
Ž . Ž .
respectively, where F x and F x are the cdf of the estimated parametric density
s
and the empirical sample distribution
T
r ym
ˆ
t y1
F x sT I
I ,
Ž .
Ý
x s
Ž y`, x
ž
s
ˆ
t
ts1
Ž . respectively, with I
I P denoting the usual indicator function. The AD statistic differs from KS in that it is given by the deviation between empirical and fitted
Ž . cdf standardized by the standard deviation of F x .
s
In the empirical application in Section 4, we will apply the AD statistic as a goodness-of-fit measure for the estimated models. Since the AD statistic weights
discrepancies more appropriately across the whole support of the distribution, most
notably so in the tails, it is particularly suitable for assessing the conditional probability of large investment losses, where one focuses on accurate inference
from the left tail of the conditional return distribution. From Fig. 4, we see that the statistics’ performance is virtually independent of
the true a , with the median KS value falling almost precisely at one, while that of AD at 0.03. Although it is comforting to know that these measures are not
Ž .
dependent on a over the range considered and for the given model , it should be kept in mind that this is of only limited practical interest, since one is usually
concerned not with a test of model adequacy based on the observed KS or AD Ž
. value per se
for which small sample results are lacking , but rather their comparison across models. Nevertheless, by detailing the finite-sample perfor-
mance of these measures, such a simulation exercise can help decide if two or more competing values of KS or AD are significantly different from one another.
2.4. Small sample behaÕior of asymptotic Õariance estimates Although there is no formal proof, we conjecture that—if the parameter vector
belongs to the interior of the admissible parameter space, and other suitable regularity conditions hold—the ML estimator is consistent and asymptotically
7
Ž .
normal. DuMouchel 1973, p. 952ff shows that this holds for the iid case with a Ge0, e arbitrarily small. The boxplots previously discussed show that except
ˆ
for a and, as a approaches two, b, the small-sample distributions of the
ˆ ˆ
parameters, including V, are thin-tailed and essentially symmetric, so that normal- ity is not an unrealistic assumption. For this reason, we concentrate on the quality
of the approximate standard errors obtained by inverting the Hessian matrix obtained from the quasi-Newton numerical optimization, as compared with the
empirical variance observed from the Monte Carlo simulations.
Inference on the persistence parameter V is greatly facilitated if the asymptotic
ˆ
distribution of V is known and can be reliably calculated. Provided that the ML
ˆ ˆ
ˆ ˆ
ˆ
X
Ž .
estimator u s m, a, b, c , g , . . . , c , g , d , . . . , d , d is asymptotically
ˆ ˆ
ˆ ˆ
ˆ ˆ
r r
1 s
normally distributed with mean u and covariance matrix given by the inverse of
asymp
X X
y1
ˆ
Ž .
Ž .
the information matrix, I , then V ; N V ,t I t , where t s EV rEu . As
T g
g T
g
closed-form expressions for k do not exist when d 1 and g 0, this method
i i
cannot be used for the general A-PARCH model.
7
Clearly, asymptotic normality will fail as we approach the boundary of the parameter space where the distribution becomes degenerate. In the likelihood maximization, we appropriately constrain the
ˆ
parameter space to ensure numerical stability. Note that V slightly exceeded unity in three out of eight Ž
. cases see Table 1 below , which is the lower bound for stationarity implied by the sufficient—but not
necessarily necessary—condition stated in Section 2. To what extent this is of consequence for theoretical or applied work needs to be investigated further.
Ek a , b Ek a , b
Ž .
Ž .
Ž .
Ž .
Fig. 5. left and
right . Ea
Eb
Ž .
When d s1, the g drop out, i.e., EVrEg s0, while EVrEc sk a, b sE Z ,
i i
i
i s1, . . . , r. Clearly, EVrEd s1, js1, . . . , s, EVrEmsEVrEc s0 and
j r
Ž .
EV rEbsÝ
c Ek a , b rEb, where, with msapr2,
is1 i 1
y1
Ek a , b
Ž .
2 a Ž
. y1ra
1r 2 a y1
2 2
2
sm y1
b y1 cos m yb cos m
Ž .
Ž . Ž
.
Ž .
Eb =
y1 y1
2
G 1 ya b cos a
atanf cos
m y1
Ž .
Ž .
Ž .
Ž .
y1
qcos m sin m sin a atanf ,
Ž . Ž .
Ž .
Ž .
which is real for 1 - a F2 and b and, with fsb tan apr2 , EkrEb is equal
r
Ž .
to zero for b s0. Similarly, EVrEasÝ c Ek a , b rEa but the latter deriva-
is1 i
tive does not admit an amenable closed-form expression. As such, we directly approximate the derivative numerically, which appears quite reliable. Fig. 5 plots
Ž .
Ž .
Ek a , b rEa and Ek a, b rEb over a portion of the parameter space.
d
ˆ
Ž .
Ž .
For the S Q-GARCH r, s model, Var V
can also be readily approximated
a , b Q
y1 X
Ž .
X
Ž .
Ž .
by t I t , t s EV rEu , where values Ek a, b, d rEa, Ek a, b, d rEb and
T Q
Ž .
Ž .
Ek a , b, d rEd need to be numerically approximated; here, k a, b, d is given
Ž . Ž
. in Eq. 8 , in which g s0 and z denotes a S
0,1 random variable.
i a , b
Fig. 6 is similar to Fig. 3 but shows boxplots of the 100 approximate standard errors, which accompany the estimated model parameters returned from the ML
ˆ ˆ
8
Ž .
estimation procedure, along with the calculated value of SE V . For each
g
parameter and value of a , the inscribed circle indicates the empirical standard
8
To prevent the boxplots from being too small, we removed several of the largest occurring values. The numbers removed from each boxplot are given in the square brackets of the upper caption of each
plot. On average, they amount to about 4 out of the 100 observations.
1
Ž .
Fig. 6. Estimated standard errors for the parameters from the 100 simulated S -A-PARCH 1, 1
a ,0
processes. The circle indicates the empirical standard deviation of the 100 estimates. The numbers in Ž
brackets on top of each plot indicate the number of removed values to enhance readability see footnote .
8 .
deviation computed from the 100 estimates, and serves as the AtrueB standard deviation appropriate for this model and sample size. The boxplots show that the
approximate standard errors are quite accurate for a - 2, while, for a approaching 2, they tend to underestimate the true variability. All variance estimates are,
however, themselves quite fat-tailed and, due to the occasional occurrence of values larger than the depicted range shows, are skewed upwards, i.e., except for
the a s2 case, have a slight tendency to overestimate the true variability.
3. Empirical examination of stability under summability