Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 073500103288619232

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Maximum Likelihood Estimation and Inference
in Multivariate Conditionally Heteroscedastic
Dynamic Regression Models With Student t
Innovations
Gabriele Fiorentini, Enrique Sentana & Giorgio Calzolari
To cite this article: Gabriele Fiorentini, Enrique Sentana & Giorgio Calzolari (2003) Maximum
Likelihood Estimation and Inference in Multivariate Conditionally Heteroscedastic Dynamic
Regression Models With Student t Innovations, Journal of Business & Economic Statistics, 21:4,
532-546, DOI: 10.1198/073500103288619232
To link to this article: http://dx.doi.org/10.1198/073500103288619232

View supplementary material

Published online: 01 Jan 2012.

Submit your article to this journal


Article views: 137

View related articles

Citing articles: 7 View citing articles

Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 13 January 2016, At: 01:06

Maximum Likelihood Estimation and Inference in
Multivariate Conditionally Heteroscedastic
Dynamic Regression Models With
Student t Innovations
Gabriele F IORENTINI
Università di Firenze Viale Morgagni 59, I-50134, Firenze, Italy ( Žorentini@ds.uniŽ.it )

Enrique S ENTANA


Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 13 January 2016

CEMFI Casado del Alisal 5, E-28014, Madrid, Spain (sentana@cemŽ.es )

Giorgio C ALZOLARI
Università di Firenze Viale Morgagni 59, I-50134, Firenze, Italy ( calzolar@ds.uniŽ.it )
We provide numerically reliable analytical expressions for the score, Hessian, and information matrix
of conditionally heteroscedastic dynamic regression models when the conditional distribution is multivariate t. We also derive one-sided and two-sided Lagrange multiplier tests for multivariate normality
versus multivariate t based on the Žrst two moments of the squared norm of the standardized innovations
evaluated at the Gaussian pseudo–maximum likelihood estimators of the conditional mean and variance
parameters. Finally, we illustrate our techniques through both Monte Carlo simulations and an empirical
application to 26 U.K. sectorial stock returns that conŽrms that their conditional distribution has fat tails.
KEY WORDS: Financial returns; Inequality constraints; Kurtosis; Normality test; Value at risk; Volatility.

1.

INTRODUCTION

Many empirical studies with Žnancial time series data indicate that the distribution of asset returns is usually rather

leptokurtic, even after controlling for volatility clustering effects (see, e.g., Bollerslev, Chou, and Kroner 1992 for a survey). This has been long realised, and two main alternative
inference approaches have been proposed. The Žrst one uses
a “robust” estimation method, such as the Gaussian pseudo–
maximum likelihood (PML) procedure advocated by Bollerslev
and Wooldridge (1992), which remains consistent for the parameters of the conditional mean and variance functions even if
the assumption of conditional normality is violated. The second
one, in contrast, speciŽes a parametric leptokurtic distribution
for the standardized innovations, such as the Student t distribution used by Bollerslev (1987). Whereas the second procedure
will often yield more efŽcient estimators than the Žrst if the
assumed conditional distribution is correct, it has the disadvantage that it may end up sacriŽcing consistency when it is not
(Newey and Steigerwald 1997). Nevertheless, a non-Gaussian
distribution may be indispensable when we are interested in
features of the distribution of asset returns, such as its quantiles, which go beyond its conditional mean and variance. For
instance, empirical researchers and Žnancial market practitioners are often interested in the so-called “value at risk” of an
asset, which is the positive threshold value V such that the probability of the asset suffering a reduction in wealth larger than V
equals some prespeciŽed level { < 1=2. Similarly, in the context of multiple Žnancial assets, one may be interested in the
probability of the joint occurrence of several extreme events,
which is regularly underestimated by the multivariate normal
distribution, especially in larger dimensions.


Notwithstanding such considerations, a signiŽcant advantage of the PML approach of Bollerslev and Wooldridge
(1992) is that they derived convenient closed-form expressions for the Gaussian log-likelihood score and the conditional information matrix, which can be used to obtain numerically accurate extrema of the objective function, as well
as reliable standard errors. In contrast, estimation under an
alternative distribution typically relies on numerical approximations to the derivatives, which are often poor. One of
the objectives of this article is to partly close the gap between the two approaches by providing numerically reliable
analytical expressions for the score vector, the Hessian matrix, and its expected value for the multivariate conditionally heteroscedastic dynamic regression model considered by
Bollerslev and Wooldridge (1992) when the distribution of the
innovationsis assumed to be proportionalto a multivariate t. As
is well known, the multivariate t distribution nests the normal as
a limiting case, but the marginal distributionsof its components
have generally fatter tails, and it also allows for cross-sectional
“tail dependence.” In this respect, our results generalize the expressions in appendix B of Lange, Little, and Taylor (1989),
who analyzed only an independent and identically distributed
(iid) setup in which there is separation between unconditional
mean and variance parameters.
We also use our analytical expressions to develop a test for
multivariate normality when a dynamic model for the conditional mean and variance is fully speciŽed, but the model is estimated under the Gaussianity null. We compare our proposed

532


© 2003 American Statistical Association
Journal of Business & Economic Statistics
October 2003, Vol. 21, No. 4
DOI 10.1198/073500103288619232

Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 13 January 2016

Fiorentini, Sentana, and Calzolari: Multivariate Models With Student t Innovations

test with the kurtosis component of Mardia’s (1970) test for
multivariate normality, which reduces to the well-known Jarque and Bera (1980) test in univariate contexts. Importantly, we
take into account the one-sided nature of the alternative hypothesis to derive the more powerful Kuhn–Tucker multiplier test,
which is asymptotically equivalent to the likelihood ratio and
Wald tests.
The rest of the article is organized as follows. First, we obtain
closed-form expressions for the log-likelihoodscore vector, the
Hessian matrix, and its conditional expected value in Section 2.
Then in Section 3 we introduce our proposed test for multivariate normality and relate it to the existing literature. We provide
Monte Carlo evaluation of different parameter and standard error estimation procedures in Section 4. Finally, we include an
illustrative empirical application to 26 U.K. sectorial stock returns in Section 5, followed by our conclusions.Proofs and auxiliary results are gathered in Appendixes.

2.
2.1

MAXIMUM LIKELIHOOD ESTIMATION

The Model

In a multivariate dynamic regression model with timevarying variances and covariances, the vector of N dependent
variables, yt , is typically assumed to be generated by the equations
1=2

yt D ¹t .µ 0 / C 6 t .µ 0 /" ¤t ;
¹t .µ/ D ¹.zt ; It¡1 I µ/;
and
6 t .µ/ D 6.zt ; It¡1 I µ/;
where ¹. / and vec[6. /] are N and N.N C 1/=2-dimensional
vectors of functions known up to the p £ 1 vector of true
parameter values µ 0 ; zt are k contemporaneous conditioning
variables; It¡1 denotes the information set available at t ¡ 1,
1=2

which contains past values of yt and zt ; 6 t .µ/ is an N £ N
1=2
1=20
“square root” matrix such that 6 t .µ/6 t .µ/ D 6 t .µ/;
¤
and " t is a vector martingale difference sequence satisfying
E." ¤t jzt ; It¡1 I µ 0 / D 0 and V." ¤t jzt ; It¡1 I µ 0 / D IN . As a consequence,
E.yt jzt ; It¡1 I µ 0 / D ¹t .µ 0 /
and
V.yt jzt ; It¡1 I µ 0 / D 6 t .µ 0 /:
Following Bollerslev (1987) in a univariate context and
Harvey, Ruiz, and Sentana (1992) in a multivariate one, as
well as many others, our approach is based on the t distribution. In particular, we assume hereinafter that conditional
on zt and It¡1 , " ¤t is iid as a standardized multivariate t with
º0 degrees of freedom, or iid t.0; IN ; º0 / for short. That is,
s
.º0 ¡ 2/³t
" ¤t D
ut ;
»t

where ut is uniformly distributed on the unit sphere surface
in RN , ³t is a chi-squared random variable with N degrees
of freedom, »t is a gamma variate with mean º0 > 2 and

533

variance 2º0 , and ut , ³t , and »t are mutually independent
(see App. A). As is well known, the multivariate Student t approaches the multivariate normal as º0 ! 1, but has generally
fatter tails. For that reason, it is often more convenientto use the
reciprocal of the degrees-of-freedom parameter, ´0 D 1=º0 , as a
measure of tail thickness, which will always remain in the Žnite
range 0 · ´0 < 1=2 under our assumptions.
2.2

The Log-Likelihood Function

Let Á D .µ 0 ; ´/0 denote the p C 1 parameters of interest,
which we assume are variation free for simplicity [cf. expression (53) in Harvey et al. 1992]. The log-likelihood function of
a sample P
of size T (ignoring initial conditions) takes the form

LT .Á/ D TtD1 lt .Á/, with lt .Á/ D c.´/ C dt .µ/ C g[&t .µ/; ´]:
´¶
µ ³ ´¶
µ ³
N´ C 1
1
¡ ln 0
c.´/ D ln 0


³
´
N
N
1 ¡ 2´
¡ ln
¡ ln ¼ ,
´
2
2

1
dt .µ/ D ¡ ln j6 t .µ/j;
2
and
g[&t .µ/; ´] D ¡

³

´ µ

N´ C 1
´
&t .µ/ ;
ln 1 C

1 ¡ 2´

where 0.¢/ is Euler’s gamma (or generalized factorial) func¡1=2
¤
¤

.µ/" t .µ/, and " t .µ/ D
tion, &t .µ/ D " ¤0
t .µ/" t .µ/, " t .µ/ D 6 t
yt ¡ ¹t .µ/. Nevertheless, it is important to stress that because both ¹t .µ/ and 6 t .µ/ are often recursively deŽned (as in
autoregressive moving average or generalized autoregressive
conditional heteroscedasticity (GARCH) models), it may be
necessary to choose some initial values to start up the recursions. As pointed out by Fiorentini, Calzolari, and Panattoni
(1996), this fact should be taken into account in computing
the analytic score, to make the results exactly comparable with
those obtained by using numerical derivatives. Not surprisingly,
it can be readily veriŽed that LT .µ; 0/ collapses to a conditionally Gaussian log-likelihood.
Given the nonlinear nature of the model, a numerical
optimization procedure is usually required to obtain maximum
likelihood (ML) estimates of Á, ÁO T say. Assuming that all of the
elements of ¹t .µ/ and 6 t .µ/ are twice continuously differentiable functions of µ , we can use a standard gradient method in
which the Žrst derivatives are numerically approximated by reevaluating LT .Á/, with each parameter in turn shifted by a small
amount, with an analogous procedure for the second derivatives. Unfortunately, such numerical derivatives are sometimes
unstable, and, moreover, their values may be rather sensitive
to the size of the Žnite increments used. This is particularly
true in our case, because even if the sample size T is large,
the Student t-based log-likelihood function is often rather at
for small values of ´. As we show in the following sections,
though, in this case it is also possible to obtain simple analytical expressions for the score vector and Hessian matrix. The use
of analytical derivatives in the estimation routine, as opposed to
their numerical counterparts, should improve the accuracy of
the resulting estimates considerably (McCullough and Vinod

534

Journal of Business & Economic Statistics, October 2003

1999). Moreover, a fast and numerically reliable procedure for
computing the score for any value of ´ is of paramount importance in the implementation of the score-based indirect inference procedures introduced by Gallant and Tauchen (1996).
(See Calzolari, Fiorentini, and Sentana 2003 for an application
to a discrete-time, stochastic volatility model.)
The analytic derivatives that we obtain could also be used
even if the coefŽcients of the model were reparameterized as
Á D f .’/, with ’ unconstrained, to maximize the unrestricted
log-likelihood function T .’/ D LT [ f .’/]. In particular, by
virtue of the chain rule for Jacobian matrices (Magnus and
Neudecker 1988, thm. 5.8), we would have that

Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 13 January 2016

D[

T .’/] D

D[LT .Á/] ¢ D[ f .’/];

where D[¢] denotes the corresponding Jacobian matrix. Similarly, we can use the chain rule for Hessian matrices (Magnus
and Neudecker 1988, thm. 6.9) to write
T .’/] D

H[

D[ f .’/]0 H[LT .Á/]D[ f .’/]
C .D[LT .Á/] ­ I pC1 /H[ f .’/],

where H[¢] denotes the corresponding Hessian matrix.
2.3

The Score Vector

Let st .Á/ denote the score function @ lt .Á/=@Á, and partition
it into two blocks, sµ t .Á/ and s´t .Á/, whose dimensions conform to those of µ and ´. Given that
@d t .µ/
1 0 £ ¡1 ¤ @vec[6 t .µ/]
0 D ¡ vec 6 t .µ/

@µ0
2
and
@g[&t .µ/; ´] @ &t .µ/
@g[&t .µ/; ´]
D
@µ 0
@&
@µ 0


@& t .µ/
N´ C 1
;
2[1 ¡ 2´ C ´&t .µ/] @ µ 0

where
@ ¹t .µ/
@ &t .µ/
0
¡1
0 D ¡2" t .µ/6t .µ/

@µ0
£
¤ @vec[6 t .µ/]
0
¡1
¡ vec0 6 ¡1
,
t .µ/" t .µ/" t .µ/6 t .µ/
@µ 0
we can immediately show that
sµ t .Á/ D
D

which for ´ > 0 are given by
´
³ ´¶
µ ³
N
@ c.´/
N´ C 1
1
1
D
¡
¡Ã
Ã
(2)

2´.1 ¡ 2´/ 2´2



and
N´ C 1
&t .µ/
@g[&t .µ/; ´]


2´.1 ¡ 2´/ 1 ¡ 2´ C ´&t .µ/

µ
´
1
&t .µ/ ;
C 2 ln 1 C

1 ¡ 2´

(3)

where Ã.x/ D @ ln 0.x/=@x is the so-called di-gamma function, or the Gauss’ psi function (Abramowitz and Stegun 1964),
which can be computed using standard routines.
If we take limits as ´ ! 0 from above, then we can once
more show that sµ t .µ; 0/ does indeed reduce to the multivariate
normal expression of Bollerslev and Wooldridge (1992). Unfortunately, @g[&t .µ/; ´]=@´ and especially @ c.´/=@´ are numerically unstable for ´ small. When N D 1, for instance, Figure 1
shows that the numerical accuracy in the computation of (2) is
poor for small ´, and eventually breaks down. In those cases,
we recommend the evaluation of (2) and (3), which in the limit
should be understood as right derivatives, by means of the (directional) Taylor expansions around ´ D 0 in Appendix B.
In practice, the log-likelihood score is often used not only
as the input to a steepest ascent, Berndt, Hall, Hall, and
Hausman (1974) (BHHH) or quasi-Newton numerical optimization routine, but also to estimate the asymptotic covariance matrix of the ML parameter estimators. Nevertheless,
both of these uses could be problematic. First, the results
of Fiorentini et al. (1996) and many others suggest that alternative gradient methods, such as scoring or Newton–Raphson,
usually show much better convergence properties, particularly
when the parameter values reach the neighborhood of the optimum. Similarly, it is well known that the outer-product-ofthe-score standard errors and test statistics can be very badly
behaved in Žnite samples, especially in dynamic models
(Davidson and MacKinnon 1993). For both of these reasons,
in the next two sections we follow Bollerslev and Wooldridge
(1992) and derive analytic expressions for the Hessian matrix
and its expected value conditional on zt ; It¡1 .

@ dt .µ/ @g[&t .µ/; ´]
C


@ ¹0t .µ/ ¡1
N´ C 1
6 t .µ/
" t .µ/

1 ¡ 2´ C ´&t .µ/

¤
1 @vec0 [6 t .µ/] £ ¡1
6 t .µ/ ­ 6 ¡1
t .µ/

2
µ

N´ C 1
" t .µ/" 0t .µ/ ¡ 6 t .µ/ ;
£ vec
1 ¡ 2´ C ´&t .µ/

C

(1)

where the Jacobian matrices @ ¹t .µ/=@µ 0 and @ vec[6 t .µ/]=@µ 0
depend on the particular speciŽcation adopted.
Similarly, it is straightforward to see that
s´t .Á/ D

@ c.´/ @g[&t .µ/; ´]
C
;



Figure 1. Derivative of c .´/ With Respect to ´ and Taylor Expansion.

Fiorentini, Sentana, and Calzolari: Multivariate Models With Student t Innovations

2.4

The Hessian Matrix

Similarly, for ´ > 0,

Let ht .Á/ denote the Hessian function @ 2 lt .Á/=@Á@Á 0 , and
partition it into four blocks, hµ µ t .Á/, hµ ´t .Á/ .D h0´µ t .Á//, and
h´´t .Á/, whose row and column dimensions conform to those
of µ and ´.
Start with the Žrst block, given by
@ 2 d t .µ/ @ 2 g[&t .µ/; ´]
C
:
hµµ t .Á/ D
@µ @µ 0
@µ @µ 0

Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 13 January 2016

It is then straightforward to see that
¤ @ vec[6 t .µ/]
@ 2 dt .µ/ 1 @vec0 [6 t .µ/] £ ¡1
6 t .µ/ ­ 6 ¡1
D
t .µ/
@µ@ µ 0

@µ0
2
¼
»
ª @vec @ vec0 [6 t .µ/]
1 © 0 £ ¡1 ¤
¡ vec 6 t .µ/ ­ Ip
.
@µ 0

2

In addition, again using the chain rule for Hessian matrices, we
obtain
@ 2 g[&t .µ/; ´] @ 2 g[&t .µ/; ´] @ &t .µ/ @ &t .µ/
D
@& 2

@µ @µ 0
@µ 0
C

@g[&t .µ/; ´] @ 2 &t .µ/
,
@&
@µ @µ 0

where
@ 2 g[&t .µ/; ´]
.N´ C 1/´
D
:
@& 2
¡
2[1 2´ C ´&t .µ/]2
We can then show that
@¹0t .µ/ ¡1 @ ¹t .µ/
@vec0 [6 t .µ/]
@ 2 &t .µ/
D
6
.µ/
C
2
2
t
@µ@µ 0

@µ0

£ ¡1
¤
¡1
0
£ 6 t .µ/ ­ 6 t .µ/" t .µ/" t .µ/6 ¡1
t .µ/

@ vec[6 t .µ/]
@µ 0
¤ @vec[6 t .µ/]
@ ¹0 .µ/ £ 0
¡1
" t .µ/6 ¡1
C2 t
t .µ/ ­ 6 t .µ/

@µ 0
¤ @ ¹t .µ/
@ vec0 [6 t .µ/] £ ¡1
6 t .µ/" t .µ/ ­ 6 ¡1
C2
t .µ/

@µ 0
» 0 ¼
£
¤ @vec @¹t .µ/
¡ 2 " 0t .µ/6 ¡1
t .µ/ ­ I p
@µ 0

© 0 £ ¡1
¤
ª
0
¡1
¡ vec 6 t .µ/" t .µ/" t .µ/6 t .µ/ ­ Ip
¼
»
@vec @ vec0 [6 t .µ/]
:
£

@µ0
£

In addition,
hµ ´t .Á/ D

535

µ ³ ´
³
´¶
@ 2 c.´/ N 4´ ¡ 1
N´ C 1
1
1
D
¡
Ã
¡
Ã
@ ´2
2 ´ 2 .1 ¡ 2´/2 ´ 3


´
µ ³
³ ´¶
N´ C 1
1
1
¡ Ã0
C 4 Ã0
, (4)



where à 0 .x/ D @ 2 ln 0.x/=@x2 is the so-called “tri-gamma function” (Abramowitz and Stegun 1964). Finally, we have that
when ´ > 0,
µ

@ 2 g[&t .µ/; ´]
´
1
C
D
¡
&
.µ/
ln
1
t
@´ 2
´3
1 ¡ 2´
&t .µ/
1
2
´ [1 ¡ 2´ C ´&t .µ/].1 ¡ 2´/
¼
»
N´ C 1 4.1 ¡ 2´/&t .µ/ C .4´ ¡ 1/&t2 .µ/
:
¡

.1 ¡ 2´/2 [1 ¡ 2´ C ´&t .µ/] 2
C

Again, both (4) and (5) can be numerically unstable when
´ is small. Hence we also recommend evaluating them in those
cases by means of the (directional) Taylor expansions in Appendix B.
2.5

The Conditional Information Matrix

Given correct speciŽcation, the results of Crowder (1976) imply that the score vector st .Á/ evaluated at the true parameter
values has the martingale difference property. His results also
imply that under suitable regularity conditions, the asymptotic
distribution of the ML estimator will be given by the expression
p
¤
£
T .ÁO T ¡ Á 0 / ! N 0; I ¡1 .Á 0 / ;
(6)
where

T

1X
It .Á 0 /;
T!1 T

I.Á 0 / D p lim

¤
1 @ vec0 [6 t .µ/] £ ¡1
6 t .µ/ ­ 6 ¡1
t .µ/

2
¼
»
N C 2 ¡ &t .Á/
0
"
.µ/"
.µ/
:
£ vec
t
t
[1 ¡ 2´ C ´&t .Á/]2

C

tD1

and
It .Á 0 / D V[st .Á 0 /jzt ; It¡1 I Á0 ] D ¡E[ht .Á 0 /jzt ; It¡1 I Á 0 ]:
In this respect, we show the following result in Appendix A.
Proposition 1.
It .Á/ D
Iµ µ t .Á/ D

³

´
Iµµ t .Á/ Iµ ´t .Á/
;
Iµ0 ´t .Á/ I´´t .Á/

@¹0t .µ/ ¡1 @ ¹t .µ/
º.N C º/
6 t .µ/
.º ¡ 2/.N C º C 2/ @µ
@µ 0

.N C º/ @vec0 [6 t .µ/]

2.N C º C 2/
£
¤ @ vec[6 t .µ/]
¡1
£ 6 ¡1
t .µ/ ­ 6 t .µ/
@µ0
@vec0 [6 t .µ/] £ ¡1 ¤
1
vec 6 t .µ/
¡

2.N C º C 2/
£
¤ @ vec[6 t .µ/]
£ vec0 6 ¡1
;
t .µ/
@µ0
C

@ ¹0t .µ/ ¡1
N C 2 ¡ &t .Á/
@ 2 g.Á/
D
6 t .µ/" t .µ/
@ µ@´

[1 ¡ 2´ C ´&t .Á/]2

(5)

536

Journal of Business & Economic Statistics, October 2003

Iµ ´t .Á/ D ¡

.N C 2/º 2
.º ¡ 2/.N C º/.N C º C 2/
£

and
I´´t .Á/ D

µ ³ ´
´¶
³
º
NCº
º4
Ã0
¡ Ã0
4
2
2
¡

Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 13 January 2016

@vec0 [6 t .µ/] © ¡1 ª
vec 6 t .µ/ ,


Nº 4 [º 2 C N.º ¡ 4/ ¡ 8]
.
2.º ¡ 2/2 .N C º/.N C º C 2/

It is important to note that unlike the Hessian, the foregoing expressions require only Žrst derivatives of the conditional
mean and variance functions. (See Fiorentini et al. 1996 for the
required derivatives in univariate linear regression models with
GARCH( p; q) innovations, and Sentana (in press) for analogous expressions in multivariate conditionally heteroscedastic
in mean factor models.) It also shares with the outer-productof-the-score formula the convenient property of being positive
semideŽnite, usually with full rank.
3.
3.1

and variance parameters µ can be decomposed in two additive components, the Žrst of which would be precisely our proposed test (Bera and McKenzie 1987). If H0 : ´ D 0 is true, then
I Q
LM2t
.µ T / will have an asymptotic chi-squared distribution with
1 degree of freedom. The limiting distribution can be obtained
directly from (7) by combining the block-diagonality of the information matrix under the null with the following result.
Proposition 3. If " ¤t jzt ; It¡1 » iid t.0; IN ; º0 / with º0 > 8,
then
p
µ

T X s´t .µ 0 ; 0/ ¡ E[s´t .µ 0 ; 0/jµ 0 ; º0 ] d
! N.0; 1/;
T t
V 1=2 [s2´t .µ 0 ; 0/jµ 0 ; º0 ]
where
´
³
N.N C 2/ º0 ¡ 2
¡1
E[s´t .µ 0 ; 0/jµ 0 ; º0 ] D
º0 ¡ 4
4
£
¤
E s2´t .µ 0 ; 0/jµ 0 ; º0


A LAGRANGE MULTIPLIER TEST FOR
MULTIVARIATE NORMALITY

¡

Implementation Details

We can easily compute an LM (or efŽcient score) test for
multivariate normality versus multivariate t-distributed innovations on the basis of the value of the score of the log-likelihood
function evaluated at the restricted parameter estimates ÁQ T D
0
.µQ T ; 0/0 . To do so, it is necessary to Žnd the value of s´t .µ; 0/.
In this sense, we can use the results in Appendix B to prove that
N.N C 2/ N C 2
1
¡
&t .µ/ C &t2 .µ/;
4
2
4
where s´t .µ; 0/ should be understood as a directional derivative.
In addition, it turns out that the information matrix is blockdiagonal between µ and ´ when ´0 D 0, which means that as
far as µ is concerned, there is no asymptotic efŽciency loss in
estimating ´ in that case. This Žnding can be stated more formally.
s´t .µ; 0/ D

Proposition 2. If ´0 D 0, then
"
#
V[sµt .µ 0 ; 0/jµ 0 ; 0]
0
V[sÁ .µ 0 ; 0/jµ 0 ; 0] D
;
N.N C 2/=2
00
where
V[sµt .µ 0 ; 0/jµ 0 ; 0] D ¡E[hµµ t .µ 0 ; 0/jµ 0 ; 0]:
Therefore, we can compute the information matrix version of
I .µQ /,
the LM test, LM2T
T say, as the square of
P
T ¡1=2 t s´t .µQ T ; 0/
;
¿TI .µQ T / D p
(7)
N.N C 2/=2
which, importantly, depends only on the Žrst two sample moments of &t .µQ T /. Note also that the block-diagonality of the information matrix implies that a joint LM test of multivariate
normality and any other restrictions on the conditional mean

C

3N 2 .N C 2/2 N.N C 2/2 .3N C 4/ º0 ¡ 2
C
º0 ¡ 4
16
8
.º0 ¡ 2/2
N.N C 2/2 .N C 4/
.º0 ¡ 4/.º0 ¡ 6/
4

.º0 ¡ 2/3
N.N C 2/.N C 4/.N C 6/
:
.º0 ¡ 4/.º0 ¡ 6/.º0 ¡ 8/
16

Two asymptotically equivalent tests, under both the null and
local alternatives, are given by (a) the usual outer product verO Q
.µ T /, which can be computed as T
sion of the LM test, LM2T
2
times the uncentered R from the regression of 1 on s´t .µQ T ; 0/,
and (b) its Hessian version,
P
fT ¡1=2 t s´t .µQ T ; 0/g2
H Q
LM2T .µ T / D
;
(8)
P
¡T ¡1 t h´´t .µQ T ; 0/
with

h´´t .µ; 0/ D ¡

N.N C 2/.N ¡ 5/
¡ .4 C 2N/&t .µ/
6
N C4 2
1
C
&t .µ/ ¡ &t3 .µ/.
2
3

Given that the numerators of those three LM tests coincide,
H .µQ /
O
whereas the denominators of LM2T
T and LM 2T .µQ T / conI .µQ /,
verge in probability to the denominator of LM2T
which
T
contains no stochastic terms, we would expect a priori that
I .µQ /
LM2T
T would be the version of the test with the smallest
H Q
.µ T /, whose denominator
size distortions, followed by LM2T
involves the Žrst three sample moments of &t .µ/, and Žnally
O Q
LM2T
.µ T /, whose calculation also requires its fourth sample
moment (Davidson and MacKinnon 1983).
It is important to mention that the fact that ´ D 0 lies at
the boundary of the admissible parameter space invalidates the
usual Â12 distribution of the likelihood ratio (LR) and Wald (W)
tests, which under the null will be more concentrated toward
the origin (see Andrews 2001 and references therein, as well as
the simulation evidence in Bollerslev 1987). The intuition can
be perhaps more easily obtained in terms of the W test. Given

Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 13 January 2016

Fiorentini, Sentana, and Calzolari: Multivariate Models With Student t Innovations

p
that ´O T cannot be negative, T ´O T will have a half-normal asymptotic distribution under the null (Andrews 1999). As a result, the W test will be an equally weighted mixture of a chisquared distribution with 0 degrees of freedom (by convention,
Â02 is a degenerate random variable that equals 0 with probability 1), and a chi-squared distribution with 1 degree of freedom.
In practice, obviously,we simply need to compare the t statistic
p
TN.N C 2/=2´O T with the appropriate one-sided critical value
from the normal tables. For analogous reasons, the asymptotic
distribution of the LR test will also be degenerate half of the
time, and a chi-squared distribution with 1 degree of freedom
the other half.
Although the foregoing argument does not invalidate the distribution of the LM test statistic, intuition suggests that the onesided nature of the alternative hypothesis should be taken into
account to obtain a more powerful test. For that reason, we also
propose a simple one-sided version of the LM test for multivariate normality. In particular, because E[s´t .µ 0 ; 0/jÁ 0 ] > 0 when
´0 > 0, in view of Proposition 3, we suggest using
©
£
¤ª2
I Q
.µ T / D max ¿TI .µQ T /; 0
LM1T

as our one-sided LM test, and comparing it with the same
50 : 50 mixture of chi-squared distributions with 0 and 1 degrees of freedom. In this context, we would reject H0 at
the 100{% signiŽcance level if the average score with respect
0
to ´ evaluated at the Gaussian PML estimators ÁQ T D .µQ T ; 0/0
I .µQ /
is positive and LM1T
T exceeds the 100.1 ¡ 2{/ percentile
2
of a Â1 distribution. Because the Kuhn–Tucker (KT) multiplier associated
P with the inequality restriction ´ ¸ 0 is equal to
max[¡T ¡1 t s´t .µQ T ; 0/; 0], our proposed one-sided LM test is
equivalent to the KT multiplier test introduced by Gourieroux,
Holly, and Monfort (1980), which in turn is equivalent in large
samples to the LR and W tests. As we argued earlier, the reason
is that those tests are implicitly one-sided in our context. In this
respect, it is important to mention that when there is a single
restriction, such as in our case, those one-sided tests would be
asymptotically locally more powerful (Andrews 2001).
Nevertheless, it is still interesting to compare the power properties of the one-sided and two-sided LM statistics. But given
that the block-diagonality of the information matrix is generally
lost under the alternative of ´0 > 0, and its exact form is unknown, we can get only closed-form expressions for the case in
which the standardized innovations, "¤t , are directly observed.
In more realistic cases, though, the results are likely to be qualitatively similar. On the basis of Proposition 3, we can obtain
the asymptotic power of the one-sided and two-sided variants
of the information matrix version of the LM test for any possible signiŽcance level, {. The results at the usual 5% level are
plotted in Figure 2 for ´0 in the range 0 · ´0 · :04, that is,
º0 ¸ 25. Not surprisingly, the power of both tests uniformly increases with the sample size T for a Žxed alternative, and as
we depart from the null for a given sample size. Importantly,
their power also increases with an increasing the number of series, N. As expected, the one-sided test is more powerful than
the usual two-sided one. The difference is particularly noticeable for small departures from the null, which is precisely when
power is generally low. For instance, when º0 D 100, T D 500,
and N D 10, the power of the one-sided test is almost 60%,

537

(a)

(b)

(c)

Figure 2. Power of the LM Test at ® D 5% and (a) T D 100;
(b) T D 250; and (c) T D 500 (
, one-sided; ² ² ²², two-sided).

Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 13 January 2016

538

Journal of Business & Economic Statistics, October 2003

whereas the power of its two-sided counterpart is less than 50%
[see Fig. 2(c)]. Similarly, the one-sided tests for N D 1 and
N D 5 are initially more powerful than the two-sided tests for
N D 2 and N D 10. However, as ´0 approaches 1=8 from below,
the one-sided test looses power for Žxed N and T, and eventually the two-sided test becomes more powerful. This is due to
the fact that the variance of the score goes to inŽnity as º0 ! 8
from Proposition 3.
Although in view of Lemma A.1 in Appendix A, our proposed LM test can be regarded as a test of whether &t .µ 0 / is
ÂN2 against the alternative that it is proportional to an FN;º0 ,
it is also possible to reinterpret (7) as a speciŽcation test of
the restriction on the Žrst two moments of &t .µ 0 / implicit in
E[s´t .µ 0 ; 0/jÁ 0 ] D 0. More speciŽcally,
­ ¶
µ
N.N C 2/ N C 2
1 2 ­­
E
¡
&t .µ/ C &t .µ/­Á 0 D 0:
(9)
4
2
4

Hence the two-sided version has nontrivial power against any
spherically symmetric distribution for which s´t .µ 0 ; 0/ has expected value different from 0 (see Theorem A.1 in App. A for
a characterization of spherically symmetric distributions). For
instance, if we consider the extreme case in which the true
standardized disturbances were in fact uniformly distributed
on the unit sphere surface in RN , so that &t .µ 0 / D N 8t, then
s´t .µ 0 ; 0/ D ¡N.N C 1/=4, which means that we would reject
the null hypothesis with probability approaching 1 as T goes
to inŽnity. On the other hand, the one-sided LM test has power
only for the leptokurtic subclass of spherically symmetric distributions. Nevertheless, as we discuss in Section 5, standardized residuals are frequently leptokurtic and rarely platykurtic
in practice.
3.2

Relationship With Existing Kurtosis Tests

Following Mardia (1970), we can deŽne the population coefŽcient of multivariate excess kurtosis as
·D

E[&t2 .µ 0 /]
¡ 1;
N.N C 2/

(10)

which equals 2=.º0 ¡ 4/ for the multivariate t distribution, as
well as its sample counterpart,
P
T ¡1 TtD1 &t2 .µ/
¡ 1:
·N T .µ/ D
(11)
N.N C 2/
On this basis, we can write ¿TI .µQ T / in (7) as
)
(p
r
p T

¤
N.N C 2/
T
T
2
·N T .µQ T / ¡
& t .µQ T / ¡ N :
T
NT
8
tD1

If we ignored the term
p

T
¤
T X£ Q
&t . µ T / ¡ N ;
T

(12)

tD1

then (7) would coincide with the kurtosis component of Mardia’s (1970) test for multivariate normality, which in turn
reduces to the popular Jarque and Bera
P (1980) test in the uniQ
variate case. Hence, given that if T ¡1 TtD1 " ¤t .µQ T /" ¤0
t . µ T / D IN ,
then (12) is identically 0, it follows from (1) that their tests

are numerically identical to ours in nonlinear regression models with conditionallyhomoscedastic disturbances estimated by
Gaussian PML, in which the covariance matrix of the innovations, 6, is unrestricted and does not affect the conditional
mean and the conditional mean parameters, say ±, and the elements of vec.6/ are variation free. However, ignoring (12) in
more general contexts may lead to size distortions, because it
is precisely the inclusion of such a term that makes s´t .µ 0 ; 0/
orthogonal to the other elements of the score. The same point
was forcefully made by Davidson and MacKinnon (1993) in
a univariate context in section 16.7 of their textbook, and, not
surprisingly, their suggested test for excess kurtosis turns out to
be equal to the outer product version of our LM test. Similarly,
the term (12) also appears explicitly in the Kiefer and Salmon
(1983) LM test for univariate excess kurtosis based on a Hermite polynomial expansion of the density, which coincides in
their context with the information matrix version of our test
(7). (See Hall 1990 for an extension to models in which the
higher-order moments depend on the information set.)
Nevertheless, exclusion of the additional term (12) does not
necessarily lead to asymptotic size distortions. In particular,
there will be no size distortions if (12) is op .1/. A necessary
and sufŽcient condition for this to happen is that &t .µ 0 / ¡ N
can be written as an exact, time-invariant, linear combination
of sµt .µ 0 ; 0/ (Fiorentini, Sentana, and Calzolari 2003). Given
that such a condition involves a rather complicated system of
nonlinear differential equations, it is not possible to explicitly
characterize which models for ¹t .µ/ and 6 t .µ/ will satisfy it,
so we have to proceed on a model-by-model basis. In this respect, Fiorentini et al. (2003) established that the condition is
indeed satisŽed for the family of GARCH-M models analyzed
by Hentschel (1995). Nevertheless, it is possible to Žnd examples of other ARCH models in which the aforementioned is
not satisŽed (e.g., the variant of the EGARCH model proposed
in Barndorf-Nielsen and Shephard 2001, chap. 13). Therefore,
the conclusion to draw from the foregoing analysis is that even
though the asymptotic size of the tests commonly used by practitioners is often correct, it is safer to use the LM test (7) because its limiting null distribution never depends on the particular parameterization used, and the additional computational
cost is negligible.
Finally, several authors have recently suggested alternative
multivariate generalizations of the Jarque–Bera test, which, as
far as kurtosis is concerned, consist of adding up the univariate kurtosis tests for each element of " ¤t .µQ T / (see Lütkepohl
1993; Doornik and Hansen 1994; Kilian and Demiroglu 2000).
But apart from the issue discussed in the previous paragraphs,
another potential shortcoming of those tests is that they are not
invariant to the way in which the residuals " t .µQ T / are orthogonalized to obtain " ¤t .µQ T /. For instance, whereas Doornik and
1=2
Hansen (1994) obtained 6t .µQ T / from the spectral decompoQ
sition of 6t .µ T /, the other authors used a Cholesky decomposition. In this respect, note that by implicitly assuming that
the excess kurtosis is the same for all possible linear combinations of the true standardized innovations " ¤t , we obtain a
test statistic that is numerically invariant to orthogonal rota1=2
tions of 6t .µQ T / (see also Mardia 1970). If " ¤t were directly
observed, then the relative power of the two testing procedures
would depend on the exact nature of the alternative hypothesis.

Fiorentini, Sentana, and Calzolari: Multivariate Models With Student t Innovations

Given that the "it¤ ’s are independent across i D 1; : : : ; N under
the null, the situation is completely analogous to the comparison between the one-sided tests for ARCH.q/ of Lee and King
(1993) and Demos and Sentana (1998). In particular, if we deŽne ·i D E."it¤4 =3/ ¡ 1 for i D 1; : : : ; N, then our test would
be more powerful against alternatives close to ·i D · for all i,
whereas the additive test would have more power when the ·i0s
were rather dispersed.

Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 13 January 2016

4. A MONTE CARLO COMPARISON OF
ALTERNATIVE ESTIMATION PROCEDURES AND
STANDARD ERROR ESTIMATORS
In this section we assess the performance of two alternative ways of obtaining ML estimates of Á, and three common ways of estimating the corresponding standard errors. The
Žrst estimation procedure uses the following mixed approach.
Initially, we use a scoring algorithm with a fairly large tolerance criterion; then, after “convergence”is achieved, we switch
to a Newton–Raphson algorithm to reŽne the solution. Both
stages are implemented by means of the NAG Fortran 77 Mark
19 library E04LBF routine (see Numerical Algorithms Group
2001 for details), with the analytic expressions for st .Á/, It .Á/,
and ht .Á/ derived in Section 2. The second procedure, in contrast, uses a quasi-Newton algorithm that computes the score
on the basis of Žnite difference procedures, as implemented
by the NAG E04JYF routine. Importantly, both routines allow for Žxed upper and lower bounds in the elements of Á. In
this respect, we should mention that when ´ is close to 0, the
quasi-Newton algorithm that uses only function values sometimes fails to converge. For that reason, and in accordance with
standard practice, we set ´O T to 0 whenever its estimate falls
below a minimum threshold, ´min . In particular, we follow MicroŽt 4.0 in choosing ´min D :04 (Pesaran and Pesaran 1997,
p. 457). Then we maximize a Gaussian pseudo-log-likelihood
function using the combined scoring plus the Newton–Raphson
algorithm explained earlier.
As for the estimators of the asymptotic covariance matrix of
the ML parameter estimators, we consider the three standard
approaches: outer product of the gradient (OPG), Hessian (H),
and conditional information (CI) matrix. To replicate what an
empirical researcher would do in practice, however, we do not
use numerical expressions when analytic expressions are used
in the optimizationalgorithm, and vice versa. Therefore, we end
up with Žve different combinations of estimators and standard
errors.
We assess the performance of these combinations through an
extensive Monte Carlo analysis, with an experimental design
borrowed from Bollerslev and Wooldridge (1992). SpeciŽcally,
the model that we simulate and estimate is given by the equations
yt D ¹t .± 0 / C ¾t .± 0 ; ° 0 /"t¤ ;
¹t .±/ D À C ½yt¡1 ;

¾t2 .±; ° /

2
D # C ®[yt¡1 ¡ ¹t¡1 .±/]2 C ¯¾t¡1
.±; ° /;

and
"t¤ jIt¡1 » iid t.0; 1; º0 /;

539

where ± 0 D .À; ½/, ° 0 D .#; ®; ¯/, À0 D 1, ½0 D :5, #0 D :05;
®0 D :15, and ¯0 D :8. As for ´0 , we consider three different
values: 0; :04, and :1, which correspond to the Gaussian limit,
and two Student t’s with 25 and 10 degrees of freedom.
Given the large number of parameters involved, we summarize the performance of the estimates of the asymptotic covariance matrix of the estimators by computing the experimental
distribution of a very simple W test statistic. In particular, the
null hypothesis that we test is that all six parameters are equal
to their true values. When ´0 > 0, the asymptotic distribution
of such a test will be Â62 . In contrast, when ´0 D 0, it follows
from the discussion in Section 3.1 that its asymptotic distribution will be a 50 : 50 mixture of Â52 and Â62 . Our results,
which are based on 10,000 samples of 1,000 observations each,
are summarized in Figure 3 using Davidson and MacKinnon’s
(1998) p value discrepancy plots, which show the difference between actual and nominal test sizes for every possible nominal
size. As expected, the CI standard errors seem to be the most
reliable, followed by the H-based ones, and Žnally, the OPG
versions, which tend to show the largest size distortions. In addition, there is a marked difference between numeric and analytic expressions. In Figure 3(a), for instance, the performance
of the numerical H standard errors is as distorted as the performance of the analytic OPG ones. But the most striking difference arises when ´0 D :04 [see Fig. 3(b)]. In this case, the two
numerical approaches lead to much larger size distortions. This
is due to two different reasons. First, the loss of accuracy in the
computation of Žrst and second derivatives by relative numerical increments of the log-likelihoodfunction can be substantial
when ´ is small, as illustrated in Figure 4 for a randomly selected replication. But our setting ´ to 0 whenever ´ · ´min has
an even stronger impact. As Figure 5 illustrates, if we reduce
´min from .04 to .00001, then the behavior of numerical and
analytical methods is more in line. However, the problem with
using such a small value of ´min is that convergence failures
occur much more frequently. On the basis of these results, our
practical recommendation would be to use the mixed optimization algorithm described earlier with analytical derivatives, and
compute standard errors with the formulas for the conditional
information matrix in Proposition 1.
5.

AN EMPIRICAL APPLICATION TO
U.K. STOCK RETURNS

In this section we investigate the practical performance of the
procedures discussed earlier. To do so, we substantially extend
the analysis of Sentana (1991), who considered both Gaussian
and t distributionsin his empirical characterization of multivariate leverage effects by means of a conditionallyheteroscedastic
latent factor model for the monthly excess returns on 26 U.K.
sectorial indices for the period 1971:2–1990:10 (237 observations), with a GQARCH(1; 1) parameterization for the common
factor, and a constant diagonal covariance matrix for the idiosyncratic terms. To concentrate on the modeling of the secondand higher-order moments of the conditional distribution of
returns, all the data were demeaned before estimation. Nevertheless, a more explicit modeling of the mean has little impact
on the remaining parameters (Sentana 1995). SpeciŽcally, the

540

Journal of Business & Economic Statistics, October 2003

Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 13 January 2016

(a)

(b)

Figure 4. Second Derivative of Log-Likelihood With Respect to ´ ;
, Numerical; -- - , Analytical).
T D 1,000 (

model that Sentana initially estimated by Gaussian PML is
yt D cft C wt ;
³

ft
wt

´

´¶
µ³ ´ ³
¸t 00
0
;
jyt¡1 ; yt¡2 ; : : : » N
;
0 0
0

£
¤
¸t D # C ® . ft¡1jt¡1 ¡ À/2 C !t¡1jt¡1 C ¯¸t¡1 ;

where yt is the vector of returns, c is the vector of factor
loadings, 0 is the diagonal matrix of idiosyncratic variances,
ftjt D !tjt c0 0 ¡1 y t is the Kalman Žlter–based estimate of the la¡1
0 ¡1
tent factor, and !tjt D [¸¡1
the corresponding
t C .c 0 c/]
conditional mean squared error. Note that ¸t differs from a
standard GQARCH(1; 1) speciŽcation in that the unobserved
factors are replaced by their expected value ft¡1jt¡1 , and the
(c)

Figure 3. P-Value Discrepancy Plot for Wald Test Á D Á0 ; T D 1,000;
Rep. D 10,000 (– – – , numerical OPG; -- -- , numerical Hessian; – - – - ,
analytical OPG;
, analytical Hessian; ± ± ±, information matrix).

Figure 5. P-Value Discrepancy Plot for Wald Test Á D Á0 ; T D 1,000;
Rep. D 10,000 (– – – , numerical OPG; - --- , numerical Hessian; – - – - ,
analytical OPG;
, analytical Hessian; ± ± ±, information matrix).

Fiorentini, Sentana, and Calzolari: Multivariate Models With Student t Innovations

541

Table 1. Estimates of a Conditionally Heteroscedastic Single-Factor
Model for 26 U.K. Sectorial Indices. Monthly Excess Returns
1971:2–1990:10 (237 observations). Estimates of Dynamic Variance
Parameters and Degrees of Freedom
p
¸t D .1 ¡ ® ¡ ¯/.1 ¡ %2 / C ® [.ft ¡1jt¡1 ¡ .1 ¡ ® ¡ ¯/=®%/2 C
!t ¡1jt ¡1 ] C ¯¸t ¡1 ; 0 · ¯ · 1 ¡ ® · 1; ¡1 · % · 1
Gaussian
Parameter

®
¯
%
´

Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 13 January 2016

Log-likelihood

Student t
SE

:111
:670
:951

:075
:258
:629

0
¡4,471.216

SE

:053
:675
1 :0
:103

:026
:120
:012

¡4,221.162

NOTE: ft jt denotes the Kalman Žlter estimate of the latent factor, and !t jt denotes the associated conditional mean squared error (Harvey et al. 1992). Standard errors (SE) are computed
using analytical derivatives based on the expressions of Bollerslev and Wooldridge (1992) in the
Gaussian case, and Proposition 1 in the case of the t .

term !t¡1jt¡1 is included to reect the uncertainty in the factor estimates (Harvey et al. 1992). We solved the usual scale
/ D 1. To do so,
indeterminacy of the factor by Žxing E.¸tp
we set # D .1 ¡ ® ¡ ¯/ ¡ ®À and À D .1 ¡ ® ¡ ¯/=®%
and estimated the model subject to the inequality constraints
0 · ¯ · 1 ¡ ® · 1 and ¡1 · % · 1, which also ensure that
¸t ¸ 0 8t. PML estimates for ®, ¯, %, and ´ are given in
Table 1, together with Bollerslev and Wooldridge (1992) robust
standard errors calculated with analytical derivatives. On the
basis of those estimates, we generated the time series of squared
Euclidean norms of the standardized innovations, &t .µQ T /, and
computed the information matrix version of the LM tests for
multivariate normality described in Section 3. Because ¿TI .µQ T /
equals 54.43, we can easily reject the null hypothesis regardless of whether we use a one-sided or a two-sided critical value,
which suggests that it is worth estimating the same model with
the Student t. Given that µQ T is a consistent estimator, we used
it as initial values for µ . As for ´, we used :106, which is the
value of a consistent, two-stage method-of-moments estimator
obtained from the sample coefŽcient of excess kurtosis of the
standardized residuals ·N T .µQ T / by exploiting the theoretical relationship ´ D ·=.4· C 2/. ML parameter estimates, together
with standard errors based on Proposition 1, are also reported
in Table 1. Apart from the marked improvement in Žt, as measured by the increase in the likelihood function and the decrease
in standard errors, and the fact that the estimated % now lies at
the boundary of the admissible parameter space, the most noticeable difference is the drastic reduction in the parameter ®,
which measures the immediate effect of shocks to the level of
the conditional variance, and the slight increase in the parameter ¯, which measures the rate at which the impact of those
shocks decays over time.
To compare the two models from a graphical perspective,
we have estimated the conditional standard deviations that they
generate for an equally weighted portfolio. In this respect, note
that conditional variance of ¶0 yt =N implied by our single-factor
model is .c0 ¶=N/2 ¸t C ¶0 0¶=N 2 , where ¶ is an N £ 1 vector of 1s.
Although the correlation between both series is high (97.6%),
the results depicted in Figure 6 indicate that the t distribution
tends to produce less extreme values for ¸t . This is particularly
true around the two most signiŽcant episodes in the sample:
the October 1987 crash (a 23:6% drop in stock prices) and the
January 1975 bounce-back (a 51:5% surge).

Figure 6. Estimated Conditional Standard Deviation of Equally
, Gaussian; --- , Student t).
Weighted Portfolio (

As mentioned in Section 1, one of the reasons for using
the t distribution is to compute the quantiles of the one-periodahead predictive distributions of portfolio returns required in
value-at-risk calculations. To determine to what extent the t is
more useful than the normal in this respect across all conceivable quantiles, we have computed the empirical cumulative
distribution function of the probability integral transforms of
the equally weighted portfolio returns generated by the two
Žtted distributions (see Diebold, Gunther, and Tay 1998).
Figure 7 shows the difference between those two cumulative distributions and the 45-degree line. Under correct speciŽcation, those differences should tend to 0 asymptotically.
Unfortunately, a size-corrected version of the usual Kolmogorov-type test that takes into account the sample uncertainty in
the estimates of the underlying parameters is rather difŽcult to
obtain in this case. Nevertheless, the graph clearly suggests that
the multivariate t distribution does indeed provide a better Žt

Figure 7. Discrepancy Plot of the Empirical Cumulative Distribution Function of Probability Integral Transform of Returns on Equally
, Gaussian; --- , Student t).
Weighted Portfolio (

542

Journal of Business & Economic Statistics, October 2003

than the normal, especially in the tails. In this respect, it is important to emphasize that the estimating criterion is multivariate, and not targeted to this particular portfolio. The observed
differences are due partly to the fact that the t distribution has
both fatter tails than the normal and more density around its
mean. However, this cannot be the only reason, for a standardized univariate t distribution with 9.71 degrees of freedom and
a standard normal share not only the median, but also the 3.6
and 96.4 percentiles. The other reason for the differing results
are the differences in estimated volatilities plotted in Figure 6.

Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 13 January 2016

6.

CONCLUSIONS

In the context of the general multivariate dynamic regression
model with time-varying variances and covariances considered
by Bollerslev and Wooldridge (1992), our main contributions
are as follows:
1. We provide numerically reliable analytical expressions
for the score vector, the Hessian matrix, and its conditional expected value when the distribution of the innovations is assumed to be proportional to a multivariate t.
2. We conduct a detailed Monte Carlo experiment in which
we demonstrate that a mixed scoring Newton–Raphson
algorithm with analytical derivatives constitutes the best
way to maximize the log-likelihood function. In addition,
we show that our analytic expressions for the conditional
information matrix provide the most reliable standard errors.
3. We derive an LM test for multivariate normal versus multivariate t innovations, and relate it to the kurtosis component of the traditional tests proposed by Mardia (1970)
and Jarque and Bera (1980). Because the limiting null distribution of our proposed LM test is correct regardless of
the model