07350015%2E2014%2E948175

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Evaluating the Calibration of Multi-Step-Ahead
Density Forecasts Using Raw Moments
Malte Knüppel
To cite this article: Malte Knüppel (2015) Evaluating the Calibration of Multi-Step-Ahead
Density Forecasts Using Raw Moments, Journal of Business & Economic Statistics, 33:2,
270-281, DOI: 10.1080/07350015.2014.948175
To link to this article: http://dx.doi.org/10.1080/07350015.2014.948175

View supplementary material

Accepted author version posted online: 31
Jul 2014.

Submit your article to this journal

Article views: 182


View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 11 January 2016, At: 19:31

Evaluating the Calibration of Multi-Step-Ahead
Density Forecasts Using Raw Moments
Malte KNÜPPEL

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:31 11 January 2016

Deutsche Bundesbank,Wilhelm-Epstein-Str. 14, D-60431 Frankfurt am Main, Germany
(malte.knueppel@bundesbank.de)
The evaluation of multi-step-ahead density forecasts is complicated by the serial correlation of the corresponding probability integral transforms. In the literature, three testing approaches can be found that
take this problem into account. However, these approaches rely on data-dependent critical values, ignore

important information and, therefore lack power, or suffer from size distortions even asymptotically. This
article proposes a new testing approach based on raw moments. It is extremely easy to implement, uses
standard critical values, can include all moments regarded as important, and has correct asymptotic size.
It is found to have good size and power properties in finite samples if it is based on the (standardized)
probability integral transforms.
KEY WORDS: Density forecast evaluation; Moment test; Normality test; Probability integral transformation.

1.

INTRODUCTION

Today, predictions are often made in the form of density forecasts. Tay and Wallis (2000) gave a survey of the use of density
forecasts in macroeconomics and finance. Like point forecasts,
density forecasts should be evaluated to investigate whether
they are specified correctly. Point forecasts, for example, can
be tested for bias. Density forecasts, in general, are tested for
calibration. Correct calibration means that the density forecast
coincides with the true density of the predicted variable.
This work is concerned with the question, how an evaluation
of density forecasts can be conducted if the probability integral

transforms (henceforth PITs) are serially correlated. The PIT
is the probability of observing a value smaller than or equal
to the actual outcome according to the forecast density. Serial
correlation of the PITs is a typical feature of multi-step-ahead
forecasts.
If the density forecasts are calibrated correctly, the PITs are
uniformly distributed over the interval (0, 1), as noted by Dawid
(1984), Diebold, Gunther, and Tay (1998), and Diebold, Tay,
and Wallis (1999). The original idea for this evaluation approach
dates back to Rosenblatt (1952). If the PITs are independent,
they can be used directly for testing the calibration of density
forecasts, employing, for example, the Kolmogorov–Smirnov
test. Applying an inverse normal transformation to the PITs
yields, in the case of correctly calibrated density forecasts, a
variable with standard normal distribution (henceforth the INTs,
i.e., the inverse normal transforms). This second transformation
was proposed by Smith (1985) and Berkowitz (2001).
For one-step-ahead forecasts, the PITs (and the INTs), in addition to uniformity (to standard normality), should display independence. In the words of Mitchell and Wallis (2011), if both
conditions are fulfilled, the density forecasts are completely calibrated. The likelihood ratio test proposed by Berkowitz (2001)
can be applied to the INTs to test simultaneously for zero mean,

unit variance, and zero autocorrelation based on a first-order autoregressive model (henceforth AR(1)-model) for the INTs. For
multi-step-ahead mean forecasts, even optimal forecasts produce serially correlated forecast errors, and the same holds for

completely calibrated density forecasts, which produce serially
correlated PITs and INTs. The evaluation of multi-step-ahead
forecasts found in the literature, mostly therefore, focuses on
correct calibration only. Basically, three approaches can be distinguished.
One approach, proposed by Corradi and Swanson (2006a)
and Rossi and Sekhposyan (2014), uses Kolmogorov-type or
Cramér-von-Mises-type tests that account for the serial correlation of the data. However, for these tests, critical values
are data dependent. Another approach rests on normality tests
for the INTs which are valid in the presence of serial correlation. Mitchell and Wallis (2011) mentioned the skewness- and
kurtosis-based normality tests proposed by Bai and Ng (2005).
Corradi and Swanson (2006b) also suggested, inter alia, the tests
proposed by Bai and Ng (2005), and related GMM-type tests
introduced by Bontemps and Meddahib (2005, 2012). The tests
of Bai and Ng (2005) were employed by D’Agostino, Gambetti,
and Giannone (2013) for the evaluation of their density forecasts. Finally, in several applications like those by Clements
(2004), Mitchell and Hall (2005), Jore, Mitchell, and Vahey
(2010), Bache et al. (2011), and Aastveit et al. (2011) one finds

a variant of the test by Berkowitz (2001) adapted to the case of
serially correlated INTs. Instead of testing for zero mean, unit
variance and zero autocorrelation, only the first two hypotheses
enter the test. Thus, no restriction is placed on the autoregressive
coefficient of the AR(1)-model.
Unfortunately, each of the approaches mentioned has certain disadvantages. As stated above, the tests by Corradi and
Swanson (2006a) and Rossi and Sekhposyan (2014) rely on
data-dependent critical values, which might be a serious impediment for their use by practitioners. Concerning the normality
tests proposed above, none of them was originally derived to

270

© 2015 American Statistical Association
Journal of Business & Economic Statistics
April 2015, Vol. 33, No. 2
DOI: 10.1080/07350015.2014.948175
Color versions of one or more of the figures in the article can be
found online at www.tandfonline.com/r/jbes.

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:31 11 January 2016


Knüppel: Evaluating the Calibration of Multi-Step-Ahead Density Forecasts Using Raw Moments

evaluate density forecasts. Therefore, these tests are based on
skewness and kurtosis, but ignore the information contained
in first and second moments. Since the INTs have a standard
normal distribution under the null hypothesis of correct calibration, large power gains could be achieved by considering those
moments. Finally, the test by Berkowitz (2001) is based on the
assumption of an AR(1)-process. If this assumption is incorrect,
the standard critical values are not valid, so that the test does
not have the correct asymptotic size. Moreover, for this test,
information from higher-order moments is not employed. As
in the case of the normality tests, the evaluation of multi-stepahead forecasts is not the intended use of the test by Berkowitz
(2001). Apparently, the tests mentioned have been applied due
to the lack of simple tests specifically designed for this task. The
raw-moments tests proposed in this work are intended to help
close this gap. They do not suffer from any of the disadvantages
mentioned, as they use standard critical values, can employ all
moments regarded as important, and have correct asymptotic
size.

The effects of estimation uncertainty for the parameters of
the forecasting model on the evaluation of density forecasts are
not addressed in this work. Put differently, the tests presented
here are designed for density forecasts which take the parameter uncertainty of the underlying model properly into account,
or for density forecasts from models with negligible parameter uncertainty. Moreover, the results of Rossi and Sekhposyan
(2014) imply that density calibration tests, if they are based on
the PITs or INTs, are valid for the evaluation of density forecasts
at the estimated parameter values of the forecasting model, if
the model is estimated under a rolling or fixed scheme. If the
densities are to be evaluated at the pseudotrue parameters of
the forecasting model, moment-based calibration tests can be
modified accordingly as shown in Chen (2011).
The tests proposed in this work can also be used to test for
correct calibration of one-step-ahead forecasts. In this case the
tests are robust to serial correlation of the PITs, whereas the
commonly used tests would suffer from size distortions.
2.

CALIBRATION TESTS BASED ON RAW MOMENTS


Let the continuous random variable of interest be denoted
by xt and the forecast density for this variable in period t by
fˆ (xt ), where the forecast was made in period t − h, and h is
a positive integer. Many of the methods used for producing
density forecasts can be found in the references mentioned in
Section 1. The PIT proposed by Rosenblatt (1952) is given by
 xt
ut = F̂ (xt ) =
fˆ (q) dq,
−∞

where F̂ (xt ) denotes the forecast distribution function associated with fˆ (xt ). If the forecast density fˆ (xt ) is equal to the true
density g (xt ), then ut is uniformly distributed over the interval
(0, 1) (henceforth referred to as U (0, 1) distributed). The INT
proposed by Smith (1985) and Berkowitz (2001) is given by
zt = −1 (ut ) = −1 (F̂ (xt )),

where −1 (·) is the inverse of the standard normal distribution function. Under the null of correct calibration, zt has a
standard normal distribution. I will proceed under the common


271

assumption that zt follows a Gaussian process under the null.
However, it should be noted that there are special nonlinear processes where the marginal distribution of zt with t = 1, 2, . . .
is standard normal, although the joint distribution is nonnormal. Tsyplakov (2011) described a strategy for generating such
sequences of zt .
To test for correct calibration when the PITs are serially correlated, practitioners have often used a variant of the test proposed by Berkowitz (2001), which was apparently first applied
by Clements (2004). It is a likelihood-ratio test for the zeromean and unit-variance property of zt , where zt is assumed to
follow an AR(1)-process. This test will be referred to as the β̂12
test. The other existing test that will be employed in this work is
the µ̂34 test by Bai and Ng (2005) which is based on the skewness and kurtosis of zt , using an estimated long-run covariance
matrix.
The major complications when testing skewness and kurtosis
arise from the fact that the expectation and the variance are
unknown and, thus, have to be estimated. Therefore, a fourdimensional covariance matrix is needed for the µ̂34 test, which
is a joint test of only two moments. When testing for standard
normality, however, also the expectation and the variance are
known under the null. Therefore, one does not need to consider
standardized moments like skewness and kurtosis. It is not even
necessary to employ central moments like the variance. Instead,

nonstandardized, noncentral moments, that is the raw moments
can be employed, so that tests can be constructed very easily.
Moreover, raw moments can be estimated unbiasedly in small
samples.
Actually, the raw-moments tests do not have to be based on
the standard normal distribution, but any suitable transformation
of the PITs can be used. Denote the transformed variables by
yt = H (ut ) ,

where H (ut ) is a real-valued function, and H (ut ) = −1 (ut )
yields standard normally distributed variables yt = zt under the
null. Assuming that E[|ytr |] < ∞, let the rth raw moment of yt
be denoted by
 
mr = E ytr

with r ∈ N+ . Define the vector of the N empirical raw
moments of interest as (m̂r1 , m̂r2 , . . . , m̂rN )′ with ri < ri+1
and i = 1, 2, . . . , N − 1, and the vector of the corresponding expected raw moments of yt = H (ut ) under the null as
(mr1 , mr2 , . . . , mrN )′ . Then the vector D̂r1 r2 ...rN , denoting the difference between both vectors mentioned, is given by



m̂r1 − mr1
⎢ m̂r2 − mr2 ⎥


D̂r1 r2 ...rN = ⎢
⎥,
..


.
m̂rN − mrN

where m̂ri equals the sample mean, m̂ri = T1 Tt=1 ytri for i =
1, 2, . . . , N, with T denoting the sample size. Denoting the long
long-run variance of ytri − mri by σr2i ,



d
T (m̂ri − mri ) −→ N 0, σr2i
under the null, if the conditions E[m2ri ] < ∞ and


i=0 |E(zt zt−i )| < ∞, where the latter refers to the INTs, are

272

Journal of Business & Economic Statistics, April 2015

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:31 11 January 2016

fulfilled, as shown by Sun (1965) and Breuer and Major (1983).
Thus, if the latter condition and
√ the condition E[m2rN ] < ∞ are
fulfilled, every element of T D̂r1 r2 ...rN is asymptotically normally distributed, because E[m2rN ] < ∞ implies that all moments of lower order are finite as well (see, e.g., Billingsley
1995,√p. 274). From the Crámer-Wold device, it then follows
that T D̂r1 r2 ...rN converges to a multivariate normal distribution, that is,



d
T D̂r1 r2 ...rN −→ N 0, r1 r2 ...rN ,
where r1 r2 ...rN is the long-run covariance matrix of the vector
series
⎡ r1

yt − mr1
⎢ y r2 − m ⎥
r2 ⎥
⎢ t
dt = ⎢
⎥.
..


.
ytrN − mrN

Thus, a test of the distributional assumption for yt can be based
on the statistic
−1

ˆ r r ...r D̂r1 r2 ...rN ,
α̂r1 r2 ...rN = T D̂′r1 r2 ...rN 
1 2
N

(1)

ˆ r1 r2 ...rN is a consistent estimator of r1 r2 ...rN . For the test
where 
statistic,

under the null. Otherwise, f (yt ) will differ from this functional
form, but positive values √of the density
√ will continue to be
restricted to the interval − 3 ≤ yt ≤ 3.
3.

MONTE CARLO SIMULATION SETUP

3.1 The Densities

d

α̂r1 r2 ...rN −→ χ 2 (N)
under the null.
If the transformed variables yt = H (ut ) have a density that
is symmetric around 0, and if at least one odd and one even raw
moment are considered, there is an alternative approach that,
asymptotically, leads to the same results as the tests described
above, but behaves differently in small samples. This approach
is based on the fact that the long-run covariance of ytri − mri and
r
yt j − mrj equals 0 if yt is symmetrically distributed around 0 and
if ri + rj is odd. A proof of this property is given in Appendix
A. Obviously, since m̂ri and m̂rj are asymptotically normal, they
are asymptotically independent if they are uncorrelated.
Based on this property, one can construct an alternative test
statistic α̂r01 r2 ...rN as the sum of two test statistics
,
+ α̂reven
α̂r01 r2 ...rN = α̂rodd
1 r2 ...rN
1 r2 ...rN

(2)

where α̂rodd
uses all N1 odd raw moments and α̂reven
all
1 r2 ...rN
1 r2 ...rN
N2 even raw moments from the set of sample moments {m̂ri }
which are considered for the test, so that N = N1 + N2 . α̂rodd
1 r2 ...rN
and α̂reven
are calculated in the same way as the test statis1 r2 ...rN
tic α̂r1 r2 ...rN in (1) , but only using the odd and even moments,
respectively. Under the null,
d

α̂r01 r2 ...rN −→ χ 2 (N)
d

and sample kurtosis of normal variables are uncorrelated, but
strongly dependent in small samples, and the same holds for
the third and the fourth raw moments. For more details see, for
example, Doornik and Hansen (2008) and the references therein.
Therefore, this work focuses on the standardized PITs (henceforth S-PITs). They are obtained as



1
.
yt = 12 ut −
2
 √ √ 
Under the null, the S-PIT is U − 3, 3 distributed. Thus,
it is a standard uniformly distributed random variable, that is,
a uniformly distributed variable with an expectation of 0 and
a variance of 1. Its skewness and kurtosis equal 0 and 1.8,
respectively. The density of yt is given by



√1

3

y

3
t
12
f (yt ) =
0
else

d

−→ χ 2 (N2 ), and
−→ χ 2 (N1 ), α̂reven
because α̂rodd
1 r2 ...rN
1 r2 ...rN
odd
even
α̂r1 r2 ...rN and α̂r1 r2 ...rN are asymptotically independent.
Concerning the choice of the transformation H (ut ), natural
candidates are given by the INTs and (a standardized version of)
the PITs. Tests based on the INTs however, are found to suffer
from large size distortions in small samples, especially if raw
moments of order four or higher are considered. This is because
the fourth raw moment, like the sample kurtosis, is strongly
positively skewed in small samples. Moreover, sample skewness

To assess the size and power properties of the tests presented,
Monte Carlo simulations are used, where it is assumed that the
density of the variable
xt ∼ N (0, 1)

(3)

is to be predicted. The xt ’s are identically, but not necessarily
independently, distributed. The density forecasts used will be
identical for each period t, so that the PITs will be serially
dependent if the xt ’s are serially dependent.
For the density forecasts, normal, two-piece-normal, Student’s t and normal mixture distributions are considered. The
normal distribution is employed to create correctly calibrated
density forecasts, or forecasts whose expectation or variance differ from the true values of 0 and 1, respectively. The two-piece
normal distribution is employed to construct density forecasts
with correct expectation and variance, but with incorrect skewness and kurtosis. To construct density forecasts with correct
expectation, variance, and skewness, but incorrect kurtosis, the
standardized Student’s t distribution is used. Finally, the normal
mixture distribution is set up such that its first four moments
are identical to those of a standard normal distribution while
the shapes of both densities differ markedly. In Appendix B, the
densities are described in detail.
Assuming normality of xt and nonnormality of the forecast
densities instead of the opposite (nonnormal xt and normal forecast densities) has the convenient implication that the unconditional distribution of the data, that is, of xt , is always normal
and does not depend on the serial correlation. However, the
applicability of the tests presented does not rely on any distributional assumption with respect to xt or the forecast density.
Actually, as follows from Wallis (2008), the subsequent simulation results would be identical if the simulated INTs were used
as realizations, and the forecast density was the standard normal

Knüppel: Evaluating the Calibration of Multi-Step-Ahead Density Forecasts Using Raw Moments

273

Table 1. Moments of misspecified forecast densities used in Monte Carlo simulations

Normal
Normal
Two-piece normal
Student’s t
Normal mixture

µ

µ2

s

k

m1

m2

m3

m4

−0.50
0
0
0
0

1
1.50
1
1
1

0
0
0.73
0
0

3
3
3.41
9
3

−0.50
0
0
0
0

1.25
2.25
1
1
1

−1.63
0
0.73
0
0

4.56
15.19
3.41
9
3

NOTE: µ denotes the expectation, µ2 the variance, s the skewness, k the kurtosis, mi the ith raw moment.

forecast density. To be more precise, the realizations x̃t would
be generated as

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:31 11 January 2016

x̃t = −1 (F (xt ))

with xt as defined in (3), and with F (·) being the distribution
function of the normal, two-piece-normal, Student’s t or normal
mixture distributions mentioned above. The forecast density
would be given by fˆ (x̃t ) = φ (x̃t ), where φ (·) denotes the standard normal density. This approach would lead to results which
would be identical to those described in what follows.

3.2

The Simulation Environment

An MA(1)-process is used to generate dependent standard
normal variables xt , so that xt evolves according to
xt = εt + ρεt−1
with εt ∼ iid N (0, (1 + ρ 2 )−1 ) for t = 1, 2, . . . T . If the forecast density is standard normal, this process leads to yt ’s which
correspond to those of two-step-ahead density forecasts which
are, in the words of Mitchell and Wallis (2011), completely calibrated. That is, in addition to the fact that the density forecasts
are correctly calibrated, yt is independent from yt−2, yt−3, . . . .

Figure 1. Misspecified forecast densities, the true standard normal densities, and the densities of the corresponding INTs (left column) and
S-PITs (right column).

274

Journal of Business & Economic Statistics, April 2015

Table 2. Actual sizes of tests

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:31 11 January 2016

T

ρ

β̂12

µ̂34

α̂1

50
50
50
100
100
100
200
200
200
500
500
500
1000
1000
1000

0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9

0.051
0.035
0.024
0.051
0.034
0.024
0.050
0.032
0.023
0.050
0.032
0.024
0.050
0.031
0.023

0.023
0.015
0.013
0.060
0.039
0.033
0.090
0.071
0.064
0.095
0.088
0.087
0.084
0.085
0.085

0.040
0.034
0.027
0.045
0.043
0.040
0.048
0.047
0.046
0.049
0.050
0.049
0.049
0.049
0.050

50
50
50
100
100
100
200
200
200
500
500
500
1000
1000
1000

0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9

0.050
0.057
0.094
0.051
0.054
0.075
0.051
0.052
0.063
0.050
0.051
0.056
0.051
0.050
0.052

0.023
0.012
0.002
0.059
0.029
0.002
0.091
0.057
0.006
0.095
0.083
0.018
0.084
0.082
0.041

0.040
0.039
0.001
0.045
0.052
0.007
0.048
0.056
0.033
0.049
0.056
0.057
0.050
0.055
0.059

0
α̂12

MA(1)-process
0.036
0.033
0.029
0.043
0.044
0.040
0.046
0.048
0.047
0.048
0.050
0.050
0.049
0.050
0.051

α̂12

0
α̂123

α̂123

0
α̂1234

α̂1234

0.039
0.034
0.022
0.046
0.046
0.040
0.048
0.048
0.046
0.049
0.050
0.048
0.049
0.050
0.050

0.030
0.021
0.017
0.040
0.034
0.030
0.045
0.043
0.040
0.048
0.048
0.047
0.049
0.049
0.048

0.041
0.023
0.011
0.048
0.043
0.035
0.049
0.048
0.045
0.050
0.050
0.048
0.049
0.049
0.049

0.034
0.030
0.026
0.044
0.041
0.038
0.046
0.046
0.044
0.049
0.049
0.048
0.048
0.049
0.050

0.048
0.024
0.010
0.054
0.049
0.040
0.052
0.052
0.049
0.051
0.051
0.050
0.050
0.050
0.051

0.039
0.046
0.000
0.046
0.063
0.014
0.048
0.061
0.053
0.050
0.058
0.081
0.050
0.057
0.073

0.029
0.024
0.006
0.040
0.038
0.026
0.045
0.047
0.037
0.048
0.052
0.047
0.049
0.054
0.054

0.041
0.026
0.000
0.048
0.057
0.002
0.049
0.061
0.031
0.050
0.058
0.090
0.050
0.056
0.084

0.034
0.034
0.004
0.043
0.047
0.044
0.047
0.051
0.073
0.049
0.053
0.068
0.050
0.054
0.065

0.048
0.025
0.000
0.054
0.065
0.000
0.053
0.068
0.032
0.051
0.062
0.116
0.051
0.058
0.104

AR(1)-process
0.036
0.040
0.018
0.043
0.052
0.045
0.047
0.056
0.055
0.049
0.057
0.064
0.050
0.056
0.065

NOTE: Actual sizes when the nominal size equals 0.05. Raw-moments tests are based on S-PITs.

Moreover, an AR(1)-process is considered. In this case, xt is
determined by

xt = ρxt−1 + εt
with εt ∼ iid N (0, 1 − ρ 2 ). The sample sizes T considered are
50, 100, 200, 500, and 1000. The autoregressive and movingaverage parameters ρ take on the values 0, 0.5, and 0.9.
The first misspecified normal forecast density considered has
an expectation of µ = −0.5 and unit variance. The second misspecified normal forecast density has an expectation of 0, but

its standard deviation µ2 = σ equals 3/2. The mean-mode
difference γ of the following standardized two-piece normal
forecast density is equal to 0.8. The standardized density of the
t-distribution has 5 degrees of freedom. Finally, the standardized normal mixture density uses the parameter value σ = 0.4.
The moments of these forecast densities are given in Table 1.
The forecast densities, the corresponding densities of the INTs
and the S-PITs, and standard normal densities are displayed in
Figure 1. In the case of correctly calibrated density forecasts,
the√density of the S-PITs would be flat and attain a value of
1/ 12 ≈ 0.3.

The tests considered are the two standard tests employed in the
literature, that is, the β̂12 test and the µ̂34 test, and various rawmoments tests based on α̂r1 r2 ...rN and α̂r01 r2 ...rN . The parameters for
the β̂12 test are estimated by maximum likelihood. For the µ̂34
and the raw-moments tests, the long-run covariance matrices are
estimated under the null. That is, the covariances are determined
without subtracting the estimated means of the vector series,
which have an expectation of 0 under the null. With this approach
we follow Bai and Ng (2005). Subtracting the empirical mean
would tend to increase the size distortions of the tests, but also
improve their power.
Concerning the raw-moments tests, the most parsimonious
test is only based on the first moment. Tests with power against
more types of density misspecification are obtained by consecutively adding higher moments. Wherever it is possible, both
test statistics, α̂r1 r2 ...rN and α̂r01 r2 ...rN , are employed. The largest
moment order considered is 4. This yields the seven test statis0
0
0
, α̂12 , α̂123
, α̂123 , α̂1234
, and α̂1234 . As suggested by
tics α̂1 , α̂12
Andrews (1991), the quadratic spectral kernel is used for the
estimation of the long-run covariance matrix. The truncation
lag is also chosen according to Andrews (1991). Employing the
Bartlett kernel as in Newey and West (1987) only leads to minor
changes of the results.

Knüppel: Evaluating the Calibration of Multi-Step-Ahead Density Forecasts Using Raw Moments

275

Table 3. Raw sample moments of S-PITs and sample moments of INTs for all forecast densities
S-PITs
T

ρ

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:31 11 January 2016



m̂1

m̂2

0.00

1.00

m̂3

50
50
1000
1000

0.0
0.9
0.0
0.9

0.48
0.48
0.48
0.48

1.13
1.14
1.13
1.13

50
50
1000
1000

0.0
0.9
0.0
0.9

0.00
−0.01
0.00
0.00

0.60
0.60
0.60
0.60

50
50
1000
1000

0.0
0.9
0.0
0.9

0.06
0.06
0.07
0.06

INTs
m̂4

Standard normal forecast density
0.00
1.80
Normal forecast density, µ = −0.5
0.96
0.95
0.96
0.96

2.19
2.20
2.19
2.19

Normal forecast density, σ = 3/2
0.00
0.00
0.00
0.00

0.76
0.76
0.76
0.76

m̂1

µ̂2





0.00

1.00

0.00

3.00

0.50
0.49
0.50
0.50

1.00
0.71
1.00
0.98

0.00
0.00
0.00
0.00

2.88
2.52
3.00
2.95

0.00
0.00
0.00
0.00

0.44
0.31
0.44
0.44

0.00
0.00
0.00
0.00

2.88
2.53
3.00
2.95

−0.03
−0.02
−0.03
−0.03

1.30
0.93
1.30
1.28

−0.93
−0.52
−1.09
−1.03

4.33
3.07
5.12
4.82

0.00
0.02
0.00
0.00

1.11
0.78
1.11
1.09

0.00
0.00
0.00
0.00

2.29
2.34
2.29
2.29

1.09
0.78
1.09
1.08

0.00
−0.01
0.00
0.00

3.22
2.63
3.89
3.66

Two-piece normal forecast density, γ = 0.8

1.01
1.02
1.02
1.02

−0.09
−0.10
−0.09
−0.09

1.90
1.91
1.91
1.91

0.00
−0.01
0.00
0.00

2.13
2.13
2.13
2.13

Standardized t-distributed forecast density, 5 degrees of freedom
50
50
1000
1000

0.0
0.9
0.0
0.9

0.00
−0.01
0.00
0.00

1.14
1.14
1.14
1.14

50
50
1000
1000

0.0
0.9
0.0
0.9

0.00
0.01
0.00
0.00

1.10
1.10
1.10
1.10

Normal mixture forecast density, σ = 0.4
0.00
0.01
0.00
0.00

1.79
1.79
1.80
1.80

0.00
0.00
0.00
0.00

NOTE: m̂i denotes mean of estimated ith raw moment in 10,000 simulations. µ̂2 , ŝ, and k̂ denote corresponding values for variance, skewness, and kurtosis, respectively. ρ denotes the
autoregressive coefficient.

To facilitate comparisons between the test statistics, the sizeadjusted power of the tests will be reported. This requires a reasonably precise estimation of their actual sizes. Using 200,000
Monte Carlo simulations yields an accuracy that appears satisfactory for the given purpose, leading to a 95% confidence
interval for the actual size with a width of at most 0.002. The
critical value of the test statistics which is used for the power
computations is determined by the 95% quantile of the 200,000
test statistics computed under the null. For the power computations, the number of Monte Carlo simulations is set to 10,000,
corresponding to a width of at most about 0.01 for the 95%
confidence interval of the size-adjusted power.
4.

SIMULATION RESULTS

4.1 Size
Given a nominal size of 5%, the actual sizes of the β̂12 test, the
µ̂34 test, and the α̂r1 r2 ...rN as well as the α̂r01 r2 ...rN tests based on
the S-PITs are displayed in Table 2. The following statements
concerning the size distortions refer to the absolute differences
between the nominal and the actual size, unless otherwise mentioned.

The size distortions of the raw-moments tests based on the SPITs are fairly contained. Often, they are considerably smaller
if the α̂r01 r2 ...rN tests are used instead of the α̂r1 r2 ...rN tests. In this
case, the largest negative size distortions are observed for the
case of 50 observations and strong persistence (i.e, in the case
of an AR(1)-process with ρ = 0.9) with actual sizes often being
below 1%. The largest positive size distortion of the α̂r01 r2 ...rN
tests is recorded for 200 observations and strong persistence,
0
where the α̂1234
test has an actual size of 7.3%. In the case of an
MA(1)-process, the α̂r01 r2 ...rN tests always perform well.
If the forecast variable follows an AR(1)-process with no or
only moderate persistence, in general, the β̂12 test yields the
smallest size distortions. In the smallest sample and with strong
persistence, however, even this test has an actual size of more
than 9%. Given an MA(1)-process, the β̂12 test suffers from
size distortions which do not vanish asymptotically. The µ̂34
test suffers from notable size distortions in many situations. In
general, the smallest size distortions of the raw-moments tests
0
test. While the size distortions of the
are obtained with the α̂12
0
test, in
α̂1 test are often marginally smaller than those of the α̂12
small samples with strong persistence it underrejects so strongly
0
that the α̂12
test appears to be preferable. Since, in addition, the
α̂1 test can be expected to have rather low power because it

276

Journal of Business & Economic Statistics, April 2015

Table 4. Size-adjusted power, normal forecast densities with µ = −0.5, σ = 1 and with µ = 0, σ = 3/2
Normal density with µ = −0.5, σ = 1

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:31 11 January 2016

T
50
50
50
100
100
100
200
200
200
500
500
500
1000
1000
1000

ρ

β̂12

µ̂34

0
α̂12

0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9

0.87
0.57
0.51
0.99
0.89
0.85
1.00
1.00
0.99
1.00
1.00
1.00
1.00
1.00
1.00

0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.06
0.05
0.05
0.05
0.05
0.05
0.05

0.81
0.42
0.34
0.99
0.85
0.79
1.00
1.00
0.99
1.00
1.00
1.00
1.00
1.00
1.00

Normal density with µ = 0, σ = 3/2

0
α̂123

0
α̂1234

0.71
0.30
0.23
0.98
0.77
0.68
1.00
0.99
0.98
1.00
1.00
1.00
1.00
1.00
1.00

MA(1)-process
0.58
0.93
0.17
0.73
0.13
0.65
0.97
1.00
0.64
0.99
0.52
0.97
1.00
1.00
0.98
1.00
0.96
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00

β̂12

µ̂34

0
α̂12

0
α̂123

0
α̂1234

0.05
0.05
0.04
0.05
0.05
0.04
0.05
0.05
0.04
0.05
0.05
0.05
0.05
0.05
0.05

0.88
0.73
0.64
1.00
0.99
0.98
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00

0.77
0.65
0.58
1.00
0.98
0.96
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00

0.51
0.34
0.27
0.98
0.92
0.85
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00

0.05
0.05
0.03
0.05
0.05
0.03
0.05
0.04
0.04
0.05
0.04
0.04
0.05
0.05
0.04

0.88
0.56
0.03
1.00
0.94
0.10
1.00
1.00
0.40
1.00
1.00
0.90
1.00
1.00
1.00

0.79
0.53
0.01
1.00
0.92
0.09
1.00
1.00
0.38
1.00
1.00
0.88
1.00
1.00
1.00

0.52
0.24
0.00
0.98
0.79
0.01
1.00
1.00
0.07
1.00
1.00
0.79
1.00
1.00
0.99

AR(1)-process
50
50
50
100
100
100
200
200
200
500
500
500
1000
1000
1000

0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9

0.86
0.41
0.11
1.00
0.72
0.15
1.00
0.96
0.28
1.00
1.00
0.63
1.00
1.00
0.91

0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.06
0.05
0.05
0.05

0.81
0.24
0.04
0.99
0.61
0.03
1.00
0.94
0.08
1.00
1.00
0.48
1.00
1.00
0.87

0.71
0.17
0.04
0.99
0.50
0.03
1.00
0.90
0.06
1.00
1.00
0.41
1.00
1.00
0.82

0.59
0.09
0.04
0.97
0.35
0.04
1.00
0.85
0.03
1.00
1.00
0.19
1.00
1.00
0.71

0.94
0.51
0.09
1.00
0.89
0.16
1.00
1.00
0.33
1.00
1.00
0.81
1.00
1.00
0.99

NOTE: Raw-moments tests are based on S-PITs.

can only detect misspecifications, which affect the mean of the
S-PITs, it will not be considered in what follows.
Summing up, no test can guarantee small size distortions in
all circumstances. However, the α̂r01 r2 ...rN tests based on the SPITs always perform well in the case of MA(1)-processes. In the
case of AR(1)-processes, they are undersized in small samples
with strong persistence, whereas the β̂12 test rejects too often in
these cases. The use of the µ̂34 test and the α̂r1 r2 ...rN tests cannot
be recommended. Therefore, in what follows, the α̂r1 r2 ...rN tests
are not considered.
4.2 Size-Adjusted Power
The size-adjusted power (henceforth simply referred to as
power) of the tests depends crucially on the sample moments
of the S-PITs and INTs. Therefore, these moments are displayed in Table 3 for small and large samples (T = 50 and
T = 1000) and the case of no (ρ = 0) and strong (ρ = 0.9,
AR(1)-process) persistence. Obviously, the expected sample
raw moments do not depend on the sample size or persistence.
Differences between the sample raw moments displayed for a

specific forecast density are only caused by the Monte Carlo error. In contrast to the sample raw moments, the sample moment
estimators for central and standardized moments can be severely
biased.
Turning to the power of the tests, in the case of the misspecified normal forecast densities, the results in Table 4 suggest
that, in general, the most powerful test is the β̂12 test. It is superior to the other tests especially in small samples with strong
0
test, which is the raw-moment
persistence. Otherwise, the α̂12
test corresponding most closely to the β̂12 test, often has similar power. The inclusion of higher-order raw moments leads to
power losses. Not surprisingly, the µ̂34 test has power essentially
equal to size.
The misspecifications implied by the two-piece normal forecast density are, commonly, most successfully discovered by
0
test, as shown in Table 5. The β̂12 test
the µ̂34 test and the α̂123
0
attains a similar power only if T = 50. The power of the α̂1234
0
0
test is comparable to that of the α̂123 test. The α̂12 test has rather
low power, which does not seem surprising, because the mean
of the S-PITs is close to 0, and the second raw moment is close
to 1 as shown in Table 3.

Knüppel: Evaluating the Calibration of Multi-Step-Ahead Density Forecasts Using Raw Moments

277

Table 5. Size-adjusted power, two-piece normal forecast density with γ = 0.8 and standardized t-distributed forecast density with
5 degrees of freedom
Two-piece normal density with γ = 0.8
ρ

β̂12

µ̂34

50
50
50
100
100
100
200
200
200
500
500
500
1000
1000
1000

0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9

0.27
0.25
0.26
0.41
0.37
0.36
0.58
0.55
0.52
0.89
0.84
0.81
0.99
0.98
0.97

0.24
0.24
0.23
0.59
0.56
0.50
0.94
0.91
0.87
1.00
1.00
1.00
1.00
1.00
1.00

0.07
0.06
0.06
0.08
0.06
0.07
0.12
0.09
0.08
0.24
0.15
0.14
0.46
0.28
0.24

0.26
0.19
0.16
0.56
0.45
0.40
0.89
0.82
0.79
1.00
1.00
1.00
1.00
1.00
1.00

MA(1)-process
0.23
0.04
0.14
0.04
0.12
0.04
0.52
0.06
0.40
0.05
0.34
0.05
0.88
0.10
0.81
0.09
0.77
0.08
1.00
0.26
1.00
0.21
1.00
0.20
1.00
0.54
1.00
0.47
1.00
0.42

50
50
50
100
100
100
200
200
200
500
500
500
1000
1000
1000

0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9

0.27
0.22
0.14
0.39
0.32
0.17
0.58
0.46
0.22
0.89
0.76
0.30
0.99
0.95
0.45

0.24
0.24
0.13
0.58
0.54
0.20
0.94
0.88
0.37
1.00
1.00
0.73
1.00
1.00
0.95

0.07
0.05
0.05
0.09
0.05
0.05
0.12
0.07
0.05
0.24
0.11
0.06
0.45
0.18
0.06

0.27
0.13
0.06
0.55
0.37
0.07
0.89
0.77
0.11
1.00
1.00
0.43
1.00
1.00
0.87

0.24
0.10
0.05
0.52
0.31
0.06
0.88
0.75
0.07
1.00
1.00
0.30
1.00
1.00
0.85

T

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:31 11 January 2016

t-distributed density with 5 d.f.

0
α̂12

0
α̂123

0
α̂1234

β̂12

µ̂34

0
α̂12

0
α̂123

0
α̂1234

0.22
0.17
0.15
0.47
0.39
0.35
0.86
0.80
0.75
1.00
1.00
1.00
1.00
1.00
1.00

0.13
0.10
0.09
0.23
0.18
0.17
0.48
0.39
0.35
0.89
0.81
0.76
1.00
0.99
0.98

0.11
0.09
0.10
0.20
0.17
0.15
0.41
0.34
0.30
0.85
0.76
0.70
0.99
0.97
0.96

0.10
0.08
0.08
0.18
0.16
0.15
0.40
0.34
0.32
0.85
0.79
0.75
0.99
0.99
0.98

0.22
0.16
0.06
0.48
0.36
0.08
0.85
0.75
0.13
1.00
1.00
0.44
1.00
1.00
0.84

0.12
0.08
0.03
0.23
0.14
0.03
0.46
0.31
0.04
0.89
0.72
0.12
1.00
0.97
0.28

0.11
0.08
0.03
0.20
0.13
0.03
0.40
0.27
0.04
0.84
0.65
0.12
0.99
0.94
0.25

0.10
0.07
0.06
0.19
0.13
0.05
0.39
0.28
0.05
0.84
0.72
0.16
0.99
0.97
0.42

AR(1)-process
0.04
0.04
0.05
0.06
0.04
0.04
0.09
0.07
0.04
0.25
0.15
0.04
0.53
0.34
0.06

NOTE: Raw-moments tests are based on S-PITs.

As can also be seen from Table 5, if the forecast density has
a standardized t-distribution with 5 degrees of freedom, the µ̂34
test delivers the best results. Note that this result is related to
the fact that the INTs have negative excess kurtosis. For random
variables with positive excess kurtosis, the µ̂34 test has very low
power, as found by Bai and Ng (2005). All raw moments tests
attain similar power which here clearly exceeds the power of
the β̂12 test whenever power exceeds size.
In the case of the normal mixture forecast density, the behavior of the µ̂34 test reported in Table 6 seems counterintuitive
at first sight, because its power appears to decrease with the
sample size. However, this can be explained by its asymmetric
power properties with respect to excess kurtosis, the bias of the
sample kurtosis estimator, and the fact that the sample kurtosis estimator yields values around 3 in most settings. Broadly
speaking, in small persistent samples, the estimated kurtosis is
often smaller than 3, and the test has relatively high power in
these cases. With even larger sample sizes than considered here,
the power of the µ̂34 test would eventually start to increase. The
β̂12 test has relatively low power in almost all cases. The high0
test. The
est power, in general, is clearly attained by the α̂1234

0
test compared to all other raw-moments
high power of the α̂1234
tests is surprising insofar as, according to Table 3, the fourth
raw sample moment is virtually equal to 1.8, its value under the
null. Additional simulations show that, interestingly, the high
0
test stems from the joint consideration of the
power of the α̂1234
second, third, and fourth raw moment. If one of these moments
does not enter the test, the power decreases considerably. Apparently, the joint distribution of these three sample moments is
such that, usually, at least one of the moments is likely to signal
departures from the standard uniform distribution.

4.3

Summary

From the Monte Carlo simulations conducted above, it follows that the α̂r01 r2 ...rN tests are preferable to the α̂r1 r2 ...rN tests.
0
Among the α̂r01 r2 ...rN tests, the α̂12
test tends to give the small0
test has power against
est size distortions. However, the α̂1234
more types of misspecification, while its size distortions are still
fairly small. Concerning the choice among the β̂12 test, the µ̂34
test, and the α̂r01 r2 ...rN tests, the µ̂34 test often has the largest size

278

Journal of Business & Economic Statistics, April 2015

Table 6. Size-adjusted power, normal mixture forecast density with σ = 0.4

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:31 11 January 2016

T

0
α̂12

ρ

β̂12

µ̂34

50
50
50
100
100
100
200
200
200
500
500
500
1000
1000
1000

0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9

0.10
0.10
0.10
0.12
0.12
0.12
0.16
0.15
0.15
0.26
0.24
0.24
0.43
0.38
0.35

0.27
0.25
0.24
0.21
0.21
0.22
0.12
0.12
0.14
0.04
0.04
0.05
0.03
0.03
0.03

50
50
50
100
100
100
200
200
200
500
500
500
1000
1000
1000

0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9
0.0
0.5
0.9

0.10
0.09
0.10
0.12
0.11
0.08
0.16
0.14
0.10
0.26
0.20
0.12
0.42
0.31
0.15

0.25
0.26
0.15
0.21
0.22
0.22
0.11
0.14
0.27
0.04
0.05
0.21
0.03
0.03
0.13

MA(1)-process
0.09
0.08
0.07
0.16
0.13
0.12
0.32
0.25
0.23
0.71
0.61
0.54
0.96
0.91
0.87
AR(1)-process
0.09
0.06
0.05
0.16
0.10
0.03
0.32
0.20
0.04
0.71
0.50
0.08
0.96
0.84
0.17

0
α̂123

0
α̂1234

0.08
0.07
0.07
0.14
0.12
0.11
0.27
0.22
0.20
0.64
0.55
0.48
0.93
0.87
0.82

0.48
0.46
0.44
0.82
0.80
0.77
0.99
0.99
0.98
1.00
1.00
1.00
1.00
1.00
1.00

0.08
0.06
0.04
0.14
0.09
0.03
0.27
0.18
0.03
0.64
0.44
0.07
0.93
0.78
0.15

0.47
0.41
0.10
0.82
0.75
0.16
0.99
0.98
0.32
1.00
1.00
0.89
1.00
1.00
1.00

NOTE: Raw-moments tests are based on S-PITs.

distortions, it cannot detect misspecifications which affect first
and second moments of the INTs only, and its power can depend
in complex ways on sample size and persistence. Therefore,
this test does not appear to be well-suited for the evaluation of
density forecasts. The β̂12 test has good size properties if the
underlying AR(1)-process assumption is correct, but otherwise
suffers from size distortions which do not vanish asymptotically.
It appears to be the best choice if the sample size is small, and
the data is very persistent. If persistence is only moderate, as
one would expect in the case of h being not too large, or if the
0
test has satisfactory power
sample is not too small, the α̂1234
against many types of misspecification. Therefore, in general,
0
appears to be the most recommendable test for the
the α̂1234
calibration of multi-step-ahead density forecasts.

5.

Denoting the log of the exchange rate at time t by by xt , I
assume that xt follows a random walk, and that the changes in
xt can be described by a conditionally heteroscedastic Gaussian

EMPIRICAL APPLICATION

In what follows, the calibration of density forecasts for
the logarithm of the daily euro/pound sterling (henceforth
EUR/GBP) exchange rate is investigated. The data cover the
period from January 4, 2008, to February 28, 2014, and are
displayed in Figure 2. I consider h-step-ahead forecasts with h
equal to 2 and to 3 days.

Figure 2. 100 times the logarithm of the daily EUR/GBP exchange
rate.

Knüppel: Evaluating the Calibration of Multi-Step-Ahead Density Forecasts Using Raw Moments

279

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:31 11 January 2016

Figure 3. Autocorrelations of the INTs of the density forecasts√for the daily EUR/GBP exchange rate for forecast horizons h = 2 and h = 3.
Dashed lines indicate 95% confidence bounds, calculated as ±2/ T .
Table 7. Moments of S-PITs and INTs and test results for calibration of density forecasts for the daily EUR/GBP exchange rate
Moments
S-PITs
m̂1
h=2
h=3

m̂2

0.01
0.01

INTs
m̂3

0.87
0.86

0.02
0.00

m̂4
1.50
1.41

m̂1
0.00
0.00

p-values
0
α̂1234

µ̂2

∗∗

0.85
0.81

0.01
0.04∗∗

0
α̂12
∗∗∗

0.00
0.02∗∗

β̂12
0.12
0.11

NOTE: Raw-moments tests are based on S-PITs. Sample sizes equal T = 555. m̂i denotes the ith raw moment, µ̂2 the variance. ∗∗∗ ,∗∗ ,∗ denote rejection at the 1%, 5%, 10% significance
level.

time series model as in Bollerslev (1986) given by
xt = xt−1 + qt , qt = σt εt

σt2

= b0 +

2
b1 qt−1

+

2
b2 σt−1

with εt ∼ iid N (0, 1). A rolling estimation window with 1000
observations is used, and T = 555 density forecasts for xt are
evaluated.
The autocorrelations of the resulting INTs are displayed in
Figure 3. Obviously, the dynamics of the INTs associated with
the h-step-ahead density forecasts seem to be fairly well described by MA(h − 1 )-processes. The autocorrelations of the
PITs are very similar to those of the INTs, so that the same
statement applies.
0
0
test, the α̂12
test,
To check for correct calibration, the α̂1234
and the β̂12 test are employed. In Table 7, in addition to the test
results, the first four sample raw moments of the S-PITs as well
as the sample mean and variance of the INTs are shown.
0
The α̂1234
test rejects the null hypothesis of correct calibration
for both forecast horizons at the 5% significance level. At the
0
latter level, the α̂12
test also rejects for h = 3, and for h = 2 it
rejects at the 1% level. In contrast to that, no rejections occur
with the β̂12 test. When looking at the moments, it appears likely
that the major misspecification of the forecast densities is their
excessive dispersion. In such a situation, according to Table 4,
0
test and the β̂12 test have similar size-adjusted power.
the α̂12
Yet, the β̂12 test is undersized in the presence of an MA(1)process with positive MA coefficient, and this property could
also hold for MA(2)-processes. This could be a reason why the
β̂12 test does not reject here.

6.

CONCLUSION

Raw-moments tests for the calibration of multi-step-ahead
density forecasts are proposed and compared to two commonly
used tests, the β̂12 test of Berkowitz (2001), and the µ̂34 test of
Bai and Ng (2005). These tests employ the inverse normal transforms (INTs) of the probability integral transforms (PITs). The
raw-moments tests are based on the standardized PITs (S-PITs).
Despite of the autocorrelation of the PITs, the raw-moments tests
rely on standard critical values.
It turns out that the µ̂34 test cannot be recommended for
the evaluation of density forecasts due to potentially large size
distortions, complicated power properties, and ignoring information from lower-order moments. The β̂12 test can be very
useful because of its relatively large power especially in small
samples with strong persistence. Yet, if the INTs do not follow an AR(1)-process, size distortions occur which do not vanish asymptotically. Moreover, the test does not use information
from higher-order moments.
Tests based on the S-PITs do not suffer from these shortcomings, and can therefore, and because of their simplicity,
be a very helpful tool for the evaluation of density forecasts.
The tests which use the fact that under the null, odd and even
sample moments are uncorrelated, perform better in terms of
size and power than their counterparts which do not employ
0
the zero-correlation property. Among the former tests, the α̂1234
test, which uses the first four raw moments of the S-PITs, has
good size and power properties in most settings investigated
in this study. Therefore, in general, it appears to be the most
recommendable test.

280

Journal of Business & Economic Statistics, April 2015

APPENDIX A: PROOF
r
yt i

− mri
The following proof shows that the long-run covariance of
r
and yt j − mrj equals 0 if yt is symmetrically distributed around 0 and
if ri + rj is odd. Consider the standard normal variable zt , and denote
the symmetry-preserving transformation by yt = S (zt ) where S (zt ) is
an odd function. The symmetric density of yt will be denoted by f (yt ).
Suppose that ri is odd and rj is even. Then, for the contemporaneous
r
r
covariance of yt i and yt j , we have that

 r +r 
 r
 ri rj
E yt yt − myrj = E yt i j − E yt i myrj

with m being the mode and with the moments

2
(σ2 − σ1 )
E [xt ] = µ = m +
π




2
(σ2 − σ1 )2 + σ1 σ2 .
E (xt − µ)2 = µ2 = 1 −
π

Thus, setting

r

where myrj denotes the expectation E[yt j ]. If r is odd, the expectation
E[ytr ] equals 0, because f (yt ) is an even function and ytr is an odd
r
r
function, implying that ytr f (yt ) is an odd function. Thus, E[yt i (yt j −
myrj )] equals 0.

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:31 11 January 2016

r

r

j
with v ∈ Z
For the noncontemporaneous covariance of yt i and yt−v
one obtains
 r rj
 r

 r rj 
E yt i yt−v
− E yt i myrj
− myrj = E yt i yt−v
 r rj 
.
= E yt i yt−v

Starting with zt , the latter expectation can be rewritten as
 ∞ ∞
 r rj 
r rj
=
φ (zt , zt−v ) dzt dzt−v
zt i zt−v
E zt i zt−v
0

0

+







0

+

0

+



0



0







0

r rj
zt i zt−v
φ

−∞

−∞

r

r

r

r

with γ = µ − m makes the mean equal to 0 and the variance equal to 1.
The parameter γ represents the mean-mode difference. A positive value
of γ corresponds to a positively-skewed random variable xt . Skewness
and kurtosis of the standardized two-piece normal distribution are given
by
 
s = E xt3 = ((3 − π ) γ 2 + 1)γ
 
γ2
γ4
+ (3π − 8)
+ 3.
k = E xt4 = ((22 − 3π ) π − 40)
4
2

(zt , zt−v ) dzt dzt−v

−∞
j
φ (zt , zt−v ) dzt dzt−v
zt i zt−v

0
j
zt i zt−v
φ (zt , zt−v ) dzt dzt−v ,

−∞

where φ (zt , zt−v ) denotes the joint normal density of zt and zt−v .
rj
r
Since zt−v
is an even function, zt i is an odd function and φ (zt , zt−v ) =
φ (−zt , −zt−v ) and φ (zt , −zt−v ) = φ (−zt , zt−v ) hold, the sum of the
first and the fourth term and the sum of the second and the third term
the right-hand side are both equal to 0, so that the entire expression
equals 0.
r
r
r
Considering yt i and yt j instead of the odd function zt i and the even
rj
ri
function zt leads to the same result, because, first, yt also is an odd
r
function and yt j also is an even function, and second, f (yt , yt−v ) =
f (−yt , −yt−v ) and f (yt , −yt−v ) = f (−yt , yt−v ) must hold because
yt = S (zt ) is a symmetry-preserving transformation. Therefore,
 r rj 
=0
E yt i yt−v

holds for all v ∈ Z.

APPENDIX B: THE DENSITIES
Here the densities used for the Monte Carlo simulations are described.
Unless otherwise mentioned, their skewness equals 0 and their kurtosis
equals 3.
Denoting the standard normal density by φ (·), the normal forecast
density is


xt − µ
1
,
fˆ (xt ) = φ
σ
σ
where µ is the mean and σ the standard deviation of xt .
The two-piece normal distribution, as described