Methods Directory UMM :Data Elmu:jurnal:J-a:Journal of Empirical Finance (New):Vol7.Issue3-4.2000:

from EVT will give better estimates of the tails of the residuals. During the revision of this paper we also learned that the central idea of our approach — the application of EVT to model residuals — has been independently proposed by Ž . Diebold et al. 1999 . We test our approach on various return series. Backtesting shows that it yields better estimates of VaR and expected shortfall than unconditional EVT or GARCH-modelling with normally distributed error terms. In particular, our analy- Ž . sis contradicts Danielsson and de Vries 1997c , who state that Aan unconditional approach is better suited for VaR estimation than conditional volatility forecastsB Ž . page 3 of their paper . On the other hand, we see that models with a normally distributed conditional return distribution yield very bad estimates of the expected shortfall, so that there is a real need for working with leptokurtic error distribu- tions. We also study quantile estimation over longer time-horizons using simula- Ž tion. This is of interest if we want to obtain an estimate of the 10-day VaR as . required by the BIS-rule from a model fitted to daily data.

2. Methods

Ž . Let X , t gZ be a strictly stationary time series representing daily observa- t tions of the negative log return on a financial asset price. 3 We assume that the dynamics of X are given by X sm qs Z , 1 Ž . t t t t Ž where the innovations Z are a strict white noise process i.e. independent, t . identically distributed with zero mean, unit variance and marginal distribution Ž . function F z . We assume that m and s are measurable with respect to G G , Z t t t -1 the information about the return process available up to time t y1. Ž . Ž . Let F x denote the marginal distribution of X and, for a horizon h gN, X t Ž . let F x denote the predictive distribution of the return over the X q . . . qX G G t q 1 t qh t next h days, given knowledge of returns up to and including day t. We are interested in estimating quantiles in the tails of these distributions. For 0 - q - 1, an unconditional quantile is a quantile of the marginal distribution denoted by x sinf xgR: F x Gq , 4 Ž . q X 3 In the present paper we test our approach on return series generated by single assets only. However, the method obviously also applies to the time series of profits and losses generated by portfolios of financial instruments and can therefore by used for the estimation of market risk measures in a portfolio context. and a conditional quantile is a quantile of the predictive distribution for the return over the next h days denoted by x t h sinf xgR: F x Gq . Ž . Ž . 4 q X q . . . qX G G t q 1 t qh t We also consider an alternative measure of risk for the tail of a distribution known as the expected shortfall. The unconditional expected shortfall is defined to be S sE X Xx , q q and the conditional expected shortfall to be h h t t S h sE X X x h , G G . Ž . Ž . Ý Ý q t qj t qj q t j s1 j s1 We are principally interested in quantiles and expected shortfalls for the 1-step predictive distribution, which we denote respectively by x t and S t . Since q q F x sP s Z qm Fx G G 4 Ž . X G G t q1 tq1 t q1 t t q 1 t sF x ym rs , Ž . Ž . Z t q1 t q1 these measures simplify to x t sm qs z , 2 Ž . q t q1 t q1 q t S sm qs E Z Z z , 3 Ž . q t q1 t q1 q where z is the upper qth quantile of the marginal distribution of Z which by q t assumption does not depend on t. To implement an estimation procedure for these measures we must choose a Ž . specific process in the class 1 , i.e. a particular model for the dynamics of the conditional mean and volatility. Many different models for volatility dynamics have been proposed in the econometric literature including models from the Ž . Ž ARCH rGARCH family Bollerslev et al., 1992 , HARCH processes Muller et ¨ . Ž . al., 1997 and stochastic volatility models Shephard, 1996 . In this paper, we use Ž . the parsimonious but effective GARCH 1,1 process for the volatility and an Ž . AR 1 model for the dynamics of the conditional mean; the approach we propose extends easily to more complex models. In estimating x t with GARCH-type models it is commonly assumed that the q innovation distribution is standard normal so that a quantile of the innovation y1 Ž . Ž . distribution is simply z sF q , where F z is the standard normal d.f. A q GARCH-type model with normal innovations can be fitted by maximum likeli- Ž . hood ML and m and s can be estimated using standard 1-step forecasts, t q1 t q1 t Ž . so that an estimate of x is easily constructed using 3 . This is close in spirit to q Ž . the approach advocated in RiskMetrics RiskMetrics, 1995 , but our empirical finding, which we will later show, is that this approach often underestimates the conditional quantile for q 0.95; the distribution of the innovations seems generally to be heavier-tailed or more leptokurtic than the normal. Another standard approach is to assume that the innovations have a leptokurtic Ž . distribution such as Student’s t-distribution scaled to have variance 1 . Suppose Z s n y2 rn T where T has a t-distribution on n2 degrees of freedom with Ž . y1 Ž . Ž . d.f. F t . Then z s n y2 rn F q . GARCH-type models with t-innova- Ž . T q T tions can also be fitted with maximum likelihood and the additional parameter n can be estimated. We will see in Section 2.2 that this method can be viewed as a special case of our approach; it yields quite satisfactory results as long as the Ž . positive and the negative tail of the return distribution are roughly equal. The method proposed in this paper makes minimal assumptions about the underlying innovation distribution and concentrates on modelling its tail using EVT. We use a two-stage approach which can be summarised as follows. Ž . 1. Fit a GARCH-type model to the return data making no assumption about F z Z Ž . and using a pseudo-maximum-likelihood approach PML . Estimate m and t q1 s using the fitted model and calculate the implied model residuals. t q1 2. Consider the residuals to be a realisation of a strict white noise process and use Ž . Ž . extreme value theory EVT to model the tail of F z . Use this EVT model to Z estimate z for q 0.95. q We go into these stages in more detail in the next sections and illustrate them by means of an example using daily negative log returns on the Standard and Poors index. 2.1. Estimating s and m using PML t q 1 t q 1 For predictive purposes we fix a constant memory n so that at the end of day t Ž . our data consist of the last n negative log returns x , . . . , x , x . We t ynq1 t y1 t Ž . Ž . consider these to be a realisation from a AR 1 –GARCH 1,1 process. Hence, the conditional variance of the mean-adjusted series e sX ym is given by t t t s 2 sa qa e 2 qbs 2 , 4 Ž . t 1 t y1 t y1 where a 0, a 0 and b 0. The conditional mean is given by 1 m sf X . 5 Ž . t t y1 This model is a special case of the general first order stochastic volatility process Ž . Ž . considered by Duan 1997 , who uses a result by Brandt 1986 to give conditions Ž . for strict stationarity. The mean-adjusted series e is strictly stationary if t 2 E log b qa Z - 0. 6 Ž . Ž . 1 t y1 Ž . By using Jensen’s inequality and the convexity of ylog x it is seen that a Ž . sufficient condition for Eq. 6 is that b qa -1, which moreover ensures that 1 Ž . the marginal distribution F x has a finite second moment. X This model is fitted using the PML method. This means that the likelihood for a Ž . GARCH 1,1 model with normal innovations is maximized to obtain parameter ˆ ˆ ˆ T Ž . estimates u s f, a , a , b . While this amounts to fitting a model using a ˆ ˆ 1 distributional assumption we do not necessarily believe, the PML method delivers reasonable parameter estimates. In fact, it can be shown that the PML method yields a consistent and asymptotically normal estimator; see for instance Chapter 4 Ž . of Gourieroux 1997 . ´ Ž Estimates of the conditional mean and standard deviation series m , . . . , ˆ t ynq1 . Ž . Ž . Ž . m and s , . . . , s can be calculated recursively from Eqs. 4 and 5 after ˆ ˆ ˆ t t ynq1 t substitution of sensible starting values. In Fig. 1 we show an arbitrary thousand day excerpt from our dataset containing the stock market crash of October 1987; the estimated conditional standard deviation derived from the GARCH fit is shown below the series. Residuals are calculated both to check the adequacy of the GARCH modelling and to use in Stage 2 of the method. They are calculated as x ym x ym ˆ ˆ t ynq1 t ynq1 t t z , . . . , z s , . . . , , Ž . t ynq1 t ž s s ˆ ˆ t ynq1 t and should be iid if the fitted model is tenable. In Fig. 2 we plot correlograms for the raw data and their absolute values as well as for the residuals and absolute Fig. 1. 1000 day excerpt from series of negative log returns on Standard and Poors index containing crash of 1987; lower plot shows estimate of the conditional standard deviation derived from PML Ž . Ž . fitting of AR 1 –GARCH 1,1 model. Fig. 2. Correlograms for the raw data and their absolute values as well as for the residuals and absolute residuals. While the raw data are clearly not iid, this assumption may be tenable for the residuals. residuals. While the raw data are clearly not iid, this assumption may be tenable for the residuals. 4 If we are satisfied with the fitted model, we end stage 1 by calculating estimates of the conditional mean and variance for day t q1, which are the obvious 1-step forecasts ˆ m sf x , ˆ t q1 t 2 2 2 ˆ s s a q a e qbs , ˆ ˆ ˆ t q1 1 t t where e sx ym . ˆ ˆ t t t 4 We also ran some Ljung–Box tests in selected time periods and found no evidence against the iid-hypothesis for the residuals. 2.2. Estimating z using EVT q We begin stage 2 by forming a QQ-Plot of the residuals against the normal distribution to confirm that an assumption of conditional normality is unrealistic, and that the innovation process has fat tails or is leptokurtic — see Fig. 3. We then fix a high threshold u and we assume that excess residuals over this Ž . threshold have a generalized Pareto distribution GPD with d.f. y1rj 1 y 1qj yrb if j 0, Ž . G y s Ž . j , b ½ 1 yexp yyrb if j s0, Ž . where b 0, and the support is y G0 when jG0 and 0FyFybrj when j - 0. This particular distributional choice is motivated by a limit result in EVT. Consider a general d.f. F and the corresponding excess distribution above the threshold u given by F y qu yF u Ž . Ž . 4 F y sP XyuFyNXu s , Ž . u 1 yF u Ž . Ž . for 0 Fy-x yu, where x is the finite or infinite right endpoint of F. Ž . Ž . Balkema and de Haan 1974 and Pickands 1975 showed for a large class of Fig. 3. Quantile–quantile plot of residuals against the normal distribution shows residuals to be leptokurtotic. Ž . distributions F that it is possible to find a positive measurable function b u such that lim sup F y yG y s0. 7 Ž . Ž . Ž . u j , b Ž u. u ™x Fy-x yu Ž . For more details consult Theorem 3.4.13 on page 165 of Embrechts et al. 1997 . In the class of distributions for which this result holds are essentially all the common continuous distributions of statistics 5 , and these may be further subdi- vided into three groups according to the value of the parameter j in the limiting GPD approximation to the excess distribution. The case j 0 corresponds to heavy-tailed distributions whose tails decay like power functions, such as the Pareto, Student’s t, Cauchy, Burr, loggamma and Frechet distributions. The case ´ j s0 corresponds to distributions like the normal, exponential, gamma and lognormal, whose tails essentially decay exponentially. The final group of distribu- Ž . tions are short-tailed distributions j - 0 with a finite right endpoint, such as the uniform and beta distributions. We assume the tail of the underlying distribution begins at the threshold u. From our sample of n points a random number N sN 0 will exceed this u threshold. If we assume that the N excesses over the threshold are iid with exact Ž . GPD distribution, Smith 1987 has shown that maximum likelihood estimates ˆ ˆ ˆ ˆ j sj and bsb of the GPD parameters j and b are consistent and asymptoti- N N cally normal as N ™`, provided jy1r2. Under the weaker assumption that Ž . the excesses are iid from F y which is only approximately GPD he also obtains u ˆ ˆ asymptotic normality results for j and b. By letting u su ™x and NsN ™` n u as n ™` he shows essentially that the procedure is asymptotically unbiased provided that u ™x sufficiently fast. The necessary speed depends on the rate of Ž . convergence in Eq. 7 . In practical terms, this means that our best GPD estimator of the excess distribution is obtained by trading bias off against variance. We Ž . choose u high to reduce the chance of bias while keeping N large i.e. u low to Ž . control the variance of the parameter estimates. The choice of u or N is the most important implementation issue in EVT and we discuss this issue in the context of finite samples from typical return distributions in Section 2.3. Consider now the following equality for points x u in the tail of F 1 yF x s 1yF u 1 yF xyu . 8 Ž . Ž . Ž . Ž . Ž . Ž . u Ž . If we estimate the first term on the right hand side of Eq. 8 using the random proportion of the data in the tail N rn, and if we estimate the second term by 5 More precisely, the class comprises all distributions in the maximum domain of attraction of an extreme value distribution. approximating the excess distribution with a generalized Pareto distribution fitted by maximum likelihood, we get the tail estimator ˆ y1rj N x yu ˆ F x s1y 1 qj , Ž . ž ˆ n b Ž . for x u. Smith 1987 also investigates the asymptotic relative error of this estimator and gets a result of the form 1 y F x Ž . d 1 r2 2 N y1 ™N 0,Õ , Ž . ž 1 yF x Ž . as u su ™x and NsN ™`, where the asymptotic unbiasedness again re- n u quires that u ™x sufficiently fast. In practice we will actually modify the procedure slightly and fix the number of data in the tail to be N sk where kn. This effectively gives us a random Ž . threshold at the k q1 th order statistic. Let z Gz G . . . Gz represent the Ž1. Ž2. Ž n. ordered residuals. The generalized Pareto distribution with parameters j and b is Ž . fitted to the data z yz , . . . , z yz , the excess amounts over the Ž1. Ž kq1. Ž k . Ž kq1. threshold for all residuals exceeding the threshold. The form of the tail estimator Ž . for F z is then Z ˆ y1rj k z yz Ž . k q1 ˆ F z s1y 1 qj . 9 Ž . Ž . Z ž ˆ n b For q 1 ykrn we can invert this tail formula to get ˆ yj ˆ b 1 yq z s z sz q y1 ; 10 Ž . q q , Ž kq1. k ž ž ˆ k rn j we use the z notation when we want to emphasize the dependence of the q , k estimator on the choice of k and the simpler z notation otherwise. q In Table 1 we give threshold values and GPD parameter estimates for both tails of the innovation distribution of the test data in the case that n s1000 and k s100; we discuss this choice of k in Section 2.3. In Fig. 4 we show the Table 1 Threshold values and maximum likelihood GPD parameter estimates used in the construction of tail estimators for both tails of the innovation distribution of the test data. Note that k s100 in both cases. Ž . Standard errors s.e.s are calculated using a standard likelihood approach based on the observed Fisher information matrix ˆ ˆ Tail z j s.e. b s.e. Ž kq1. Ž . Ž . Losses 1.215 0.224 0.122 0.568 0.089 Ž . Ž . Gains 1.120 y0.096 0.090 0.589 0.079 5 Fig. 4. GPD tail estimates for both tails of the innovations distribution. The points show the empirical distribution of the residuals and the solid lines represent the tail estimates. Also shown are the d.f. of Ž . Ž . the standard normal distribution dashed and the d.f. of the t-distribution dotted with degrees of Ž . Ž . freedom as estimated in an AR 1 –GARCH 1,1 model with t-innovations. Ž Ž .. corresponding tail estimators Eq. 9 . We are principally interested in the left picture marked Losses which corresponds to large positive residuals. The solid lines in both pictures correspond to the GPD tail estimates and can be seen to model the residuals well. Also shown is a dashed line which corresponds to the standard normal distribution and a dotted line which corresponds to the estimated Ž . conditional t distribution scaled to have variance 1 in a GARCH model with t-innovations. The normal distribution clearly underestimates the extent of large losses and also of the largest gains, which we would already expect from the QQ-plot. The t-distribution, on the other hand, underestimates the losses and overestimates the gains. This illustrates the drawbacks of using a symmetric distribution with data which are asymmetric in the tails. With more symmetric data the conditional t-distribution often works quite well and it can, in fact, be viewed as a special case of our method. As already mentioned, it is an example of a heavy-tailed distribution, i.e. a distribution whose Ž . limiting excess distribution is GPD with j 0. Gnedenko 1943 characterized all such distributions as having tails of the form 1 yF x sx y1r j L x , 11 Ž . Ž . Ž . Ž . where L x is a slowly varying function and j is the positive parameter of the limiting GPD. 1 rj is often referred to as the tail index of F. For the t-distribu- tion with n degrees of freedom the tail can be shown to satisfy n Žny2.r2 yn 1 yF x ; x , 12 Ž . Ž . B 1 r2,nr2 Ž . Ž . where B a, b denotes the beta function, so that this provides a very simple example of a symmetric distribution in this class, and the value of j in the Ž limiting GPD is the reciprocal of the degrees of freedom see McNeil and Saladin Ž .. 1997 . Fitting a GARCH model with t innovations can be thought of as estimating the j in our GPD tail estimator by simpler means. Inspection of the form of the likelihood of the t-distribution shows that the estimate of n will be sensitive mainly to large observations so that it is not surprising that the method gives a reasonable fit in the tails although all data are used in the estimation. Our method has, however, the advantage that we have an explicit model for each tail. We estimate two parameters in each case, which gives a better fit in general. Ž Ž .. We also use the GPD tail estimator Eq. 9 to estimate the right tail of the Ž . negative return distribution F x by applying it directly to the raw return data X x , . . . , x ; in this way we calculate an unconditional quantile estimate x ˆ t ynq1 t q using unconditional EVT. We investigate whether this approach also provides reasonable estimates of x t . It should however be noted that the assumption of q independent excesses over threshold is much less satisfactory for the raw return data. The asymptotics of the GPD-based tail estimator are therefore much more poorly understood if applied directly to the raw return data. Even if the procedure can be shown to be asymmptotically justified, in practice it is likely to give much more unstable results when applied to non-iid, finite Ž . Ž sample data. Embrechts et al. 1997 provide a related example see Fig. 5.5.4. on . Ž . page 270 ; they construct a first order autoregressive AR 1 process driven by a symmetric, heavy-tailed, iid noise, so that both noise distribution and marginal distribution of the process have the same tail index. They apply the Hill estimator Ž . an alternative EVT procedure described in Section 2.3 to simulated data from the Ž . process and also to residuals obtained after fitting an AR 1 model to the raw data and find estimates of the tail index to be much more accurate and stable for the residuals, although the Hill estimator is theoretically consistent in both cases. This example supports the idea that pre-whitening of data through fitting of a dynamic model may be a sensible prelude to EVT analysis in practice. 2.3. Simulation study of threshold choice Ž . To investigate the issue of threshold choice i.e. choice of k we perform a small simulation study. We also use this study to compare the GPD approach to tail estimation with the approach based on the Hill estimator and the approach Ž . based on the empirical distribution function historical simulation . Ž . The Hill estimator Hill, 1975 is designed for data from heavy-tailed distribu- Ž Ž .. tions admitting the representation Eq. 11 with j 0. The estimator for j , Ž . based on the k exceedances of the k q1 th order statistic, is k Ž H . Ž H . y1 ˆ ˆ j sj sk log z ylog z , Ý k Ž j. Ž kq1. j s1 and an associated quantile estimator is Ž . H ˆ yj 1 yq Ž . Ž . H H z s z sz ; 13 Ž . q q , Ž kq1. k ž k rn Ž . see Danielsson and de Vries 1997b for details. The properties of these estimators have been extensively investigated in the EVT literature; in particular, a number of Ž recent papers show consistency of the Hill estimator for dependent data Resnick . and Starica, 1995, 1996 and develop bootstrap methods for optimal choice of the ˇ ˇ Ž . threshold z Danielsson and de Vries, 1997a . Ž kq1. In the simulation study we generate samples of size n s1000 from Student’s t-distribution which, as we have observed, provides a rough approximation to the observed distribution of model residuals. The size of sample corresponds to the Ž . window length we use in applications of the two-step method. From 12 we know the tail index of the t-distribution and quantiles are easily calculated. We calculate ˆ Ž j and z the maximum-likelihood and GPD-based estimators of j and z k q , q k ŽH. Ž H . ˆ . based on k threshold exceedances as well as j and z for various values k q ,k of k; for the quantile estimates we restrict our attention to values of k such that Ž . k 1000 1 yq , so that the target quantile is beyond the threshold. Of interest are Ž . the mean squared errors MSEs and biases of these estimators, and the depen- dence of these errors on the choice of k. For each estimator we estimate MSE and bias using Monte Carlo estimates based on 1000 independent samples. For Ž . example, we estimate MSE z by q , k 1000 2 Ž . j MSE z s z yz , Ý ž q , q , k q ž k j s1 Žj. where z represents the quantile estimate obtained from the jth sample. q ,k Ž Although the Hill estimator is generally the most efficient estimator of j it . gives the lowest MSE for sensibly chosen k it does not provide the most efficient nor the most stable quantile estimator. Our simulations suggest that the GPD method should be preferred for estimating high quantiles. An example is given in Fig. 5. We plot the bias and MSE of estimators of the 99th percentile against k, in the case that the degrees of freedom of the t-distribution is n s4. The Hill estimator is marked with a solid line, the GPD Ž . Fig. 5. Estimated bias and MSE mean squared error against k for various estimators of the 0.99 quantile of a t distribution with n s4 degrees of freedom based on an iid sample of 1000 points. Solid line is Hill estimator; dashed line is estimator based on GPD approach; dotted line is the empirical Ž . quantile estimator i.e. the historical simulation approach . The alternative x-axis labels above the graphs give the threshold corresponding to k expressed as a sample percentile. estimator is marked with a dashed line and the empirical HS-estimate z of the Ž11. quantile is marked by a dotted line. The Hill method has a negative bias for low values of k that becomes positive and then grows rapidly with k; the GPD estimator has a positive bias that grows much more slowly; the empirical estimate has a negative bias. The MSE reveals more about the relative merits of the methods: the GPD estimator attains its lowest value corresponding to a k value of about 100 but, more importantly, the MSE is very robust to the choice of k because of the slow growth of the bias. The Hill method performs well for k F70 but then deteriorates rapidly. The HS method is obviously less efficient than the two EVT methods, which shows that EVT does indeed give more precise estimates of the 99th percentile based on samples of size 1000 from the t-distribution. For the 99th percentile both the GPD and Hill estimators are clearly useful, if used correctly. In the case of GPD we must ensure that the variance of the estimator is kept low by setting k sufficiently high, but as long as k is greater than about 50 the method is robust; the issue of choosing an optimal threshold does not seem so critical for the GPD method. For the Hill method it is more important because the efficient range for k is smaller; it is important that the bias be kept under control by choosing a low k. In this paper we only show results for the t-distribution with four degrees of freedom, but further simulations suggest that the same qualitative conclusions hold for other values of n and other heavy-tailed distributions. For estimating more distant quantiles we observe that the GPD method appears to be more efficient than the Hill method and maintains its relative stability with respect to choice of k. The greater complexity of the GPD quantile estimator, which involves a second ˆ ˆ y1 estimated scale parameter b as well as the tail index estimator j , seems to lead to better finite sample performance. 2.4. Summary: adÕantages of the GPD approach We favour the GPD approach to tail estimation in this paper for a variety of reasons that we list below. v In finite samples of the order of 1000 points from typical return distributions Ž EVT quantile estimators whether maximum-likelihood and GPD-based or Hill- . based are more efficient than the historical simulation method. v Ž The GPD-based quantile estimator is more stable in terms of mean squared . error with respect to choice of k than the Hill quantile estimator. In the present application a k value of 100 seems reasonable, but we could equally choose to use k values of 80 or 150. v For high quantiles with q G0.99 the GPD method is at least as efficient as the Hill method. v The GPD method allows effective estimates of expected shortfall to be constructed as will be described in Section 4. v Ž . The GPD method is applicable to light-tailed data j s0 or even short-tailed Ž . data j - 0 , whereas the Hill method is designed specifically for the heavy-tailed Ž . case j 0 . There are periods when the conditional distribution of financial returns appears light-tailed rather than heavy-tailed.

3. Backtesting