Document not found! Please try again

Can Realized Volatility improve the Accuracy of Value-at ... - CiteSeerX

3 downloads 791 Views 566KB Size Report
In the past years the academical and industrial interest in portfolio risk forecasting has grown rapidly. .... Given a probability α ∈ (0,1), the VaR states that the portfolio return in the future will be ...... 2004/ar/riskreport.php. Ding, Z. ..... Std. Dev. Skewness Kurtosis. Min. Max. (rt)T t=1. 0.016. 0.972. -0.087. 9.743. -7.546. 7.822.
Can Realized Volatility improve the Accuracy of Value-at-Risk Forecasts? Robinson Kruse Leibniz University of Hannover Department of Economics K¨onigsworther Platz 1 D–30167 Hannover April 2006

1

Introduction

In the past years the academical and industrial interest in portfolio risk forecasting has grown rapidly. The importance of this issue is to due to the fact that the regulation of banks is based on a certain risk measure, namely Value-at-Risk (VaR). After many financial market crises occurred all over the world in the last two decades, the regulation of banks was intensified. The Bank of International Settlements (BIS) acts as the bank supervision authority. Banks have to forecast their own portfolio VaR and report it to the BIS every trading day. The so-called internal model approach regulates that risk managers of banks use their own forecasting model, since none is prescribed by law, see Basel Committee on Banking Supervision (1995). Therefore, the risk managers have to choose an accurate forecasting procedure from a large set

of available models and methods. Independently of banking supervision, banks use one-step ahead VaR forecasts to control their risks, see the annual report of Deutsche Bank (2004) for example. The fact that volatility is one of the most important risk component and that it is important for VaR calculations leads to the issue of volatility forecasting. Meanwhile, a huge amount of research articles, working papers and textbooks on volatility forecasting are available. Henceforth, it is rather unclear which forecasting model or method is the most appropriate. Furthermore, VaR forecasting is not limited to the volatility topic. Another important component is the way to model the distribution of portfolio returns. This field is also quite large and heavily discussed in the literature. However, a standard tool in practice is a full parametric approach that is characterized by a volatility model with fixed parameters and the assumption that portfolio returns are normally distributed. This tool is entitled as RiskMetrics. From a theoretical point of view, this model is inadequate for several reasons that are discussed in the following section. Nevertheless, the existing literature on VaR forecasting has shown that the RiskMetrics approach is able to provide a good out-of-sample forecasting performance, e.g. Bao et al. (2006). In Bao et al. (2006) several distributional modeling approaches are compared, while the volatility forecasting model set is limited to only one volatility model. It is argued, that the distribution plays a more important role than the volatility model. In our work, we try to cover a range of different volatility modeling approaches and consider different ways of modeling the distribution. To be more precise, our work is related to the article of Koopman et al. (2005), where different classes of volatility models are compared in terms of their outof-sample predictive abilities. They consider Generalized Autoregressive Conditional Heteroscedasticity (GARCH), Stochastic Volatility (SV) as well as realized volatility (RV) models. RV models are the most actual way in volatility modeling and forecasting, while GARCH and SV models are used since the eighties. The comparison is done in a data-snooping robust framework that was developed by White (2000) and 2

refined by Hansen (2005). In general, a lot of forecasting models are considered for one and only one time series. In order to avoid spurious statistical inference about the true predictive abilities one needs such a robust testing framework. Contrary to the work of Kuester et al. (2006), we focus on long memory properties of volatility, especially realized volatility. Furthermore, we explore the forecast abilities of Stochastic Volatility models in the VaR framework. On the other hand, there are several links between this work and that of Kuester et al. (2006): We pay attention to extreme value theory and filtered historical simulation. In this context we are using some new hybrid method in the sense that we combine extreme value theory and filtered historical simulation with Autoregressive Fractionally Integrated Moving Average (ARFIMA) time series models for realized volatility. This links our work to Giot and Laurent (2004), who analyzed a full parametric realized volatility model. Our approach can be viewed as a semi-parametric one. The rest of this work is organized as follows: In section two Value-at-Risk is introduced and several modeling approaches reaching from realized volatility over volatility models like GARCH to the modeling of quantiles are discussed. Section three provides an overview of methods to evaluate VaR forecasts. It starts with a VaR forecast specification test, after that two loss functions are analyzed and finally the data-snooping robust testing framework of White (2000) and Hansen (2005) is explained. All previously described methods are applied in section 7, where a data set of S&P500 future contracts is analyzed. The conclusions are drawn in the last section.

2

Competing Value-at-Risk Specifications

Regarding the measurement of financial risk, the Value-at-Risk (VaR) approach emerged as the standard tool. Its popularity is due to its simple interpretation and ability to transform portfolio risk into a scalar that can be interpreted easily. Generally, it is assumed that the investor holds a long position in the considered portfolio. 3

This means that he buys the portfolio today in order to sell it in the future. Henceforth, he suffers from a decreasing portfolio value. In our work, we assume that the imaginary investor buys today an amount of S&P500 future contracts with a value of one dollar in order to sell them tomorrow. Given a probability α ∈ (0, 1), the VaR states that the portfolio return in the future will be greater than or equal to the VaR with probability 1 − α. In other words, with probability α the return will less than the VaR. Usually, α takes the values of one or five percent. In the rest of this work we will entirely focus on the case α = 1%, which is the most relevant case in practice. Note, that VaR takes negative values in our work. With the knowledge of the VaR we are not able to answer the question which return will occur in mean given that the return will be less then the VaR. This issue can be treated with another risk measure, namely Expected Shortfall. Theoretically, Expected Shortfall is a better risk measure than VaR, in the sense that it is in line with an axiomatic system that is described in Artzner et al. (1999). They showed that VaR is in contrast to Expected Shortfall not coherent. This means that portfolio diversification may increase the VaR instead of reducing it. However, this issue is not a subject of our work. In order to give a precise definition of VaR, we have to introduce some notation. The price of a future contract in time t is denoted as Ft . Let rt ≡ ln(Ft ) − ln(Ft−1 ) be the daily return. Usually, it is assumed that the data generating process of the returns is rt = µt + εt ,

t = 1, ..., T .

(1)

The conditional expectation of the returns is denoted by µt = E(rt |Ft−1 ) and the innovation of the process is εt . The information set Ft−1 is a filtration generated by the innovations ε1 , ..., εt−1 . The conditional expectation accounts for time-dependent dynamics in the returns and a non-zero mean. In addition, returns are almost heteroscedastic. Therefore, it is reasonable to incorporate this stylized fact in the process. The innovation is allowed to be heteroscedastic but serially uncorrelated with mean 4

zero. This can be achieved by specifying a multiplicative process for εt as follows: rt = µt + σt zt ,

(2)

where σt zt = εt . The components of the multiplicative innovation process are a scaling factor σt and an independently and identically distributed (iid) process zt . Rearranging the previous equation yields zt =

rt −µt . σt

Assuming E(zt ) = 0 and E(zt2 ) =

1 yields σt2 = E(ε2t |Ft−1 ) = E(rt2 |Ft−1 ). Hence, zt is the standardized return. In sum, the process defined in equation (2) is based on a quite simple construction and allows for dynamics in the first and the second moment. Now, we turn to a precise definition of VaR with a coverage rate of α%: P(rt < −VaRαt |Ft−1 ) = α ,

(3)

where P(·|Ft−1 ) is a conditional probability, VaRαt denotes the Value-at-Risk for time t given the confidence level α. In order to proceed, let G denote the innovation distribution and assume that G belongs to the location-scale family of distributions that is commonly used in financial econometrics. A distribution that belongs to this family is fully described by the first and the second moment. The best known example is the normal distribution. Given a coverage probability α and an information set Ft−1 one can establish the following result: VaRαt = −(µt + σt Gα−1 ) ,

(4)

where G−1 α is the α-quantile of G. This formulation is quite helpful in the sense that the VaR depends linearly on the conditional mean and the conditional standard deviation of the returns. The one-step ahead VaR forecast can be easily obtained from the previous equation by replacing t with t + 1 and inserting out-of-sample forecasts for µt , σt and G−1 α : dα b−1 ) , µt+1 + σ bt+1 G VaR α t+1 = −(b 5

(5)

where µ bt+1 and σ bt+1 are one-step ahead forecasts that are conditioned on the information set Ft . Usually, µt is modeled with at least weak stationary Autoregressive Moving Average (ARMA) models. The need for a volatility forecast opens the door for Generalized Autoregressive Conditional Heteroscedasticity (GARCH) and Stochastic Volatility (SV) models. The most recent approach to volatility modeling and forecasting is a concept named realized volatility (RV). The next subsections describe the mentioned approaches. Furthermore, the quantile estimation of the innovation distribution is non-trivial. The subsections 2.3 and 2.4 are dedicated to this theme.

3

Realized Volatility

In the short past, one often considered squared daily returns or absolute returns as a proxy for the true and unknown volatility of asset returns. To evaluate volatility forecasts, one needs an approximation for the true volatility. Squared returns turn out to be a bad approximation. A better one can be obtained by using realized volatility. This measure is based on intraday information. To be more precise, realized volatility relies on the sum of squared intraday returns. Using the theory of quadratic variation and arbitrage-free processes, one can show that realized volatility is a consistent measure for the true volatility. Reasons why realized volatility is interesting in the context of VaR forecasting are: First, its theoretical properties are very attractive. Second, it can be modeled and forecasted with usual time series models. Third, compared to other volatility modeling approaches like GARCH and SV it is a nonparametric measure of volatility. Of course there are several problems. Usually, intraday data is available at frequencies between one and thirty minutes. In general, the optimal sampling frequency is unknown. Consistency claims a ultrahigh frequency, while market microstructure effects prevent us from using the highest available frequency. The most important microstructure effect in this context are the bid-ask bounce, price discreteness and 6

nonsynchronous trading, see Giot and Laurent (2004). In general, market microstructure effects can cause a substantial bias of realized volatility. On the contrary, a low sampling frequency reduces the quality of the estimates. A procedure to find the optimal sampling frequency requires the computation of realized volatility at different frequencies which is computationally demanding, see Oomen (2005). However, we choose a frequency of 15 minutes as a compromise between a too noisy and a too smooth measure of RV. Our raw data set is obtained from price-data.com and consists of S&P500 future prices recorded at 2700 trading days between 03/10/1994 and 12/31/2004. The Chicago Mercantile Exchange opens at 9.45 and closes usually at 16.30. However, the closing time varies. In order to get the most information out of the data we allow a flexible closing time. The traded future contracts exhibit a maturity of three months. As the time to maturity of a particular contract, e.g. the March contract, approaches zero, the next contract, e.g. the August contract, becomes more important and is therefore traded more frequently. Hence, we switch from the old contract to the new one when the newer one has a higher trading volume. This procedure has been applied in Bollerslev et. al (2005) where also S&P500 future prices are analyzed.

3.1

Measuring and Modeling

In order to explain how realized volatility (RV) is constructed in this work, some notation has to be introduced. Let Fti denote the i-th intraday price observation of the trading day t and i = 1, ..., N . We define realized volatility as follows: RVt ≡ =

N X ¡ i=3 N X

¢2 ln(Fti ) − ln(Fti−1 )

rt2i

=

M X

rt2i+2 ,

i=1

i=3

where M equals N − 2. Following Bollerslev et al. (2005) we exclude overnight returns. Overnight information from day t − 1 to day t is naturally incorporated in 7

the price Ft1 which is the first observed intraday price of the day t. To avoid a pricing error we exclude Ft1 . All in all, we have M = 26 if the Chicago Mercantile Exchange opens at 9.45 and closes at 16.30. In addition, we calculate no return between two different contracts in order to avoid artificial jumps in the realized volatility. If the number of squared intraday returns used in the calculation of realized volatility approaches infinity (M → ∞), then RVt converges uniformly in probability to √ integrated variance with rate M , see Barndorff-Nielsen and Shephard (2002). Empirical evidence has established the following stylized facts about realized volatility. First, the autocorrelation function is dying out at a hyperbolic rate rather than exponentially which suggests the existence of long memory in the data. Second, the logarithm of RV is nearly Gaussian. Third, the distribution of RV in levels is rightskewed and leptokurtic, see Anderson et. al (2002), Corsi (2004) as well as Bollerslev et. al (2005). Next, we are going to discuss two different modeling approaches of realized volatility that can be exploited to obtain RV forecasts As mentioned above, these forecasts can be useful for VaR prediction.

3.2

Autoregressive Fractionally Integrated Moving Average Models

To start this section about short and long memory time series models for realized volatility, we want to provide a common definition of long memory. We rely on Baillie (1996), who cited McLeod and Hipel (1978). Let ρj denote the autocorrelation at lag j P of a discrete time process. This process exhibits long memory if limk→∞ kj=−k |ρj | is infinite. Of course, this is not the one and only way how long memory can be defined. An econometric survey about the definition of long memory is Gu´egan (2005). Before we describe long memory time series models like the family of Autoregressive Fractionally Integrated Moving Average (ARFIMA) Models, we turn to the special cases with short memory that are not fractionally integrated. Such ARMA(p, q)

8

models for a time series (yt )Tt=1 are usually written as yt = µ + φ1 yt−1 + ... + φp yt−p + εt + θ1 εt−1 + ... + θq εt−q ,

(6)

where (εt )Tt=1 is assumed to be a white noise process and µ denotes the constant. To simplify the notation we use the lag operator L that is characterized by the property Lyt = yt−1 . Using lag polynomials the previous equation becomes Φ(L)(yt − µ) = Θ(L)εt ,

(7)

with Φ(L) = 1 − φ1 L − ... − φp Lp as the AR polynomial and Θ(L) = 1 + θ1 L + ... + θq Lq as the corresponding MA polynomial. We focus on stationary and invertible ARMA process, that are characterized by the fact that all roots lie outside the unit circle. The autocorrelations of a stationary and invertible ARMA process are geometrically bounded. Therefore, such a process has a short memory. The time series we want to analyze is realized volatility, but often it is more appropriate to consider its logarithm, since the logarithmic series is usually not as far away from normality as the series in levels. The normality assumption is important when a maximum likelihood method is used for estimation. A misspecification may arise if the disturbance is not normally distributed. Pong et. al (2004) and Gallant et. al (1999) considered an ARMA(2,1) model because it can be deemed to be a simple alternative to the long memory ARFIMA model that is able to reflect the most important properties of realized volatility, too. The reason for the special lag structure p = 2 and q = 1 is related to the fact that the sum of two AR(1) processes gives an ARMA(2,1) process, see Hamilton (1994). The ARMA(2,1) model can be interpreted as a component model with a transitory AR(1) component and a AR(1) permanent component, see Gallant et. al (1999). Estimation and forecasting are done in EViews 4.1 by applying a Marquardt nonlinear least squares algorithm. The nonlinear least squares estimator is asymptotically efficient and asymptotically equivalent to the maximum likelihood estimator. 9

We now turn to the class of ARFIMA models following Andersen et al. (2001) and Andersen et al. (2003). The autocorrelation function of such models decline at a hyperbolic rate, while it declines only at an exponentially rate for ARMA models. Since empirical evidence suggests long memory, the application of fractionally integrated time series models seems to be reasonable. Using lag polynomials again, a common expression for an ARFIMA(p, d, q) model reads Φ(L)(1 − L)d (yt − µ) = Θ(L)εt ,

(8)

where d denotes the long memory parameter that is not restricted to take integer values. If 0 < d < 0.5 holds, then the AFRIMA process is long memory, all autocorrelations are positive and decaying hyperbolically, see Baillie (1996). The memory of the process increases when d increases. As discussed in one of the following sections, this will not be the case in the family of FIGARCH processes. The short memory ARMA(p, q) model results if d = 0. One possible extension of the previous equation is considered by Giot and Laurent (2004), who applied an ARFIMAX specification with additional regressors to account for the correlation of realized volatility and past negative innovations. Other extensions can be found in Martens et. al (2004). The long memory parameter d can be estimated by several methods: Geweke and PorterHudak (1984) (henceforth: GPH) proposed a log-periodogram regression while the Gaussian Semi-Parametric (GSP) method of Robinson and Henry (1998) is also available. We estimate all parameters simultaneously with the modified profile maximum likelihood method. The simulation study in Doornik and Ooms (2001) shows that this method is preferable to exact maximum likelihood estimation. The starting value of the long memory parameter d is an estimate due to the log-periodogram regression approach. For comparison, we apply the Gaussian Semi-Parametric estimator of Robinson and Henry (1998). Non-zero periodogram points are evaluated at Fourier frequencies, see Doornik and Ooms (2001).

10

In order to determine the optimal lag order, we estimate different specifications with p = 1, ..., 5 if q = 0, 1 and p = 0, 1 if q = 1, ..., 5. Hence, we consider twenty different specifications. The optimal lag order (p∗ , q ∗ ) is chosen due to the minimization of Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Estimation and forecasting are done in Ox, using the ARFIMA 1.0.1 package of Doornik and Ooms. Since the AR(FI)MA model is also estimated in logarithms, we would introduce a data transformation bias if we just apply an exponential transformation. For this reason, we adopt the bias correction of Giot and Laurent (2004): ¶ µ 1 2 c RVt+1 = exp ln(RVt+1 ) − εbt+1 + σ b , 2 ε

(9)

where σ bε2 denotes the estimated residual variance. Note, that we apply this bias correction formula for all logarithmic transformations of volatility.

3.3

Heterogenous Autoregressive Realized Volatility Models

An alternative to the ARFIMA model is the Heterogenous Autoregressive Realized Volatility model (HAR) that originates in Corsi (2004), who extended the Heterogeneous Autoregressive Conditional Heteroscedasticity (HARCH) model of M¨ uller et. al (1997). This model can be interpreted as a component model that contains a daily, a weekly and a monthly realized volatility component. Like the ARMA(2,1) model, the HAR model is a parsimonious and has short memory. Nevertheless, it might be able to reflect the most important properties. The model is described by the equation q q q q d RVt = α0 + α1 RVdt−1 + α2 RVw + α RVm (10) 3 t−1 t−1 + εt , P5 1 where RVdt denotes the daily realized variance, RVw t−1 = 5 i=1 RVt−i the weekly P 22 1 realized variance and finally, RVm t−1 = 22 i=1 RVt−i is the monthly realized variance. The error term, denoted by εt , is assumed to be a Gaussian white noise process. Note, 11

that we introduce the superscript d just to distinguish between the particular time horizons. Andersen et al. (2005) consider a logarithmic version of the HAR model which we want to apply in our work. The logarithmic version reads m ln(RVdt ) = α0 + α1 ln(RVdt−1 ) + α2 ln(RVw t−1 ) + α3 ln(RVt−1 ) + εt .

(11)

The HAR specifications are estimated and forecasted in EViews 4.1 with OLS and Newey-West standard errors to compute robust standard errors. Note, that the estimated residuals are expected to be heteroscedastic. Therefore, a parametric volatility model might be estimated in order to cope the time dependent error variances. This ’volatility of realized volatility’ approach is proposed in Corsi et al. (2005), but it is beyond the scope of this work.

4

Volatility Models

4.1

GARCH Models with Short Memory

In the class of geometric memory volatility models we consider some of the best known specifications. Starting with the standard GARCH model we examine two asymmetric versions, namely the Threshold GARCH and the Exponential GARCH model. In addition, the fractionally integrated and the hyperbolic GARCH models are considered in the next subsection that deals with long memory GARCH models. Recall the stochastic process for asset returns, rt = µt + εt = µt + σt zt . In the seminal work of Engle (1982), the first Autoregressive Conditional Heteroscedasticity (ARCH) model was proposed. The conditional variance in an ARCH(1) model is specified as σt2 = ω + α1 ε2t−1 ,

(12)

where ω > 0 and α1 ≥ 0 are assumed to ensure the positivity of σt2 . Empirical work has shown rapidly that a higher number of lagged shocks are needed to capture the 12

dynamics in the second moment. The result was the ARCH(p) model with p + 1 parameters. A more parsimonious specification is the Generalized ARCH model of Bollerslev (1986). The GARCH(p, q) model specifies the conditional expectation of ε2t as follows: σt2 = ω + α(L)ε2t + β(L)σt2 ,

(13)

where α(L) = α1 L + ... + αq Lq and β(L) = β1 L + ... + βp Lp are lag polynomials, see Bollerslev (1986). The non-negative parameters αi measure the short-run effects of innovations on the conditional variance, while the non-negative parameters βj measure the long-run impact. Rewriting the previous equation yields σt2 =

α(L) 2 ω + ε . 1 − β(1) 1 − β(L) t

(14)

This way of expressing the GARCH(p, q) is called the ARCH(∞) representation, since the conditional variance is affected by an infinite number of past shocks. Note, that we have to assume that all the roots of 1 − β(L) lie outside the unit circle to derive this result. Many empirical investigations lead to the conclusion that (p, q) = (1, 1) is the most relevant lag structure regarding daily return data. Hence, the GARCH(1,1) model has only three unknown parameters which is quite parsimonious. Given these lag orders the GARCH model can be written in easier fashion as 2 σt2 = ω + αε2t−1 + βσt−1 ,

(15)

where α + β is a measure of shock persistence. This persistence tends to infinity when the parameter restriction α + β = 1 is fulfilled. This special case is well known as the Integrated GARCH(1,1) model that leads to infinite unconditional second and fourth moments. Furthermore, the restriction also implies a unit root in the ARMA(1,1) representation of the GARCH process for ε2t . Davidson (2004) shows very detailed that the covariance-instationarity of an integrated GARCH process does not imply any long memory behavior. Using lag polynomials the IGARCH model can be written

13

as σt2

· ¸ ω φ(L)(1 − L) 2 = + 1− εt , 1 − β(1) 1 − β(L)

(16)

where φ(L) denotes the filter φ(L) = 1 − α(L) − β(L). A simplification of this specification is used in risk management. It is called RiskMetrics and was invented by J.P. Morgan, see RiskMetrics (1995). At first, the lag order is set to p = q = 1 which is reasonable. However, the parameters are not estimated. They are fixed at the following values: ω = 0, β = 0.94 and therefore α = 0.06. In addition, it is assumed that zt is normally distributed. Next, we consider some extensions of the standard GARCH specification. Since asymmetric effects are important to describe the dynamics of the volatility process, often asymmetric GARCH models are used in financial econometrics. We choose one simple and popular (Threshold GARCH) and one more sophisticated (Exponential GARCH) asymmetric specification from the huge set of possible specifications. Let us start with the Threshold GARCH model that originates in Glosten, Jagannathan and Runkle (1994). The standard GARCH equation is extended by a dummy variable that equals one if the last shock was negative and zero if it was not: 2 σt2 = ω + αε2t−1 + γI{εt−1 0, α1 ≥ 0, β1 ≥ 0 are no more necessary. The formal expression for the EGARCH model is 2 ln(σt2 ) = ω + αg(zt−1 ) + β ln(σt−1 ),

14

(18)

where g(zt−1 ) = θ1 zt−1 + θ2 (|zt−1 | − E(|zt |)) is weighting function, see Tsay (2005). The parameters θ1 and θ2 measure the asymmetric effects. Speaking more clearly, θ1 measures the sign effect, which is expected to be negative, while θ2 gages the magnitude effect. Note, that E(|zt |) depends on the distributional assumption. More details can be found in Tsay (2005).

4.2

GARCH Models with Long Memory

The properties of the IGARCH model are not very attractive from an empirical point of view. The exponentially decaying autocorrelations function of ε2t might be too fast to mimic the observed autocorrelation patterns of investigated return series no matter how close the persistence measure is to unity. As suggested by Ding, Engle and Granger (1993) the sample autocorrelations decline only at a hyperbolic rate. The hyperbolic rate is usually modeled by an fractionally integrated GARCH process (FIGARCH) that has been proposed by Baillie, Bollerslev and Mikkelsen (1996). FIGARCH processes are not covariance stationary but strict stationary under some regularity conditions. Chung (1999) proposes a slightly different specification of the FIGARCH process. However, Baillie, Bollerslev and Mikkelsen (1996) introduce a memory parameter d ∈ [0, 1] into the volatility equation. Formally, the FIGARCH model is defined as σt2

· ¸ φ(L)(1 − L)d 2 ω + 1− εt . = 1 − β(1) 1 − β(L)

(19)

The memory of the FIGARCH process is measured by −d. In the special case of d = 1, the model reduces to an integrated GARCH with short memory, while the length of memory is increasing as d approaches zero. The characterization of the FIGARCH model as an intermediate case between the stable GARCH and the IGARCH is misleading, because the limiting case d = 0 is actually another short memory case, see Davidson (2004). An extension of the fractionally integrated GARCH models is the hyperbolic GARCH (HYGARCH) model that originates in Davidson (2004). The 15

conditional volatility is specified as · ¸ ω φ(L)[1 + τ ((1 − L)d − 1)] 2 2 σt = + 1− εt . 1 − β(1) 1 − β(L)

(20)

In contrast to the FIGARCH models that allow for long memory but not for weak stationarity the HYGARCH model provides both. Long memory effects are included if d ∈ (0, 1) and weak stationarity is ensured as far as τ < 1. Note, that the fractionally integrated version is a special case: If τ equals one the models reduces to the FIGARCH model. The properties are deeply discussed in Davidson (2004). We take this specification into account because its theoretical properties are very attractive. Of course, it is possible to extend the HYGARCH model to take asymmetric effects into account, see Schoffer (2003) who analyzed the Hyperbolic Asymmetric Power ARCH model. Estimation and forecasting of GARCH models are done in Ox, using the G@RCH 4.0 package of Laurent and Peters. We choose bounded parameters in the Quasi Maximum Likelihood (QML) setting. The QML estimator of Bollerslev and Wooldridge (1992) is consistent and inefficient, but standard errors are robust against misspecification of the innovation distribution. In addition, we impose the restriction α+β < 1.

4.3

Stochastic Volatility Models

The Stochastic Volatility (SV) model, that was introduced by Taylor (1986), is a valuable alternative to deterministic volatility models like the GARCH models we considered in the previous subsections. The key difference between the two classes of volatility models is that GARCH models are deterministic, while SV models are characterized by a stochastic process for the latent volatility. Usually, a first-order autoregressive process for the logarithm of volatility is employed. The logarithm is used to ensure the positivity of the volatility like in the Exponential GARCH model of Nelson (1991). The SV model recently applied by Koopman et al. (2005) is given

16

by the system of equations: iid

rt = µ + σt zt ,

zt ∼ N (0, 1)

σt2 = σ∗2 exp(ht )

(21) (22)

ht = βht−1 + ση ηt ,

iid

ηt ∼ N (0, 1) .

(23)

If {zt }Tt=1 and {ηt }Tt=1 are assumed to be iid random variables that are normally distributed then the volatility model is symmetric like a GARCH(1,1) model. Asymmetric modifications are considered in Harvey and Koopman (1996) and more recently in Asai and McAleer (2004). To avoid further complications we treat the mean µ as fixed, see Koopman et al. (2005). The parameter σ∗2 is s scaling factor. Another way to specify the SV model is to set σ∗2 = 1 and introduce a constant into the logarithmic AR process. Due to identification problems it is not possible to have leave both parameters unrestricted. The persistence parameter β is restricted by (0, 1) to ensure weak stationarity of the autoregressive process. It can be interpreted like the sum of the parameters α and β in the GARCH(1,1) process, see equation (2.15). The efficient estimation of the SV model is very complicated. The likelihood function cannot be computed by linear methods, since the volatility process (ht )Tt=1 is unobserved and related to the observed returns in a non-linear way, see Koopman et al. (2005). Several studies are focussed on the efficient estimation of SV-type models, e.g. Liesenfeld and Jung (2003) discuss the Efficient Importance Sampling Method. We apply the exact maximum likelihood method that relies on Monte Carlo importance sampling techniques. To briefly describe the idea behind this method, let r = (r1 , ..., rT )0 , ψ = (β, σ∗ , ση )0 and θ = (h1 , ..., hT )0 . The likelihood function L(ψ) = p(r|ψ) can be rewritten as Z Z L(ψ) = p(r, θ|ψ)dθ = p(r|θ, ψ)p(θ|ψ)dθ . The density p(r|θ, ψ) is called the importance density. It can be approximated by a simulated conditional Gaussian density. The Gaussian density is chosen due to its 17

simplicity. The likelihood function - that is estimated by Monte Carlo methods can be used to obtain exact maximum likelihood parameter estimates numerically, see Shephard and Pitt (1997) and Durbin and Koopman (1997). Koopman and Shephard (2003) analyzed the assumptions that are made in the importance sampling framework. Details can be found in Shephard (1996), Durbin and Koopman (2002) as well as Lee and Koopman (2004). Alternatively, the model can also be estimated with the generalized method of moments, see Melino and Turnbull (1990). Details regarding SV models can be found in Taylor (1994), Shepard (1996) and Koopman et al. (2005). GARCH and SV models are compared in Fleming and Kirby (2003) and Carnero, Pena and Ruiz (2004).

4.4

Specification of the Innovation Distribution

One of the most popular approaches regarding the specification of a volatility model is the full parametric approach. The popularity of this choice can be explained by the simplicity in estimation and the easy implementation on computers. This section deals with the parametric specification of the innovation distribution G. One of the most frequently used distribution is the standard normal. Empirical evidence has shown that the Student t-distribution, that depends on the degrees of freedom ν, is usually more appropriate to describe asset returns. Note, that the normal distribution is a limiting case (ν → ∞) and that ν is a shape parameter. Assuming that the innovations are t-distributed, fat tails can be modeled easily. The fatness increases when ν decreases since the conditional kurtosis can be expressed as 3(ν − 2)/(ν − 4), see Angelidis et al. (2004). Problematic is the possible non-existence of higher moments. To be more precise, moments higher than ν − 2 are infinite. In practice, the estimated number of degrees of freedom is roughly located around six which might be viewed as high enough since the first four moments exist. Bollerslev (1987) was the first, who published an article that deals with the Student-t distribution in the

18

context of GARCH models. Another aspect is the skewness that might be significantly different from zero. Although skewness has not been treated as very important so far, some new approaches are recently available. On the one hand, Harvey and Siddique (1999) introduced the Autoregressive Conditional Skewness Model and on the other hand, Fernandez and Steel (1998) proposed a skewed Student t-distribution. This version is a mixture of two truncated symmetric t-distributions. It depends on four parameters that determine location, dispersion, skewness and kurtosis. We consider the standardized, two-parameter version that was used by Giot and Laurent (2004) in the context of the Asymmetric Power ARCH model. The degree of asymmetry is controlled by the additional parameter ξ > 0. If ξ equals one, then the well known symmetric version results. To make the statement more clear, let m and s denote the mean and the standard deviation of the asymmetric t-distribution respectively. Following Giot and Laurent (2004), √ µ ¶ ) ν−2 1 Γ( ν−1 2 √ , m = ξ− ξ πΓ( ν2 ) µ ¶ 1 2 s = ξ + 2 − 1 − m2 , ξ where Γ(·) denotes the Gamma function. The density functions of the asymmetric Student t-distribution (gta ) and its special case (gt ) are given by: ( gta (z|ξ, ν) =

2s g (ξ(sz ξ+ 1ξ t

+ m)|ν) , z < − ms

2s |ν) g ( sz+m ξ ξ+ 1ξ t

, z ≥ − ms

Γ( ν+1 ) 1 2 gt (z|ν) = p . ν z 2 ν+1 π(ν − 2)Γ( 2 ) (1 + ν−2 ) 2

(24)

(25)

If ξ = 1 it follows that m = 0 and s = 1, which implies that gta (z|1, ν) = gt (z|ν) follows. If ξ > 1 (< 1) then the distribution is skewed to the right (left), see Lambert and Laurent (2002) for further details. The parameters ν and ξ are estimated together with the GARCH parameters in the QML framework.

19

Another distribution that we want to include in our analysis is the Generalized Error Distribution (GED) that was proposed by Subbotin (1923). Nelson (1991) introduced the Exponential GARCH model with generalized error distributed shocks. The GED depends on a shape parameter, say ζ. This distribution contains the standard normal as the special case when ζ = 2 is true. If ζ < 2 the distribution exhibits fat tails. As mentioned in Schoffer (2003), the GED is not able to explain the high concentration of returns around zero and fat tails simultaneously. Another special case is the double exponential distribution, also known as the Laplace distribution, which is nested if √ √ ζ = 1. The limiting case (ζ → ∞) is a uniform distribution on (− 3, 3), see Bao et al. (2004) as well as Angelidis et al. (2004). A skewed version of the GED is also available, see Theodossiou (2000), but not included in our work. The parameter ζ is estimated together with the GARCH parameters in the QML framework. We decide to rule out other distributions since we have to restrict our ourselves to the most important ones. Of course, many other distributions are very attractive from the theoretical point of view. But the implementation on computers plays also an important role in practice. Bao et al. (2004) provide an excellent overview of different distributions used in financial econometrics.

5

Quantile Modeling

This section is about modeling strategies and estimation procedures of G−1 α , which is the α-quantile of the distribution of zt . Below, three linked approaches, reaching from Historical Simulation to an Extreme Value Theory based method, are considered.

5.1

Historical Simulation

The full parametric approach has the disadvantage that a lot of restrictive assumptions has to be made. Many of them may be inadequate which can result in poor VaR forecasts. A sharp contrast to the full parametric approach is a method called His20

torical Simulation (HS). The only assumption made is that returns are independently and identically distributed, which is crucial. In other words, no specific distribution is assumed, but volatility dynamics are ignored due to the iid assumption. Historical Simulation is based on a rolling window estimation scheme with a the window size m that is typically 125, 250 or 500. The estimation of the unconditional α-quantile of the return distribution in time t is based on the observations t − m, t − m − 1, ..., t. The one-step ahead VaR forecast for time t + 1 is d α = Fb−1 (m) , VaR t+1 α

(26)

where Fb denotes the empirical distribution of returns. For example, let m = 500 and α = 1%. This implies that the VaR forecast is just the fifth sample order statistic. Naturally, the forecast depends heavily on the chosen window size. If m is very large then the forecast will be nearly constant over time, while the estimate of Fα−1 will be inaccurate if m is too small. The main reason for the poor performance is the one and only assumption that the returns are iid. On the contrary, the HS approach is very easy to implement and does not demand much computer time.

5.2

Filtered Historical Simulation

One way to evade the problematic iid assumption is the Filtered Historical Simulation (FHS) approach, see Pritsker (2001) and Barone-Adesi, Giannopoulus and Vosper (2002). This method uses filtered data zbt =

rt −b µt σ bt

in place of raw return

data rt . The raw returns are filtered by an estimate of µt and σt which can be constructed with full parametric GARCH, SV as well as RV models. It might sound trivial, notwithstanding it is quite important, that different specifications generate different volatility filters, e.g. that a GARCH(1,1) model with normally distributed innovations generates another filter than the same volatility model with skewed-t distributed innovations. The VaR forecast based on a GARCH model due to the

21

Filtered Historical Simulation approach is given by d α = −(b b−1 (m)) , VaR µt+1 + σ bt+1 G t+1 α

(27)

b−1 (m) is the estimated α-quantile of the distribution of filtered returns zbt . where G α The utilization of realized volatility in this context is new although it is straightforward. We will use this nonparametric volatility filter as an alternative to full parametric GARCH and SV filters. Nevertheless, a forecast of realized volatility is required, since the VaR forecast is given by q α c t+1 G d b−1 (m)) , µt+1 + RV VaRt+1 = −((b α

(28)

b−1 (m) is the estimated α-quantile of the distribution of standardized returns. where G α We estimate the mean µ with the rolling window scheme. Another possibility is to fit an autoregressive process to account for the dynamics in the conditional mean. A full parametric approach is proposed by Giot and Laurent (2004) that suffers from the fact that the time series model for realized volatility and the return equation are estimated separately. However, a joint estimation in a system would be more preferable.

5.3

Filtered Extreme Value Approach

Another method that might be combined with Filtered Historical Simulation is the Extreme Value approach. A comprehensive textbook about Extreme Value Theory and its applications is Embrechts et al. (1997). All previous methods try to estimate the quantile with information about the whole distribution. Alternatively, we can focus on the tails directly using extreme value theory, since the 1%-quantile is an extreme value of the distribution. In order to extract the extreme values from our sample we have to choose a threshold u. Next, we define the set of extreme values u as {z(i) |z(i) > u}Ti=1 , where z(1) ≥ z(2) ≥ ... ≥ z(Tu ) > u are ordered filtered returns

22

and Tu is the number of extreme values. Note, that we focus on the left tail of the distribution and therefore work with negative returns. As mentioned in Kuester et al. (2006), there are two main strands in the literature. One of them is concerned with the parameter estimation of the Generalized Pareto distribution, see McNeil and Frey (2000). The other makes use of a tail index estimator, which is recently discussed in Christoffersen and Goncalves (2004). The main difference between the approaches is that the latter one builds on the cumulative distribution function G(z) = 1 − L(z)z −1/ξ , where L(z) is a slowly varying function, instead of the Generalized Pareto distribution. G(z) can approximated by a constant c so that G(z) ≈ 1 − cz −1/ξ . The positive parameter ξ is referred as the tail index which determines the shape of the tail. We employ the nonparametric Hill estimator (due to Hill (1975)) that assumes independently and identically distributed data. The Hill estimator for ξ is Tu ³z ´ 1 X (i) b . ξ= ln Tu i=1 u

(29)

Estimation of the α-quantile requires the restrictions G(u) = 1 −

Tu T

and c =

Tu 1/ξb u , T

see Christoffersen and Goncalves (2004). The estimated function F and the corresponding α-quantile are given by µ

Tu ³ z ´−1/ξb b G(z) =1− , T u

b b−1 (ξ) G α

=u

αT Tu

¶−ξb .

(30)

Finally, the one-step ahead VaR forecast is obtained via b , d α = −(b b−1 (ξ)) µt+1 + σ bt+1 G VaR t+1 α

(31)

where σ bt+1 is a volatility forecast obtained by any of our described models and methods. Hence, the Filtered Extreme Value Approach is very flexible in the sense that it can be combined with several volatility modeling and forecasting approaches like RV, SV and GARCH. 23

6

Evaluation of Value-at-Risk Forecasts

In order to distinguish between correctly specified and misspecified VaR forecasts we apply the conditional coverage criterion by Christoffersen (1998) that is commonly used. Furthermore, the null hypothesis that a VaR forecast is correctly specified can be tested in a likelihood ratio framework. After the application of this specification test we are able to construct a model set of correctly specified forecasts, which we call Mc ⊆ M. This section is continued with a discussion of two loss functions that are used to compare the forecasts k ∈ Mc directly. The end of this section is dedicated to the Superior Predictive Ability (SPA) test of Hansen (2005).

6.1

Conditional Coverage and the Likelihood Ratio Test

The evaluation of a VaR forecast is closely related to the evaluation of an interval forecast. The key idea to interpret the VaR forecast as an interval forecast was mentioned in Christoffersen (1998). A VaR forecast with confidence level α can be ¡ ¢T written equivalently as a half-closed interval forecast (yt+1 )Tt=R = [VaRαt+1 , ∞) t=R with the confidence level 1−α. Note, that we split the whole sample 1, ..., T into an insample period that reaches from observation 1 to R and an out-of-sample period that contains the observations R + 1, ..., T . The in-sample period is used for estimation of the parametric models, while the out-of-sample period is used to evaluate P = T − R one-step ahead VaR forecasts. If the interval forecast is correctly specified then (1 − α)% of the realized values are element of the interval and α% are not. Translated to the VaR forecasting context, ¡ ¢T this statement has the following meaning: If the VaR forecast VaRαt+1 t=R is correctly specified, then α% of the realized return values are smaller than the predicted VaR values. In order to obtain the number of return values that are smaller than the predicted VaR values we use an indicator function. The heart of the VaR forecast

24

specification test is the hit sequence (Ht )Tt=R+1 that meets the condition Ht = I{rt 6∈yt } ⇔ Ht = I{rt 0 holds, we conclude that the benchmark is outperformed by the forecast k. On the contrary, if E(Xt,k ) ≤ 0 ∀k, then the benchmark is (weakly) superior to all alternative forecasting models that are element of Mc . In this case we say that the benchmark exhibits superior predictive ability. We choose the RiskMetrics forecast as the benchmark, see section 2.2.1. In the context of VaR forecasting, we want to test the hypothesis that the RiskMetrics forecast is not outperformed by any other forecast. This idea leads to the multiple null hypothesis H0 : max E(Xt,k ) ≤ 0. k

(42)

In other words, no forecast k ∈ Mc is better than the benchmark model in terms of the particular loss function. The maximum operator appears in the previous formula due to the fact the maximum expected loss difference is the most relevant. If the null hypothesis is rejected, we conclude that there is at least one forecasting model that 29

is superior to the benchmark. The expectation of Xt,k can be consistently estimated P with the sample mean X k = P1 Tt=R+1 Xt,k , where k = 1, ..., l and L = B, Q. White (2000) proposed the following test statistic for the null hypothesis in equation (42): t = max P 1/2 X k . k

(43)

Since the distribution is not unique under the null hypothesis, the stationary bootstrap method of Politis and Romano (1994) is utilized to obtain probability values of t. The idea is to construct an empirical distribution of t which we label t∗ . The quantiles of the empirical distribution function can be compared with the test statistic in order to compute the probability values. In a bootstrapping framework B new samples for Xt,k are generated. They are builded by randomly choosing subsamples of different lengths. These subsamples are put together in order to obtain a re-sampled series of Xt,k . The lengths of the subsamples are independent and are drawn from a geometric distribution with mean q, see Koopman et al. (2005). Assuming stationari ity of Xt,k , we consider B samples of this series which we label Xt,k with i = 1, ..., B. P i i . The empirical distribution function of Its average is written as X k = P1 Pt=R+1 Xt,k

t is then given by i

t∗ = max P 1/2 (X k − X k ) , k

(44)

see Hsu and Kuan (2005). There are two main problems with this approach that are commented in Bao et. al (2006). First, the choice of the forecasting scheme is not irrelevant. Usually, a rolling window scheme is employed, see Koopman et al. (2005), Hansen (2005), Hansen and Lunde (2005), Marcucci (2005), Bao et al. (2006) as well as Kuester et al. (2006). Nevertheless, a recursive scheme is also quite attractive, but the bootstrap method of Politis and Romano (1994) requires a special assumption that cannot be reconciled with such a forecasting scheme, see Hansen (2005). Recently, Corradi and Swanson (2005) developed a bootstrap method that is explicitly designed for the recursive forecasting scheme. Second, the Reality Check test of White (2000) is conservative and depends heavily on the structure of Mc . 30

If this set contains poor forecasts then White’s test rejects the null hypothesis not frequently enough, because it assumes that all competing forecasting model are as precisely as good as the benchmark. Formally, that is E(X1 ) = ... = E(Xl ). This is called the least favorable configuration. A solution of the last problem can be found in Hansen (2005), where a standardized test statistic is proposed. This test statistic is presented by P 1/2 X k e t = max , k ω bkk

(45)

2 2 where ω bkk is an estimate of ωkk = limP →∞ var(P 1/2 X k ). To be more precise, ω bkk = P P i i B B 1 1 2 i=1 (X k − X k ) with X k = B i=1 X k . In order to avoid the least favorable B

configuration and to identify the distribution of e t under the null hypothesis, Hansen (2005) proposed a different way of bootstrapping the distribution of e t. This can be i

done by replacing X k with Z k in the test statistic, where Zki = Xki − I{X k >−Ak } X k , Ak =

1 ω bkk . 4P 4

(46) (47)

This choice of the threshold Ak was proposed in Hansen (2005) and adopted in nearly every empirical work in this field. One can show that the empirical distribution of i

P 1/2 Z k i e t = max k ω bkk

(48)

converges to the distribution of e t under the null hypothesis. The p-value of e t can be computed as B 1 X I ei e . B i=1 {t >t}

(49)

The resulting probability value can be interpreted as a consistent one, since it is defined as limP →∞ P(e ti > t), where t denotes the observed of the test statistic, see Hansen and Lunde (2005). In addition, two inconsistent probability values can be provided in order to obtain a lower and an upper bound for the consistent one. The 31

upper bound corresponds to the probability value of White’s Reality Check test that i

i,U

is conservative. In this case Z k = Z k

i

= X k − X k is specified. The lower bound

corresponds to the probability value of a liberal test whose null hypothesis assumes that models with worse performance than the benchmark are poor models in the ¡ ¢ i i,L i limit, see Hansen et al. (2003). Therefore, Z k = Z k = X k − max X k , 0 . One can i,L

i

i,U

observe that the relation Z k ≤ Z k ≤ Z k holds. This leads to analogous relation of the probability values. Hansen (2005) showed in simulations that the consistent test is robust against the inclusion of poor forecasts and that it is more powerful than the Reality Check test of White (2000). The SPA test is implemented in Ox by Peter Hansen, Jugho Kim and Asgar Lunde, see Hansen et al. (2003) for a manual. The default settings are B = 1000 and q = 0.5.

7

Application

This section deals with the application of the VaR forecasting models and methods that are discussed in section two. As mentioned in section 2.1, we are using a data set of S&P500 future contracts with 2700 observations. We split the whole sample into an in-sample period that reaches from observations 1 to 1833 and an out-ofsample period that contains the observations 1834,...,2700. The in-sample period is used for estimation of the parametric models, while the out-of-sample period is used to evaluate 867 one-step ahead VaR forecasts with a coverage rate that equals one percent. The chosen forecasting scheme is rolling window with a window size of 1833, which is the number of observations in the in-sample period. All models are re-estimated every trading day or two reasons. On the one hand, it is reasonable to account for new information, especially in turbulent periods. On the other hand, it is quite realistic that forecasting models are updated by risk managers when new information is available. 32

The set M consists of 107 VaR forecasts. We have five GARCH specifications with three different distributional assumptions which results in 15 full parametric forecasts beside the RiskMetrics model that assumes normality. In addition, we are considering six parametric time series models for RV. These 21 volatility forecasts and the SV forecast are exploited for filtered historical simulation with window sizes equal to 125, 250 and 500. Moreover, the volatility forecasts are applied in the filtered extreme value context, where we entertain the Hill estimator with 91 observations which roughly corresponds to five percent 1833. Note, that Danielsson and Morimoto (2000) proposed a bootstrap method in order to determine the optimal number of observations that are used for the Hill estimator. Instead of applying this bootstrap method, we follow the simulation results of Christoffersen and Goncalves (2004). They showed that five percent of the data is a satisfying rule of thumb. In sum, we consider 88 filtered, 16 full parametric and three non-parametric VaR forecasts. Our aim is to identify the best VaR forecast from the set of correct specified models Mc . The first step of our empirical analysis is dedicated to some graphical analysis and descriptive statistics of our data set. The next step involves the estimation of AR(FI)MA and HAR models. Furthermore, we report the estimation results of some selected GARCH models as well as the results for the SV model. After that, we use the LR test for conditional coverage to detect the misspecified forecasts. These forecasts will be omitted in our further forecast evaluation. For all correct specified forecast, the losses due to the loss functions Q and B are calculated. They are essential for Hansen’s SPA test which is finally applied. At the end, we are able to identify the best VaR forecast.

7.1

Graphical Data Analysis and Summary Statistics

Let us start with the graphical inspection of our data. In Figure A.1, we present graphs of the returns, the Realized Volatility and its logarithm together with their

33

estimated kernel densities and empirical autocorrelation functions (ACF). The graph of the return series clearly shows that there are some tranquil periods as well as turbulent ones which suggests volatility clustering. The graphical inspection of realized volatility confirms this. Note, that a high absolute daily return does not necessarily imply a high value of realized volatility. This may be the case if the opening price is not far away from the closing price, while the price fluctuates strongly in between. The second row shows the estimated Gaussian kernel densities of the three variables. The return distribution differs from the normal distribution because it exhibits fat tails. On the contrary, the distribution of the logarithmic realized volatility can be viewed as approximately Gaussian. The empirical autocorrelation functions suggests that the returns are serially uncorrelated and that RV exhibits long memory. The long memory property is very apparent for the logarithmic series since the autocorrelation function is dying out very slowly. Even the autocorrelation coefficient for lag 50 is significantly different from zero. Table B.1 provides summary statistics for the three series that we analyzed graphically. The first row of each block contains the statistics for the full sample, while the second and the third row contain the values for the in-sample and out-of-sample period respectively. Regarding the full sample, the return distribution is slightly skewed and exhibits a moderate value of excess kurtosis. Interestingly, the estimated skewness is negative in the in-sample period while it is positive in the out-of-sample period. The distribution of RV is skewed and clearly leptokurtic while the distribution of the logarithmic series is not. Note, that the skewness and excess kurtosis of the RV distribution are much smaller in the out-of-sample period. This is due to the fact that the in-sample period contains the maximum value 20.131. Our findings are mainly in line with other studies, e.g. Koopman et al. (2005). To get an impression of the volatility components of the HAR model, we present them in Figure A.2. The left column shows the daily, weekly and monthly component for the HAR model that is specified for the square root of RV, while the right column 34

provides the same variables for the logarithmic specification. The smoothness of the aggregated series increases with a decreasing frequency.

7.2

Estimation Results

The first estimation results we want to summarize and interpret are presented in Table B.2. All models are estimated for the square root of RV and its logarithm. The estimation is based on 1833 observations of the in-sample period that reaches from 10 March 1994 to 5 June 2001. Note, that the number of observations used for estimation varies with the lag structure of the particular model and that the ARFIMA models are estimated in deviation of the sample means. All estimated parameters are significantly different from zero which implies that the dynamics of RV can be captured by the estimated models. Furthermore, the values of the coefficient of determination are quite high. For the square root of RV an ARFIMA(0, d, 0) and for the logarithm of RV an ARFIMA(1, d, 1) model is chosen respectively. For comparison, we report different estimates of the long memory parameter d. The GPH estimate is used as the starting value in the maximum likelihood estimation of the ARFIMA models. Little differences can be found when we compare the GPH and the GSP estimates. Nevertheless, the estimated values of the long memory parameter in the ARFIMA models are plausible. Note, that the ARFIMA(1, d, 1) model may be covariance-stationary since the estimated long memory parameter is close to the boundary value of 0.5. This finding is in line with the results of Koopman et. al (2005). Compared to Andersen et. al (2005), we obtain quite similar parameter estimates for the HAR model, although the coefficient of determination is noticeably higher. It should be noted that the parameter estimates are getting smaller for a decreasing aggregation frequency. This suggests that the daily volatility component is more persistent than the weekly and the monthly.

35

We use the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) in order to compare the performance of the estimated time series models. We find little evidence for the ARFIMA model compared to the ARMA(2,1) model and the HAR model. Finally, we analyze the estimated residuals by applying the ARCH Lagrange Multiplier (LM) test and the Ljung-Box statistic Q in order test the null hypothesis of homoscedasticity and zero autocorrelation. Since the probability values for the ARCH test with five lags are smaller than five percent, we reject the null hypothesis of homoscedasticity in each case. We find serially uncorrelated residuals only in the cases where a logarithmic transformation to RV is applied. We find the highest probability value of the Ljung-Box statistic for the ARFIMA(1, d, 1) model. All in all, the dynamics of realized volatility seem to be best captured by this specification. Next, we comment some selected estimation results for the GARCH models in Table B.3. We choose the standard GARCH(1,1), the Exponential GARCH(1,1) and the Fractionally Integrated GARCH(1,1) specification with three different assumptions about the innovation distribution. Further results can be found in Table B.4. For each GARCH model, a first-order autoregressive process is fitted to capture the potential serial correlation in the returns, rt = µ + φrt−1 + εt . The standard errors are robust and calculated with the sandwich formula, see Laurent and Peters (2001). For the normal distribution we find that the estimated autoregressive parameter φ is insignificant in each case. Regarding the GARCH model, we find typical values for α and β. The parameters of the EGARCH model are more difficult to interpret. Nevertheless, most of the parameters are significantly different from zero. Especially the estimates for θ1 and θ2 are remarkable high (in absolute values) as well as highly significant. On the contrary, the parameters of the FIGARCH model are mostly insignificant, especially the estimates of α and β. However, d turns out to be the only significant parameter, except the constant ω. The results for the Generalized Error Distribution (GED) are quite close to the 36

previously commented. The shape parameter ζ is in every case smaller than two, indicating fat tails. The last column of Table B.3 displays the results for the skewed t-distribution. The logarithm of the asymmetry parameter ξ is in every case negative and significantly different from zero. This suggests that the data supports the hypothesis of a negatively skewed innovation distribution. Interestingly, the estimated asymmetry parameters of the EGARCH model with skewed-t distributed innovations are clearly smaller than those of the EGARCH model with normally distributed innovations. Furthermore, the estimates of the degrees of freedom ν are close around six. We apply two tests in order to check the adequacy of the estimated models. The first one is the Residual based Diagnostic statistic of Tse (2002) and the second one is the well-known ARCH-LM test. The former one is based on an autoregression with squared standardized residuals, where we choose 15 lags. The probability values of the ARCH-LM test statistic and those of Tse’s RBD statistic suggest that nearly all models are correctly specified. Two exceptions are the GARCH and the FIGARCH model that assume a skewed-t distribution. The comparison of the Bayesian Information Criteria (BIC) values can be summarized as follows: First, the skewed t-distribution is more preferable than the GED which is turn more preferable than the normal distribution. Second, the EGARCH appears to be the best model, the standard GARCH the second best and the FIGARCH as the worst. All in all, the EGARCH model with skewed-t distributed innovations seems to be the most preferable. For the standard Stochastic Volatility model we obtain the estimates: ψb = bσ (β, b∗ , σ bη )0 = (0.973, 0.035, 0.620)0 . The estimated persistence parameter β is close to one, indicating high persistence. Moreover, it is close to the sum of the GARCH-N b parameters α b and β.

37

7.3

VaR Forecast Evaluation

Some selected VaR forecasts are displayed in the Figures A.3, A.4 and A.5. The first one compares two Filtered Historical Simulation and realized volatility based forecasts, namely the 250-Log HAR and the 250-Log ARFIMA forecasts. Note, that we have introduce several abbreviations that are exemplified: A HAR model for the logarithm of RV in combination with the FHS approach that uses m observations to estimate the quantile, will be abbreviated as m-Log HAR. In the same spirit, a FIGARCH-SKEW-t means a full parametric FIGARCH model with innovations that are skewed-t distributed. Consequently, a 500-TGARCH-GED model is a Threshold GARCH model with the assumption of generalized error distributed innovations in combination with FHS that uses 500 observations. Finally, we call a Stochastic Volatility model that is combined with the filtered extreme value approach simply EVT-SV. However, it is difficult to find great differences between the forecasts in Figure A.3. The second plot shows the full parametric EGARCH model with a skewed t-distribution and the non-parametric VaR forecast due to the Historical Simulation method with a window size of 250. For the latter one, the stepped pattern is typical. As the window size increases, the forecast plot becomes more flat. Furthermore, the differences between the EVT-SV and the EVT-FIGARCH-SKEW-t forecasts seem to be surprisingly small. The next step of our analysis is the test for conditional coverage, see Table B.5 and B.6. The results for the 107 forecasts can be summarized as follows: One and only one forecast is misspecified, namely the RiskMetrics forecast. Its coverage rate is approximately 2.1% which is more than twice as expected under the null hypothesis of correct conditional coverage. All other forecasts have coverage rates around 1% in a moderate distance. The full parametric models provide coverage rates around one percent that are decreasing if a fat tailed distribution is assumed. Moreover, the coverage rates take

38

the smallest values if the skewed t-distribution is assumed. The EVT-based models tend to very low coverage rates which suggests that the Value-at-Risk is generally over-predicted. The filtered models are quite good, since they provide coverage rates that are close to one percent. We find many specifications that provide the very accurate VaR forecasts. The estimated correlation coefficient between Ht and Ht−1 has typically the opposite sign of the coverage rate. Recall, that λ2 = π11 − π01 . In nearly all cases, n11 = 0 holds, which implies that π11 equals zero. Hence, the estimated probability of observing a hit today, given that yesterday a hit occurred, is zero. If additionally HR+1 = 0 and HT = 0 are true, then n = n01 = n10 is also e2 = −e true. This implies that π = π01 = π10 holds. Hence, λ π . In our application, HR+1,k = 0 and HT,k = 0 holds for all k. Therefore, we are able to identify the forecasts with n11 6= 0 quickly in Table B.4. These cases are all unfiltered Historical Simulation forecasts. However, the results of the Likelihood Ratio test for conditional coverage suggests that the only misspecified VaR forecast is the RiskMetrics forecast. In order to proceed, we have to identify the set of correctly specified VaR forecasts Mc . Consequently, the RiskMetrics forecast is not element of Mc . On the contrary, this forecast is said to be the benchmark since it is used widely in practice. Therefore, we analyze two different versions of Mc , one that contains RiskMetrics as the benchmark and another one that excludes it. For the latter one, we choose the most similar forecasting model as the benchmark which is the standard GARCH model with a normality assumption. We compute the losses due to the loss functions B and Q for each forecast k ∈ Mc . In order to compare their predictive ability we focus on the loss averages. Table B.7 provides a top ten list of VaR forecasts. They are ranked ascending by the averages of Bk and Qk . The results are striking. The top ten consists of six and seven realized volatility based forecasts for the loss function B and Q respectively. Regarding the economical loss function B, the ARFIMA model, that is estimated in logarithms and that is based on a quantile estimator due to the Filtered Historical Simulation ap39

proach with 250 observations, appears to be the best performing forecasting model. The short memory ARMA(2,1) model is also very accurate. Moreover, the HAR model exhibits good forecasting abilities. The logarithmic version performs a bit better than the square-root version of this model. Note, that the window size of 250 observations, which approximately corresponds to trading data of one year, emerges clearly as the most successful. In addition, three full parametric GARCH models with a normality assumption appear in the top ten. This is quite surprising since the existing literature suggests that this distributional assumption is misleading. Interestingly, the misspecified RiskMetrics forecast performs very well and is placed on rank five. On rank 10 we find a hybrid model that is a mixture of a SV model and a quantile estimate based on the filtered extreme value theory approach. Regarding the statistical loss function Q, the full parametric Exponential GARCH model with normally distributed innovations appears to be the best forecasting model. It is followed by a group of six realized volatility based forecasts. This group consists mainly of short memory HAR and ARMA(2,1) models. Compared to the loss function B, we find that the window size 500 seems to be the most appropriate. Furthermore, it is rather unclear if the logarithmic or the square root transformation is more preferable. Again, the EVT-SV model shows a good performance. The second EVT based forecasts that belongs to the top ten is the ARFIMA(0, d, 0) model for the square root of RV. The full parametric EGARCH model provides overall an excellent performance. The RiskMetrics forecasts is placed on rank 72. In Figure A.6 we present the computed losses for the best and the worst VaR forecasts for the loss function B. The same is done in Figure A.7 for the asymmetric tick function Q. Obviously, the asymmetric tick function causes more spikes, since the weights α and 1 − α are quite extreme. The losses due to the function B are smoother over time, although there is a big spike located approximately at observation 135. Moreover, it is eye-catching that the best models are making big errors while the worst models are behaving much better. It would be interesting to analyze if 40

combinations of some forecasts are doing a better job than the single ones but this aspect is beyond the scope of this work. The bottom of Table B.7 displays the results of Hansen’s SPA test. The bootstrap parameters are B = 2000 and q = {0.25, 0.50, 0.75}. The results and the drawn conclusions remain nearly the same if we vary the values of B and q. First, the misspecified RiskMetrics forecast is chosen as the benchmark. For both loss functions we find that the benchmark is not significantly outperformed by any forecasting model. The consistent probability values are 73.5% and 23.1% for B and Q respectively. For the loss function B we observe a relative great difference between the lower and the upper bound. This indicates that there are a lot of models that are poor compared to the benchmark. This is not surprising since RiskMetrics is placed on rank 5. Analogously, this difference is close to zero for the asymmetric tick function Q, where the benchmark has a relative high loss on average. Second, we choose the full parametric GARCH model with normally distributed innovations as the benchmark from the set Mc that excludes the RiskMetrics forecast. This model is placed on the ranks 16 and 19 for B and Q respectively. Compared to the previous analysis, the results and the conclusions for the GARCH-N model are not very different. The benchmark is not significantly outperformed by any forecast VaRαt,k , k ∈ Mc regardless of the loss function. The consistent probability values are 26.6% and 61.5% for B and Q respectively. In both cases the difference between the inconsistent probability values is about 20%. Group-wise comparisons of VaR forecasts are naturally bases on a subset of Mc . Therefore, the SPA test would not be applied to the full set of forecasts. We are not conducting such analysis since it would contradict the spirit of the data-snooping robust SPA test.

41

8

Discussion

This work is concerned with several approaches to Value-at-Risk forecasting. We consider short and long memory time series models for realized volatility and a selection of volatility models like GARCH and Stochastic Volatility. Furthermore, we apply a lot of hybrid specifications, e.g. the Filtered Historical Simulation and the Filtered Extreme Value approach are entertained. In sum, we analyze the one step ahead out-of-sample VaR forecasting performance of 107 models for S&P500 future contracts. In the evaluation procedure, we find that the RiskMetrics methodology, which is widely used in practice, generates the only misspecified forecast. Due to the high number of forecasts we face potential data snooping problems. In order to avoid spurious statistical inference about the forecasting performance, the robust and powerful Superior Predictive Ability test of Hansen (2005) is utilized. The results suggest that nor the RiskMetrics neither a simple GARCH model with normally distributed innovations are significantly outperformed by any other forecasting model. Nevertheless, the best performing forecasting models are hybrid specifications that are based on realized volatility and the Filtered Historical Simulation approach. In addition, our findings indicate that the full parametric Exponential GARCH model is very successful. Moreover, the performance of models assuming normality is surprisingly good. Acknowledgments: I would like to thank J¨org Breitung, Uta Pigorsch and Christina Ziegler for helpful comments.

References Andersen, T.G., T. Bollerslev, P.F. Christoffersen and F.X. Diebold (2005), “Volatility Forecasting”, National Bureau of Economic Research, Working Paper.

42

Andersen, T.G., T. Bollerslev, F.X. Diebold and H. Ebens (2001), “The Distribution of Stock Return Volatility”, Journal of Financial Economics, 61, 43-76.

Andersen, T.G., T. Bollerslev, F.X. Diebold and P. Labys (2003), “Modeling and Forecasting Realized Volatility”, Econometrica, 71, 579-625.

Angelidis, T., A. Benos and S. Degiannakis (2004), “The use of GARCH models in VaR estimation”, Statistical Methodology, 1, 105–128.

Arztner, P., F. Delbaen, J.M. Eber and D. Heath (1999), “Coherent measures of risk”, Mathematical Finance, 3, 203-228.

Baillie, R.T. (1996), “Long Memory Processes and Fractional Integration in Econometrics”, Journal of Econometrics, 73, 5-59.

Baillie, R.T., T. Bollerslev and H.O. Mikkelsen (1996), “Fractionally Integrated Generalized Autoregressive Conditional Heteroskedasticity”, Journal of Econometrics, 74, 3-30.

Bao, Y., T.H. Lee and B. Saltoglu (2004), “A Test for Density Forecast Comparison with Applications to Risk Management”, Working Paper.

Bao, Y., T.H. Lee and B. Saltoglu (2006), “Evaluating Predictive Performance of Value-at-Risk Models in Emerging Markets: A Reality Check”, forthcoming in: Journal of Forecasting.

Barndorff-Nielsen, O.E. and N. Shephard (2002), “Estimating Quadratic Variation Using Realised Variance”, Journal of Applied Econometrics, 17, 457-477. 43

Barone-Adesi, G., K. Giannopoulos and L. Vosper (2002), “Backtesting derivative portfolios with filtered historical simulation (FHS)”, European Financial Management, 8, 31-58.

Basle Committee on Banking Supervision (1995): “An Internal Model-based Approach to Market Risk Capital Requirements”, Report.

Bollerslev, T. (1986), “Generalized Autoregressive Conditional Heteroskedasticity”, Journal of Econometrics, 31, 307-327.

Bollerslev, T. (1987), “A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return”, Review of Economics and Statistics, 69, 542-547.

Bollerslev, T., U. Kretschmer, C. Pigorsch and G. Tauchen (2005), “A Discrete-Time Model for Daily S&P500 Returns and Realized Variations: Jumps and Leverage Effects”, Working Paper.

Bollerslev, T. and J.M. Wooldridge (1992), “Quasi-Maximum Likelihood Estimation and Inference in Dynamic Models with Time Varying Covariances”, Econometric Reviews, 11, 143-172.

Carnero, M.A., D. Pena and E. Ruiz (2004), “Persistence and Kurtosis in GARCH and Stochastic Volatility Models”, Journal of Financial Econometrics, 2, 319-342.

Christoffersen, P. (1998), “Evaluating Interval Forecasts”, International Economic Review, 39, 841-862.

44

Christoffersen, P.F. and F.X. Diebold (2000), “How relevant is volatility forecasting for financial risk management?”, The Review of Economics and Statistics, 82, 12–22.

Christoffersen, P.F. and S. Goncalves (2004), “Estimation Risk in Financial Risk Management”, Journal of Risk, 7, 1-28.

Christoffersen, P.F. and D. Pelletier (2004), “Backtesting Value-at-Risk: A DurationBased Approach”, Journal of Financial Econometrics, 2, 84-108.

Chung, C.F. (1999), “Estimating the fractionally intergrated GARCH model”, Working paper.

Clements, M.P. and N. Taylor (2003), “Evaluating Interval Forecasts of High-Frequency Financial Data”, Journal of Applied Econometrics, 18, 445-456.

Corradi, V. and N.R. Swanson (2005), “Nonparametric Bootstrap Procedures for Predictive Inference Based on Recursive Estimation Schemes”, Working Paper.

Corsi, F. (2004), “A Simple Long Memory Model of Realized Volatility”, Working paper.

Corsi, F., U. Kretschmer, S. Mittnik and C. Pigorsch (2005), “The Volatility of Realized Volatility”, Working paper.

Danielsson, J. and Y. Morimoto (2000), “Forecasting Extreme Financial Risk: A Critical Analysis of Practical Methods for the Japanese Market”, Monetary and Economic Studies, 18, 25-47.

45

Davidson, J. (2004), “Moment and memory properties of linear conditional heteroscedasticity models, and a new model”, Journal of Business and Economics Statistics, 22, 16-29.

Deutsche Bank (2004), Annual report, Internet: http://annualreport.deutsche-bank.com/ 2004/ar/riskreport.php.

Ding, Z., C.W.J. Granger and R.F. Engle (1993), “A long memory property of stock market returns and a new model”, Journal of Empirical Finance, 1, 83-106.

Doornik, J.A. and M. Ooms (2001), A Package for Estimating, Forecasting and Simulating Arfima Models: Arfima package 1.01 for Ox, Documentation.

Durbin, J. and S.J. Koopman (1997), “Monte Carlo Maximum Likelihood Estimation for Non-Gaussian State Space Models”, Biometrika, 84, 669-684.

Durbin, J. and S.J. Koopman (2002), “A simple and efficient simulation smoother for state space time series analysis”, Biometrika 89, 603–616.

Embrechts, P., C. Kl¨ uppelberg and T. Mikosch (1997), “Modelling Extremal Events for Insurance and Finance”, Springer-Verlag, Berlin.

Engle, R.F. and S. Manganelli (2004), “CAViaR: Conditional Autoregressive Value at Risk by Regression Quantiles”, Journal of Business and Economic Statistics, 22, 367-381.

Fernandez, C. and M.F.J. Steel (1998), “On Bayesian Modeling of Fat Tails and Skewness”, Journal of the American Statistical Association, 93, 359-371. 46

Fleming, J. and C. Kirby (2003), “A Closer Look at the Relation between GARCH and Stochastic Autoregressive Volatility”, Journal of Financial Econometrics, 1, 365419.

Gallant, A.R., C.T. Hsu and G.E. Tauchen (1999), “Using Daily Range Data to Calibrate Volatility Diffusions and Extract the Forward Integrated Variance”, Review of Economics and Statistics, 81, 617-631.

Geweke, J. and S. Porter-Hudak (1984), “The Estimation and Application of Long Memory Time Series Models”, Journal of Time Series Analysis, 4, 221-238.

Giot, P. and S. Laurent (2004), “Modelling daily Value-at-Risk using realized volatility and ARCH type models”, Journal of Empirical Finance, 11, 379-398.

Glosten, L.R., R. Jagannathan and D.E. Runkle (1993), “On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks”, Journal of Finance, 48, 1779-1801.

Gu´egan, D. (2005), “How can we Define the Concept of Long Memory? An Econometric Survey”, Econometric Reviews, 21, 113-149.

Hamilton, J.P. (1994), Time Series Analysis, Princeton, NJ: Princeton University Press.

Hansen, P.R. (2005), “A Test for Superior Predictive Ability”, Journal of Business & Economic Statistics, 23, 4, 365-380.

47

Hansen, P.R., J. Kim and A. Lunde (2003), “Testing for Superior Predictive Ability using Ox. A Manual for SPA for Ox”, Documentation.

Hansen, P.R. and A. Lunde (2005), “A Forecast Comparison of Volatility Models: Does Anything Beat a GARCH (1,1)?”, Journal of Applied Econometrics, 20, 873889.

Harvey, A.C. and N. Shephard (1996), “Estimation of an Asymmetric Stochastic Volatility Model for Asset Returns”, Journal of Business & Economic Statistics, 14, 429-34.

Harvey, C.R. and A. Siddique (1999), “Autoregressive Conditional Skewness”, Journal of Financial and Quantitative Analysis, 34, 465–487.

Hill, B. (1975), “A simple general approach to inference about the tail of a distribution”, Annals of Statistics, 3, 1163-1174.

Hsu, P.H. and C.M. Kuan (2005), “Reexamining the Profitability of Technical Analysis with Data Snooping Checks”, Journal of Financial Econometrics, 3, 606-628.

Kim, S., N. Shephard and S. Chib (1998), “Stochastic Volatility: Likelihood Inference and and Comparison with ARCH models”, Review of Economic Studies, 65, 361-393.

Koenker, R. and G.J. Bassett (1978),“ Regression quantiles”, Econometrica 46, 33–50.

Koopman, S.J., B. Jungbacker and E. Hol (2005), “Forecasting daily variability of the S&P 100 stock index using historical, realised and implied volatility measurements”, Journal of Empirical Finance, 12, 445– 475. 48

Kuester, K., S. Mittik and M.S. Paolella (2006), “Value-at-Risk Prediction: A Comparison of Alternative Strategies”, Journal of Financial Econometrics, 4, 53–89.

Lambert, P. and S. Laurent (2002), “Modelling skewness dynamics in series of financial data using skewed location-scale distributions”, Working paper.

Laurent, S. and J.P. Peters (2002), “G@RCH 2.2: An Ox Package for Estimating and Forecasting Various ARCH Models”, Journal of Economic Surveys, 16, 447–485.

Lee, K. and S.J. Koopman (2004), “Estimating Stochastic Volatility Models: A Comparison of Two Importance Samplers”, Studies in Nonlinear Dynamics & Econometrics, 8, Article 5.

Liesenfeld, R. and R.C. Jung (2000), “Stochastic volatility models: conditional normality versus heavy-tailed distributions”, Journal of Applied Econometrics, 15, 137160.

Lopez, J.A. (1997): “Regulatory Evaluation of Value-at-Risk-Models”, Working paper.

Manabu, A. and M. McAleer (2005), “Dynamic Asymmetric Leverage In Stochastic Volatility Models”, Econometric Reviews, 24, 317-332.

Marcucci, J. (2005), “Forecasting Stock Market Volatility with Regime-Switching GARCH Models”, Studies in Nonlinear Dynamics and Econometrics, 9, Article 6.

Martens, M., D. van Dijk and M. de Pooter (2004), “Modeling and Forecasting S&P 49

500 Volatility: Long Memory, Structural Breaks and Nonlinearity”, Working paper.

McLeod, A.I. and K.W. Hipel (1978), “Preservation of the rescaled adjusted range, 1: A reassessment of the Hurst phenomenon”, Water Resources Research, 14, 543-553.

McNeil, A.J. and R. Frey (2000), “Estimation of Tail-Related Risk Measures for Heteroskedastic Financial Time Series: An Extreme Value Approach”, Journal of Empirical Finance, 7, 271-300.

Melino, A. and S.M. Turnbull (1990), “Pricing Foreign Currency Options with Stochastic Volatility”, Journal of Econometrics, 45, 239-265.

Mittnik, S. and M.S. Paolella (2000), “Conditional Density and Value-at-Risk Prediction of Asian Currency Exchange Rates”, Journal of Forecasting, 19, 313-333.

M¨ uller, U.A., M.M. Dacorogna, R.D. Dav´e, R.B. Olsen, O.V. Puctet, and J. von Weizs¨acker (1997), “Volatilities of Different Time Resolutions — Analyzing the Dynamics of Market Components”, Journal of Empirical Finance, 4, 213-239.

Nelson, D.B. (1991), “Conditional Heteroskedasticity in Asset Returns: A New Approach”, Econometrica, 59, 347-370.

Oomen, R.C.A. (2005), “Properties of Bias-Corrected Realized Variance Under Alternative Sampling Schemes”, Journal of Financial Econometrics, 3, 555-577.

Politis, D.N. and J.P. Romano (1994), “The Stationary Bootstrap”, Journal of the American Statistical Association, 89, 1303–1313.

50

Pong, S.Y., M.B. Shackleton, S.J. Taylor and X. Xu (2004), “Forecasting currency volatility: A comparison of implied volatilities and AR(FI)MA models”, Journal of Banking & Finance, 28, 2541–2563.

Poon, S. and C.W.J. Granger (2003), “Forecasting Volatility in Financial Markets: A Review”, Journal of Economic Literature, XLI, 478-539.

Pritsker, M. (2001), “The Hidden Dangers of Historical Simulation”, Working paper.

RiskMetrics (1995), Technical Document, J.P. Morgan, New York.

Robinson, P.M. and M. Henry (1998), “Long and short memory conditional heteroscedasticity in estimating the memory parameter of levels”, Discussion paper,

Sarma, M., S. Thomas and A. Shah (2003), “Selection of Value-at-Risk Models”, Journal of Forecasting, 22, 337–358.

Schoffer, O., (2003), “HY-A-PARCH: A Stationary A-PARCH Model with Long Memory”, Working paper.

Shephard, N. (1996), “Statistical Aspects of ARCH and Stochastic Volatility Models”, in D.R. Cox, D.V. Hinkley and O.E. Barndorff-Nielsen (eds.), Time Series Models in Econometrics, Finance and Other Fields, 1-67. London: Chapman & Hall.

Shephard N. and M.K. Pitt (1997), “Likelihood analysis of non-Gaussian measurement time series”, Biometrika, 84, 653-667.

51

Subbotin, M. (1923), “On the law of frequency of errors”, Mathematicheskii Sbornik, 31, 296-301.

Taylor, S.J. (1986) Modeling Financial Time Series, Chichester, UK: John Wiley and Sons.

Theodossiou, P. (2000), “Distribution of financial asset prices, the skewed generalized error distribution, and the pricing of options”, Working paper.

Tsay, R.S. (2005), Analysis of Financial Time Series, Second edition, Wiley Series in Probability and Statistics.

Tse, Y. K. (2002), “Residual-based diagnostics for conditional heteroscedasticity models”, Econometrics Journal, 5, 358-374.

White, H. (2000), “A reality check for data snooping”, Econometrica, 68, 1097-1126.

52

A

Graphics

10

Returns

RV

20

Log RV

2.5

5 0

0.0

10

−2.5

−5 0

700

1400

2100

0.6

0

700

1.00

Density−Returns

1400

2100

0

Density−RV

2100

Denstiy−Log RV

0.50

0.2

1400

0.4

0.75

0.4

700

0.2

0.25 −5

1

0

5

0 1.0

ACF−Returns

0

10

20

ACF−RV

20

40

−2.5

0.0

2.5

ACF−Log RV

0.5

0.5

0

−5.0 1.0

0

20

40

0

20

40

Figure 1: Empirical Properties of Returns, Realized Volatility and its Logarithm.

53

Sqrt RV daily

4

Log RV daily

2.5 0.0

2 −2.5 0

400

800

1200

1600

2000

2400 2

Sqrt RV weekly

3

0

2

400

800

1200

1600

2000

2400

1200

1600

2000

2400

1200

1600

2000

2400

Log RV weekly

0

1 −2 0

400

800

1200

1600

2000

2400

Sqrt RV monthly

2

0 2

400

800

Log RV monthly

1 0 1

−1 −2 0

400

800

1200

1600

2000

2400

0

400

800

Figure 2: Time Series Plots of daily, weekly and monthly Realized Volatility.

54

Returns

250−Log HAR

5

0

−5 0

100

200

300

400 Returns

500

600

700

800

600

700

800

250−Log ARFIMA

5

0

−5 0

100

200

300

400

500

Figure 3: Filtered Historical Simulation and Realized Volatility based VaR Forecasts.

55

RETURNS

EGARCH

5

0

−5 0

100 RETURNS

7.5

200

300

400

500

600

700

800

300

400

500

600

700

800

HS 250

5.0 2.5 0.0 −2.5 0

100

200

Figure 4: Full parametric and Non-parametric VaR Forecasts.

56

Returns

EVT−SV

5

0

−5 0

100

200

300

400 Returns

500

600

700

800

700

800

EVT−FIGARCH−SKEW−t

5

0

−5

0

100

200

300

400

500

600

Figure 5: Extreme Value Theory based VaR Forecasts.

57

EVT−GARCH−N

250−Log ARFIMA

1.50

1.25

1.00

0.75

0.50

0.25

0

100

200

300

400

500

600

700

800

Figure 6: Losses due to B of the best and the worst VaR Forecasts.

58

500−HS

EGARCH−N

1.0

0.8

0.6

0.4

0.2

0

100

200

300

400

500

600

700

800

Figure 7: Losses due to Q of the best and the worst VaR Forecasts.

59

B

Tables

Table 1: Summary Statistics for Returns, Realized Volatility and its Logarithm Time Series

Mean

Std. Dev.

Skewness

Kurtosis

Min

Max

(rt )Tt=1

0.016

0.972

-0.087

9.743

-7.546

7.822

(rt )R t=1

0.014

0.947

-0.388

10.929

-7.546

7.465

(rt )Tt=R+1

0.021

1.025

0.415

7.761

-3.623

7.822

T (RV)t=1 R (RV)t=1 T (RV)t=R+1 T (ln(RV))t=1

0.866

1.261

6.852

74.192

0.011

20.131

0.832

1.293

7.643

87.659

0.011

20.131

0.939

1.187

4.723

34.405

0.039

12.309

-0.613

0.934

0.133

3.422

-4.516

3.002

(ln(RV))R t=1

-0.674

0.953

0.127

3.429

-4.516

3.002

(ln(RV))Tt=R+1

-0.482

0.880

0.232

3.332

-3.244

2.510

Remarks: The first row displays the summary statistics for the full sample t = 1, ..., T , the second for the in-sample t = 1, ..., R and the third for the out-of-sample period t = R + 1, ..., T .

60

Table 2: Estimation Results for AR(FI)MA and HAR Models Model

Square Root of RV

Log RV

µ

0.810 (0.000)

-0.639 (0.000)

φ1

1.240 (0.000)

1.203 (0.000)

φ2

-0.257 (0.001)

-0.216 (0.000)

θ1

-0.847 (0.000)

-0.837 (0.000)

R2

0.454

0.546

AIC

0.568

1.956

BIC

0.580

1.968

Q(15)

(0.027)

(0.080)

ARCH(5)

(0.000)

(0.008)

GPH

d

0.412 (0.000)

0.409 (0.000)

GSP

d

0.396 (0.000)

0.376 (0.000)

ARFIMA

φ1

0

0.374 (0.011)

d

0.386 (0.000)

0.468 (0.000)

θ1

0

-0.488 (0.001)

AIC

0.564

1.952

BIC

0.571

1.964

Q(15)

(0.074)

(0.310)

ARCH(5)

(0.000)

(0.007)

α0

0.110 (0.000)

-0.153 (0.017)

α1

0.402 (0.000)

0.367 (0.000)

α2

0.240 (0.000)

0.310 (0.000)

α3

0.200 (0.000)

0.213 (0.000)

R2

0.450

0.543

AIC

0.575

1.962

BIC

0.587

1.974

Q(15)

(0.080)

(0.129)

ARCH(5)

(0.000)

(0.053)

ARMA

HAR

Remarks: R2 denotes the coefficient of determination, AIC the Akaike Information Criterion and BIC the Bayesian Information Criterion. The rows ARCH(5) and Q(15) contain the probability values of Engle’s test for ARCH effects with five lags and Ljung-Box test for serial correlation with 15 lags respectively. For the estimated parameters the corresponding probability values of the significance tests are reported in brackets.

61

Table 3: Estimation Results for (FI)GARCH Models Model GARCH

N

GED

SKEW-t

µ

0.043 (0.015)

0.068 (0.000)

0.034 (0.041)

φ

-0.026 (0.303)

-0.066 (0.008)

-0.073 (0.002)

ω

0.012 (0.115)

0.009 (0.000)

0.010 (0.000)

α

0.095 (0.013)

0.075 (0.000)

0.068 (0.000)

β

0.899 (0.000)

0.918 (0.000)

0.924 (0.000)

ζ

EGARCH

1.274 (0.000)

ν

5.596 (0.000)

ln(ξ)

-0.138 (0.000)

BIC

2.545

2.486

2.473

RBD(15)

(0.354)

(0.276)

(0.250)

ARCH(5)

(0.144)

(0.069)

(0.043)

µ

0.017 (0.345)

0.051 (0.008)

0.020 (0.060)

φ

-0.025 (0.297)

-0.061 (0.010)

-0.057 (0.011)

ω

-0.157 (0.381)

-0.572 (0.004)

-1.346 (0.000)

α

-0.477 (0.002)

-0.435 (0.002)

-0.330 (0.025)

β

0.978 (0.000)

0.981 (0.000)

0.978 (0.000)

θ1

-0.182 (0.002)

-0.149 (0.001)

-0.129 (0.000)

θ2

0.212 (0.000)

0.192 (0.000)

0.164 (0.000)

ζ

FIGARCH

1.344 (0.000)

ν

6.207 (0.000)

ln(ξ)

-0.145 (0.000)

BIC

2.511

2.472

2.459

RBD(15)

(0.625)

(0.696)

(0.727)

ARCH(5)

(0.439)

(0.521)

(0.568)

µ

0.049 (0.035)

0.070 (0.001)

0.041 (0.023)

φ

-0.022 (0.490)

-0.066 (0.005)

-0.073 (0.002)

ω

0.666 (0.000)

0.592 (0.000)

0.552 (0.000)

α

0.000 (1.000)

0.000 (1.000)

0.061 (0.918)

β

0.163 (0.885)

0.191 (0.841)

0.284 (0.676)

d

0.284 (0.002)

0.278 (0.003)

0.284 (0.000)

ζ

1.301 (0.000)

ν

6.372 (0.000)

ln(ξ)

-0.136 (0.000)

BIC

2.545

2.488

2.475

RBD(15)

(0.538)

(0.337)

(0.150)

ARCH(5)

(0.331)

(0.136)

(0.025)

Remarks: Probability values for the significance tests are reported in brackets. BIC is the Bayesian Information Criterion, ARCH(5) denotes the probability value for the ARCH-LM test by Engle (1982) with five lags, while the values in the row RBD(15) are the probability values of the specification test by Tse (2002) with 15 lags.

62

Table 4: Further Estimation Results for (HY)GARCH Models Model TGARCH

N

GED

SKEW-t

µ

0.018 (0.334)

0.056 (0.000)

0.022 (0.212)

φ

0.005 (0.855)

-0.047 (0.013)

-0.048 (0.050)

ω

0.016 (0.099)

0.012 (0.025)

0.013 (0.008)

α

0.008 (0.618)

0.006 (0.627)

0.000 (0.966)

β

0.906 (0.000)

0.918 (0.000)

0.925 (0.000)

γ

0.134 (0.015)

0.111 (0.001)

0.113 (0.000)

ζ

HYGARCH

1.318 (0.000)

ν

6.007 (0.000)

ln(ξ)

-0.153 (0.000)

BIC

2.493

2.476

2.461

RBD(15)

(0.542)

(0.504)

(0.515)

ARCH(5)

(0.509)

(0.396)

(0.427)

µ

0.049 (0.031)

0.071 (0.001)

0.036 (0.048)

φ

-0.024 (0.415)

-0.066 (0.005)

-0.074 (0.001)

ω

0.034 (0.619)

0.021 (0.641)

0.027 (0.367)

α

0.000 (1.000)

0.000 (1.000)

0.159 (0.662)

β

0.177 (0.890)

0.202 (0.866)

0.455 (0.390)

d

0.286 (0.116)

0.268 (0.201)

0.355 (0.122)

ln(τ )

0.100 (0.577)

0.119 (0.643)

0.035 (0.796)

ζ

1.276 (0.000)

ν

5.603 (0.000)

ln(ξ)

-0.139 (0.000)

BIC

2.550

2.511

2.481

RBD(15)

(0.606)

(0.447)

(0.264)

ARCH(5)

(0.383)

(0.187)

(0.041)

Remarks: Probability values for the significance tests are reported in brackets. BIC is the Bayesian Information Criterion, ARCH(5) denotes the probability value for the ARCH-LM test by Engle (1982) with five lags, while the values in the row RBD(15) are the probability values of the specification test by Tse (2002) with 15 lags.

63

Table 5: Conditional Coverage Test Results π e

e2 λ

LR

Prob.

RiskMetrics

0.021

-0.021

8.526

0.018

GARCH-N

0.013

-0.013

0.871

0.647

GARCH-GED

0.007

-0.007

1.009

0.604

GARCH-SKEW-t

0.005

-0.005

3.203

0.202

TGARCH-N

0.013

-0.013

0.871

0.647

TGARCH-GED

0.007

-0.007

1.009

0.604

TGARCH-SKEW-t

0.003

-0.003

5.018

0.081

EGARCH-N

0.012

-0.012

0.433

0.805

EGARCH-GED

0.008

-0.008

0.458

0.795

EGARCH-SKEW-t

0.005

-0.005

3.203

0.202

FIGARCH-N

0.013

-0.013

0.871

0.647

FIGARCH-GED

0.008

-0.008

0.458

0.795

FIGARCH-SKEW-t

0.007

-0.007

1.009

0.604

HYGARCH-N

0.010

-0.011

0.202

0.904

HYGARCH-GED

0.007

-0.007

1.009

0.604

HYGARCH-SKEW-t

0.006

-0.006

1.901

0.387

EVT-Log ARMA

0.005

-0.005

3.203

0.202

EVT-Sqrt ARMA

0.006

-0.006

1.901

0.387

EVT-Log ARFIMA

0.005

-0.005

3.203

0.202

EVT-Sqrt ARFIMA

0.005

-0.005

3.203

0.202

EVT-Log HAR

0.006

-0.006

1.901

0.387

EVT-Sqrt HAR

0.006

-0.006

1.901

0.387

EVT-GARCH-N

0.005

-0.005

3.203

0.202

EVT-GARCH-GED

0.006

-0.006

1.901

0.387

EVT-GARCH-SKEW-t

0.005

-0.005

3.203

0.202

EVT-TGARCH-N

0.003

-0.003

5.018

0.081

EVT-TGARCH-GED

0.003

-0.003

5.018

0.081

EVT-TGARCH-SKEW-t

0.003

-0.003

5.018

0.081

EVT-EGARCH-N

0.005

-0.005

3.203

0.202

EVT-EGARCH-GED

0.007

-0.007

1.009

0.604

EVT-EGARCH-SKEW-t

0.007

-0.007

1.009

0.604

EVT-FIGARCH-N

0.007

-0.007

1.009

0.604

EVT-FIGARCH-GED

0.007

-0.007

1.009

0.604

EVT-FIGARCH-SKEW-t

0.007

-0.007

1.009

0.604

EVT-HYGARCH-N

0.007

-0.007

1.009

0.604

EVT-HYGARCH-GED

0.007

-0.007

1.009

0.604

EVT-HYGARCH-SKEW-t

0.007

-0.007

1.009

0.604

EVT-SV

0.013

-0.013

0.871

0.647

125-Log ARMA

0.010

-0.011

0.202

0.904

125-Sqrt ARMA

0.010

-0.011

0.202

0.904

125-Log ARFIMA

0.010

-0.011

0.202

0.904

125-Sqrt ARFIMA

0.010

-0.011

0.202

0.904

125-Log HAR

0.013

-0.013

0.871

0.647

125-Sqrt HAR

0.009

-0.009

0.201

0.904

125-GARCH-N

0.008

-0.008

0.458

0.795

125-GARCH-GED

0.009

-0.009

0.201

0.904

125-GARCH-SKEW-t

0.010

-0.011

0.202

0.904

125-TGARCH-N

0.008

-0.008

0.458

0.795

125-TGARCH-GED

0.008

-0.008

0.458

0.795

125-TGARCH-SKEW-t

0.008

-0.008

0.458

0.795

125-EGARCH-N

0.010

-0.011

0.202

0.904

125-EGARCH-GED

0.008

-0.008

0.458

0.795

125-EGARCH-SKEW-t

0.008

-0.008

0.458

0.795

125-FIGARCH-N

0.010

-0.011

0.202

0.904

125-FIGARCH-GED

0.009

-0.009

0.201

0.904

125-FIGARCH-SKEW-t

0.012 64

-0.012

0.433

0.805

0.010

-0.011

0.202

0.904

125-HYGARCH-GED

0.009

-0.009

0.201

0.904

125-HYGARCH-SKEW-t

0.010

-0.011

0.202

0.904

125-SV

0.012

-0.012

0.433

0.805

125-HS

0.012

0.090

2.912

0.233

Model

125-HYGARCH-N

e2 deRemarks: The empirical coverage rate is π e, while λ notes the estimated correlation between Ht+1 and Ht . LR is the likelihood ratio test statistic for conditional coverage and Prob. is the corresponding probability value.

Table 6: Further Conditional Coverage Test Results π e

e2 λ

LR

Prob.

250-Log ARMA

0.013

-0.013

0.871

0.647

250-Sqrt ARMA

0.012

-0.012

0.433

0.805

250-Log ARFIMA

0.013

-0.013

0.871

0.647

250-Sqrt ARFIMA

0.014

-0.014

1.499

0.473

250-Log HAR

0.012

-0.012

0.433

0.805

250-Sqrt HAR

0.012

-0.012

0.433

0.805

250-GARCH-N

0.013

-0.013

0.871

0.647

250-GARCH-GED

0.013

-0.013

0.871

0.647

250-GARCH-SKEW-t

0.013

-0.013

0.871

0.647

250-TGARCH-N

0.013

-0.013

0.871

0.647

250-TGARCH-GED

0.012

-0.012

0.433

0.805

250-TGARCH-SKEW-t

0.014

-0.014

1.499

0.473

250-EGARCH-N

0.012

-0.012

0.433

0.805

250-EGARCH-GED

0.010

-0.011

0.202

0.904

250-EGARCH-SKEW-t

0.012

-0.012

0.433

0.805

250-FIGARCH-N

0.013

-0.013

0.871

0.647

250-FIGARCH-GED

0.012

-0.012

0.433

0.805

250-FIGARCH-SKEW-t

0.012

-0.012

0.433

0.805

250-HYGARCH-N

0.012

-0.012

0.433

0.805

250-HYGARCH-GED

0.012

-0.012

0.433

0.805

250-HYGARCH-SKEW-t

0.013

-0.013

0.871

0.647

250-SV

0.014

-0.014

1.499

0.473

250-HS

0.009

0.117

3.636

0.162

500-Log ARMA

0.012

-0.012

0.433

0.805

500-Sqrt ARMA

0.014

-0.014

1.499

0.473

500-Log ARFIMA

0.012

-0.012

0.433

0.805

500-Sqrt ARFIMA

0.012

-0.012

0.433

0.805

500-Log HAR

0.012

-0.012

0.433

0.805

500-Sqrt HAR

0.012

-0.012

0.433

0.805

500-GARCH-N

0.007

-0.007

1.009

0.604

500-GARCH-GED

0.007

-0.007

1.009

0.604

500-GARCH-SKEW-t

0.009

-0.009

0.201

0.904

500-TGARCH-N

0.009

-0.009

0.201

0.904

500-TGARCH-GED

0.008

-0.008

0.458

0.795

500-TGARCH-SKEW-t

0.012

-0.012

0.433

0.805

500-EGARCH-N

0.012

-0.012

0.433

0.805

500-EGARCH-GED

0.009

-0.009

0.201

0.904

500-EGARCH-SKEW-t

0.010

-0.011

0.202

0.904

500-FIGARCH-N

0.008

-0.008

0.458

0.795

500-FIGARCH-GED

0.008

-0.008

0.458

0.795

500-FIGARCH-SKEW-t

0.009

-0.009

0.201

0.904

500-HYGARCH-N

0.008

-0.008

0.458

0.795

500-HYGARCH-GED

0.009

-0.009

0.201

0.904

500-HYGARCH-SKEW-t

0.010

-0.011

0.202

0.904

500-SV

0.010

-0.011

0.202

0.904

500-HS

0.009

0.117

3.636

0.162

Model

e2 deRemarks: The empirical coverage rate is π e, while λ notes the estimated correlation between Ht+1 and Ht . LR is the likelihood ratio test statistic for conditional coverage and Prob. is the corresponding probability value.

65

Table 7: Top Ten Value-at-Risk Forecasts Rank

Model

1

250-Log ARFIMA

2

Rank

Model

0.04208

1

EGARCH-N

0.02767

250-Log ARMA

0.04216

2

500-Log HAR

0.02768

3

250-Log HAR

0.04218

3

500-Sqrt ARMA

0.02769

4

250-Sqrt HAR

0.04229

4

500-Log ARMA

0.02769

5

RiskMetrics

0.04229

5

500-Sqrt HAR

0.02782

6

EGARCH-N

0.04230

6

250-Sqrt HAR

0.02783

7

250-Sqrt ARFIMA

0.04230

7

EVT-Sqrt ARFIMA

0.02791

8

FIGARCH-N

0.04239

8

EVT-SV

0.02792

9

250-Sqrt ARMA

0.04244

9

500-Log ARFIMA

0.02793

10

EVT-SV

0.04284

10

EGARCH-GED

0.02795

5

RiskMetrics

0.04229

72

RiskMetrics

0.02927

q = 0.25

SPAL

0.625

SPA

0.756

SPAU

q = 0.75

16

0.215

SPA

0.232

0.845

SPAU

0.232

SPA

0.607

L

SPA

0.215

SPA

0.735

SPA

0.231

q = 0.50

q = 0.50

0.813

SPA

0.231

SPAL

0.610

SPAL

0.214

SPA

0.732

SPA

0.230

SPAU

0.822

SPAU

0.230

SPA

GARCH-N

0.04318

SPA

0.215

SPA

0.266

U

SPA

q = 0.75

19

GARCH-N L

q = 0.25

0.02812

SPA

0.548

SPA

0.710

0.439

U

SPA

0.763

SPAL

0.215

SPAL

0.485

SPA

0.266

SPA

0.615

SPAU

0.439

SPAU

0.686

SPA

0.215

L

SPA

0.464

SPA

0.266

SPA

0.594

L

q = 0.75

q = 0.25

U

U

L

q = 0.25

Q

SPAL

L

q = 0.50

B

U

SPA

q = 0.50

q = 0.75

0.439

U

SPA

0.671

Remarks: The VaR forecasts are ranked ascending by the average of their loss due to the functions B and Q. The average losses for the benchmarks (RiskMetrics and GARCH-N) are presented together with their rank for comparison. The probability 66 values of Hansen’s Superior Predictive Ability Test (SPA) are reported in the bottom together with their upper (SPAU ) and lower bounds (SPAL ). The Bootstrap parameters are B = 2000 and q = 0.25, 0.50, 0.75.

Suggest Documents