Stat Comput (2008) 18: 391–408 DOI 10.1007/s11222-008-9063-1
Bayesian inference and model comparison for asymmetric smooth transition heteroskedastic models Richard Gerlach · Cathy W.S. Chen
Received: 17 August 2007 / Accepted: 25 March 2008 / Published online: 23 April 2008 © Springer Science+Business Media, LLC 2008
Abstract Inference, quantile forecasting and model comparison for an asymmetric double smooth transition heteroskedastic model is investigated. A Bayesian framework in employed and an adaptive Markov chain Monte Carlo scheme is designed. A mixture prior is proposed that alleviates the usual identifiability problem as the speed of transition parameter tends to zero, and an informative prior for this parameter is suggested, that allows for reliable inference and a proper posterior, despite the non-integrability of the likelihood function. A formal Bayesian posterior model comparison procedure is employed to compare the proposed model with its two limiting cases: the double threshold GARCH and symmetric ARX GARCH models. The proposed methods are illustrated using both simulated and international stock market return series. Some illustrations of the advantages of an adaptive sampling scheme for these models are also provided. Finally, Bayesian forecasting methods are employed in a Value-at-Risk study of the international return series. The results generally favour the proposed smooth transition model and highlight explosive and smooth nonlinear behaviour in financial markets. Keywords Markov chain Monte Carlo method · Mixture normal · Posterior model probability · Value-at-Risk · Asymmetric volatility model · Smooth transition
R. Gerlach Econometrics and Business Statistics, University of Sydney, Sydney, Australia C.W.S. Chen () Department of Statistics, Feng Chia University, Taichung, Taiwan e-mail:
[email protected]
1 Introduction Modelling financial time series has received extensive recent attention, with many nonlinear models proposed, in the literature. As noted by Priestley (1980), and more recently by Franses and van Dijk (2000), a natural approach to such modeling defines different states or regimes, and allows dynamic behaviour to be regime dependent. A famous example is the threshold autoregressive (TAR) model proposed by Tong (1978) and Tong and Lim (1980), where the regime is determined by a threshold variable, with a step function transition between regimes. A more gradual transition obtains via a smooth, continuous transition function; an idea first proposed by Bacon and Watts (1971) and introduced for nonlinear time series by Chan and Tong (1986), that gained popularity via Granger and Teräsvirta (1993) and Teräsvirta (1994). Teräsvirta (1998) and van Dijk et al. (2002) gave comprehensive reviews of the smooth transition autoregressive model (STAR) model and several of its variants. For dynamic volatility Engle (1982) proposed the ARCH model, generalized by Bollerslev (1986) to the popular GARCH model. Volatility asymmetry is evident in financial markets, and many nonlinear GARCH models have been developed to capture it. The extensive literature on asymmetric GARCH models suggests there is potential for exploring smooth transition nonlinear heteroskedastic models. This model would have the conditional volatility as a smooth transition GARCH (ST-GARCH) model; see for example, González-Rivera (1998) and Anderson et al. (1999). Further, a natural idea would be to have a double threshold model that allowed for mean asymmetry also. Li and Li (1996) first combined asymmetric ARCH volatility and TAR mean equations to form the double threshold ARCH model (DT-ARCH); Brooks (2001) extended to a DT-GARCH
392
model; Nam et al. (2001) used a smooth transition DTGARCH model; while Chen et al. (2003) allowed for exogenous threshold variables and exogenous regressors to examine asymmetric return spillover effects in a DTX-GARCH model. These papers found clear evidence of threshold nonlinearity in both the mean and volatility equations in major markets. This paper will examine whether such asymmetry might be better modeled by a smooth transition function in both mean and volatility equations. Similar models to that used here were proposed by Nam et al. (2001) and an unpublished working paper by Lundbergh and Teräsvirta (1999): our model extends these via a non-zero threshold value, exogenous regressors, an exogenous threshold variable, allowing for possibly explosive volatility and t-distributed errors. Sharp transition double threshold models, such as those by Li and Li (1996), Brooks (2001) and the DTX-GARCH model of Chen et al. (2003), and the symmetric ARX-GARCH model are special cases of our proposed double smooth transition model. Estimation of the smoothing parameter and identification as it tends to zero have proven a challenge for both classical and Bayesian approaches: the likelihood function is not integrable for this parameter in an ST-GARCH model. Many papers report very high standard errors compared to estimates (see Chelley-Steeley 2005; Akram et al. 2005 and Lubrano 2001); or report large estimates, essentially giving a sharp threshold transition function (see Nam et al. 2001; Lopes and Salazar 2006 and Akram et al. 2005). Lubrano (2001) discusses Bayesian inference in an ST-GARCH model and suggests some informative prior distributions on the smoothing parameter. We also consider a Bayesian framework, designing an adaptive Markov chain Monte Carlo (MCMC) method for estimation, inference and forecasting that builds on the method proposed in So et al. (2005). We examine the logistic transition function in some detail and suggest a competitive, weakly informative prior for the smoothing parameter, that leads to an integrable, proper posterior distribution. We also design a mixture prior to aid parameter identification as the smoothing parameter tends to zero, allowing for sensible inference in that case. The Bayesian approach allows simultaneous inference on all model parameters, in contrast to methods that fix the delay (usually at 1) or that take a two stage approach: estimating delay lag first and then other parameters conditional on this choice; e.g. Li and Li (1996). Bayesian methods also provide valid inference under parameter constraints; see Silvapulle and Sen (2004) for problems with standard methods in constrained inference. Despite the polarity of nonlinear models, there exists little research on formally comparing nonlinear heteroskedastic specifications; an exception is Chen et al. (2006) who use reversible jump (RJ) MCMC to compare two threshold
Stat Comput (2008) 18: 391–408
GARCH models. As mentioned above various studies find estimates of the smoothing parameter that are either consistent with a sharp threshold or a constant transition function. A further goal is thus to examine formal model comparison of the proposed smooth transition model with its limiting special cases: the sharp transition DT-GARCH (with infinite smoothing parameter) and symmetric ARX GARCH (with zero smoothing parameter) models. We will also provide results from an out-of-sample forecasting exercise involving Value at Risk to compare these three models with other popular specifications such as RiskMetrics. Section 2 of this paper discusses threshold modeling in general and presents the double smooth transition model. Section 3 discusses the Bayesian methods for estimation, inference and model comparison, including the new model priors and the adaptive MCMC method. Section 4 presents a simulation study. Section 5 presents an empirical study of six international market indices, while Sect. 6 presents some conclusions.
2 The model Motivated by the STAR and the ST-GARCH models, we consider the double smooth transition heteroskedastic model as follows: (1)
(2)
yt = μt + F (zt−d ; γ , c)μt + at , (l)
(l)
μt = φ0 +
p
(l)
φi yt−i +
i=1
r
(1)
(l)
ψi xt−i ,
i=1
where yt is the observed data; l = 1, 2 is the regime indicator; d is the delay lag; xt is an exogenous variable and zt is the threshold variable. This is a standard STAR mean equation, plus an exogenous regressor, allowing for smooth nonlinearity in the mean. To allow for heteroskedasticity and volatility asymmetry, we next specify the errors with dynamic variance given by an ST-GARCH model: at = ht εt , εt ∼ D(0, 1), (1)
(2)
ht = ht + F (zt−d ; γ , c)ht , (l)
(l)
ht = α0 +
g i=1
(l)
2 αi at−i +
k
(2) (l)
βi ht−i ,
l = 1, 2,
i=1
where D(0, 1) is a distribution with mean 0 and variance 1. The threshold zt could be lagged observations (i.e. yt−d ), or exogenous: e.g. an international market return, economic index or a combination of variables, see Chen and So (2006). The mean (variance) equation parameters in the second regime (l = 2) describe the difference between the mean (variance) parameters when F (zt−d ; γ , c) changes
Stat Comput (2008) 18: 391–408
393
fully from 0 to 1. F (zt−d ; γ , c) is usually assumed to be a logistic, exponential or cumulative distribution function. A popular choice is the logistic: F (zt−d ; γ , c) =
1 1 + exp{−γ ( zt−dsz−c )}
,
(3)
where γ is the smoothness or speed of transition parameter, commonly assumed positive; c is the threshold value and sz is the (sample) standard deviation of the observed threshold variable z; this allows γ to be scale-free and hence comparable across different return series. The methods presented here can easily be adapted for other transition functions. This model (1)–(3) is a double smooth transition GARCH model, or a continuous mixture of two ARX-GARCH models, with F (zt−d ; γ , c) controlling the speed of transition between regimes. The model can simultaneously capture smooth nonlinear endogenous and exogenous nonlinearity in the mean, as well as in the volatility equation. We follow the literature here and employ regimeswitching between only two latent processes. Such a model has an immediate financial interpretation, building on the leverage argument of Black (1976) and others, allowing for different volatility (and mean) processes in rising and falling markets. Further, however, a smooth transition model is already a continuous mixture between the two underlying regimes, and thus can capture a vastly increased and more general set of mean and volatility processes than a two-regime sharp transition model, e.g. as in Chen et al. (2003). The typical financial asymmetry argument can also support the use of the same parameters c, γ and smooth transition function F (.) for the mean and volatility equations, which are the usual assumptions applied. However, naturally the model may be extended, in future research, by allowing more than two latent processes and different transition functions for the mean and volatility. van Dijk and Franses (1999) proposed a STAR model with three latent regimes (for the mean only), while the unpublished manuscript Lundbergh and Teräsvirta (1999) allowed for a different speed of transition parameter γ , though retaining the same F (.), for the mean and volatility in their model. We leave these extensions for future work. The symmetric ARX-GARCH and sharp transition DTXGARCH models are special cases of model (1)–(3) above. As γ → 0, F (zt−d ; γ , c) → 0.5, the model reduces to a single regime ARX-GARCH and the parameters in (1)–(2) become unidentified. Further, as γ → ∞, zt−d < c implies F (zt−d ; γ , c) → 0 so that the behavior of yt and ht is: (1)
yt = φ0 +
p
(1)
φi yt−i +
i=1
ht =
(1) α0
+
g i=1
r
(1)
ψi xt−i + at ,
i=1 (1)
2 αi at−i +
k i=1
(1)
βi ht−i .
Similarly, when zt−d > c , F (zt−d ; γ , c) → 1 as γ → ∞ and yt and ht become: (1) (1) (2) (2) φi + φi yt−i yt = φ0 + φ0 + p
i=1 r
+
(1) (2) ψi + ψi xt−i + at ,
i=1
(1) (1) (2) (2) 2 αi + αi at−i ht = α0 + α0 + g
i=1 k
+
(1) βi + βi(2) ht−i .
i=1
This is exactly the DTX-GARCH model in Chen et al. (2003). Sufficient conditions usually enforced for the ST-GARCH model are: (a) Nonnegativeness of the conditional variance (1)
α0 > 0,
(1) αi
(1)
αi
(2) + αi
(1)
≥ 0,
βi
≥ 0,
(1) βj
(2) + βj
≥ 0,
i
(4) ≥ 0.
j
(b) Covariance-stationarity g k (1) (2) (1) (2) (αi + 0.5αi ) + (βi + 0.5βi ) < 1. i=1
(5)
j =1
These conditions can also be found in Anderson et al. (1999). In order to allow for possibly explosive volatility, and to ensure a proper prior, we generalize the restrictions as follows: (1) (1) (1) (1) βi < b2 , αi + βj < b3 , (6) α0 < b1 , i
j
where b1 , b2 , b3 are user-specified: we let b2 , b3 ≥ 1 to allow explosive behaviour. When nonlinearity is introduced the likelihood function becomes complicated, e.g. it may be non-differentiable in the parameters. The conditional likelihood function for the double ST model (1)–(2) is: p(y
s+1,n
n 1 yt − μt , |) = √ pε √ ht ht t=s+1
where pε is the density function for εt ; represents all model parameters; y s+1,n = (ys+1 , ys+2 , . . . , yn ); n is the sample size; s = max{p, r, g, k, d}; ht = Var(yt |It−1 ) and μt = E(yt |It−1 ), where It−1 is the information set. Evidence in the literature shows residuals from financial time
394
Stat Comput (2008) 18: 391–408
series models tend to be leptokurtic. Thus, we let εt follow a standardized t-distribution with ν degrees of freedom, so the conditional likelihood is: n ( ν+1 1 s+1,n 2 ) p(y |) = √ ν √ ( 2 ) (ν − 2)π ht t=s+1
(yt − μt )2 × 1+ (ν − 2)ht
− ν+1 2 ,
(7) (i)
(i)
where = (θ i , α i , γ , ν, c, d) with θi = (φ0 , . . . , φp , (i) (i) (i) (i) ψ1 , . . . , ψr ) and α i = (α0 , . . . , βk ) , i = 1, 2. Here(i) (i) after we let φp+1 , . . . , φp+r denote the exogenous terms (i)
(i)
ψ1 , . . . , ψr . 3 Priors and Bayesian inference MCMC methods are employed since the posterior density in GARCH-type models is not of a standard form. We employ a mixture of random walk (RW) Metropolis and independent kernel Metropolis-Hastings (MH) algorithms in an adaptive parameter sampling scheme. The parameter constraints in (4)–(5) and necessary identification adjustments as γ → 0 (see below) are directly incorporated into the prior distribution. Lubrano (2001) cited problems in inference for γ , since the likelihood function does not tend to 0 as γ → ∞. We suggest a prior distribution that is weakly informative so that the posterior is proper, but the data still dominates estimation and inference through the likelihood. The proposed prior also specifically accounts for the limiting cases as γ approaches 0 or ∞. The prior is also chosen to have a finite range in the variance equation parameters and to be fully proper, for computational model comparison purposes.
θ i |δ i ∼ Np+r (0, D i V i D i ), (i)
(i)
where V i is the prior correlation matrix, δ i = (δ1 , δ2 , (i) (i) (i) . . . , δp+r ) and Di is a Diagonal matrix diag(aτ1 , aτ2 , √ (i) . . . , aτp+r ), with a = 1 if δ i = 1 and a = k if δ i = 0. We assume prior independence for θ i so that V i = I and D i ensures (8) is satisfied. In other words, p(θ i |δ i ) =
p+r
(i)
(i)
p(φj |δj ).
j =1
Setting k small ensures that θ 2 |δ 2 = 0 will still be partially identified (with finite variance) in the posterior, by directing it towards 0; as should be the case as F (zt−d ; γ , c) → 0.5. This prior effectively avoids the identification problem as γ → 0 in a sensible manner, for the mean equation parameters. We note that such a mixture prior is not strictly necessary for the variance parameters in (2), since they will be restricted to a finite range, as described below. Thus their posterior variance cannot become infinite, even when the likelihood is flat (as γ → 0), and so identifiability is not really a problem. 3.2 Further priors Priors on the remaining parameters are mostly chosen to
3.1 A mixture prior specification
(i) d
As mentioned above, when γ goes to 0, the parameters in (1)–(2) become unidentified. To remedy this problem, we introduce a specific prior formulation for (1) only, based on George and McCulloch (1993) who used discrete indicators to identify subset regression models. We define the latent (i) (i) variable δj , which determines the prior distribution of φj , via the mixture of two normals: φj(i) |δj(i) ∼ (1 − δj(i) )N (0, k 2 τj(i)2 ) + δj(i) N (0, τj(i)2 ), j = 1, . . . , p + r, 1 if i = 1 or γ > ξ, (i) δj |γ = 0 if i = 2 and γ ≤ ξ.
We discuss specific choices for k, τj and ξ in later sections: choices of these quantities are made easier by having a scale free parameter γ . Here ξ is a specified threshold and γ ≤ ξ indicates that F (zt−d ; γ , c) → 0.5; i.e. an ARXGARCH model. As in George and McCulloch (1993), we choose k to be a small positive value so that if γ ≤ ξ and (2) (2) δj = 0, then the posterior for the parameters φj will be weighted by the prior towards 0. The mixture prior (8) is equivalent to:
(8)
Here i = 1, 2 denotes the regime, while j = 1, . . . , p + r denotes the lag order of the AR and exogenous mean terms in θ i .
be flat over (4)–(6). We adopt the prior α0 = Unif(0,b1 ), i = 1, 2 and place a constrained uniform prior on the other parameters in α, defined by the indicator I(S) with S the set α that satisfies (4), (5), and (6), following Chen et al. (2006). Thus inference is based on the likelihood function constrained to this region. See Gelman (2006) for a discussion of similar prior formulations. Figure 1 shows some examples of the transition function for various different values of the smoothness parameter γ . We see that values of γ > 20 (roughly) and beyond essentially give a sharp logistic transition function, while γ < 1 gives close to a constant transition. This suggests that, even in very large data sets, it will be difficult to distinguish between large values of γ (e.g. between 50 and 100). Lubrano (2001) suggested an informative, positively truncated Cauchy prior for γ allowing for an integrable posterior by putting sufficiently diminished weight on higher values of γ .
Stat Comput (2008) 18: 391–408
395
Fig. 1 Effects of γ on logistic function F (zt−d ; γ , c) as given in (3) with threshold c = 0
However, the estimates seemed heavily dependent on the choice of prior hyper-parameters. Geweke (1993) suggested an exponential prior for the degrees of freedom parameter in a t-distribution (another case where there is little information to distinguish large values and infinity is a valid value). When applied to γ it will also make the posterior integrable, but does have an undesirable mode at 0 (where parameters are not identified). To enforce γ > 0, we re-parameterize and assume ln γ ∼ N (μγ , σγ2 ). Figure 2 displays the log-normal prior density for γ for two choices of μγ , σγ2 , plus the truncated Cauchy (Lubrano 2001) and exponential densities (Geweke 1993), with parameters taken from Lubrano (2001). Based on Fig. 1, we consider only choices of μγ , σγ2 that ensure the prior density becomes small for γ > 20, γ → 0 and is not too informative inside the region (1, 20). An Appendix shows that the log-normal prior ensures that the posterior density for γ is integrable. Figure 2 shows that the log-normal priors chosen are less informative than the truncated Cauchy and also do not have a mode at 0.
The prior for the delay lag is a discrete uniform, i.e. Pr(d) =
1 , d0
where d = 1, . . . , d0 . For the df ν, we re-parameterize by defining ρ = ν −1 as in Chen et al. (2003). The prior for ρ is set to be I(ρ ∈ [0, 14 ]), so that ν > 4, ensuring that the first four moments of εt are finite. This is equivalent to a prior of 4/ν 2 for ν which, restricted to the range (4, ∞), is a proper prior, thus ensuring a proper posterior (Geweke 1993). Finally we choose the flat prior: c ∼ Unif(bz1 , bz2 ) where again the range is defined by the user, but is usually contained within the range of z. 3.3 Posteriors for MCMC scheme MCMC methods set up a Markov chain providing a dependent Monte Carlo sample from the joint posterior distribution. Where practical, parameters are generated in groups to speed mixing and convergence; see Carter and Kohn (1994).
396
Stat Comput (2008) 18: 391–408
Fig. 2 Some prior densities for γ . The upper panel is log-normal prior density for γ under two choices of μγ , σγ2 , the middle panel is the truncated Cauchy (Lubrano 2001), and the lower panel is exponential densities (Geweke 1993), with parameters taken from Lubrano (2001)
The conditional posterior for each group is proportional to the likelihood function (7) times the prior density for that group: p(l |y s+1,n , =l ) ∝ p(y s+1,n |)p(l | =l ),
standard errors of the least squares estimates; while tuning of this matrix is also applied, during the burn-in period only, to achieve an appropriate acceptance rate between 25 and 50%, as in Gelman et al. (1996). 3.3.1 Adaptive sampling methods
where l is each parameter group, p(l ) is its prior density and =l is the vector of all model parameters, excepting l . We use the following groups (i) θ i , i = 1, 2; (ii) α; (iii) ρ; (iv) γ ; (v) d; (vi) c. These groups were chosen by trial and error, using our experience with these models, and by examining many different configurations. The chosen groups displayed the best mixing, acceptance rates and convergence properties compared to many others that we tried. For all parameters except d, the posterior distributions are not of a standard form. We thus turn to the Metropolis and Metropolis-Hastings (MH) methods. We employ the standard random walk (RW) Metropolis method for the parameters in (i), (iii), (iv) and (vi) above, using a Gaussian proposal, to generate the MCMC sample. The choice of each variance-covariance matrix may be guided by the conditional posterior distribution, e.g. by the
In our experience with GARCH-type models, the RW Metropolis algorithm produces very high correlations and slow mixing among the iterates of the parameters α. The effect is enhanced when the true volatility persistence is
(1) (1) high, i.e. α + β , and/or the sum in (5), is close to 1, which is the usual case with financial return data. In particular, when the MCMC iterates get very close to the boundaries in (4)–(6), it seems difficult for a RW sampler to efficiently move away from this area, when the last iterate (which is already close to the boundary) is the proposal mean. To speed up mixing and reduce the inter-iterate dependence in general, and especially in the region close to these boundaries, we revise the sampling method for α after the burn-in period and switch to the independent kernel (IK) MH algorithm for the sampling period, as in So et al. (2005).
Stat Comput (2008) 18: 391–408
For α we thus employ the RW walk Metropolis method for the first M MCMC iterations (M is the size of the burn-in sample). We then switch to an IK MH method with a Gaussian proposal distribution. The overall method is adaptive because the IK MH proposal mean and variancecovariance are chosen to be the sample mean and sample variance-covariance of the iterates for α from the burn-in period. This has the added advantage of using the posterior correlations among the α in the burn-in period in the sample period proposal, which should increase efficiency. In particular, since the burn-in sample’s mean (now the proposal mean) is likely not too close to the boundaries in (4)–(6), the sampler should be more efficient in that region for these parameters. We note that this method will only work if the MCMC sample has converged and sufficiently covered the posterior inside the burn-in period. Convergence is thus monitored heavily using trace and ACF plots, while the tuning algorithm will also help to ensure sufficient coverage of the posterior by moderating the acceptance rate of the Metropolis method. MCMC results and convergence are extensively examined by starting the scheme from many different and varied starting values. While we have found favourable results in this paper, in general more efficient results might be obtained by using Student-t proposal densities, with low degrees of freedom, or better tailored proposals in the sample period. Finally, the delay parameter d is obtained in step (v) by sampling from the conditional multinomial distribution: Pr(d = j |y s+1,n , =d ) p(y s+1,n |, d = j )Pr(d = j ) , ∝ d0 s+1,n |, d = i)Pr(d = i) i=1 p(y where j = 1, . . . , d0 . The MCMC method proceeds by choosing initial values for all parameters, then iteratively sampling in turn from each conditional posterior distribution, using the methods above. The initial M iterations are discarded as a burn-in sample, the final sample of N − M iterates is used for posterior inference. 3.4 Model selection Model comparison and selection is an important part of empirical analysis. Information criterions such as AIC and BIC are commonly employed to choose between nested models. However, we prefer a formal Bayes factor approach, as in Kass and Raftery (1995); such formal comparison remains difficult for heteroskedastic models (Berg et al. 2004). The decision between models Mi ; i = 1, . . . , K, is made by posterior probability, where we choose model Mk if Pr(Mk |y) is a maximum over k = 1, . . . , K.
397
To estimate this probability for each model involves an often highly multi-dimensional integration over each model’s parameter space. Various numerical MCMC-hybrid methods such as importance sampling, as in Geweke (1995) and Gerlach et al. (1999), applied to choose between four competing GARCH models in Chen and So (2006) and between GARCH and stochastic volatility models in Gerlach and Tuyl (2006); the reversible jump (RJ) MCMC method of Green (1995), applied to choose between a symmetric and nonlinear GARCH model in So et al. (2005), between GARCH and DTGARCH in Chen et al. (2005) and between a GARCH and EGARCH model in Vrontos et al. (2000); and a method due to Chib (1995) have been proposed and are popular for heteroskedastic models. However, all of these methods involve MCMC extensions, so that the (extra MCMC) computational time required is usually quite long: see the discussion in Carlin and Chib (1995), Godsill (2001) and Chen et al. (2008); e.g. the RJ method requires the specification of proposal distributions, whose choice can affect final model choice significantly; while importance sampling involves running the MCMC sampling scheme multiple times. Recently, Congdon (2006) introduced a more efficient approach, following on from work in Carlin and Chib (1995), Godsill (2001) and Scott (2002). This method requires MCMC iterates to be produced from all competing models, but no jumping between models is required, so that no jump proposal distributions and no extra MCMC iterations are needed: Pr(Mi |y) = Pr(Mi , |y)d ≈
=
1 N −M 1 N −M
N
(j )
Pr(Mi |y, i )
j =M+1 N
(j )
(j )
p(y s+1,n |i , Mi )p(i |Mi )Pr(Mi ) ,
K s+1,n |(j ) , M )p((j ) |M )Pr(M ) k k k k=1 p(y j =M+1 k k (j )
where i are the parameters from model i; i is the j th MCMC iterate from the posterior distribution of model i; Pr(Mi ) is the prior probability in favour of model i; p(i |Mi ) is the prior distribution for model i and (j ) p(y s+1,n |i , Mi ) is the likelihood function for model Mi . To avoid numerical problems common with likelihoods, log-likelihoods are employed, which are then scaled by subtracting the maximum of the log-likelihoods for each model at each MCMC iteration, from each model’s log-likelihood value.
398
Stat Comput (2008) 18: 391–408
4 Forecasting Value-at-Risk The Basel Capital Accord, originally signed by the Group of Ten countries in 1988, requires Authorized Deposit-taking Institutions (ADIs) to hold sufficient capital to provide a cushion against unexpected losses. Value-at-Risk (VaR) is a procedure designed to forecast the worst expected loss over a given time interval under normal market conditions, at a given confidence level α (Jorion 1997). That is, α = Pr(yn+1 < −VaRn+1 ). The Monte Carlo approach provides a flexible way to estimate VaR. Bayesian estimators of the general k-period VaR are constructed via the percentiles of the predictive distribution p(yn+k |y). A level α VaR estimate can be obtained using the sample α percentiles of the MCMC sample iter(j ) ates yn+k , j = M + 1, . . . , N , simulated from the forecast (j )
density p(yn+k |y, (j ) ). For a one-period VaR under the smooth transition model in (1)–(2), we can simply simulate: (j ) hn+1 (j ) (j ) (j ) VaR = − μn+1 + tα (ν ) , ν (j ) /(ν (j ) − 2) which forms a posterior predictive MCMC sample of VaR estimates that accounts for parameter uncertainty; where tα (ν (i) ) is the αth quantile of a Student-t distribution with ν (j ) degrees of freedom; ν (j ) /(ν (j ) − 2) is an adjustment term for a standardized Student-t with ν (i) degrees of free(j ) (j ) dom; μn+1 , the conditional mean in (1) and hn+1 the conditional volatility in (2) j = M + 1, . . . , N, are evaluated conditional upon y and the parameter values at MCMC iteration j . The final VaR estimate is the average of this sample. A similar method is used for the DTX-GARCH and ARXGARCH models considered in Sect. 6. A standard criterion to test the VaR accuracy is the number of violations, defined as the numbers of days in which the actual return exceeds the forecasted VaR over a time horizon. This can be written as n+m
I (yt < −VaRt ).
t=n+1
The violation rate divides this number by the time horizon m. When the violation rate is less than α, risk estimates are conservative and higher than actual and capital manipulation might be seen as ineffective. Alternately, when the violation rate is greater than α, risk estimates are lower than actual and financial institutions may face the possibility of bankruptcy. Usually, solvency outweighs profitability and lower violation rates are preferred to higher ones. Thus in practice it is more serious to under-estimate than overestimate risk. The implementation of the Basel II framework
only flags models where the violation rates are above nominal. This suggests that violation rates less than or equal to nominal are desirable and that a violation rate below nominal is preferable to a rate above nominal by the same or even slightly smaller distance.
5 Simulation study We now examine the effectiveness of our MCMC sampling scheme, considering finite sample properties and consistency of the MCMC estimators in Sect. 3. We examine the double smoothing transition GARCH model with p = r = g = k = 1. We simulate 100 replicated time series with sample size n = 2000 from the model: yt = −0.1 − 0.2yt−1 + 0.4zt−1 + F (zt−1 )(0.1 + 0.3yt−1 + 0.15zt−1 ) + at , i.i.d. at = ht , εt , εt ∼ t (8), F (zt−1 ) =
1 1 + exp{−γ ( zt−1s+0.2 )} z
,
2 + 0.8ht−1 ht = 0.2 + 0.15at−1 2 + F (zt−1 ) −0.1 − 0.1at−1 − 0.1ht−1 ,
where, in turn, γ = 1, 2, 4, 5, 7 and 10. The exogenous threshold variable, zt−1 is generated from the GARCH(1,1) process: zt = ht t , i.i.d.
2 ht = 0.3 + 0.1zt−1 + 0.8ht−1 , t ∼ N (0, 1).
The maximum lag for d was set to d0 = 3; we set (ξ, k) = (0.7, 0.001) in the mixture specification (8). We chose α0(1) ∼ Unif(0, b1 = sy2 ), where sy is the sample standard deviation of y, as in So et al. (2005); such a choice is sensible since in a GARCH model α0 is by definition less than the unconditional variance. Results in So et al. (2005), and Chen et al. (2006) confirm that inference and model selection is insensitive to any choice of b1 ∈ (0.5sy2 , 10sy2 ) in DT-GARCH and GARCH models. Also we set b2 = 1, b3 = 1.25 allowing possible explosiveness in the variance equation and chose c ∼ Unif(Q1 , Q3 ); Qi is the ith quantile of y. We set all initial MCMC iterates mean equation parameters to 0, and all volatility equation parameters to 0.1 (these are not close to the true values). We also chose the hyper2 2(i) parameters, τj = 0.05 and (1) (μγ = ln 5, σγ2 = ln310 ); 2
(2) (μγ = ln 5, σγ2 = 2 ln310 ), as in Fig. 2.
Stat Comput (2008) 18: 391–408
399
Table 1 Simulation results based on n = 2,000 and obtained from 100 replications Real
γ =1
γ =2
prior 1
prior 2
γ =4
prior 1
prior 2
prior 1
prior 2
Med
Std
Med
Std
Med
Std
Med
Std
Med
Std
Med
Std 0.085
φ0(1)
−0.10
−0.075
0.105
−0.061
0.104
−0.083
0.096
−0.081
0.106
−0.097
0.088
−0.095
φ1(1)
−0.20
−0.139
0.051
−0.143
0.057
−0.177
0.053
−0.197
0.058
−0.206
0.048
−0.198
0.049
ψ1(1)
0.40
0.419
0.066
0.427
0.064
0.407
0.064
0.410
0.067
0.397
0.063
0.399
0.061
φ0(2)
0.10
0.046
0.193
0.024
0.194
0.076
0.170
0.072
0.185
0.100
0.133
0.097
0.134
φ1(2) ψ1(2) (1) α0 α1(1) (1) β1 (2) α0 (2) α1 β1(2)
0.30
0.193
0.084
0.207
0.096
0.273
0.081
0.298
0.091
0.314
0.065
0.305
0.069
0.15
0.129
0.074
0.129
0.077
0.150
0.068
0.147
0.069
0.154
0.064
0.153
0.063
0.20
0.242
0.095
0.249
0.101
0.250
0.090
0.257
0.091
0.263
0.085
0.269
0.087
0.15
0.148
0.051
0.158
0.058
0.159
0.054
0.166
0.056
0.181
0.055
0.173
0.054
ν
0.80
0.707
0.116
0.712
0.121
0.741
0.109
0.739
0.110
0.729
0.107
0.725
0.109
−0.10
−0.062
0.150
−0.068
0.159
−0.092
0.134
−0.089
0.138
−0.119
0.121
−0.130
0.124
−0.10
−0.066
0.074
−0.078
0.084
−0.092
0.072
−0.101
0.074
−0.123
0.066
−0.112
0.065
−0.10
−0.101
0.188
−0.113
0.195
−0.146
0.167
−0.154
0.176
−0.110
0.157
−0.105
0.159
7.00
7.239
1.194
7.153
1.179
7.173
1.189
7.323
1.268
7.222
1.191
7.150
1.188
2.253
3.021
1.666
4.692
2.675
2.497
2.232
3.956
4.043
2.797
4.123
5.310
−0.058
0.286
−0.067
0.308
−0.105
0.224
−0.167
0.234
−0.216
0.153
−0.186
0.154
γ −0.20
c
γ =5
γ =7
γ = 10
φ0(1)
−0.10
−0.109
0.079
−0.098
0.077
−0.102
0.075
−0.098
0.074
−0.096
0.071
−0.106
0.071
(1)
−0.20
−0.197
0.044
−0.197
0.046
−0.207
0.043
−0.208
0.044
−0.207
0.042
−0.204
0.041
ψ1(1) (2) φ0 φ1(2) (2) ψ1 (1) α0 α1(1) β1(1) α0(2) α1(2) β1(2)
0.40
0.392
0.060
0.400
0.059
0.395
0.059
0.400
0.057
0.400
0.057
0.393
0.057
0.10
0.122
0.116
0.100
0.114
0.108
0.105
0.101
0.107
0.100
0.097
0.118
0.098
0.30
0.300
0.059
0.299
0.062
0.312
0.055
0.315
0.059
0.313
0.053
0.31
0.053
0.15
0.148
0.063
0.148
0.062
0.149
0.062
0.15
0.061
0.144
0.061
0.146
0.062
0.20
0.256
0.083
0.274
0.086
0.261
0.083
0.265
0.079
0.258
0.081
0.257
0.078
0.15
0.174
0.053
0.169
0.052
0.175
0.054
0.165
0.051
0.172
0.051
0.163
0.049
0.80
0.739
0.103
0.722
0.108
0.739
0.104
0.746
0.102
0.734
0.105
0.746
0.099
−0.10
−0.129
0.117
−0.143
0.12
−0.114
0.114
−0.138
0.110
−0.126
0.110
−0.119
0.108
−0.10
−0.117
0.062
−0.109
0.062
−0.119
0.061
−0.112
0.059
−0.114
0.058
−0.103
0.056
−0.10
−0.093
0.148
−0.087
0.156
−0.112
0.145
−0.105
0.143
−0.093
0.145
−0.119
0.141
7.00
7.143
1.222
7.381
1.26
7.012
1.125
7.029
1.130
7.216
1.205
7.120
1.171
5.157
3.593
5.473
7.091
6.137
4.132
6.376
7.637
7.133
4.668
9.065
10.953
−0.210
0.127
−0.214
0.126
−0.232
0.108
−0.206
0.113
−0.228
0.095
−0.212
0.090
φ1
ν γ c
−0.20
Prior 1: γ ∼ LN(1.61, (0.77)2 ) Prior 2: γ ∼ LN(1.61, (1.09)2 )
We used a burn-in sample of M = 15,000 and a total sample of N = 40,000 iterations, but used only every 5th iterate in the sample period for inference. Table 1 displays the summary statistics from 100 replications for each value of with two prior choices. Columns are the true or ‘real’ values, averages of 100 posterior median estimates and standard errors. Although not shown, the average posterior probabilities that d = 1 are all very close to 1: the posterior mode of d very accurately estimates the delay parameter. Both tables illustrate favourable estimation performance with pos-
terior median estimates not significantly different from their true values. The choice of hyper-parameters for the prior on γ seems to have minimal effect, except that prior 1 allows smaller standard errors. Both tables show that this standard error is proportional to the true value of γ , as is standard, but that estimates of γ clearly increase appropriately with the true value, for both priors. Lubrano (2001) found that for the truncated Cauchy prior the estimate was always closer to the prior mean than to the maximum likelihood estimate (see Tables 2 and 4). Our results thus reflect a slight improve-
400
Stat Comput (2008) 18: 391–408
ment for estimation of γ . We also note that the parameters in the mean equation display larger standard errors for smaller values of γ , while still being roughly unbiased. This is expected as these parameters become close to unidentified for small γ , however the mixture prior on these parameters has still led to sensible inference. We extensively examined trace and autocorrelation plots to check on MCMC convergence for these data sets.
6 Empirical study We illustrate our methods using six daily stock market indices obtained from Datastream International: the MIBTel of Italy, the FTSE 100 of U.K., the AORD of Australia, the TSE 300 of Canada, the Nikkei 225 of Japan, and the TWSI of Taiwan; from January 4, 1994, to March 24, 2005. Daily log returns are calculated as yt = (log pt − log pt−1 ) × 100, where pt is the price index at time t. The Nikkei 225 and TWSI are the only two markets showing negative mean return, possibly due to the Asian crisis in the late 1990s. The TWSI shows the largest standard deviation and range, illustrating that it was the highest risk market over this time period. All return series exhibit the standard property of asset return data: they have fat-tailed distributions as indicated by the excess kurtosis, with all return series failing the JarqueBera test for (the absence of) normality, at the 1% level. For each market, the working model considered in this section is: (1)
(1)
(1)
yt = φ0 + φ1 yt−1 + ψ1 zt−1 (2) (2) (2) + F (zt−d ) φ0 + φ1 yt−1 + ψ1 zt−1 + at , at = ht εt , F (zt−d ) = (1)
1 1 + exp{−γ ( zt−dsz−c )} (1)
,
(1)
2 ht = α0 + α1 at−1 + β1 ht−1 (2) (2) 2 (1) + F (zt−d ) α0 + α1 at−1 + β2 ht−1 ,
where εt has a student-t distribution with degrees of freedom ν, standardized to have unit variance. We choose the prior √ for γ with (μγ , σγ ) = (ln 5, 2 ln310 ) and set the vector τ as the corresponding least squares standard errors from an ordinary ARX model fit. The exogenous threshold z is the daily return on the US S&P500 index. 6.1 Estimation results The posterior median estimates and 95% credible intervals, for the working model applied to each market, are summarized in Table 2. The posterior mode for the delay d (not
shown) is 1 day for all markets, with posterior probabilities ≈ 1. Figure 3 shows some smoothed posterior densities (using the standard default density estimator in software Matlab) of MCMC iterates for selected functions of parameters across the markets. Similar to before, the smoothness parameter γ had uncertainty (or standard error) that increased with the posterior estimate. Figure 3(a) shows the posterior for γ in each market. There are two groups: Italy, Australia and Canada show clear evidence of smooth and slow transitions, with almost all posterior weight below γ = 5 (see also interval estimates in Table 2); while Japan, Taiwan and the UK display somewhat sharper and faster transitions, but with some uncertainty concerning this result, still all modes below γ = 5 and almost all posterior weight below γ = 20. None of these posteriors suggest a sharp transition function. (1) (1) (2) All estimates of ψ1 and ψ1 + ψ1 are positive and significant, excepting Canada, indicating the well-known result that the previous day’s S&P500 return has a significant positive spillover effect on the mean of other markets. This effect also seems asymmetric: it is larger (excepting Canada and Taiwan) in the negative regime (zt−d < c): significantly so in UK, Italy and Australia (see Table 2); this is also shown in Figs. 3(c) and (e), respectively showing posterior densities of ψ1(1) and ψ1(1) + ψ1(2) in each market. Only Canada has no significant US spillover, possibly due to its market being in the same time zone as the US: the previous day’s US return is ‘older’ news compared to other markets. Comparing the volatility estimates, the estimate of volatility persistence in the negative regime (zt−d < c) is higher than that in the positive regime, i.e. α1(2) +β1(2) < 0, significantly and clearly in all markets; see Figs. 3(d) and (f), (1) (1) showing that the volatility becomes explosive α1 +β1 > 1 after large negative shocks to the US market (in which case F (zt−d ; γ , c) ≈ 0) in at least 4 markets: Italy, UK, (1) (1) Canada and Australia, but remains stationary (α1 +β1 +α1(2) +β1(2) < 1) after large positive shocks in the US (where F (zt−d ; γ , c) ≈ 1). Based on these results, higher average volatility is evident when bad news (zt−1 < 0) arrives from the US market and this bad news has a stronger and more persistent, even explosive, effect on volatility in these markets. Regarding the distribution of εt , all degrees of freedom estimates are less than 10 (except the UK), suggesting the existence of conditional leptokurtosis and justifying the use of a fat-tailed error distribution. The estimates of the smoothing parameter γ are between 1.73 (Italy) and 7.75 (Japan); while posteriors for the threshold parameter c are shown in Fig. 3(b): all markets, barring Canada and the UK, favour a non-zero threshold cutoff. Japan and Taiwan favour a negative, while Italy and Australia favour a positive threshold. Figure 4 presents the
Stat Comput (2008) 18: 391–408
401
Table 2 Inference for six stock markets from the double ST-GARCH model Italy Med
UK std
95%CI
Med
Canada std
95%CI
Med
std
95%CI
(1)
0.125
0.080
−0.012
0.309
0.117
0.067
−0.076
0.220
−0.048
0.064
−0.185
0.062
φ1(1)
−0.043
0.039
−0.116
0.037
−0.169
0.038
−0.246
−0.098
0.061
0.049
−0.041
0.156
(1) ψ1
0.339
0.059
0.238
0.466
0.450
0.051
0.334
0.536
0.025
0.050
−0.074
0.123
φ0(2) (2) φ1 ψ1(2)
−0.105
0.196
−0.541
0.237
−0.095
0.108
−0.279
0.212
0.204
0.126
−0.017
0.469
−0.030
0.076
−0.191
0.112
0.065
0.054
−0.037
0.174
−0.035
0.086
−0.222
0.126
−0.188
0.073
−0.333
−0.048
−0.184
0.056
−0.292
−0.075
0.021
0.059
−0.092
0.141
α0
(1)
0.013
0.009
0.001
0.035
0.021
0.012
0.003
0.048
0.016
0.014
0.001
0.051
α1(1)
0.107
0.019
0.078
0.156
0.085
0.015
0.061
0.115
0.097
0.014
0.067
0.122
β1
(1)
0.977
0.021
0.925
0.999
0.985
0.014
0.949
1.000
0.986
0.011
0.957
0.999
α0(2) (2) α1 β1(2)
0.000
0.017
−0.030
0.034
−0.014
0.015
−0.043
0.014
−0.008
0.018
−0.050
0.022
−0.036
0.031
−0.099
0.019
−0.042
0.022
−0.101
−0.003
−0.029
0.025
−0.092
0.016
−0.166
0.039
−0.235
−0.074
−0.102
0.026
−0.145
−0.047
−0.162
0.030
−0.230
−0.104
γ
1.726
0.977
0.920
3.954
4.477
4.562
1.303
17.613
2.406
1.226
1.240
5.643
c
0.342
0.171
−0.032
0.600
−0.178
0.135
−0.462
0.087
0.048
0.237
−0.388
0.495
ν
9.120
1.465
6.980
12.632
20.080
20.163
12.085
48.025
7.716
0.993
6.174
10.040
φ0
Australia
Japan
Taiwan
φ0
(1)
0.012
0.050
−0.101
0.113
−0.122
0.100
−0.339
0.069
−0.117
0.090
−0.262
0.149
φ1(1)
0.008
0.032
−0.051
0.076
−0.073
0.043
−0.159
0.009
0.026
0.046
−0.078
0.108
(1) ψ1
0.333
0.032
0.273
0.397
0.399
0.067
0.253
0.518
0.311
0.054
0.213
0.430
φ0(2) (2) φ1 ψ1(2)
0.094
0.124
−0.154
0.360
0.175
0.120
−0.065
0.428
0.214
0.127
−0.149
0.444
−0.142
0.062
−0.281
−0.031
0.014
0.054
−0.091
0.122
−0.065
0.064
−0.172
0.078
−0.063
0.035
−0.133
0.001
−0.049
0.072
−0.182
0.101
0.001
0.075
−0.155
0.139
α0
(1)
0.010
0.007
0.001
0.027
0.082
0.035
0.015
0.142
0.175
0.054
0.098
0.304
α1(1)
0.082
0.017
0.055
0.115
0.102
0.023
0.062
0.153
0.077
0.024
0.034
0.128
β1
(1)
0.984
0.012
0.952
1.000
0.911
0.028
0.854
0.967
0.954
0.030
0.881
0.993
α0(2) (2) α1 β1(2)
0.006
0.015
−0.023
0.033
−0.066
0.041
−0.136
0.018
−0.144
0.062
−0.267
−0.029
−0.016
0.031
−0.086
0.033
−0.054
0.025
−0.115
−0.012
−0.019
0.032
−0.078
0.046
−0.189
0.033
−0.278
−0.144
0.008
0.036
−0.072
0.076
−0.064
0.045
−0.144
0.017
γ
1.907
0.942
0.926
4.462
7.748
8.416
1.763
31.408
4.155
3.741
1.245
14.985
c
0.352
0.160
0.001
0.592
−0.474
0.102
−0.541
−0.156
−0.407
0.113
−0.543
−0.139
ν
7.380
0.893
5.911
9.429
7.987
1.108
6.244
10.680
5.256
0.538
4.393
6.465
(ξ, k) = (0.7,0.001). A total of 40,000 MCMC iterations are performed, with the first M = 15,000 warm-up iterates being discarded; only every 5th iterate is collected in the sampling period
estimated transition function F (zt−d ; γ , c) for each market (dark crosses), the steepness reflecting the magnitude of each estimate of γ . Also shown are transition functions based on 100 randomly selected MCMC iterates for (γ , c) (grey crosses). These plots roughly illustrate the posterior distribution for F (zt−d ; γ , c). They indicate that Taiwan, Japan and the UK are closer to having a sharp transition between regimes, but with still much evidence that γ is finite; whilst Italy, Australia and Canada appear more consistent
with a slower and smoother transition between regimes; as agrees with Fig. 3(a). 6.2 Empirical advantage of adaptive sampling We now illustrate some of the efficiency gains of the adaptive sampling method in this paper, over a pure RW Metropolis sampler. We use the same parameter groups for each sampling scheme, given in Sect. 3.3. Since inter-
402
Stat Comput (2008) 18: 391–408
Fig. 3 Some posterior densities for parameters from the ST model, for each market. Plots are (a) γ ; (b) threshold parameter c; (c) exogenous mean effect ψ (1) ; (d) persistence in volatility α1(1) + β1(1) ; (e) ψ (1) + ψ (2) ; (f) α1(1) + β1(1) + α1(2) + β1(2)
iterate correlations can slow down mixing, we firstly illustrate via ACF plots that the adaptive method can indeed lead to reduced auto-correlations for MCMC iterates. We consider two of our chosen markets: Japan and the (j ) UK. Figure 5 shows the ACF plots for the parameters α0 , (j ) (j ) α1 and β1 for j = 1, 2 from two MCMC runs: the first uses a pure RW Metropolis scheme (no proposal updating used); the second uses the adaptive method in this paper. Clear improvements, with the ACF dying down much faster,
are illustrated when using the adaptive MCMC sampling method, for both datasets. Further, we consider the potential scale reduction statistic ‘R’ from Gelman et al. (2005, pp. 296–297). R estimates the potential for reducing the standard error of a parameter estimate by running the MCMC chain for more than its current number of iterations. The measure requires multiple MCMC runs and employs the within and between sample estimates of variance (see p. 296, Gelman et al. 2005). This measure will decline to 1 as the MCMC sample size
Stat Comput (2008) 18: 391–408
403
Fig. 4 The estimated smoothing transition function F (zt−d ; γ , c) for each market (dark crosses), the transition functions based on 100 randomly selected joint MCMC iterates for γ and c (grey crosses) are also shown Table 3 The statistic R measuring potential scale reduction for further MCMC sampling α0(1)
α1(1)
β1(1)
α0(2)
α1(2)
β1(2)
Pure RW
1.253
1.054
1.207
1.267
1.057
1.219
Adaptive
1.243
1.005
1.065
1.227
1.005
1.058
Pure RW
1.029
1.024
1.022
1.028
1.018
1.013
Adaptive
1.013
1.003
1.007
1.011
1.004
1.014
Market
Method
Japan
UK
increases, indicating that standard error cannot be reduced by further MCMC iterations. We report this statistic from five separate MCMC runs for the UK and Japanese markets, using each of the two MCMC methods mentioned above. The two methods used the same five sets of (different) start(j ) (j ) (j ) ing values. R is reported for α0 , α1 and β1 for j = 1, 2 in Table 3. The reduction in the statistic R when using the adaptive method, compared to the RW method, is clear, especially for (j ) (j ) the parameters α1 , β1 .
6.3 Model selection results We have found significant threshold nonlinearity in these six markets, in response to US market news. A model comparison analysis can now help determine the most appropriate type of nonlinear model for each market in sample. Many of the estimates of γ across markets have wide intervals, also illustrated in Fig. 4: it may be that a sharp threshold, or even a simple symmetric model, is statistically just as appropriate. Thus, we would like to compare the double ST model
404
Stat Comput (2008) 18: 391–408
Fig. 5 ACF plots for parameters α. (a) The pure RW method: Japan. (b) The adaptive method: Japan. (c) The pure RW method: UK. (d) The adaptive method: UK
(a)
(b)
(c)
(d)
with the symmetric GARCH model (γ = 0) and the double sharp threshold model (γ → ∞). We consider a DTX-GARCH model given as follows: (1) (1) (1) φ0 + φ1 yt−1 + ψ1 zt−1 + at , zt−d ≤ r, yt = (2) (2) (2) φ0 + φ1 yt−1 + ψ1 zt−1 + at , zt−d > r, √ i.i.d. where at = ht εt , εt ∼ t (ν) (1) (1) 2 (1) + β1 ht−1 , zt−d ≤ r, α0 + α1 at−1 ht = (2) (2) 2 (2) α0 + α1 at−1 + β1 ht−1 , zt−d > r. Table 4 contains the estimated posterior probabilities for the three models under consideration. We varied the hyperparameter ξ across three choices for a sensitivity analysis; the model selected was quite insensitive to this choice, although the posterior probabilities did vary marginally. The numbers in parentheses compare only the DTGARCH (DT) and proposed smooth transition (ST) model. The ST model is very strongly preferred in all markets. The posterior probabilities are slightly higher in Italy, Canada, UK and Aus-
tralia; this makes sense because the posterior densities in Fig. 3(a) have more weight on lower values of γ in these markets (except UK). However the probabilities in favour of the ST model are still very convincingly close to 1 in Taiwan and Japan: these countries display the most evidence for higher values of γ but not nearly enough to support a sharp threshold. Clearly, these markets all display a significant degree of asymmetry in response to US market returns, and this asymmetry seems to be best captured by a smooth, as opposed to sharp, transition function. 6.4 Empirical VaR forecasting results Results from the VaR forecasting exercise are given in Tables 5 and 6. Five models are considered: the ARX-GARCH with normal errors and separately with t-errors; the proposed ST model; the DTX-GARCH model and the standard RiskMetrics model (with parameter set to 0.94 as standard). Table 5 contains violation rates for each model, for a 500 day period starting from March 25, 2005, the day after the end of the sample data period, for α = 0.01, 0.05. By themselves
Stat Comput (2008) 18: 391–408
405
Table 4 Posterior probability for each model for three choices of ξ Italy
UK
Canada
Australia
Japan
Taiwan
ST
0.999 (0.996)
1.000 (1.000)
1.000 (0.999)
0.997 (0.999)
0.942 (0.999)
0.967 (1.000)
DT
0.000 (0.004)
0.000 (0.000)
0.000 (0.001)
0.000 (0.001)
0.000 (0.001)
0.000 (0.000)
GARCH-t
0.001
0.000
0.000
0.003
0.058
0.033
ST
0.997 (0.997)
1.000 (1.000)
0.999 (1.000)
0.986 (0.997)
0.941 (0.995)
0.980 (1.000)
DT
0.000 (0.003)
0.000 (0.000)
0.000 (0.000)
0.000 (0.003)
0.000 (0.005)
0.000 (0.000)
GARCH-t
0.003
0.000
0.001
0.014
0.059
0.020
ST
0.964 (0.966)
1.000 (1.000)
0.995 (0.994)
0.951 (0.960)
0.901 (0.968)
0.962 (0.992)
DT
0.000 (0.034)
0.000 (0.000)
0.000 (0.006)
0.000 (0.040)
0.000 (0.032)
0.000 (0.008)
GARCH-t
0.036
0.000
0.005
0.049
0.099
0.038
ξ = 0.5
ξ = 0.7
ξ = 1.0
ST: double smoothing transition model, DT: the double threshold GARCH model, GARCH-t: ARX-GARCH model. The parentheses include probabilities for comparing both the ST and DT models only Table 5 Risk management evaluation—1% and 5% VaR violation rates (and numbers) are given GARCH-n
GARCH-t
ST
DT
RiskMetrics
α = 0.01 Italy
1.8%
(9)
1.8%
(9)
0.6%
(3)
0.6%
(3)
2.8%
(14)
UK
1.0%
(5)
0.8%
(4)
0.6%
(3)
0.8%
(4)
2.4%
(12)
Canada
2.0%
(10)
1.4%
(7)
0.8%
(4)
0.4%
(2)
3.0%
(15)
Australia
1.8%
(9)
1.8%
(9)
1.0%
(5)
0.8%
(4 )
3.0%
(15)
Japan
1.2%
(6)
1.2%
(6)
0.6%
(3)
0.6%
(3)
2.0%
(10)
Taiwan
1.8%
(9)
1.4%
(7)
0.8%
(4)
0.8%
(4)
2.4%
(12)
Italy
3.8%
(19)
4.0%
(20)
4.6%
(23)
4.4%
(22 )
5.8%
(29)
UK
5.0%
(25)
5.2%
(26)
4.0%
(20)
4.0%
(20)
5.4%
(27) (33)
α = 0.05
Canada
5.8%
(29)
6.2%
(31)
4.0%
(20)
3.8%
(19)
6.6%
Australia
5.2%
(26)
5.4%
(27)
5.0%
(25)
4.4%
(22)
6.4%
(32)
Japan
4.2%
(21)
4.2%
(21)
5.2%
(26)
4.8%
(24 )
5.0%
(25)
Taiwan
3.0%
(15)
3.4%
(17)
4.4%
(22)
4.6%
(23)
5.2%
(26)
The parentheses include numbers of days in 500 trading days for which the loss of actual returns exceeded the forecasted VaR, over a one-day time horizon
these numbers can only be suggestive, but under the null hypothesis we would expect 5 violations for α = 0.01 and 25 violations for α = 0.05. The ST model clearly has violation rates closest to expected among the competing models, with 8 out of 12 instances (4 each at α = 0.01, 0.05) of being the closest or equal closest (+ or −1 violation is considered equal) to nominal rate out of 12 cases: these being Italy, Canada, Australia and Taiwan at α = 0.05 and Italy, Canada, Australia and Japan at α = 0.01. The next best
models are the DT and ARX-GARCH-n with 5 instances of (equal) closest to nominal rate. We employ two back-testing criteria (unconditional and conditional coverage) for examining the accuracy of the models for VaR. The simplest method tests the hypothesis that the violation rate is equal to α. Kupiec (1995) examines whether VaR estimates, on average, provide correct coverage of the lower α percent tails of the forecasted distributions. Christoffersen (1998) developed a conditional coverage test that examines whether VaR estimates exhibit cor-
406
Stat Comput (2008) 18: 391–408
Table 6 Evaluation of VaR interval forecasts GARCH-n
GARCH-t
ST
DT
RiskMetrics
LRuc
LRcc
LRuc
LRcc
LRuc
LRcc
LRuc
LRcc
LRuc
LRcc
Italy
0.1060
0.0935
0.1060
0.0935
0.3315
0.6128
0.3315
0.6128
0.0009
0.0029
UK
1.000
0.9506
0.6414
0.8687
0.3315
0.6128
0.6414
0.8687
0.0077
0.0025
Canada
0.0479
0.1152
0.3966
0.6319
0.6414
0.8687
0.1250
0.3059
0.0003
0.0009
Australia
0.1060
0.0935
0.1060
0.0935
1.000
0.9506
0.6414
0.8687
0.0003
0.0003
Japan
0.6630
0.8454
0.6630
0.8454
0.3315
0.6128
0.3315
0.6128
0.0479
0.0589
Taiwan
0.1060
0.2296
0.3966
0.6319
0.6414
0.8687
0.6414
0.8687
0.0077
0.0161
Italy
0.1994
0.4169
0.2885
0.5553
0.6776
0.6029
0.5301
0.4859
0.4229
0.1879
UK
1.000
0.8082
0.8384
0.8418
0.2885
0.5486
0.2885
0.5486
0.6852
0.4497 0.1332
α = 0.01
α = 0.05
Canada
0.4229
0.1706
0.2346
0.1663
0.2885
0.5486
0.1994
0.4097
0.1168
Australia
0.8384
0.4164
0.6852
0.4497
1.0000
0.7800
0.5301
0.4859
0.1678
0.1641
Japan
0.3992
0.3918
0.3992
0.3918
0.8384
0.8172
0.8364
0.9732
1.000
0.8082
Taiwan
0.0271
0.0665
0.0822
0.0674
0.5301
0.4859
0.6776
0.6029
0.8384
0.1341
The cells in bold font indicate rejection of the null hypothesis of correct VaR estimates at the 5% significance level
Table 7 The ratio (α/α) ˆ of the violation rate at 5% and 1% VaR
Markets
GARCH-n
GARCH-t
1%
Italy
1.80
1.80
UK
5%
2.80
0.80
2.40
1.40
0.80
0.40
3.00
Australia
1.80
1.80
1.00
0.80
3.00
Japan
1.20
1.20
0.60
0.60
2.00
Taiwan
1.80
1.40
0.80
0.80
2.40
Italy
0.76
0.80 1.04
0.60
0.60
RiskMetrics
2.00
1.00
0.80
0.60
DT
Canada
UK
1.00
ST
0.92 0.80
0.88
1.16
0.80
1.08
Canada
1.16
1.24
0.80
0.76
1.32
Australia
1.04
1.08
1.00
0.88
1.28
Japan
0.84
0.84
1.04
0.96
Taiwan
0.60
0.68
0.88
0.92
rect coverage at each point in time. Under the null hypothesis that the failure process is independent and the expected violation rate is equal to α. The p-values from the two LR (likelihood ratio) tests are presented in Table 6. For α = 0.01 the RiskMetrics model is clearly rejected in all markets. For a 5% significance level only the ARX-GARCH-n model in Canada can be rejected; while at a 10% level only the ST and DT models cannot be rejected in any market. For α = 0.05 the ST, DT and RiskMetrics models all cannot be rejected in any market. The GARCH-n at 5% level and GARCH-t at 10% level can be rejected in Taiwan only when α = 0.05.
1.00 1.04
A generic but useful criterion for an optimal choice of model is that ratio α /α be close to 1. Moreover, α /α < 1 is more desirable than α /α > 1 since the risk management loss function tends to be asymmetric. Table 7 shows the ratios of observed to expected violation rates in each market. Recall that VaR assessment requires an asymmetric loss function so that a violation rate below nominal is preferable to a rate above nominal, by the same or even slightly smaller distance. Under these criteria we have boxed the arguably ‘best’ models in each market. Again the ST model is optimal in 8 out of 12 instances: 5 at α = 0.01 (all but UK) and 3 (Italy, Canada and Australia) at α = 0.01. The DT model
Stat Comput (2008) 18: 391–408
is next best with 5 instances of being the best or equal best model. In summary, the ST model seems to have performed optimally at 1 day ahead percentile forecasting overall among the models considered, followed closely by the DTXGARCH model, except for the UK market where the ARXGARCH-n was the best model. The forecast comparison tests proved a somewhat blunt instrument, although again only the ST and DT models could not be rejected at a 10% level across markets.
7 Conclusion This paper presented a nonlinear double smooth transition heteroskedastic model to capture smooth mean and variance asymmetries in financial markets. We discussed adaptive MCMC methods for estimation, inference and percentile forecasting for this model, including the development of a prior distribution allowing reliable inference for the smoothing transition parameter and solving the identifiability problem as this parameter tends to zero. Further, we implemented a formal Bayes factor model comparison technique within the class of double smooth transition ARXGARCH models. Simulations showed that the Bayesian approach can provide accurate estimates for all unknown parameters. The model selection method showed that the proposed double smooth transition model was strongly preferred over a sharp threshold nonlinear and a symmetric GARCH model, in all six markets. A VaR forecasting study confirmed the proposed model was clearly the optimal performing candidate across markets for 1 day ahead 1st and 5th percentile value at risk forecasting, compared to a range of competing GARCH models and the standard Riskmetrics method. Finally, the adaptive method was shown to provide faster mixing and greater efficiency than a pure random walk Metropolis MCMC algorithm for two real data sets. Acknowledgements We thank the Co-ordinating Editor and two referees for their insightful comments that helped improve the paper. Cathy Chen is supported by National Science Council (NSC) of Taiwan grant NSC95-2118-M-035-001. Part of the work of Richard Gerlach, undertaken during a research visit to Feng Chia University (FCU), was supported by the grant 06G27022 of FCU. Support for Gerlach also came from the University of Newcastle via a networking grant and the University of Sydney via a R & D grant and various Faculty of Economics and Business internal research grants. The authors thank Mr. Eden Kao for an initial simulation study of a special case, and thank Mr. Tim Tien and Ms. Ann Lin for carrying out several computational tasks.
407
Appendix The prior of γ follows a lognormal distribution. We will show that posterior density of γ is integrable: 1 (ln γ − μ)2 f (γ ) = √ , γ > 0. exp − 2σ 2 2πσ γ We let μ = 0 and σ = f (γ ) ∝
1 −(ln γ )2 , e γ
√1 2
w.l.o.g., hence
γ > 0.
Lubrano (2001) shows that prior information is needed to force the posterior density to tend to zero quickly enough at its right tail in order to be integrable. The prior should at least be O(γ −(1+ν) ) as γ → ∞ with ν > 0. 2 The proof is then to compare γ1 e−(ln γ ) with γ −(1+ν) and proceeds as follows. 1 −(ln γ )2 γe γ −(1+ν)
e−(ln γ ) e−(ln γ ) = = γ −ν e−ν ln γ 2
2
= eν ln γ −(ln γ ) → 0 2
for ν < ln γ , which is satisfied for most ν > 0 as γ → ∞. We obtain that the lognormal prior of γ is O(γ −(1+ν) ) as γ → ∞ with ν < ln γ . Therefore, by Theorem 1 of Lubrano (2001), the posterior density of γ is of the same order of integrability as the prior of γ .
References Akram, Q.F., Eitrheim, O., Sarno, L.: Nonlinear dynamics in output, real exchange rates and real money balances: Norway, 1830– 2003. Norges Bank Working paper, 1502-8143 (2005) Anderson, H.M., Nam, K., Vahid, F.: Asymmetric Nonlinear Smooth Transition GARCH Models. In: Rothman, P. (ed.) Nonlinear Time Series Analysis of Economic and Financial Data, pp. 191–207. Kluwer, Boston (1999) Bacon, D.W., Watts, D.G.: Estimating the transition between two interesting straight lines. Biometrika 58, 525–534 (1971) Berg, A., Meyer, R., Yu, J.: Deviance information criterion for comparing stochastic volatility models. J. Bus. Econ. Stat. 22, 107–120 (2004) Black, F.: Studies of stock market volatility changes. In: Proceedings of the American Statistical Association. Business and Economic Statistics Section, pp. 177–181 (1976) Bollerslev, T.: Generalized autoregressive conditional heteroscedasticity. J. Econom. 31, 307–327 (1986) Brooks, C.: A double-threshold GARCH model for the French Franc/Deutschmark exchange rate. J. Forecast. 20, 135–143 (2001) Carlin, B.P., Chib, S.: Bayesian model choice via Markov chain Monte Carlo. J. R. Stat. Soc. Ser. B 57, 473–484 (1995) Carter, C., Kohn, R.: On Gibbs sampling for state space models. Biometrika 81, 541–553 (1994) Chan, K.S., Tong, H.: On estimating thresholds in autoregressive models. J. Time Ser. Anal. 7, 179–190 (1986)
408 Chelley-Steeley, P.: Modelling equity market integration using smooth transition analysis: a study of Eastern European stock markets. J. Int. Money Finance 24, 818–831 (2005) Chen, C.W.S., Chiang, T.C., So, M.K.P.: Asymmetrical reaction to US stock-return news: evidence from major stock markets based on a double-threshold model. J. Econ. Bus. 55, 487–502 (2003) Chen, C.W.S., Gerlach, R., So, M.K.P.: Comparison of non-nested asymmetric heteroskedastic models. Comput. Stat. Data Anal. 51, 2164–2178 (2006) Chen, C.W.S., Gerlach, R., So, M.K.P.: Bayesian model selection for heteroskedastic models. In: Chib, S., Griffiths, B., Koop, G., Terrell, D. (eds.) Bayesian Econometric Methods. Advances in Econometrics. Elsevier Science (2008, to appear) Chen, C.W.S., So, M.K.P.: On a threshold heteroskedastic model. Int. J. Forecast. 22, 73–89 (2006) Chen, C.W.S., So, M.K.P., Gerlach, R.: Assessing and testing for threshold nonlinearity in stock returns. Aust. NZ J. Stat. 47, 473– 488 (2005) Chib, S.: Marginal likelihood from the Gibbs output. J. Am. Stat. Assoc. 90, 1313–1321 (1995) Christoffersen, P.F.: Evaluating interval forecasts. Int. Econ. Rev. 39, 841–862 (1998) Congdon, P.: Bayesian model choice based on Monte Carlo estimates of posterior model probabilities. Comput. Stat. Data Anal. 50, 346–357 (2006) Engle, R.F.: Autoregressive conditional heteroscedasticity with estimates of variance of United Kingdom inflation. Econometrica 50, 987–1008 (1982) Franses, P.H., van Dijk, D.: Non-Linear Time Series Models in Empirical Finance. Cambridge University Press, Cambridge (2000) Gelman, A.: Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 1, 515–533 (2006) Gelman, A., Roberts, G.O., Gilks, W.R.: Efficient Metropolis jumping rules. In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics, vol. 5, pp. 599–607. Oxford University Press, Oxford (1996) Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, 2nd edn. Chapman & Hall, Boca Raton (2005) George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993) Gerlach, R., Tuyl, F.: MCMC methods for comparing stochastic volatility and GARCH models. Int. J. Forecast. 22, 91–107 (2006) Gerlach, R., Carter, C.K., Kohn, R.: Diagnostics for time series analysis. J. Time Ser. Anal. 20, 309–330 (1999) Geweke, J.: Bayesian treatment of the independent Student-t linear model. J. Appl. Econom. 8(Suppl.), 19–40 (1993) Geweke, J.: Bayesian comparison of econometric models. Working Paper 532. Research Department, Federal Reserve Bank of Minneapolis (1995) Godsill, S.J.: On the relationship between Markov chain Monte Carlo methods for model uncertainty. J. Comput. Graph. Stat. 10, 1–19 (2001) González-Rivera, G.: Smooth-transition GARCH models. Stud. Nonlinear Dyn. Econom. 3, 61–78 (1998)
Stat Comput (2008) 18: 391–408 Granger, C.W.J., Teräsvirta, T.: Modelling Nonlinear Economic Relationships. Oxford University Press, Oxford (1993) Green, P.J.: Reversible jump MCMC computation and Bayesian model determination. Biometrika 82, 711–732 (1995) Jorion, P.: Value at Risk: The New Benchmark for Controlling Market Risk. Irwin Professional (1997) Kass, R., Raftery, A.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995) Kupiec, P.: Techniques for verifying the accuracy of risk measurement models. J. Deriv. 2, 173–84 (1995) Li, C.W., Li, W.K.: On a double-threshold autoregressive heteroscedastic time series model. J. Appl. Econom. 11, 253–274 (1996) Lopes, H.F., Salazar, E.: Bayesian model uncertainty in smooth transition autoregressions. J. Time Ser. Anal. 27, 99–117 (2006) Lubrano, M.: Smooth transition GARCH models: a Bayesian perspective. Rech. Econ. Louvain 67, 257–287 (2001) Lundbergh, S., Teräsvirta, T.: Modelling economic high—frequency time series with STAR-STGARCH models. Working Paper Series in Economics and Finance No. 291, Stockholm School of Economics (1999) Nam, K., Pyun, C.S., Avard, S.L.: Asymmetric reverting behavior of short-horizon stock returns: an evidence of stock market overreaction. J. Bank. Finance 25, 807–824 (2001) Priestley, M.B.: State-dependent models: a general approach to nonlinear time series analysis. J. Time Ser. Anal. 1, 57–71 (1980) Scott, S.: Bayesian methods for hidden Markov models: recursive computing in the 21st century. J. Am. Stat. Assoc. 97, 337–351 (2002) Silvapulle, M.J., Sen, P.K.: Constrained Statistical Inference: Inequality, Order, and Shape Restrictions. Wiley-Interscience, Portland (2004) So, M.K.P., Chen, C.W.S., Chen, M.T.: A Bayesian threshold nonlinearity test in financial time series. J. Forecast. 24, 61–75 (2005) Teräsvirta, T.: Specification, estimation, and evaluation of smooth transition autoregressive models. J. Am. Stat. Assoc. 89, 208–218 (1994) Teräsvirta, T.: Modeling economic relationships with smooth transition regression. In: Ullah, A., Giles, D.E. (eds.) Handbook of Applied Economic Statistics, pp. 507–552. Dekker, New York (1998) Tong, H.: On a threshold model. In: Chen, C.H. (ed.) Pattern Recognition and Signal Processing. Sijhoff and Noordhoff, Amsterdam (1978) Tong, H., Lim, K.S.: Threshold autoregression, limit cycles and cyclical data (with discussion). J. R. Stat. Soc. Ser. B 42, 245–292 (1980) van Dijk, D., Franses, P.H.: Modelling multiple regimes in the business cycle. Macroecon. Dyn. 3, 311–340 (1999) van Dijk, D., Teräsvirta, T., Franses, P.H.: Smooth transition autoregressive models—a survey of recent developments. Econom. Rev. 21, 1–47 (2002) Vrontos, D., Dellaportas, P., Politis, D.N.: Full Bayesian inference for GARCH and EGARCH models. J. Bus. Econ. Stat. 18, 187–198 (2000)