Simulation Based Predictive Density Estimation and Testing for Diffusion Processes∗ Valentina Corradi1 , Norman R. Swanson2 1
Queen Mary, University of London and 2 Rutgers University
December 2005 Very Preliminary and Incomplete
Abstract This paper makes two contributions. First, we develop a simple simulation based framework for constructing predictive densities for one-factor and stochastic volatility diffusion processes (at arbitrary prediction horizons), that is suitable for the case in which the functional form of the conditional density is not known. Second, we outline a simulation and bootstrap based methodology that yields tests for pairwise comparison as well as tests for multiple comparison of possibly misspecified diffusion processes, in terms of their out sample predictive ability.
JEL classification: C22, C51.
1
Introduction
Financial assets are typically modelled as diffusion processes. Correct specification of the model describing the dynamics of the underlying asset is crucial for designing appropriate hedging strategies and for accurate pricing of derivative assets. Hence, the focus on testing for the correct specification of diffusion models. A first generation of tests, initiated by A¨ıt-Sahalia (1996), addresses the issue by comparing the marginal density implied by the null model, with a nonparametric density estimator, for the case of one-factor models (see also, Pritsker (1998), and Jiang (1998), Durham (2003)). While one-factor models may be a reasonable representation of the short-term rate, there is a somewhat widespread consensus that stock returns, or term structure models, require additional latent factors. To take this into account, Corradi and Swanson (2005a) provide a test for comparing the conditional distribution (marginal or joint) implied by the null with the corresponding empirical distribution, where the former can be replaced by the empirical of the simulated data when is not known in closed form. Their test, can be used in the context of multidimensional and/or multifactor models. Needless to say, tests based on the comparison of marginal distibutions have no power against iid alternatives with the same marginal. Test based on comparison of joint distribution do not suffer this problem, but still may have no power against alternatives with the same joint but a different conditional distribution. Hence, the need to directly compare the conditional distribution (or density) with their model free, nonparametric counterpart. The main difficulty that arises in this context stems from the fact that knowledge of the drift and variance terms of a diffusion process does not in turn imply knowledge of the transition density, in general. Indeed, if the functional form for the transition density were known, we could test the hypothesis of correct specification of a diffusion via the probability integral transform approach of Diebold, Gunther and Tay (1998), the cross spectrum approach of Hong (2001), Hong, Li and Zhao (2004), Hong and Li (2005), the test of Bai (2003) based on the joint use of a Kolmogorov test and a martingalization method, or via the normality transformation approach of Bontemps and Meddahi (2005) and Duan (2003). For the case in which the transition density is unknown, a test can be constructed by comparing the kernel (conditional) density estimator of the actual and simulated data, as in Altissimo and Mele (2002, 2005), and Thompson (2004), or comparing conditional distribution of the simulated and of the historical data, as in Bhardwaj, Corradi and Swanson (2005). Recently, A¨ıt-Sahalia,
1
Fan and Peng (2005) have extended the closed form approximation approach of A¨ıt-Sahalia (2002) to the case of conditional density. Basically, they compare a closed form approximation of the conditional density under the null, with a kernel conditional density estimator. Their test has power versus multifactor alternative, but it requires a one-factor null model. All the papers cited above deal with testing for the correct specification of a given diffusion model. Nevertheless, models are just approximations to reality and so they are likely to be misspecified. Therefore, often one has to choose among misspecified models. Here, the focus is on predictive densities and conditional confidence interval for diffusion process. In fact, in financial risk management one is mainly concerned in having a good approximation of the entire conditional distribution or of some specific conditional confidence interval, as in the case of Value at Risk evaluation. It is well known that the joint specification of drift, variance and initial values fully determine the specification of transition density, and so of the conditional distribution. We want to choose the model which provides the more accurate approximation of the conditional distribution. Accuracy is measured in term of a distributional generalization of mean square error, as defined in Corradi and Swanson (2005b). Let Fiτ (u|Xt , θi† ) and F0τ (u|Xt , θ0 ) be the distribution of Xt+τ , given Xt , evaluated at u, impliedµ by diffusion model i and by the¶”true” model µ³ respectively, we choose model ´2 ¶ ³ ´2 † † τ τ τ τ . < E Fj (u|Xt , θj ) − F0 (u|Xt , θ0 ) i over model j, if E Fi (u|Xt , θi ) − F0 (u|Xt , θ0 ) We can just take a weighted average over u, or select particular confidence intervals. If we knew Fiτ (u|Xt , θi† ) in closed form, then we could proceed as in Corradi and Swanson (2005b), if interested in in-sample predictive evaluation, or as in Corradi and Swanson (2006), if interested in outof-sample evaluation. However, in general the functional form of the model implied conditional distribution is not known in closed form. Thus, we rely on a simulation-based approach. We simulate the process, say under model i, τ −steps ahead, using as initial value the observed value for the series, say Xt , and using previously estimated parameters, then we construct the empirical distribution of the simulated series. For the case of stochastic volatility models, we do not observe the initial value of the volatility process. To overcome this problem, we simulate the process (asset price and volatility) using different random initial values for the volatility process, then we construct the empirical distribution of the asset price process for any given initial value of the volatility process, and we take an average over the latter. This integrates out the effect of the volatility initial value. As it is customary in the out-of-sample evaluation literature, we split the sample of T observa2
tions as R+P, where only the last P observations are used for predictive evaluation. Parameters are estimated recursively using simulated generalized method of moments (SGMM). We then simulate P − τ τ −step ahead processes, using as starting values XR , ..., XR+P −τ . We then construct the empirical distribution of the simulated data and that of the historical data. The scaled difference between the two forms the basis of the suggested statistics. Both the case of pairwise comparison of two models and the case in which a benchmark model is compared against multiple competing models are considered. The limiting distribution of the suggested statistics is a (functional of) guassian process with a covariance reflecting the contribution of recursive SGMM estimators. Ready to use critical values are not available. We provide asymptotically (first order) valid critical values via the block bootstrap. To accomplish this, we introduce a new bootstrap procedure able to mimic the contribution of parameter estimation error for SGMM estimators in a recursive estimation setting.1 The rest of the paper is organized as follows. Section 2 defines the set-up, Section 3 outlines the statistics and analyzes their asymptotic distribution. Section 4 provides a first order bootstrap procedure for recursively estimated SGMM estimators, and also provides asymptotically valid bootstrap critical values for the statistics of interest. Section 4 extends the analysis of the previous sections to the case of stochastic volatility models. An empirical illustration is reported in Section 5. All proofs are collected in an appendix. Hereafter, P ∗ denotes the probability law governing the resampled series, conditional on the sample, E ∗ and V ar∗ the mean and variance operators associated with P ∗ , o∗P (1) Pr −P denotes a term converging to zero in P ∗ −probability, conditional on the sample except a subset of probability measure approaching zero, and finally OP∗ (1) Pr −P denotes a term which is bounded in P ∗ −probability, conditional on the sample except a subset of probability measure approaching zero.
2
Set-Up
Given m diffusion models describing the dynamics of Xt , dX(t) = bk (X(t), θk† )dt + σk (X(t), θk† )dW (t), 1
(1)
This complements the block bootstrap procedure suggested in Corradi and Swanson (2007) for recursively esti-
mated m−estimators.
3
for k = 1, ...m, if model k is correctly specificied, then bk (X(t), θk† ) = b0 (X(t), θ0† ) and σk (X(t), θk† ) = σ0 (X(t), θ0† ). Now, Fkτ (u|Xt , θk† ) = P τ† (Xt+τ ≤ u|Xt ), i.e. Fkτ (u|Xt , θk† ) defines the conditional θk
distribution of Xt+τ , given Xt , evaluated at u, under the probability law generated by model k. Analogously, define F0τ (u|Xt , θ0 ) = Pθτ0 (Xt+τ ≤ u|Xt ) the ”true” conditional distribution. Suppose we are interested in approximating Pθτ0 (u1 ≤ Xt+τ ≤ u2 |Xt ). Accuracy is measured in terms of a distributional analog of mean square error. In particular, we say that model 1 is more accurate than model k, if µ³ ´2 ¶ † † τ τ τ τ E (F1 (u2 |Xt , θ1 ) − F1 (u1 |Xt , θ1 )) − (F0 (u2 |Xt , θ0 ) − F0 (u1 |Xt , θ0 )) µ³ ´2 ¶ † † τ τ τ τ τ τ < E (Fk (uk |Xt , θk ) − Fk (u1 |Xt , θk )) − (F0 (u2 |Xt , θ0 ) − F0 (u1 |Xt , θ0 )) . This measure defines a norm and implies a standard goodness of fit measure. Recalling that E (1{u1 ≤ Xt+τ ≤ u2 }|Xt ) = F0τ (u2 |Xt , θ0 ) − F0τ (u1 |Xt , θ0 ), we construct the sequence of P − τ τ −step ahead prediction errors under say model k, as 1{u1 ≤ Xt+τ ≤ u2 } − Fkτ (u2 |Xt , θbk,t ) − Fkτ (u1 |Xt , θbk,t ), for t = R, ..., R + P − τ, where θbk,t is an estimator of θk† computed using all observation up to time t, and P + R = T. If we knew Fkτ (u2 |Xt , θbk,t ) in closed form, we could proceed along the lines of Corradi and Swanson (2006). However, in general, the knowledge of the drift and variance of the diffusion do not imply the knowledge of Fkτ (u2 |Xt , θbk,t ) in closed form. As mentioned in the introduction, we simulate P − τ paths of length τ, using as starting values, XR+1 , ..., XR+P −τ , and using the recursively estimated parameters θbk,t , t = R, ..., R + P − τ. We then construct the empirical distribution of the series simulated under model 1. Hence, the first step is to obtain the sequence of estimator θbk,t . As in general, we do not have close form expression for the moments of the model, in the sequel we shall use SGMM estimators, in a recursive manner. In order to construct simulated estimators for say model k, we require simulated sample paths. If we use a Milstein scheme (see e.g. Pardoux and Talay (1985)), then 1 θ θ θ θ θ θ , θ)0 σk (X(q−1)h , θ)h Xqh − X(q−1)h = bk (X(q−1)h , θ)h + σk (X(q−1)h , θ)²qh − σk (X(q−1)h 2 1 θ θ + σk (X(q−1)h , θ)0 σk (X(q−1)h , θ)²2qh (2) 2 iid
where ²qh ∼ N (0, h), q = 1, . . . , Q, and σ 0 is the derivative with respect to the first argument. Also, Qh = N, so that N denotes the length of the simulated path. 4
In the current context, assume that θbk,t,N,h is the simulated generalized method of moments SGMM estimator for model k, and is defined to be: 0 t N t N X X X X 1 1 1 θk θk b k,t 1 Ω gk (Xj ) − gk (Xk,j gk (Xj ) − gk (Xk,j θbk,t,N,h = arg min 0 ,h ) 0 ,h ) θk ∈Θk t N 0 t N 0 j=1
j=1
j =1
j =1
0b
= arg min Gk,t,N,h (θk ) Ωk,t Gk,t,N,h (θk ),
(3)
θk ∈Θk
where g denotes a vector of p moment conditions, Θ ⊂
2.
The following result then holds. Theorem 1: Let Assumption A and B hold. Also assume that model 1 and model k are nonnested. If as P, R, S, N → ∞, h → 0, P/N → 0, P/S → 0, and h2 P → 0, and P/R → π, 0 < π < ∞, then (i) Under H0 , d
Dk,P,N,S (u1 , u2 ) → N (0, Wk (u1 , u2 )) where Wk (u1 , u2 ) = (C(u2 ) − C(u1 )) + (V (u2 ) − V (u1 )) + (CV (u2 ) − CV (u1 )) + (P11 (u2 ) − P11 (u1 )) + (Pkk (u2 ) − Pkk (u1 )) − (P1k (u2 ) − P1k (u1 )) +(P1 C(u2 ) − P1 C(u1 )) − (Pk C(u2 ) − Pk C(u1 )) +(P1 V (u2 ) − P1 V (u1 )) − (Pk V (u2 ) − Pk V (u1 )), where for a generic u, Ã ∞ X C(u) = E F
V (u) = Ã
θ
†
1 X1,1+τ
j=0
à F
!2 (u|X1 )−F0τ (u|X1 )
à −
!2 θ
†
1 X1,1+j+τ
∞ X
(u|X1+j )−F0τ (u|X1+j )
(F0τ (u|X1+j )−1{X1+j+τ
†
−
F
θ
†
k Xk,1+j+τ
!2 (u|X1+j )−F0τ (u|X1+j ) !!
Ã
(F0τ (u|X1 )−1{X1+τ
j=0
θ
k Xk,1+τ
Ã
ÃÃ E
F
!2 (u|X1 )−F0τ (u|X1 )
≤ u}) F
à ≤ u}) F
θ
†
1 X1,1+τ
θ
†
1 X1,1+j+τ
8
(u|X1 )−F
θ
†
k Xk,1+τ
(u|X1+j )−F
θ
†
k Xk,1+j+τ
(u|X1 ) !!!
(u|X1+j )
P11 (u) = 4Πµ2F1 µ0f
† 1 ,θ1
∞ X
CV (u) =
³ ´−1 (u) D1†0 Ω†1 D1† µf1 ,θ† 1
Ã
!2
E F
θ
†
1 X1,1+τ
j=0
Ã
(u|X1 )−F0 (u|X1 )
−
F
Ã
(F0τ (u|X1+j )−1{X1+j+τ ≤ u}) F
θ
!2 (u|X1 )−F0 (u|X1 )
Ã
†
1 X1,1+j+τ
θ
†
k Xk,1+τ
!!!
(u|X1+j )−F
θ
†
k Xk,1+j+τ
(u|X1+j )
³ ´−1 P1 C(u) = 4ΠµF1 µf1 ,θ† (u) D1†0 Ω†1 D1† D1†0 Ω†1 1
∞ X
E ((g(X1 ) − E(g(X1 )))
j=0
à F
!2 θ
†
1 X1,1+j+τ
(u|X1+j )−F0 (u|X1+j )
à −
F
θ
†
k Xk,1+j+τ
!2 (u|X1+j )−F0 (u|X1+j )
³ ´−1 P1 V (u) = 4ΠµF1 µf1 ,θ† (u)0 D1†0 Ω†1 D1† D1†0 Ω†1 1
∞ X
E ((g1 (X1 ) − E(g1 (X1 )))
j=0
Ã
Ã
(F0τ (u|X1+j )−1{X1+j+τ
!!!
≤ u}) F
θ
†
1 X1,1+j+τ
(u|X1+j )−F
θ
†
k Xk,1+j+τ
(u|X1+j )
³ ´−1 P1k (u) = 8ΠµF1 µf1 ,θ† (u)0 D1†0 Ω†1 D1† D1†0 Ω†1 1
∞ X
E ((g1 (X1 ) − E(g1 (X1 ))) (gk (X1+j ) − E(gk (X1+j ))))
j=0
³ ´−1 Ω†k Dk† Dk†0 Ω†k Dk† µf and
à µF1 = E
µf
† k ,θk
† k ,θk
(u)µFk , !
F τ θ†
1 (Xt ) X1,t+τ
(u)−1{Xt+τ ≤ u}
¶¶ ¶ ¶0 ¶ µ µ µµ µ θk† θ k,t,N,h θk,t,N,h (Xt ) − Xk,t+τ,i (Xt ) |Xt ∇θk Xk,t+τ,i (Xt ) , (u)0 = E ES fkτ u − Xk,t+τ,i
where ES denotes the expectation with respect to the probability law governing the simulated randomness, and E denotes expectation with respect the probability law governing the data. (ii) Under HA ,
µ Pr
¶ 1 √ |Dk,P,N,S (u1 , u2 )| > ε → 0. P 9
Note that the covariance term reflect the contribution of recursively estimated (SGMM) parameter estimation error. The intuitive argument underlying the Theorem 1, is the following: ½ ¾ ½ † ¾ S S 1X 1X θk θbk,t,N,h 1 Xk,t+τ,i (Xt ) ≤ u = 1 Xk,t+τ,i (Xt ) ≤ u S S i=1 i=1 Ã ! T ´ 1 X ³b θk† (u)∇θk Xk,t+τ,i (Xt ) √ +E f θ† θk,t,N,h − θ† + oP (1) k P t=R Xk,t+τ,i (Xt ) ! Ã T ´ 1 X ³b θk† (u)∇θk Xk,t+τ,i (Xt ) √ =F † (u) + E f θ† θk,t,N,h − θ† + oP (1) + oS (1), θ k P t=R Xk,t+τ,i (Xt ) X k (Xt ) k,t+τ
where oS (1) denotes a terming approaching zero as S → ∞. Then, the statement follows by the same argument used in the case in which the closed form of the conditional distribution is known. Note, that as S/P → ∞, we can neglect the contribution of simulation error in the asymptotic covariance matrix. So far we have considered pairwise comparison of two nonnested models. In some circumstances, one may be interested in comparing one model (benchmak) model against multiple competing models, as in White’s (2000) reality check. In this latter case, the null hypothesis is that no model can outperform the benchmark.4 In this case, the hypotheses are: ÃÃ ! !2 H00 : max EX F † (u2 ) − F † (u1 ) − (F0 (u2 |Xt ) − F0 (u1 |Xt )) θ
k=2,...,m
θ
1 X1,t+τ (Xt )
1 X1,t+τ (Xt )
−EX F
† θ X k (Xt ) k,t+τ
0 HA
(u2 ) − F
θ
†
k Xk,t+τ (Xt )
(u1 ) − (F0 (u2 |Xt ) − F0 (u1 |Xt )) ≤ 0
ÃÃ
: max EX k=2,...,m
2
!
F
θ
†
1 X1,t+τ (Xt )
(u2 ) − F
θ
†
1 X1,t+τ (Xt )
(u1 )
!2 − (F0 (u2 |Xt ) − F0 (u1 |Xt ))
−EX F
† θ X k (Xt ) k,t+τ
(u2 ) − F
θ
†
k Xk,t+τ (Xt )
2
(u1 ) − (F0 (u2 |Xt ) − F0 (u1 |Xt )) > 0
The statistic above allows to test for the null hypothesis that model 1 and k produce equally accurate prediction for the Using the same approach as discussed in the previous sub-sections, the appropriate test statistic is: M ax Dk,P,N,S (u1 , u2 ) = max Dk,P,N,S (u1 , u2 ) k=2,...,m
4
The reality check of White (2000) was developed for evaluation of conditional mean models. Corradi and Swanson
(2005b) extend the test to the case of comparison of conditional distributions and densities.
10
Theorem 2: Let Assumption A and B hold. Also assume that model 1 and k are nonnested for at least one k = 2, ..., m. If as P, R, S, N → ∞, h → 0, P/N → 0, P/S → 0 and h2 P → 0, and P/R → π, 0 < π < ∞, then d
max (Dk,P,N,S (u1 , u2 ) − µk (u1 , u2 )) → max Zk (u1 , u2 ),
k=2,..,m
k=2,...,m
where ÃÃ µ1 (u1 , u2 ) = E F
! θ
†
1 X1,t+τ (Xt )
(u2 ) − F
θ
†
1 X1,t+τ (Xt )
(u1 )
!2 − (F0 (u2 |Xt ) − F0 (u1 |Xt ))
and (Z1 (u1 , u2 ), ..., Zm (u1 , u2 )) is a m−dimensional Gaussian vector with covariance matrix with kk element given by Wk (u1 , u2 ) as defined in the statement of Theorem 1(i).5 Note that the critical values of maxk=2,...,m Zk (u1 , u2 ) provide asymptotically correct critical values for Dk,P,N,S (u1 , u2 ) when µk (u1 , u2 ) = µ1 (u1 , u2 ) for k = 2, ..., m, i.e. when all competitors are as good as the benchmark, least favorable case under the null. If µk (u1 , u2 ) > µ1 (u1 , u2 ) for some k, then they are upper bound, and inference drawn on them is conservative (see Hansen (2005) on recentering methods to alleviate this issue). If µk (u) > µ1 (u) for all k > 1, maxk=2,...,n ZP,u (1, k) approaches −∞ Under HA , maxk=2,...,n ZP,u (1, k) approaches +∞. In order to find critical values for maxk=2,...,n Zu (1, k) need to rely on bootstrap.6 We need a ³ ´ † 1 PT b bootstrap procedure able to mimic the limiting distribution of P 1/2 t=R θ1,t − θ1 .
4
Bootstrapping Critical Values
In the recursive case, observations at the beginning of the sample are used more frequently than observations at the end of the sample. This introduces a location bias to the usual block bootstrap, as under standard resampling with replacement, any block from the original sample has the same probability of being selected. Also, the bias term varies across samples and can be either positive or negative, depending on the specific sample. A first order valid bootstrap procedure for m−estimators in a recursive scheme has been suggested by Corradi and Swanson (2007).7 Here we 5 6
The out of diagonal terms can be defined in an analogous way. For the case of comparison of conditional mean models, White (2000) suggest Monte Carlo simulation as an
alternative to the bootstrap. Nevertheless, this alternative is not available in the present context, as the asymptotic θ
k,t,N,h covariance terms contain terms, such as ∇θk Xk,t+τ,i (Xt ), which are not known in closed form. 7 For the case of rolling estimation scheme, see Corradi and Swanson (2006).
11
complement the results of Corradi and Swanson (2007) by addressing the issue of bootstrapping SGMM estimators in a recursive setting. Resample b blocks of length l from the full sample, with lb = T. For any given τ, we need to jointly resample Xt , Xt+1 , ..., Xt+τ . More precisely, let Z t,τ = (Xt , Xt+1 , ..., Xt+τ ), t = 1, ..., T − τ, ∗ , ..., X ∗ ), we resample b overlapping blocks of length l from Z t,τ . This yields Z t,∗ = (Xt∗ , Xt+1 t+τ ∗ t = 1, ..., T − τ. Use these data to construct θbk,t,N,h . Recall that N is the sample length of the
simulated series, used to estimate parameters. Note that as we shall assume N/R, N/P → ∞, the simulation error vanishes and hence we do not need to resample the simulated series. Indeed, if N/R → ∞, then GMM and simulated GMM are asymptotically equivalent. More precisely, for t ≥ R,
∗ θbk,t,N,h
0 t T N N X X X X b 1 1 1 1 θk,t,N,h θk gk (Xj∗ ) − = arg min ) gk (Xj 0 ) − gk (Xj,h )− gk (Xj,h θk ∈Θk t T 0 N N j=1 j=1 j=1 j =1 t T N N X X X X 1 1 1 θbk,t,N,h θ gk (Xj∗ ) − 1 gk (Xj 0 ) − Ω∗k,t g(Xj,h )− g(Xj,h ) t T 0 N N j=1
j=1
j =1
j=1
b ∗ G∗ = arg min G∗k,t,N,h (θk )0 Ω k,t k,t,N,h (θk )
(6)
θk ∈Θk
where Ω∗t
−1
λt 1 X = wν,t t ν=−λt
t−λ Xt j=ν+1+λt
0 T T X X 1 ∗ g(Xj∗ ) − 1 g(Xj 0 ) g(Xj−ν )− g(Xj 0 ) T 0 T 0
j =1
j =1
The intuition behind the particular recentering used in (6), is that it ensures that the mean of the bootstrap moment conditions, evaluated at θbk,t,N,h is zero, up to a negligible term. In fact, note that: E∗
t 1X
t
j=1
T N N ∗ X X X b b θk,t,N,h 1 1 θk,t,N,h gk (Xj∗ ) − 1 gk (Xj 0 ) − gk (Xj,h )− gk (Xj,h ) T 0 N N
= E ∗ (gk (Xj∗ )) −
j=1
j =1
1 T
T X
j=1
gk (Xj 0 ) = O(l/T ), with l = o(T 1/2 ).
j 0 =1
where the O(l/T ) term is due to the end block effect. Lemma 2: Let Assumption A hold. If as P, R, N → ∞, h → 0, P/N → 0 and h2 P → 0, and l → ∞, l/T 1/4 → 0, P/R → π, 0 < π < ∞, then for k = 1, ..., m, ¯ Ã Ã ! ! Ã !¯ T T ³ ¯ ¯ ´ ´ X 1 X ³ b∗ 1 ¯ ∗ ¯ † P ω : sup ¯PT √ θk,t,N,h − θbk,t,N,h ≤ v − P √ θbk,t,N,h − θk ≤ v ¯ > ε → 0, ¯ P P v∈ ε → 0, v∈ ε → 0. P ω : sup ¯PT v∈