Efficient asymptotic variance reduction when estimating volatility in high frequency data∗ arXiv:1701.01185v1 [q-fin.ST] 5 Jan 2017
Simon Clinet†and Yoann Potiron‡ This version: January 6, 2017
Abstract This paper shows how to carry out efficient asymptotic variance reduction when estimating volatility in the presence of stochastic volatility and microstructure noise with the realized kernels (RK) from [Barndorff-Nielsen et al., 2008] and the quasi-maximum likelihood estimator (QMLE) studied in [Xiu, 2010]. To obtain such a reduction, we chop the data into B blocks, compute the RK (or QMLE) on each block, and aggregate the block estimates. The ratio of asymptotic variance over the bound of asymptotic efficiency converges as B increases to the ratio in the parametric version of the problem, i.e. 1.0025 in the case of the fastest RK Tukey-Hanning 16 and 1 for the QMLE. The finite sample performance of both estimators is investigated in simulations, while empirical work illustrates the gain in practice.
Keywords: high frequency data ; market microstructure noise ; integrated volatility ; quasimaximum likelihood estimator ; realized kernels
∗
We would like to thank Markus Bibinger, Dacheng Xiu, Jean Jacod, the participants of Keio Econometrics Workshop, the Workshop on Portfolio dynamics and limit order books in Ecole Centrale Paris, The Quantitative Methods in Finance 2016 Conference in Sydney for helpful discussions and advice. The research of Simon Clinet is in part supported by CREST Japan Science and Technology Agency, Japan Society for the Promotion of Science: Grants-in-Aid for Scientific Research No. 24340015 (Scientific Research), No. 26540011 (Challenging Exploratory Research). The research of Yoann Potiron is supported by a special private grant from Keio University. All financial data is provided by the Chair of Quantitative Finance of the Ecole Centrale Paris. † Graduate School of Mathematical Sciences, University of Tokyo: 3-8-1 Komaba, Meguro-ku, Tokyo 1538914, Japan. E-mail:
[email protected] website : http://www.ms.u-tokyo.ac.jp/~simon/ ‡ Faculty of Business and Commerce, Keio University. 2-15-45 Mita, Minato-ku, Tokyo, 108-8345, Japan. Phone: +81-3-5418-6571. E-mail:
[email protected] website: http://www.fbc.keio.ac.jp/~potiron
1
1
Introduction
Over the past decades, the availability of high frequency data has led to a better understanding of asset prices. The main object of interest, the quadratic variation, can be used for example as a proxy for the spot volatility or the volatility parameter of a time-varying model. Moreover, forecasts of future volatility can be improved with it. Without microstructure noise, the realized variance (RV) estimator (e.g. [Andersen et al., 2001], [Meddahi, 2002], [Barndorff-Nielsen and Shephard, 2002]) is both consistent and efficient. The convergence rate n1/2 and the asymptotic variance (AVAR) were established in [Genon-Catalot and Jacod, 1993], [Jacod, 1994] and [Jacod and Protter, 1998] (see also [Zhang, 2001] and [Mykland et al., 2006]). Under market frictions, the RV is no longer consistent. [Zhang et al., 2005] bring forward the Two-Scale Realized Volatility nonparametric estimator, the first consistent estimator in the presence of noise and with a relatively slow convergence rate of n1/6 . [Zhang, 2006] modifies it to provide the Multi-Scale Realized Volatility (MSRV) which features the optimal rate of convergence n1/4 as documented in [Gloter and Jacod, 2001]. Other approaches consist in and are not limited to: pre-averaging (PAE) the observations ([Jacod et al., 2009]), [Barndorff-Nielsen et al., 2008] advocates for the realized kernels (RK) and [Xiu, 2010] studies the quasi-maximum likelihood estimator (QMLE). Those three approaches share the optimal rate property and only differ through edge-effects which impact their respective AVAR. 1 RT The nonparametric AVAR bound of efficiency is equal to 8ωT 2 0 σu3 du, where T stands for the horizon time and ω 2 corresponds to the noise variance. This was shown in [Reiß, 2011] under the deterministic volatility and Gaussian noise setting, but it is commonly assumed that it stays true under stochastic volatility. Subsequently, in a recent breakthrough paper, [Altmeyer and Bibinger, 2015] found an estimator based on the spectral approach introduced in [Reiß, 2011] which reaches the bound in a very general situation. More recently, [Jacod and Mykland, 2015] proposed an adapted version of the pre-averaging estimator using local estimates as in [Reiß, 2011] which gave rise to estimators that are within 7% of the bound. When comparing fairly how perform several estimators, we need the candidates to be equipped with the same technology. Following closely the local technique used in [Reiß, 2011] and [Jacod and Mykland, 2015], we aim to adapt accordingly the RK and the QMLE. Indeed, although both estimators behave remarkably well when volatility is constant, i.e. in the parametric case the ratio of AVAR over the bound of asymptotic efficiency is 1.0025 when considering the most efficient Tukey-Hanning 16 RK and 1 for the QMLE, they can actually be highly inefficient in the non parametric setting as documented in the following of this introduction and in Section 2. Under time-varying volatility, we aim to reduce significantly their AVAR and rend them as efficient as in the parametric problem. Although it would reduce the AVAR the same way, we didn’t implement the local version of the MSRV. One reason is that the MSRV doesn’t
2
perform as well with a parametric ratio around 1.15. To reduce the variance, we divide the interval 0, T into B non-overlapping regular blocks 0, T /B , T /B, 2T /B , · · · , (B − 1)T /B, T . We then compute the RK (QMLE) on each block, and take the sum of the B estimates. We show that the estimator ratio converges to the parametric ratio as B increases. More importantly for practical applications, the convergence is very fast, and the gain is already important in the case B = 2 blocks. There are two essential reasons from a practical perspective why in our setting the number of blocks B is fixed whereas [Altmeyer and Bibinger, 2015] and [Jacod and Mykland, 2015] let B → ∞. The first reason is that we find that the AVAR is in most cases already within 4% of the bound when estimating daily volatility with B = 8 blocks, which is reasonably close enough. The second reason is that we want to quantify the expected gain when choosing a fixed number of blocks. From a more theoretical perspective, [Altmeyer and Bibinger, 2015] actually require that the block length tends to 0, and the asymptotic properties of their estimator are unknown if B stays fixed. In [Jacod and Mykland, 2015], the authors choose a growing B because the local tuning parameter1 is optimal only in the parametric case. This is different from the RK (respectively QMLE) case where the tuning parameter is optimal even when the volatility is time-varying (there is no tuning parameter). Nonetheless, by choosing an adapted non-parametric version of the tuning parameter2 which might not be optimal, we can fix B with the PAE and compute the associated AVAR. As an example, we focus on the RK Tukey-Hanning 16 and consider the (apparently innocuous) block constant model σt = 1 for t ∈ [0, 21 ) and σt = 2 for t ∈ [ 12 , 1]. When choosing the optimal bandwidth, [Barndorff-Nielsen et al., 2008]3 showed that the AVAR is equal to Z T 3/4 (RK) σu4 du g, (1.1) AV AR[0,T ] = ω T 0
where g is defined as 16 g= 3
q
ρk•0,0 k•1,1
with
1 q + p 2 1 + 1 + 3d/ρ
q
p 1 + 1 + 3d/ρ2 ,
RT
σ 2 du k 0,0 k 2,2 , d = • 1,1• ρ = q 0R u T (k• )2 T 0 σu4 du and where k•i,i are constant functions of the kernel. We fix T = 1 and we compute in that case R1 2 R1 3 R1 4 σ du = 5/2, σ du = 9/2 and σ du = 17/2. Thus, the bound of efficiency is equal to u u 0 0 0 u 1
see (3.4) on p. 2916.
√
Φ(g,g 0 )2 +3Φ(g,g)Φ(g 0 ,g 0 )−Φ(g,g 0 ) −1/2 We can choose θ = γ T c du in the notation of 3Φ(g 0 ,g 0 ) 0 u [Jacod and Mykland, 2015] (see in particular (3.4) on p. 2916) as such adapted tuning parameter. 3 see pp. 1494-1495 for more details. 2
−1
−1
RT
3
(RK)
36ω, whereas AV AR[0,1] = 37.89ω. This can be expressed as a loss of 37.89−36 ≈ 5%, which is to 36 ≈ .25%. When fixing B = 2, the volatility be compared to the loss in the parametric case4 8.02−8 8 (RK) on each block is constant and thus yields AV AR[0,1/2] = 2−3/2 × 8.02ω on the first block and (RK)
AV AR[1/2,1] = 23/2 ×8.02ω on the second block. As both estimates are uncorrelated5 , we obtain √ R1 (RK) (RK) (RK) that the global AVAR is equal to AV AR[0,1] = 2(AV AR[0,1/2] +AV AR[1/2,1] ) = 8.02ω 0 σu3 du, i.e. .25 % loss which corresponds exactly to the parametric loss. From (1.1), we can see that the theoretical loss can be expressed as a deterministic function of the already well-known measure of volatility constancy ρ and another connected quantity which we denotes RT 3 σu du κ= . R0 T T 1/4 ( 0 σu4 du)3/4 Details can be found in Section 2, along with an expression for the QMLE loss as well. In the previous example where the loss was about 5%, the corresponding setting can be computed as p ρ = 5/2 × 2/17 ≈ .86 and κ = 9/2 × (2/17)3/4 ≈ .90. Volatility on real data is much more heteroskedastic than this toy example, corresponding to lower ρ and κ. In their empirical study, [Andersen et al., 2014] daily estimate ρ−1 on Dow Jones stocks between January 1, 2005 and May 31, 2007. They find that the typical value is around 1.3, and about 1.6 when restricting to the top 10% days in terms of intraday variation of volatility. This corresponds respectively to estimates of ρ as 1/1.3 ≈ .77 and 1/1.6 ≈ .62. When taking respectively those two realistic values, the corresponding RK and QMLE losses are expected to be around 20% (can go up to 100 %), depending on the other parameter value κ. With such highly inefficient estimators, we believe that there is a practical need for variance reduction. This is especially the case on days when the volatility is highly heteroskedastic. Clearly this estimator is related to local parametric methods in high-frequency data, i.e. aggregating local parametric estimates. For example, [Mykland and Zhang, 2009] investigated the ex post adjustment involving asymptotic likelihood ratios to make when assuming constant local volatility. [Reiß, 2011] showed the asymptotic equivalence in Le Cam’s sense between the non-parametric and locally constant volatility experiment. To estimate quarticity and other functionals of volatility, [Jacod et al., 2013] estimated the volatility locally and plugged the value into the sum. [Potiron, 2016] introduced a very general time-varying parameter model and used a sum of parametric estimates to estimate the integrated parameter. Using the same technology to estimate the time-varying parameter of a self-exciting process, our own work includes [Clinet and Potiron, 2016]. The remainder of the paper is structured as follows. Section 2 stretches the limitations of the global approach by expressing the loss as a function of ρ and κ. Section 3 investigates the 4 5
Details can be found on Table II (p. 1495, [Barndorff-Nielsen et al., 2008]). if we remove end-effects
4
RK case and the corresponding limit theory, while Section 4 considers the QMLE approach. Section 5 documents the AVAR reduction on one realistic example. Robustness to jumps and weaker assumptions on observation times and noise is discussed in Section 6. Section 7 performs a Monte Carlo experiment to assess finite sample performance and compare both approaches to the pre-averaging estimator. Section 8 provides an empirical illustration where we quantify the expected gain in practice. Theoretical details and proofs can be found in the Appendix.
2
Limitations of the global approach
This section documents the performance of the global RK and QMLE. In particular, we show how it deteriorates as a function of the volatility heteroskedasticity. Finally, we diagnose the reasons and provide the solution to this relative failure. One crucial feature common to both estimators is that they behave remarkably well when volatility is constant. Indeed, the QMLE is efficient and the RK Tukey-Hanning 16 almost efficient in that case. Even the RK Tukey-Hanning 2, with an AVAR over the bound of efficiency ratio of less than 1.04, can be considered as "practically efficient". To study what happens when volatility is time-varying, is is useful for 0 ≤ r < s ≤ T to define Rs 3 Rs 2 σ du du σ r Ru r u and κr,s = ρr,s = q s Rs 1/4 (s − r) ( r σu4 du)3/4 (s − r) r σu4 du to be measures of heteroskedasticity. In the following, we will be using ρ and κ in place of ρ0,T and κ0,T . The quantity ρ was already introduced in [Barndorff-Nielsen et al., 2008] and plays an important role in the AVAR of both RK and the QMLE. [Xiu, 2010] (Figure 1, p. 241) expresses the quotient of both AVARs as a function of ρ, but doesn’t assess their respective performance when compared to the (conjectured) bound of efficiency defined as AV
(Bound) AR[0,T ]
= 8ωT
1 2
Z
T
σu3 du.
0
In contrast, the other quantity κ is introduced to investigate that relative performance. More precisely, κ is needed to express the AVAR over the bound of efficiency ratio for both approaches since the AVAR doesn’t feature the tricity, i.e. the integrated third moment of volatility, which is key in the bound of efficiency. Evidently, both measures ρ and κ are very much connected and we can actually show the following bounds. Lemma 1. We have that a.s 3/2 1/2 0 < ρr,s ≤ κr,s ≤ ρr,s ≤ 1.
5
Note that the equality ρr,s = κr,s = 1 corresponds to the parametric case. In particular, Lemma 1 implies that for any given ρ, the value κ is a.s. in a small boundary around ρ. This is of particular interest because as far as the authors know the literature on quarticity estimation (e.g. [Jacod et al., 2009], [Andersen et al., 2014], [Mancino and Sanfelici, 2012], [Jacod et al., 2013]) is far more abundant than the work on estimating tricity, which implies that in practice ρ can be estimated relatively easily, whereas κ would require more effort. From [Andersen et al., 2014] (Figure 7, p. 41), when taking a pre-averaging window equal to one minute (chosen consistently with their recommendation in Section 5.2.4 on p. 34 where the authors argue that a resonable choice of window should lie between 30 seconds and 2 minutes) we infer that the estimates of ρ are about 1/1.2 ≈ .83, 1/1.3 ≈ .77 and 1/1.6 ≈ .62 when considering respectively the bottom 10% days in terms of intraday variation of volatility, all days and the top 10% days in terms of intraday variation of volatility. Correspondingly, we will be using ρhigh = .83, ρregular = .77, ρlow = .62 to refer respectively to high, regular and low values of ρ throughout the rest of the paper. It is not surprising to find such low values on stocks data as it has been understood for several decades now that many stylized facts describes volatility as highly heteroskedastic (see, e.g., [Ghysels et al., 1996], [Engle and Patton, 2001]). (RK)
When using the optimal bandwidth, AV AR[0,T ] is defined as AV
(RK) AR[0,T ]
T
Z =ω T
3/4 σu4 du g,
0
where we have 16 g= 3
q q p 1 k•0,0 k•2,2 0,0 1,1 2 ρk• k• q + 1 + 1 + 3d/ρ and d = p (k•1,1 )2 1 + 1 + 3d/ρ2
with k•i,i constant functions of the kernel. Correspondingly, we give the formal definition of the RK loss as (RK)
L
(RK)
=
AV AR[0,T ]
(Bound)
− 1.
(2.1)
AV AR[0,T ]
Obvious computations lead to L(RK) = gκ−1 /8 − 1. If we see g as a function of ρ, L(RK) is equal to g(1)/8 − 1 in the parametric case. The parametric values for several kernels can be directly inferred from [Barndorff-Nielsen et al., 2008] (Table II, p. 1495) and the loss is equal to .25 % when considering the Tukey-Hanning 16, 3.625 % for the Tukey-Hanning 2, 6.75 % for the Parzen and 13 % for the Cubic kernel. We have that g is an increasing function of ρ, and thus the effect of ρ and κ are reverse. Next we consider the AVAR of the QMLE expressed via RT Z T 3/2 5T ω 0 σu4 du (QM LE) AV AR[0,T ] = RT + 3ω σu2 du . ( 0 σu2 du)1/2 0 6
The formula can actually be found in Box V (p. 240, [Xiu, 2010]). The corresponding QMLE loss is defined in analogy with (2.1) and can be expressed as (QM LE)
L
(QM LE)
=
AV AR[0,T ]
(Bound)
−1
AV AR[0,T ] =
5 + 3ρ2 − 1. 8κρ1/2
Figure 1 plots the feasible loss region for three typical RK, the QMLE as well as the PAE and the MSRV. It is clear that they highly loose efficiency when ρ is decreasing. The QMLE is dominated by the RK approach when ρ is low, which was observed on Figure 1 (p. 241, [Xiu, 2010]). The problem behind this potentially high loss can be intuitively explained as follows. For the RK, although the optimal tuning parameter is robust to heteroskedastic volatility, it suffers from the fact that one day6 is too long to "stay optimal". This is a very similar situation to the PAE, which also features a tuning parameter. Subsequently, [Jacod and Mykland, 2015] used block estimations to heavily reduce variance. As for the QMLE, which in contrast is designed in a parametric way yielding no choice of tuning parameter, the smaller ρ and κ are, the further the misspecified model deviates from the truth. It is by nature a different estimator, but local methods are expected to reduce the misspecification as in [Reiß, 2011]. Thus, we aim to reduce the non-parametric loss into the parametric loss using adapted local methods. As we can see on Figure 1, the QMLE will benefit the most as it is efficient in the parametric case and deteriorates more than the RK in the non-parametric case.
3
Realized Kernels
3.1
Model for the observations
We assume that the latent log-price process follows dXt = at dt + σt dWt , where Wt is a standard Brownian motion. Moreover, we assume that the 2 dimensional process Vt = (Xt , σt ) is of the form ˜ t + dJt , dVt = a ˜t dt + σ ˜t dW ˜ t is a 2 dimensional Brownian motion, the drift a where W ˜t is locally bounded, the volatility matrix σ ˜t is symmetric positive definite, locally bounded and itself an Itô process and that if 6
or one week, one month, etc.
7
(2) (2) λt is the smallest eigenvalue of σ ˜t , then inf t λt > 0 a.s.. Jt = (0, J˜t ) where J˜t is a jump process with finite activity independent from the other quantities. This rules out jumps in Xt , an issue addressed in Section 6.1. In contrast the volatility process can include jumps (see, e.g., [Todorov and Tauchen, 2011] for empirical evidence). The observations are contaminated by the microstructure noise so that we observe Zτi = Xτi + Uτi , where τi are the corresponding observation times. We assume that the data are regularly spaced satisfying τi − τi−1 = ∆. Furthermore, we assume that the noise is independent and identically distributed (i.i.d), and independent of the volatility process and Wt , with null-mean, variance ω 2 and finite fourth moment. Next the horizon time is defined as T > 0. Finally, we consider the high frequency asymptotics and assume that n goes to infinity, where T = n∆. In particular, the time between two observations ∆ goes to 0.
The assumption on observation times is unsatisfactory from a number of viewpoints. First in practice they are not regular and stochastic. Also, they can be endogenous (i.e. correlated to the price process). For instance, [Li et al., 2014] and [Fukasawa, 2010] studied the behavior of the RV under this general setting. Also, [Robert and Rosenbaum, 2011] introduced a specific endogenous model based on financial stylized facts and retrieved the integrated volatility. More recently, [Potiron and Mykland, 2016] have introduced a more general endogenous model. We consider these issues in Section 6.2. The assumptions on the noise (i.e. i.i.d and its independence with Xt ) are not always consistent with empirical studies. This was documented in [Hansen and Lunde, 2006]. Actually, [Jacod et al., 2009] only require that the noise is an independent sequence conditioned on the price process to assess the limit theory of the PAE. Moreover, [Aït-Sahalia et al., 2011] show the robustness of the TSRV when noise exhibits time series dependence. [Kalnina and Linton, 2008] modified the TSRV to estimate integrated volatility in an endogeneous and heteroskedastic noise framework. An outline of possible extensions of the local method when weakening assumptions on noise can be found in Section 6.3 to Section 6.5.
3.2
Local RK definition
We consider first the framework B = 1 where the local RK coincides with the RK. The flat-top RK takes on the form H h − 1 X γh + γ−h , K = γ0 + k H h=1 where H > 0 and the deterministic kernel k(x) is defined for x ∈ [0, 1]. The realized autocovariance is defined as γh =
n X
(Z∆j − Z∆(j−1) )(Z∆(j−h) − Z∆(j−h−1) ),
j=1
8
where h = −H, · · · , −1, 0, 1, · · · , H. In the general case B > 1, we define Ki the estimate on the ith block Ti−1 , Ti where Ti = iT /n, and assume that n/B is an integer for simplicity of exposition. We aggregate the local estimates to obtain the adapted version of the RK defined as ˜ = K
B X
Ki .
i=1
The corresponding H is B-dimensional in this case. We also adapt the jittering introduced in Section 2.6 ([Barndorff-Nielsen et al., 2008], p. 1487), i.e. for i = 0, · · · , B we assume that XTi is an average of m distinct observations on the interval (Ti − ∆, Ti + ∆).
3.3
Asymptotic theory
We define LX for σ(X)-stable convergence. See Appendix 10.1 about stable convergence for formal definitions. We further define 2 ξr,s =q
ω2 Rs (s − r) r σu4 du
2 as the noise-to-signal ratio, and refer to ξ 2 = ξ0,T in the following. Finally, we define kernel weight functions k(x) that are two times continuously differentiable on [0, 1] and Z 1 Z 1 Z 1 0,0 2 1,1 0 2 2,2 k• = k(x) dx, k• = k (x) dx, k• = k 00 (x)2 dx. 0
0
0
We recall the main asymptotic result about the RK in [Barndorff-Nielsen et al., 2008], i.e. Theorem 4 (p. 1493), in the following theorem. Theorem 1. (Theorem 4 in [Barndorff-Nielsen et al., 2008]) When H = cn2/3 , we have 1/6
n
Z K− 0
T
σu2 du
LX
→ M N 0, 4T |
Z
T
0
σu4 du ck•0,0 + c−2 (k 0 (0)2 + k 0 (1)2 )ξ 4 , {z } (RK,c,1)
AV AR[0,T ]
where M N denotes a mixed normal distribution. When k 0 (0)2 + k 0 (1)2 = 0, m → ∞, and H = cn1/2 , we have Z T Z T 0,0 LX 1/4 2 4 −1 1,1 2 −3 2,2 4 n K− σu du → M N 0, 4T σu du ck• + c 2k• ρξ + c k• ξ . 0 | 0 {z } (RK,c,2)
AV AR[0,T ]
9
A straightforward application of Theorem 1 on each block i = 1, · · · , B yields Z Ti LX (RK,ci ,1) 1/6 n Ki − σu2 du → M N 0, B 1/3 AV AR[Ti−1 ,T , i]
(3.1)
Ti−1
where ∆B = T /B is defined as the block length and ci is the tuning parameter used on the ith block and Z Ti LX (RK,ci ,2) 1/4 2 1/2 n Ki − σu du → M N 0, B AV AR[Ti−1 ,Ti ] . (3.2) Ti−1
˜ is equal to the sum of variance terms in (3.1) in Next we show that the AVAR associated to K the case of the slowest RK and (3.2) when considering the fastest RK. Theorem 2. (CLT for local RK) When H = cn2/3 with c = (c1 , · · · , cB ), we have 1/6
n
˜− K
Z
T
0
B X LX (RK,ci ,1) σu2 du → M N 0, B 1/3 . AV AR[Ti−1 ,T i]
(3.3)
i=1
When k 0 (0)2 + k 0 (1)2 = 0, m → ∞, and H = cn1/2 , we have 1/4
n
˜− K
Z
T
σu2 du
LX
→ M N 0, B
0
1/2
B X
(RK,ci ,2) . AV AR[Ti−1 ,T i]
(3.4)
i=1
Remark 1. The requirement that m → ∞ in (3.4) is due to end-effects. The reader should refer to the discussion in [Barndorff-Nielsen et al., 2008] (p. 1493) in the case B = 1. When m is fixed, the relative contribution7 to the AVAR is proportional to ξ 2 /m, as it was already the case for the RK. [Barndorff-Nielsen et al., 2009] documented that this magnitude can reasonably be ignored in practice. To determine the B tuning parameters that minimize the AVAR in (3.3) and (3.4), we can consider each local AVAR independently as they depend on one distinct tuning parameter. To keep short8 , we treat only the more interesting case of the fastest RK with convergence rate n1/4 obtained in the convergence (3.4). For that purpose, we follow Section 4.3 in [Barndorff-Nielsen et al., 2008] (p. 1494-1496) and consider that p (H (1) , · · · , H (B) ) = (c1 ξ0,T1 , · · · , cB ξTB−1 ,T ) n/B. The optimal values are then shown to be equal to s q k•1,1 ∗ ci = ρTi−1 ,Ti 0,0 1 + 1 + 3d/ρ2Ti−1 ,Ti . k• 7
The corresponding expression can be found in the second term in (10.5). The following can also be adapted in the special case with n1/6 , considering Section 4.2 in [Barndorff-Nielsen et al., 2008] (p. 1493-1494), with no further difficulties. 8
10
The corresponding AVAR is equal to (RK,c∗i ,2) AR[Ti−1 ,T i]
AV
Z
Ti
3/4
σu4 du
= ω ∆B
g(ρTi−1 ,Ti ),
Ti−1
where g is considered here as a function of ρ. ˜ goes Finally, we show that when choosing the optimal values, the AVAR associated to K (Bound) to AR[0,T ] when B → ∞. The constant g(1)/8, when normalized to g(1)/8 − 1, corresponds to the parametric loss and depends solely on the shape of the kernel. The rationale of such result is that when B increases we have the volatility roughly constant on each block and thus g(1) AV 8
B X
B
1/2
(RK,c∗i ,2) AR[Ti−1 ,T i]
AV
=
i=1
≈
B X i=1 B X
ωB
1/2
Z
Ti
∆B
3/4 σu4 du g(ρTi−1 ,Ti ),
Ti−1 3/2
ωB 1/2 ∆B σT3 i−1 g(1).
i=1
Next we obtain by a Riemann sum argument that B X
ωB
1/2
3/2 ∆B σT3 i−1 g(1)
= ωT
1/2
i=1
B X
∆B σT3 i−1 g(1),
i=1 T
≈ ωT 1/2
Z
σu3 dug(1),
0
which can be expressed as sition.
g(1) AV 8
(Bound)
AR[0,T ]
. The formal result is given in the following propo-
Proposition 3. (Convergence of local RK AVAR) When B → +∞, we have B
1/2
B X
(RK,c∗ ,2) a.s.
i AV AR[Ti−1 ,T → i]
i=1
g(1) (Bound) AV AR[0,T ] . 8
(3.5)
Remark 2. In particular, the asymptotic loss for B → +∞ is g(1)/8 − 1, which is always smaller than L(RK) = gκ−1 /8 − 1 when using the RK with B = 1. The proof of this statement can be found in Appendix 10.6.
4
QMLE
In analogy with Section 3, we provide in this section a definition of the local estimator and equivalent asymptotic results in the case of the QMLE.
11
4.1
Local QMLE definition
We consider first the setting B = 1 where the local QMLE is equal to the global QMLE. We recapitulate the parametric approach, and introduce the quasi-estimator. [Aït-Sahalia et al., 2005] studied the parametric case assuming that the latent efficient log price process satisfies dXt = σdWt .
(4.1)
The observed log returns Yi = Zτi − Zτi−1 are following a MA(1) process in that situation. If we postulate that the noise distribution is Gaussian, then the log likelihood function for Y = (Y1 , · · · , Yn )0 can be expressed as n 1 1 l(σ 2 , ω 2 ) = − log det(Ω) − log(2π) − Y 0 Ω−1 Y, 2 2 2 where
σ 2 ∆ + 2ω 2
−ω 2
0
−ω 2
σ 2 ∆ + 2ω 2
−ω 2
0 .. .
−ω 2 .. .
σ 2 ∆ + 2ω 2 .. .
0
···
0
Ω=
··· 0 .. .. . . .. . 0 .. . −ω 2 −ω 2 σ 2 ∆ + 2ω 2
.
We define the corresponding MLE as (b σ2, ω b 2 ) and the estimator of integrated volatility as Q = Tσ b2 . When the log price Xt features stochastic volatility and drift as in Section 3.1 and/or when the noise is not normally distributed, (b σ2, ω b 2 ) is seen as the QMLE. When B > 1, we define the aggregate version of the QMLE as B X
˜= Q
Qi ,
i=1
where (b σi2 , ω bi2 ) are defined as the MLE on the ith block and Qi = ∆B σ bi2 .
4.2
Asymptotic theory
We provide in the following theorem the main result in [Xiu, 2010] (Box V, p. 240). Theorem 4. (Theorem 5 in [Xiu, 2010]) We have !! ! RT (QM LE) AV AR 0 0 n1/4 Q − 0 σu2 du LX [0,T ] → MN , , 1/2 2 2 0 n ω b −ω 0 2ω 4 + cum4 (U ) where we recall that AV
(QM LE) AR[0,T ]
=
5T ω RT 0
RT 0
4 du σu
1/2 + 3ω
2 du σu
the fourth cumulant of U0 . 12
RT 0
3/2 σu2 du . Also, cum4 (U ) refers to
An obvious application of Theorem 4 for each block i = 1, · · · , B gives us that Z Ti LX (QM LE) 1/4 n Qi − σu2 du → M N 0, B 1/2 AV AR[Ti−1 ,Ti ] . Ti−1
˜ can be decomposed as a sum We show in the following theorem that the AVAR associated to Q of local AVARs scaled by B 1/2 . Theorem 5. (CLT for local QMLE) We have ! R ˜ − T σu2 du n1/4 Q 0 LX 0 → M N , P B 0 n1/2 B −1 i=1 ω bi2 − ω 2
B 1/2
(QM LE)
PB
i=1 AV AR[Ti−1 ,Ti ]
!!
0 2ω 4 + cum4 (U )
0
˜ goes to AV AR(Bound) when B increases. We show now that the AVAR associated to Q [0,T ] Proposition 6. (Convergence of local QMLE AVAR) When B → +∞, we have B
1/2
B X
(QM LE) a.s.
(Bound)
AV AR[Ti−1 ,Ti ] → AV AR[0,T ]
.
(4.2)
i=1
5
AVAR reduction
From an empirical viewpoint, Proposition 3 and Proposition 6 are only partly informative since they consider the case B → ∞. We aim to deliver to the econometrician insights on the error magnitude when B is fixed. In that case, the right quantities to assess uncertainty are the AVARs provided in Theorem 2 and Theorem 5, but the formula are rather involved. In this section, we propose to express those formula on a realistic example and investigate the practical question of how fast the convergence in (3.5) and (4.2) is. In addition, we document that there is almost no further theoretical gain considering B bigger than 8. For that purpose, we define the fastest RK AVAR and the loss when B is fixed as (RK)
AV ARB
= B 1/2
B X
(RK)
(RK,c∗ ,2)
(RK)
i AV AR[Ti−1 ,T , LB i]
=
i=1
AV ARB
(Bound)
− 1,
AV AR[0,T ]
and the corresponding QMLE quantities as AV
(QM LE) ARB
=B
1/2
B X
(QM LE)
AV
(QM LE) AR[Ti−1 ,Ti ]
i=1 (RK)
(QM LE)
,
(QM LE) LB
=
AV ARB
(Bound)
− 1.
AV AR[0,T ]
The values of LB and LB are plotted as a function of ρ in the upper panels of Figure 2 for a realistic continuous U-shape with one jump volatility model where the sample mean .77
13
.
corresponds to a regular value of ρ9 . As we can see, the convergence in (3.5) is very fast. When ρ = .77, the QMLE loss is almost divided by 4 when considering 2 blocks instead of 1, with (QM LE) (QM LE) (RK) L1 ≈ 16% and L2 ≈ 5%. In the same setting the RK loss goes from L1 ≈ 16% (RK) to L2 ≈ 8%. If we consider the lower value ρ = .62, the QMLE losses for the first four values (QM LE) (QM LE) (QM LE) (QM LE) of B are equal to L1 ≈ 35%, L2 ≈ 19%, L3 ≈ 11% and L4 ≈ 6%. The (RK) (RK) (RK) (RK) corresponding RK values are L1 ≈ 28%, L2 ≈ 17%, L3 ≈ 11% and L4 ≈ 8%. This (RK) suggests that the convergence to the loss bounds (which we recall to be equal to L∞ = 3.625% (QM LE) when considering the RK Tukey-Hanning 2 and L∞ = 0% for the QMLE) is very fast for both approaches. Actually for any reasonable ρ taken to be between .5 and 1, choosing B = 8 (RK) (QM LE) is big enough for the loss to stay within L∞ + 4% (or L∞ + 4%), and it is usually far below this threshold with regular and high values of ρ. This indicates that in most realistic settings, there is only very small marginal gain to use a B bigger than 8. Moreover, we can see on the left lower panel in Figure 2 that when ρ is relatively high, the QMLE outperforms the RK approach when considering B = 1, and the gap gets bigger as we increase B. In contrast when ρ < .77, the QMLE is outperformed when considering only one block, but eventually makes it back when incrementing the value of B. The actual value required to fill up the gap is getting bigger as ρ decreases. This suggests that both approaches are complementary to each other. Finally, the lower left panel in Figure 2 documents that both approaches dominate the PAE regardless of the number of blocks.
6
Robustness
We discuss in this section possible applications of the local method when weakening the assumptions on the price process, the observation times and the microstructure noise. Some arguments are purely heuristic.
6.1
Jumps
We allow in this section the price Xt to feature finite activity jumps. As far as the authors know, the asymptotic theory of the RK remains unknown, although [Barndorff-Nielsen et al., 2008] (Section 5.5, pp. 1508-1510) suggest that the RK stays consistent with the same rate of convergence. The global QMLE is robust to the presence of jumps as shown in Theorem 2 (p. 10, [Ait-Sahalia and Xiu, 2016]). Accordingly, the proof of Theorem 5 can be adapted in this more general framework and thus the local QMLE is robust to jumps in Xt . An alternative possibility for both methods consists in removing first the jumps as in, e.g. [Mancini, 2001]. 9
Useful details on this model can be found in Section 7.
14
6.2
Stochastic and endogenous observation times
We follow and adapt here the analysis of [Barndorff-Nielsen et al., 2008] (Section 5.3, pp. 15051507). This is also very much connected to the quadratic variation of time (see, e.g., Assumption A on p. 1939 in [Mykland et al., 2006]). We assume that the observation times are of the form τi = Γi∆ , (6.1) Rt 2 ˜ t a strictly positive and cadlag ˜ u du, with Γ where Γt is a stochastic process satisfying Γt = 0 Γ ˜ t = XΓt so that for X ˜ t the process. We can then construct a (change of time) process X observations are regular. If we assume that Xt and Γt are adapted to a filtration Ft , then by Dambis Dubins-Schwarz theorem (see, e.g., Theorem 1.6 on p. 181 in [Revuz and Yor, 2013]) ˜ Γ − [Z] ˜Γ we have on each block [Z]Ti − [Z]Ti−1 = [Z] . The consequence of this analysis Ti Ti−1 is two-fold. First, we can apply local estimators even though the local observations are not regular. Second, it works even if the number of observations differ from one block to another. This works for both approaches. If the endogeneity of observation times is such that (6.1) doesn’t hold, it is unsure what would happen to the global RK. In contrast it might be possible to add to the misspecification of the QMLE information on endogenous observation times. This is mentioned in Section 4.3.2 in [Xiu, 2010] for the simple case where observation times are i.i.d and independent from the price process. Actually, we can change the model to incorporate more general settings such as the HBT model introduced in [Potiron and Mykland, 2016] as long as we can exhibit a parametric approximate model.
6.3
Heteroskedastic noise
[Barndorff-Nielsen et al., 2008] mention on p. 1492 that the proofs of Theorem 3 in their work can be adapted to account for heteroskedastic noise. As the proofs of Theorem 2 (CLT for local RK) mostly build on that proof, it means that they can be adapted to the heteroskedastic noise case. For the QMLE we can add another misspecification about the noise variance.
6.4
Serial-dependent noise
One can refer to Section 5.4 (pp. 1507-1508) for the global RK case. The authors relax the assumption that Uτi is independent from Uτj , and they still obtain the same rate of convergence (although with a different asymptotic distribution) when using the fastest kernels. Accordingly, if we assume that Uτi is serial-dependent up to a finite lag, this implies that there is no problem of correlation between block terms in the proof and thus we obtain the same rate of convergence with the local RK. For the QMLE, we can change the model to include the information about the serial-dependence of the microstructure noise. 15
6.5
Endogenous noise
We follow Section 5.5 (p. 1508) in [Barndorff-Nielsen et al., 2008]. We introduce a simple model of endogeneity ¯ H X Uτi = βh (Xτi−h − Xτi−1−h ) + U¯τi , h=0
with U¯t is a white noise independent of Xt . The global RK is robust to this kind of model, and similarly the local RK will be robust as the dependence between blocks will be controlled by ¯ For the QMLE, it would again be possible to include the distribution of the endogenous H. noise into the model.
7 7.1
Numerical study Goal of the study
˜ B and the local QMLE Q ˜B In this section, we examine the performance of the local RK K in a finite sample context for several values of B. We carry out Monte Carlo simulations for three different volatility models having realistic values of ρ. We then check whether asymptotic approximations of several statistics correctly kick in to illustrate to what extent the theory is affected when the sample data is finite of size n. First, we assess the central limit theories for the two infeasible statistics R ˜ B − T σu2 du L K ˜B K 0 → N (0, 1), Zn = q (RK) AV ARB and ˜ ZnQB
R ˜ B − T σ 2 du L Q u 0 =q → N (0, 1), (QM LE) AV ARB
for B = 1, 2, 4, 6, 8 in accordance with Section 5 where we showed that there is almost no further theoretical gain choosing a B bigger than 8. In particular, we investigate how increasing B affects the normal approximation of these two studentizations for several levels of sampling. Second, we compare the relative performance of the local RK, the local QMLE, and the adaptive version of the PAE defined in [Jacod and Mykland, 2015] that we denote by P . To do so, we ˜B, Q ˜ B , P } the empirical loss defined as report for any estimator Σ ∈ {K " R T 2 2# (Σ − σs ds) 0 ˜ Σ = EM − 1, L (Bound) AV AR[0,T ]
16
where EM [X] denotes the sample mean of X based on the M Monte Carlo simulations and 1 RT (Bound) we recall that AV AR[0,T ] = 8ωT 2 0 σu3 du is the bound of efficiency for the asymptotic (Σ) variance. We also compute the sample mean of the theoretical loss LΣ B = EM LB , where Σ (Σ) ∈ {(RK), (QM LE), (P )}. Note that LΣ B are close to the mean loss E LB if M is large ˜ Σ , which gives us a simple criterion to compare the estimators, enough. The empirical loss L can be decomposed as ˜Σ L B =
LΣ B |{z}
+
theoretical loss due to the finiteness of B
7.2
Σ ˜Σ (L B − LB ) | {z }
.
loss due to the finite sample n
Simulation design
We implement the above procedures for M = 10, 000 Monte Carlo simulations of intraday returns on the time interval [0, T ], T = 1/252 year (that is T = 1 working day). One working day is in turn subdivided in 23, 400 seconds corresponding to 6.5 hours of trading activity. For each model, the corresponding trajectories are generated from a classical Euler scheme based on n = 46, 800 intervals, that is one observation every 0.5 second. We simulate 1000 more observations prior and post main trading period in order to compute properly different Pn−H (Z∆j − γh that are necessary for the RK. Indeed, using their truncated versions γ˜h = j=H+1 Z∆(j−1) )(Z∆(j−h) −Z∆(j−h−1) ) tend to generate a non-negligible bias as pointed out in [Xiu, 2010] (see Table 2 on p. 243), so that we prefer to overcome this issue with a few minutes of outof-sample data. Finally, we also use observations based on sparsely sampled versions of the original trajectories, for a number of intervals taking on the values 23, 400, 11, 700, and 5, 850, the latter corresponding to having one observation every 4 seconds, which still corresponds to a fairly heavily traded stock. We do not report the results for lower frequencies, but the theory still kicks in for sparser samplings too. We consider three stochastic volatility models to simulate the intraday returns, along with two levels of mean noise-to-signal ratios ξ 2 = 0.001 and ξ 2 = 0.0002. The two values are empirically corroborated in [Hansen and Lunde, 2006], where the authors report empirical values of ξ 2 for several stocks ranging from 0.00004 to 0.006 (see Table 3 on p. 147). We introduce now the volatility models, which have been designed to reflect different average values of ρ ranging from 0.89 (corresponding to a high value) for Model 1, 0.77 (corresponding to a regular value) for Model 2 to 0.64 (corresponding to a low value) for Model 3 as reported on Table 1. The three models can all be represented as a Heston stochastic volatility model (SV) with U-shape intraday volatility pattern and a possible jump whose occurrence time is picked up uniformly randomly on a subinterval [T (0) , T (1) ] of [0, T ]. Except for the jump component, this general model is directly inspired from Model 4 in [Andersen et al., 2012], and [Xiu, 2010] (see Section 6.1 on p. 242). We assume that the log price process Xt and the volatility process σt follow the
17
dynamic dXt = µdt + σt− dWt , σt = σt,SV σt,U , with 2 2 ¯ t, dσt,SV = α(¯ σ 2 − σt,SV )dt + δσt,SV dW
σt,U = C + Ae−at/T + De−b(1−t/T ) − βστ −,U 1{t≥τ } . ¯ t are two standard Brownian motions with dhW, W ¯ it = φdt. Note that σt,U Here Wt and W jumps at time τ , that we define as a uniform random variable on [T (0) , T (1) ]. β controls the size of the jump. The choice of making σt,U jumps, instead of the global volatility σt , is merely a way to ensure that σt remains positive. Finally, the drift parameter µ and the stochastic volatility part remain constant for each model. The corresponding parameters are chosen consistently with the ones from Section 6.1 (p. 242) in [Xiu, 2010], that is µ = 0.03, α = 5, σ ¯ 2 = 0.1, 2 is sampled from a Gamma distribution of parameters δ = 0.4, φ = −0.75. Finally, σ0,SV 2 2 2 (2α¯ σ /δ , δ /2α), which corresponds to the stationary distribution of the CIR process. Model 1: SV + steep U (HIGH ρ) The first model does not incorporate the jump in volatility, i.e. we set β = 0. The parameters of the U-shape part are set to generate a steep sloap, which in turn lowers somewhat the value of ρ compared to Model 4 in [Andersen et al., 2012] where we find that the corresponding mean ρ value is too high to be consistent with ρhigh (which we recall is the empirical high value reported in Section 2). With C = 0.83, A = 1.26, D = 0.42, a = 10, b = 10, this model presents a sample mean value of ρmean = 0.89, which is slightly bigger than ρhigh = 0.83. We are conservative in this first model to show what happens to the local method in a very unlikely bad situation for AVAR reduction, i.e. a very high ρmean . Model 2: SV + normal U + 1 Jump (REGULAR ρ) In this model, the U-shape intraday volatility parameters are set to values that are consistent with those chosen in Model 4 in [Andersen et al., 2012], that is C = 0.75, A = 0.25, D = 0.89, and a = b = 10. The jump size parameter is set to β = 0.5, that is a jump of 50% in size at the random time τ . We set T (0) = 0, T (1) = T and thus let τ take values on the whole time interval. Such friction in the volatility process leads to lower values of ρ and κ compared to Model 1, with a sample mean equal to ρmean = 0.77. This is thus a very realistic model in terms of measure of heteroskedasticity as ρmean = ρregular . It is also possible to obtain ρmean = ρregular in an alternative continuous volatility model with normal U by taking a 2-factor stochastic volatility 18
model (SV2F) as in [Barndorff-Nielsen et al., 2008] (Section 6.2, p. 1511), with parameters tuned such that the trajectories are rough enough. The results from Section 7.3 would be similar. As a byproduct, Model 2 shows that a jump in the volatility can lower significantly the measures of heteroskedasticity ρ and κ. Model 3: SV + steep U + 1 Jump (LOW ρ) This last model is a combination of the first two models. U-shape volatility parameters are set to give the same sloap as for Model 1, and the jump size parameter is set to β = 0.5 as in Model 2. However, to keep the positivity of σt we restrain the values of the jump time and set T (0) = 0.05T , T (1) = 0.7T . This third scenario is designed to reach volatility paths presenting an heteroskedasticity with a low value of ρ and we report the sample mean ρmean = 0.64, which is almost equal to ρlow = 0.62. We are in the situation where the global estimators should deviate the most from the bound of efficiency. We now turn to the estimation procedure. First, to estimate K on [0, T ], we work with the Tukey-Hanning 2 kernel as for the numerical study in [Barndorff-Nielsen et al., 2008] (Section 6, pp. 1510-1513) since it requires reasonable bandwidth sizes H, which makes the estimator computable in an acceptable amount of time. Moreover, we do not need too many out-of-period data to compute γh . We focus on the feasible adaptive estimator, that is, first we pre-estimate √ b that we use in a second step to implement the the oracle quantity H ∗ = c∗ ξ n by some H, RK estimator. In view of the expressions of ρ and ξ, estimating H ∗ amounts clearly to preRT RT estimate the triplet (ω 2 , 0 σs2 ds, 0 σs4 ds). The noise variance is estimated by the classical 2 P formula ω ˆ 2 = (2n)−1 nj=1 Z∆(j+1) − Z∆j . The other two quantities are replaced by their pre-averaged versions (formulas 3.6 and 3.14 in [Jacod et al., 2009]) with the tuning parameter √ √ θ = k ∆ arbitrary set to T , and the triangular kernel g(x) = x ∧ (1 − x). In practice, we find b in terms of RMSE, so that that the realized kernel is not very sensitive to the dispersion of H it is not absolutely necessary to get very accurate pre-estimators. Such robustness proved to be crucial in our procedure as it is well known that estimators for the quarticity can be unstable in finite sample when the amount of data is not large. On blocks [Ti−1 , Ti ], we redo the same ˜ B by aggregation. procedure and obtain the corresponding K Next we compute the QMLE by a numerical maximization of the quasi-likelihood function ˜B. given in Section 4.1. This gives us Q and the local estimates Q Finally, for comparison purpose, we implement P following closely the adaptive procedure described in [Jacod and Mykland, 2015], using again the triangular kernel. Accordingly, we pre-estimate the integrated volatility and the noise variance on each block by the same method as for the RK. We obtain then a local estimator for the tuning parameter on each block that we use to compute P .
19
7.3
Results and discussion ˜
˜
We first report the finite sample properties of the two statistics ZnQB and ZnKB in Table 2 and Table 3 for Model 2 under the noise level ξ 2 = 0.001. We can see that the results are promising at any level of sampling, as the RMSE of the Z-statistic does not suffer much from ˜ the increasing in the number of blocks, especially for the QMLE for which the RMSE of ZnQB stays closely in line with ZnQ . The results also indicate that the asymptotic theory eventually kicks in for all the estimators as the standard deviation of the statistics decreases to 1 when the sampling frequency increases. Nevertheless, we can see a slight overdispersion compared to what was reported in [Xiu, 2010] and [Barndorff-Nielsen et al., 2008]. For the QMLE, this is due to the strong difference with the noise-to-signal ratio that was used in [Xiu, 2010] where ξ 2 ≈ 0.06. Concerning the RK, the difference in the studentization is due to the fact that the (RK) authors in [Barndorff-Nielsen et al., 2008] do not employ AV AR[0,T ] for the studentization, but a non-asymptotic variance as documented in Section 4.4 (pp. 1496-1498) of their work. ˜ Σ for two levels of We then report the theoretical loss values LΣ and the empirical loss L B
B
sampling n = 23, 400 and n = 46, 800, and two levels of noise-to-signal ratios ξ 2 = 0.001 and ξ 2 = 0.0002 in Table 4. First, we can note that the theoretical loss behaves in a very similar way as in Section 5 for the three models. In particular, this implies that neither the SV part nor the steep U component seems to have a bad impact for the local method. Also, one can see that when choosing B = 8 the theoretical loss is at most 3.2% more than the parametric loss (which we recall to be equal to 3.625 % for the RK Tukey-Hanning 2 and 0% in the case of the QMLE), which are in line with the threshold found in Section 5. Second, the loss due to the finite sample behaves in a very proper way when B increases. For any setting and both estimators, it is roughly constant as a function of B, although suffering more when ρ is higher and n smaller. This is perfectly in line with the findings in Table 2 and Table 3. In particular for Model 2 and Model 3, the finite sample effect is almost not moving as B increases. For Model 1, this is basically the same picture for the QMLE, but the empirical loss seems to stagnate between B = 4 and B = 8 when using the RK. This is not surprising as the RK suffers more from the finite sample effect than the QMLE as seen in Table 2 and Table 3. It seems to indicate that for those specific cases there is room for improvement of the local ˜ in finite sample rather than the rule-of-basis B = 8. method by possibly using an adaptive B This is true but the reader should keep in mind that Model 1 is rather improbable in terms of ρ with a very high value. We can see that the gain is very small in this setting anyway, and that choosing an adaptive B for such a small gain (here less than 1 %) could actually deteriorates ˜ the loss in practice due to the pre-estimation part of B.
20
Third, note that the decomposition ˜Σ L B +1 ≈
LΣ B +1 | {z }
×
Due to the theoretical loss
VarM ZnΣ | {z }
,
(7.1)
Due to the finite sample
where VarM [X] denotes the sample variance of X based on the M Monte Carlo simulations, is numerically well-verified and gives an intuitive interpretation of the main sources of deviation ˜ 2 on Model 2, with n = 23, 400, ξ 2 = 0.001. from the bound in practice. For instance, consider Q In that case, the previous decomposition (7.1) gives 0.218 + 1 = 1.218 for the left hand side, and (0.124 + 1) × 1.0482 ≈ 1.220 for the right hand side which is very close to the other value indeed. Finally, this simulation study indicates that the local version of RK and the QMLE performs very well in practice, with the QMLE slightly more robust to the values of n and B as free of tuning parameters. In the global case, this was one known advantage of the two methods compared to the alternatives. We can see in Table 4 that with both realistic levels of noise, the adapted version of the pre-averaging estimator is very far off the target, although the theoretical loss is very close to the theoretical losses of RK and the QMLE. A careful investigation shows that such deviation in the empirical loss is mainly due to a large bias compared to the bound of efficiency when the noise-to-signal ratio decreases. Moreover, although we haven’t implemented the spectral estimator for proper comparison in the current simulation design, the simulation study on pp. 4572-4573 in [Altmeyer and Bibinger, 2015] suggests that despite its efficiency, its behaviour in finite sample is not as appealing as the local estimates of the RK and QMLE approaches.
8
Empirical illustration
We conclude this study by the application of our method on transaction log prices of Intel Corporation (INTC) shares recorded on the NASDAQ stock market over the year 2015. We exclude January 1, the day after Thanksgiving and December 24 which are less active, thus this leaves us with 250 trading days of data. Moreover, we only keep transactions that were carried out between 9:30am and 4pm. Finally, we consider the data in tick time, for an average of 6, 139 daily trades. The most active days include more than 15, 000 trades. We first estimate the theoretical gain in AVAR of the local estimators compared to the global ones with B = 1, 2, 4, 6, 8. To do so, we estimate the integrated volatility, the integrated quarticity and ρ and ρTi−1,Ti for i = 1, · · · , B and we plug the results in the formulas (QM LE)
AV ARB
(QM LE)
(RK)
and
AV AR1
AV ARB
(RK)
AV AR1 21
which can be expressed as functions of those variables. We consider the above ratios to assess the performance of the local method and not the losses as in the rest of the paper because as far RT as we know, there is no known estimator of tricity 0 σs3 ds in noisy observations, and thus the (Bound) bound of efficiency AV AR[0,T ] cannot be estimated. In order to get stable values of ρ and ρTi−1,Ti , and for the sake of brevity, we opt for a simple method and we choose to estimate both the integrated volatility and the integrated quarticity by pre-averaging the log-returns on 30 second windows10 as one method used in [Andersen et al., 2014]. Although such a choice could be discussed more thoroughly, designing a more efficient estimator of ρ is beyond the scope of this paper. Also, we do not cap the block estimates of ρTi−1 ,Ti by 1 because this leads to a drastic downward bias in the numerical results on simulated data, especially when the actual value is close to 1 (e.g. on small blocks). Across the 250 days, different values of B and the corresponding blocks [Ti−1 .Ti ], the value 1.1 was crossed only a few times. Table 5 shows the average values of those estimates over the year. We get a global estimate of ρ around 0.74, which is very close to ρregular . Across the year the estimates of ρ ranged from around 0.3 to 1, and actually crossed 1 for two days where it reached 1.03 and 1.04. When B increases, we find as expected that ρbB , the mean value across days and blocks [Ti−1 , Ti ] of ρTi−1 ,Ti estimated values, also increases to reach a value of 0.86 for 8 blocks. Accordingly, the mean estimated ratios of AVAR decreases from 1 to 0.9 for the QMLE, and from 1 to 0.92 for the RK. Moreover, we find that those ratios are consistently smaller than 1 for the 250 days and different values of B bigger than 1, so that the local method never deteriorates the AVAR of the estimator. Note that the same ratios for Model 2 in our simulation study range from 1 to 102.4/121.5 = 0.84 for the QMLE and from 1 to 105.6/118.2 = 0.89 for the RK. The slight disparity between the empirical study and Model 2 can be explained in several ways. For example, it is likely that we still under-evaluate the difference between ρ and ρTi−1 ,Ti , or that the theoretical model is a little too optimistic about how fast ρTi−1 ,Ti gets close to 1 on local blocks. In any case, the results are approximately in line with what was expected, and present a substantial gain in terms of AVAR for both the QMLE and the RK. Second, we implement the estimators for each day, and we report a few statistics. The last ˜ B − Q and column in Table 5 shows the empirical correlation between the correction terms Q ˜ B − K for several values of B. The positive correlation indicates that the local method tends K to correct the global estimates in the same direction for both the QMLE and the RK. Moreover, increasing the number of blocks B amplifies the phenomenon. Table 6 shows the empirical mean and standard deviation of the 10 estimators. Note that the main source of randomness in the 10
The advised reader will have seen that this window size doesn’t correspond to the size used to report the empirical values from [Andersen et al., 2014] in the introduction and Section 2, which is one minute. Here we chose 30 seconds because we found that the estimator of ρ was less biased with this setting on numerical simulations performed on Model 2.
22
estimators being the target value itself, it is not suprising to find very close values for their respective standard deviation. We have reported in the last column the correlation between each estimator and the global QMLE. Again, we find results very close to 1 for all estimators. One should note that the global RK is less correlated to the QMLE than all the local QMLE ˜ B . This indicates that the order of magnitude of the correction induced by the local method Q is smaller than the difference between the two global estimators. Finally, Figure 3 shows daily ˜ 8 , K and K ˜ 8 in May 2015. We can see that the 95% theoretical confidence intervals for Q, Q confidence intervals for the local estimators are never longer than their global counterparts, and are often much shorter. Moreover, over the year the global and the local estimates confidence intervals always overlap, corroborating the fact that the local estimates are in line with their global versions.
9
Conclusion
In this paper, we have given a deep look into the efficiency of local methods to estimate integrated volatility. We have seen that for the RK and the QMLE, we can reduce the AVAR when B is fixed and retrieve the parametric loss when B goes to infinity. The theoretical gain is mostly preserved when looking at finite sample results. Given how simple to implement the methodology is, we expect that it will be very helpful for practitioners. Local methods can actually be applied to other estimators and problems, such as the highfrequency covariance estimation, the estimation of quarticity, the leverage effect, the volatility of volatility, etc. We expect similar gains for several popular estimators (such as the QMLE considered in [Aït-Sahalia et al., 2010] to estimate integrated quarticity) in those problems as well. In addition, a direction towards future research could be to look for an optimal B in finite sample.
10 10.1
Appendix: proofs Stable convergence
We define the canonical filtration generated by Xt as Ft = σ({Xu }0≤u≤t ). We introduce the definition of stable convergence, which is slightly stronger than the usual convergence in distribution and needed for the purpose of inference such as convergence of studentization and the construction of a confidence interval. Definition 1. (Stable convergence) Let Zn be a sequence of FT -measurable random variables. We say that Zn converges stably in distribution to Z as n → ∞ if Z is measurable with respect
23
to an extension of FT so that for all A ∈ FT and for all bounded continuous functions f , E [1A f (Zn )] → E [1A f (Z)] as n → ∞. Note that the continuity of f refers to continuity with respect to the Skorokhod topology of D[0, T ] (i.e. the space of all cadlag functions on [0, T ]). Nonetheless, since all our limits are in C[0, T ] (i.e. the space of all continuous paths on [0, T ]), we can also use continuity given by the sup-norm. Chapter VI of [Jacod and Shiryaev, 2003] can be consulted as a reference. Other definitions of stable convergence are available in [Rényi, 1963], [Aldous and Eagleson, 1978], Chapter 3 (p. 56) of [Hall and Heyde, 1980], [Rootzen, 1980] and Section 2 (pp. 169-170) of [Jacod and Protter, 1998].
10.2
Removing the drift and bounding volatility
Since we want to prove stable convergence, in view of the local boundedness of σt , and because (2) inf t∈(0,T ] λt > 0, we can without loss of generality assume that for all t ∈ [0, T ] there exists ˜t we have some nonrandom constants σ and σ such that for any eigenvalue λt of σ 0 < σ < λt < σ,
(10.1)
by using a standard localization argument such that the one that was used in Section 2.4.5 of [Mykland and Zhang, 2012]. In particular, this implies that the volatility and the volatility of volatility both satisfy (10.1). One can further suppress µt as in Section 2.2 (pp. 1407-1409) of [Mykland and Zhang, 2009], and act as if Xt is a martingale.
10.3
Proof of Lemma 1 2/3
We first show the left hand side inequality, that can be reformulated as κr,s ≥ ρr,s . Note that by an immediate application of Hölder’s inequality we have 2/3 Z s Z s 2 1/3 3 σu du ≤ (s − r) σu du . r
r
Thus, Rs
σu2 du 1/2 Rs (s − r)1/2 r σu4 du R s 3 2/3 σ du 2/3 r u ≤ 1/2 = κr,s . Rs 1/6 4 (s − r) σ du r u r
ρr,s =
For the right hand side inequality, we first consider the domination Z s 1/2 Z s 1/2 Z s 3 2 4 σu du ≤ σu du σu du , r
r
r
24
which is obtained by Cauchy-Schwartz’s inequality. Then we inject this expression in κr,s and we get Rs 3 σ du r u κr,s = 3/4 Rs (s − r)1/4 r σu4 du R s 2 1/2 σ du 1/2 r u ≤ 1/4 = ρr,s . Rs 1/4 4 (s − r) σ du r u
10.4
Proof of Theorem 2
We need to introduce some notation. We consider the block constant processes defined as c˜t = ci where Ti−1 ≤ t < Ti , ρt = ρTi−1 ,Ti where Ti−1 ≤ t < Ti . ξt2 = ξT2 i−1 ,Ti where Ti−1 ≤ t < Ti . Conditions (3.3) and (3.4) in Theorem 2 can be re-expressed respectively as Z T Z T LX 1/6 ˜ 2 1/3 0 2 0 2 4 n K− σu du → M N 0, 4B ∆B σu4 (˜ cu k•0,0 + c˜−2 (k (0) + k (1) )ξ du , u u 0 0 Z T Z T LX 1/4 ˜ 2 1/2 4 0,0 −1 1,1 2 −3 2,2 4 n K− σu du → M N 0, 4B ∆B σu (˜ cu k• + c˜u 2k• ρu ξu + c˜u k• ξu du . 0
0
We also define the kernels for general processes At and Ct as H h − 1 X k K(A, C) = γ0 (A, C) + γh (A, C) + γ−h (A, C) , H h=1
where the realized autocovariance is defined as γh (A, C) =
n X
(A∆j − A∆(j−1) )(C∆(j−h) − C∆(j−h−1) ),
j=1
with h = −H, · · · , −1, 0, 1, · · · , H. We further define Ki (A, C) the estimate on the ith block and we aggregate the local estimates to define the adapted version of K(A, C) as ˜ K(A, C) =
B X
Ki (A, C).
i=1
We follow the same line of reasoning as in the proof of Theorem 4 (p. 1530, [Barndorff-Nielsen et al., 2008]). Accordingly, we just need to show an adapted version of Theorem 3 (p. 1492). Theorem 2 will then follow from Lemma 1 (p. 1523) and Proposition 5 (p. 1524). From now on, we aim to show the adapted version of Theorem 3 (p. 1492, [Barndorff-Nielsen et al., 2008]) which is stated in the following. 25
Theorem 7. (Adapted version of Theorem 3 in [Barndorff-Nielsen et al., 2008]) We ˜ where H ˜ = nα with α ∈ {1/2, 2/3}. As n → ∞ we have that that H = cH, r Z T Z T n ˜ LX 2 0,0 1−α du , σu4 c˜−1 σu du → M N 0, 4k• B ∆B K(X, X) − u ˜ H 0 0 Z T p LX 2 1,1 α ˜ K(X, ˜ ˜ H( U ) + K(U, X)) → M N 0, 8ω k• B σu2 c˜u du , 0 s B X ˜2 H L ˜ )→ c2i . K(U N 0, 4ω 4 (k 0 (0)2 + k 0 (1)2 )B 2α−1 n i=1
assume
(10.2) (10.3) (10.4)
˜ ) is equivalent to In addition, when k 0 (0)2 + k 0 (1)2 = 0, the asymptotic variance of K(U ˜ 3 )B 3α−1 k•2,2 4ω (n/H 4
B X
c3i
B B X 1,1 X √ ˜ k˜•1,1 (ci , ci−1 ) ci ci−1 , (10.5) ci + + (B /Hm) k• α
i=1
where k˜•1,1 (c1 , c2 ) =
R1 0
i=1
i=2
˜ 2 /(mn) → 0 k 0 (x)k 0 (ax)dx with a = min(c1 , c2 )/ max(c1 , c2 ) and if H s B X ˜3 H LX ˜ K(U, X) → N 0, 4ω 4 k•2,2 B 3α−1 c3i . (10.6) n i=1
p ˜ X) − To show (10.2), we consider the continuous interpolated martingale Mt = Hn˜ K(X, Rt 2 σ du , i.e. when we assume that there is an observation at time t. We aim to use Theorem 0 u 5.42 ([Jacod and Shiryaev, 2003], p. 515). The nesting condition 5.41 (which implies condition 5.37) is satisfied with Gtn = Ft . Moreover, [Sup − β5 ] is satisfied on the account that there is no drift in Xt and [δ5,1 − D] since there is no jumps in Xt . To show [γ5 − D], i.e. that Rt PB P (i) (i) are such that [M, M ]t → 4k•0,0 B 1−α ∆B 0 σu4 c˜−1 u du, we express Mt as i=1 Mt , where Mt R p Ti (i) n ˜ 2 Mt = 0 for t ∈ [0, Ti−1 ], the continuous interpolation of σ du on K (X, X) − i u ˜ Ti−1 H [Ti−1 , Ti ] and Mt = MTi for t ∈ [Ti , T ]. We can easily show that [M, M ]t =
B X
[M (i) , M (i) ]t .
(10.7)
i=1
The K(X) case in the proof of Theorem 3 (p. 1528, [Barndorff-Nielsen et al., 2008]) is based on a martingale theorem which shows that Z Ti ∧t P (i) (i) 0,0 1−α [M , M ]t → 4k• B (t − Ti−1 ) σu4 c˜−1 (10.8) u du. Ti−1 ∧t
Rt P In view of (10.7) and (10.8), we have thus shown that [M, M ]t → 4k•0,0 B 1−α ∆B 0 σu4 c˜−1 u du. The proof for (10.3) can adapt directly from the cross-term K(X, U ) + K(U, X) part in the proof of Theorem 3 (p. 1528, [Barndorff-Nielsen et al., 2008]). Indeed, on each block we have 26
the convergence discussed on p. 1525, and it is clear that as the block terms are uncorrelated to each other conditionally on Xt , we obtain the convergence of the vector block estimates, with correlation limit between two different block terms equal to 0. We turn now to the proof of (10.4). Using the part of the proof on the noise term (pp. 1528-1529, [Barndorff-Nielsen et al., 2008]), we have that s ˜2 H L Ki (U ) → N 0, 4ω 4 (k 0 (0)2 + k 0 (1)2 )B 2α−1 c2i . n We now compute the covariance from (A.5) in [Barndorff-Nielsen et al., 2008] (p. 1529). We ˜ −1/2 Ki (U ), Hn ˜ −1/2 Kj (U )) is equivalent to have for i < j that Cov(Hn Cov
w(i) − w(i) 2
1
˜ 1/H
(i)
−1/2
n
(i) V1
(i)
w (i) − wH (i) −1 −1/2 (i) − H n VH (i) , ˜ 1/H
(j) (j) (j) (j) w2 − w1 −1/2 (j) wH (j) − wH (j) −1 −1/2 (j) n V1 − n VH (j) , ˜ ˜ 1/H 1/H Pin/B (i) (i) h−1 = j=(i−1)n/B+1 (Uτj Uτj−h +Uτj Uτj+h +Uτj−1 Uτj−1−h +Uτj−1 Uτj−1+h ). where wh = k( H (i) ) and Vh It is clear now that Cov(Ki (U ), Kj (U )) goes to 0. We aim to show now (10.5). In view of (A.3) on p. 1528 in [Barndorff-Nielsen et al., 2008], we have
˜ )= K(U
B n X
(i)
−
i=1
H X
(i)
(i) (wh+1
−
(i) 2wh
+
(i) (i) wh−1 )Vh
H o X (i) (i) (i) − (wh+1 − wh−1 )Rh ,
h=1
h=1
|
{z
}|
Ai
L
{z Ci
L
(l)
}
(l)
where Ci is due to end-effects. We have that Ai → Ai and Ci → Ci for some normally dis(l) (l) tributed variables Ai and Ci from the proof on p. 1529 in [Barndorff-Nielsen et al., 2008]. Actually, we can show that the convergence still holds for the random vector (A1 , · · · , AB , C1 , · · · , CB ) L ˜ )→ and thus we have that K(U N (0, V ) where V is equal to X (l) (l) (l) (l) Cov(Ai , Aj ) + Cov(Ci , Cj ). (10.9) 1≤i,j≤B
˜ 3 )−1 PB Var(A(i) ) = B 3α−1 k•2,2 PB c3i , which shows the convergence to We have that (n/H i=1 i=1 the first term in (10.5). The second term is obtained as ˜ −1 (B α /Hm)
B X i=1
Var(Ci ) + 2
B X
B B X X √ Cov(Ai , Ai−1 ) = k•1,1 ci + k˜•1,1 (ci , ci−1 ) ci ci−1 .
i=2
i=1
i=2
The other terms in (10.9) go to 0, thus we have shown (10.5). The convergence (10.6) is obtained as a straightforward consequence of (10.5). 27
10.5
Proof of Proposition 3 (RK,c∗ ,2)
i takes on the form AV AR[Ti−1 ,T i]
AV
(RK,c∗i ,2) AR[Ti−1 ,T i]
Z √ = ω B ∆B
Ti
σu4 du
3/4
g(ρTi−1 ,Ti ).
Ti−1 2
In view of (10.1), we easily obtain 0 < ρ ≤ ρTi−1 ,Ti ≤ 1 where ρ = σσ2 . This gives us the estimate 8 ≤ g(ρTi−1 ,Ti ) ≤ g < ∞ for some g. Let us define for B ∈ N, B ≥ 1, the random set JB := {i ∈ {1, · · · , B}|σ jumps on (Ti−1 , Ti ]}. Because the jumps in σ are of finite activity, almost surely the cardinal of JB , defined as |JB |, (RK,c∗i ,2) for which k is contained tends to a finite value. Thus we can get rid of the terms AV AR[Tk−1 ,T k] into JB because X a.s. (RK,c∗i ,2) AV AR[Ti−1 ,T ≤ |JB |∆B T 1/2 ωσ 3 g → 0, i] i∈JB
and similarly g(1)ωT
1/2
XZ i∈JB
Ti
a.s.
σu3 du → 0.
Ti−1
Thus, the proposition will be proved if we show X XZ (RK,c∗i ,2) 1/2 AV AR[Ti−1 ,Ti ] − g(1)ωT i∈J / B
i∈J / B
Ti
a.s.
σu3 du → 0.
Ti−1
As the continuous part of σ is assumed to be an Itô process with bounded components, some calculation shows that for any p > 0, q ≥ 1, and uniformly in i ∈ / JB we have the following expansion Z Ti 3/2 σup du = σTpi−1 ∆B + OLq (∆B ), Ti−1
q where A = OLq (C), C > 0 means that E CA is bounded. Thus, using again (10.1), we also obR Ti 4 1/2 1/2 tain the expansions ρTi−1 ,Ti = 1+OLq (∆B ), g(ρTi−1 ,Ti ) = g0 +OLq (∆B ), and (∆B Ti−1 σu du)3/4 = R 1/2 Ti ∆B Ti−1 σu3 du + OLq (∆2B ) to get finally the estimate Z Ti (RK,c∗i ,2) 3/2 1/2 AV AR[Ti−1 ,Ti ] = g(1)ωT σu3 du + OLq (∆B ) Ti−1
uniformly in i ∈ / JB . At this stage we have thus proved that X X Z Ti (RK,c∗i ,2) 1/2 1/2 AV AR[Ti−1 ,Ti ] − g0 ωT σu3 du = OLq (∆B ) i∈J / B
i∈J / B
Ti−1
To get the almost sure convergence to 0, we define YB as the left hand side of the previous P P+∞ q q equality and note that E +∞ B=1 |YB | < +∞ for any q > 2. This gives us that B=1 |YB | < a.s. +∞ a.s. and so |YB |q → 0, which concludes the proof. 28
10.6
Proof of Remark 2
We show the inequality g(ρ)κ−1 ≥ g(1) for any admissible couple (ρ, κ). Note that by the domination κ ≤ ρ1/2 obtained on the account of Lemma 1, it is sufficientqto show that the p function f : ρ → ρ−1/2 g(ρ) is decreasing on the interval (0, 1]. We let p(ρ) = 1 + 1 + 3d/ρ2 , and a short calculation shows us that f 0 (ρ) has the same sign as p0 (ρ)(1 − p(ρ)−2 ). Therefore, the inequality p(ρ) ≥ 1 implies that f is decreasing if and only if p is, which is obvious.
10.7
Proof of Theorem 5
We follow and adapt the proofs in [Xiu, 2010]. To do that, we define the 2B-dimensional (1,1) (B,1) (1,2) (B,2) ¯ n where quantities Ψn = (Ψn , · · · , Ψn , Ψn , · · · , Ψn ), and correspondingly Ψ n o n o −1 −1 (i,1) (i,2) Ω(i) ) ∂ log(det Ω(i) ) 1 (i) 0 ∂Ω(i) (i) (i) 0 ∂Ω(i) (i) , (Ψn , Ψn ) = 2√1 n ∂ log(det + Y Y + Y Y 2 2 2 2 ∂σ ∂σ 2n ∂ω ∂ω and (i,1)
(i,2)
¯n ,Ψ ¯n ) = (Ψ
1 √ 2 n
n
∂ log(det Ω(i) ) ∂σ 2
o n o (i) −1 (i) −1 (i) (i) ∂ log(det Ω(i) ) 1 ∂Ω , 2n + tr ∂Ω∂σ2 Σ0 + tr Σ 0 ∂ω 2 ∂ω 2
with for any i = 1, · · · , B, we define Ω(i) = (Ωj,k )1≤j,k≤n/B , Y (i) = (Yj )(i−1)n/B+1≤j≤in/B and R τ(i−1)n/B+1 Ti−1
(i)
Σ0 =
σu2 du + 2ω 2
−ω 2
−ω 2 R τ(i−1)n/B+2 τ(i−1)n/B+1
σu2 du + 2ω 2
0
−ω 2
.. . 0
..
···
−ω 2
..
.
0 .. .
..
.
0
R τ(i−1)n/B+3 τ(i−1)n/B+2
σu2 du + 2ω 2
..
. ···
..
. −ω 2 R Ti 2 σ 2 du + 2ω 2 −ω τin/B−1 u
.
0
We also define the 2B-dimensional quantity Z Z T1 ∗ 2 θn = σu du, · · · , 0
0
TB
σu2 du, ω 2 , · · ·
,ω
2
.
.
TB−1
At this stage note that a straightforward consequence of Theorem 3 ([Xiu, 2010], p. 237) can conclude the proof. Accordingly we thus show that the conditions of the theorem hold. We ¯ n satisfy the conditions of Theorem 2 ([Xiu, 2010], p. 237). This can first show that Ψn and Ψ be actually directly deduced from the proof of Theorem 4 on pp. 247-248 in [Xiu, 2010]. Next P ¯ n is stochastic equicontinuous and |∇Ψn − ∇Ψ ¯ n| → we show that ∇Ψ 0 uniformly. This can be shown as in the proof of Theorem 5 on p. 249 in [Xiu, 2010]. Finally we show (8) in Theorem 3 of [Xiu, 2010]. Following the proof of Theorem 5 on p. 249 in [Xiu, 2010], it is enough to (β) (β) (β) (β) compute the limit of adapted versions of the terms M1 , M2 , M3 and M4 introduced on p. 248 in [Xiu, 2010]. In addition, [Xiu, 2010] points out that the argument of the proof of Lemma 3 on pp. 248-249 is very similar to the proof of Theorem 1 in [Barndorff-Nielsen et al., 2008]. Thus, we can use the same adaptation as used on the proof of Theorem 2 to compute the limit (β) (β) (β) (β) of adapted versions of the terms M1 , M2 , M3 and M4 . 29
10.8
Proof of Proposition 6
This proof follows exactly the same line of reasoning as the proof of Proposition 3.
References [Aït-Sahalia et al., 2010] Aït-Sahalia, Y., Fan, J., and Xiu, D. (2010). High-frequency covariance estimates with noisy and asynchronous financial data. Journal of the American Statistical Association, 105(492):1504–1517. [Aït-Sahalia et al., 2005] Aït-Sahalia, Y., Mykland, P. A., and Zhang, L. (2005). How often to sample a continuous-time process in the presence of market microstructure noise. Review of Financial studies, 18(2):351–416. [Aït-Sahalia et al., 2011] Aït-Sahalia, Y., Mykland, P. A., and Zhang, L. (2011). Ultra high frequency volatility estimation with dependent microstructure noise. Journal of Econometrics, 160(1):160–175. [Ait-Sahalia and Xiu, 2016] Ait-Sahalia, Y. and Xiu, D. (2016). A hausman test for the presence of market microstructure noise in high frequency data. Chicago Booth Research Paper, (16-06). [Aldous and Eagleson, 1978] Aldous, D. J. and Eagleson, G. (1978). On mixing and stability of limit theorems. The Annals of Probability, 6(2):325–331. [Altmeyer and Bibinger, 2015] Altmeyer, R. and Bibinger, M. (2015). Functional stable limit theorems for quasi-efficient spectral covolatility estimators. Stochastic Processes and their Applications, 125(12):4556–4600. [Andersen et al., 2001] Andersen, T. G., Bollerslev, T., Diebold, F. X., and Labys, P. (2001). The distribution of realized exchange rate volatility. Journal of the American statistical association, 96(453):42–55. [Andersen et al., 2012] Andersen, T. G., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation. Journal of Econometrics, 169(1):75– 93. [Andersen et al., 2014] Andersen, T. G., Dobrev, D., Schaumburg, E., et al. (2014). A robust neighborhood truncation approach to estimation of integrated quarticity. Econometric Theory, 30(01):3–59.
30
[Barndorff-Nielsen et al., 2008] Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2008). Designing realized kernels to measure the ex post variation of equity prices in the presence of noise. Econometrica, 76(6):1481–1536. [Barndorff-Nielsen et al., 2009] Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2009). Realized kernels in practice: Trades and quotes. The Econometrics Journal, 12(3):C1–C32. [Barndorff-Nielsen and Shephard, 2002] Barndorff-Nielsen, O. E. and Shephard, N. (2002). Estimating quadratic variation using realized variance. Journal of Applied econometrics, 17(5):457–477. [Clinet and Potiron, 2016] Clinet, S. and Potiron, Y. (2016). Estimating the integrated parameter of the time-varying parameter self-exciting process. arXiv preprint arXiv:1607.05831. [Engle and Patton, 2001] Engle, R. F. and Patton, A. J. (2001). What good is a volatility model. Quantitative finance, 1(2):237–245. [Fukasawa, 2010] Fukasawa, M. (2010). Realized volatility with stochastic sampling. Stochastic Processes and their Applications, 120(6):829–852. [Genon-Catalot and Jacod, 1993] Genon-Catalot, V. and Jacod, J. (1993). On the estimation of the diffusion coefficient for multi-dimensional diffusion processes. Annales de l’IHP Probabilités et statistiques, 29(1):119–151. [Ghysels et al., 1996] Ghysels, E., Harvey, A. C., and Renault, E. (1996). 5 stochastic volatility. Handbook of statistics, 14:119–191. [Gloter and Jacod, 2001] Gloter, A. and Jacod, J. (2001). Diffusions with measurement errors. i. local asymptotic normality. ESAIM: Probability and Statistics, 5:225–242. [Hall and Heyde, 1980] Hall, P. and Heyde, C. C. (1980). Martingale limit theory and its application. Academic press. [Hansen and Lunde, 2006] Hansen, P. R. and Lunde, A. (2006). Realized variance and market microstructure noise. Journal of Business & Economic Statistics, 24(2):127–161. [Jacod, 1994] Jacod, J. (1994). Limit of random measures associated with the increments of a brownian semimartingale. preprint, 120:155–162. [Jacod et al., 2009] Jacod, J., Li, Y., Mykland, P. A., Podolskij, M., and Vetter, M. (2009). Microstructure noise in the continuous case: the pre-averaging approach. Stochastic processes and their applications, 119(7):2249–2276. 31
[Jacod and Mykland, 2015] Jacod, J. and Mykland, P. A. (2015). Microstructure noise in the continuous case: Approximate efficiency of the adaptive pre-averaging method. Stochastic Processes and their Applications, 125(8):2910–2936. [Jacod and Protter, 1998] Jacod, J. and Protter, P. (1998). Asymptotic error distributions for the euler method for stochastic differential equations. Annals of Probability, pages 267–307. [Jacod et al., 2013] Jacod, J., Rosenbaum, M., et al. (2013). Quarticity and other functionals of volatility: efficient estimation. The Annals of Statistics, 41(3):1462–1484. [Jacod and Shiryaev, 2003] Jacod, J. and Shiryaev, A. (2003). Limit theorems for stochastic processes (2nd ed.), volume 288. Berlin: Springer-Verlag. [Kalnina and Linton, 2008] Kalnina, I. and Linton, O. (2008). Estimating quadratic variation consistently in the presence of endogenous and diurnal measurement error. Journal of Econometrics, 147(1):47–59. [Li et al., 2014] Li, Y., Mykland, P. A., Renault, E., Zhang, L., and Zheng, X. (2014). Realized volatility when sampling times are possibly endogenous. Econometric Theory, 30(03):580– 605. [Mancini, 2001] Mancini, C. (2001). Disentangling the jumps of the diffusion in a geometric jumping brownian motion. Giornale dell’Istituto Italiano degli Attuari, 64(19-47):44. [Mancino and Sanfelici, 2012] Mancino, M. E. and Sanfelici, S. (2012). Estimation of quarticity with high-frequency data. Quantitative finance, 12(4):607–622. [Meddahi, 2002] Meddahi, N. (2002). A theoretical comparison between integrated and realized volatility. Journal of Applied Econometrics, 17(5):479–508. [Mykland and Zhang, 2009] Mykland, P. A. and Zhang, L. (2009). Inference for continuous semimartingales observed at high frequency. Econometrica, 77(5):1403–1445. [Mykland and Zhang, 2012] Mykland, P. A. and Zhang, L. (2012). The econometrics of high frequency data. Statistical methods for stochastic differential equations, 124:109. [Mykland et al., 2006] Mykland, P. A., Zhang, L., et al. (2006). Anova for diffusions and ito processes. The Annals of Statistics, 34(4):1931–1963. [Potiron, 2016] Potiron, Y. (2016). Estimating the integrated parameter of the locally parametric model in high-frequency data. arXiv preprint arXiv:1603.05700v1. [Potiron and Mykland, 2016] Potiron, Y. and Mykland, P. A. (2016). Estimation of integrated quadratic covariation with endogenous sampling times. Journal of Econometrics. 32
[Reiß, 2011] Reiß, M. (2011). Asymptotic equivalence for inference on the volatility from noisy observations. The Annals of Statistics, 39(2):772–802. [Rényi, 1963] Rényi, A. (1963). On stable sequences of events. Sankhy¯a: The Indian Journal of Statistics, Series A, pages 293–302. [Revuz and Yor, 2013] Revuz, D. and Yor, M. (2013). Continuous martingales and Brownian motion, volume 293. Springer Science & Business Media. [Robert and Rosenbaum, 2011] Robert, C. Y. and Rosenbaum, M. (2011). A new approach for the dynamics of ultra-high-frequency data: The model with uncertainty zones. Journal of Financial Econometrics, 9(2):344–366. [Rootzen, 1980] Rootzen, H. (1980). Limit distributions for the error in approximations of stochastic integrals. The Annals of Probability, pages 241–251. [Todorov and Tauchen, 2011] Todorov, V. and Tauchen, G. (2011). Volatility jumps. Journal of Business & Economic Statistics, 29(3):356–371. [Xiu, 2010] Xiu, D. (2010). Quasi-maximum likelihood estimation of volatility with high frequency data. Journal of Econometrics, 159(1):235–250. [Zhang, 2001] Zhang, L. (2001). From martingales to ANOVA: Implied and realized volatility. PhD thesis, University of Chicago, Department of Statistics. [Zhang, 2006] Zhang, L. (2006). Efficient estimation of stochastic volatility using noisy observations: A multi-scale approach. Bernoulli, 12(6):1019–1043. [Zhang et al., 2005] Zhang, L., Mykland, P. A., and Aït-Sahalia, Y. (2005). A tale of two time scales: Determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association, 100(472):1394–1411.
33
Table 1: Sample mean and standard error of ρ and κ for the three models. Model
ρmean
ρstdv.
κmean
κstdv.
Model 1 Model 2 Model 3
0.89 0.77 0.64
0.01 0.15 0.14
0.92 0.83 0.74
0.01 0.12 0.1
34
˜
Table 2: Finite sample properties of ZnKB (Model 2)† No. Obs.
Mean
B = 1 block 5,850 -0.042 11,700 -0.032 23,400 -0.030 46,800 -0.027 B = 2 blocks 5,850 -0.065 11,700 -0.048 23,400 -0.042 46,800 -0.037 B = 4 blocks 5,850 -0.105 11,700 -0.082 23,400 -0.069 46,800 -0.059 B = 6 blocks 5,850 -0.144 11,700 -0.114 23,400 -0.099 46,800 -0.086 B = 8 blocks 5,850 -0.193 11,700 -0.154 23,400 -0.128 46,800 -0.109
Stdv.
RMSE
0.5%
2.5%
5%
1.102 1.067 1.044 1.041
1.103 1.068 1.044 1.041
0.29 0.39 0.41 0.46
1.75 1.96 2.13 2.25
1.105 1.069 1.048 1.044
1.106 1.070 1.049 1.045
0.24 0.32 0.37 0.43
1.110 1.074 1.051 1.043
1.115 1.077 1.053 1.044
1.115 1.077 1.054 1.043 1.119 1.08 1.054 1.042
97.5%
99.5%
3.77 96.62 3.98 96.01 4.16 95.63 4.35 95.58
98.62 98.20 97.84 98.18
99.85 99.80 99.70 99.74
1.55 1.85 2.01 2.11
3.53 96.49 3.65 95.89 3.91 95.52 4.10 95.54
98.43 98.20 97.88 98.06
99.82 99.71 99.65 99.71
0.21 0.28 0.36 0.37
1.38 1.54 1.77 1.89
3.02 96.25 3.33 95.74 3.66 95.22 3.89 95.31
98.37 98.15 97.73 97.96
99.81 99.66 99.65 99.64
1.124 1.083 1.059 1.047
0.19 0.23 0.31 0.38
1.23 1.40 1.65 1.65
2.72 95.81 3.09 95.37 3.49 94.95 3.56 94.95
98.33 97.88 97.57 97.76
99.75 99.67 99.60 99.57
1.136 1.091 1.062 1.047
0.15 0.21 0.28 0.32
1.03 1.26 1.52 1.64
2.31 95.40 2.83 95.27 3.27 94.91 3.51 94.72
98.14 97.88 97.56 97.56
99.75 99.66 99.59 99.55
† This
95%
table shows summary statistics and empirical quantiles benchmarked to the N (0,1) distribution for the infeasible Z-statistics related to the global and local RK (Tukey-Hanning 2). The simulation design is Model 2 with M = 10, 000 Monte-Carlo simulations.
35
˜
Table 3: Finite sample properties of ZnQB (Model 2)† No. Obs.
Mean
B = 1 block 5,850 -0.024 11,700 -0.015 23,400 -0.012 46,800 -0.013 B = 2 blocks 5,850 -0.023 11,700 -0.011 23,400 -0.007 46,800 -0.009 B = 4 blocks 5,850 -0.016 11,700 -0.007 23,400 -0.002 46,800 -0.005 B = 6 blocks 5,850 -0.012 11,700 -0.002 23,400 -0.0 46,800 -0.004 B = 8 blocks 5,850 -0.014 11,700 -0.005 23,400 -0.001 46,800 -0.003
Stdv.
RMSE
0.5%
2.5%
5%
1.084 1.058 1.039 1.034
1.084 1.058 1.039 1.034
0.36 0.43 0.51 0.59
2.09 2.26 2.19 2.38
1.086 1.06 1.042 1.036
1.086 1.06 1.042 1.036
0.34 0.42 0.54 0.56
1.089 1.063 1.042 1.035
1.089 1.063 1.042 1.035
1.089 1.062 1.041 1.033 1.093 1.066 1.043 1.033
97.5%
99.5%
4.12 96.48 4.51 96.32 4.48 95.87 4.67 95.74
98.57 98.34 97.97 98.06
99.84 99.75 99.70 99.73
1.90 2.08 2.12 2.22
4.03 96.48 4.28 96.24 4.31 95.71 4.51 95.71
98.46 98.31 98.01 98.08
99.85 99.72 99.66 99.63
0.34 0.42 0.51 0.56
1.98 2.09 2.16 2.20
3.86 96.42 4.19 96.24 4.52 95.64 4.74 95.60
98.57 98.34 98.02 98.08
99.82 99.72 99.65 99.69
1.089 1.062 1.041 1.033
0.36 0.43 0.51 0.57
2.01 2.01 2.10 2.21
3.93 96.46 4.26 96.33 4.63 95.60 4.72 95.64
98.56 98.33 98.06 98.07
99.82 99.74 99.71 99.69
1.093 1.066 1.043 1.033
0.36 0.40 0.47 0.57
1.82 1.94 2.05 2.29
3.83 96.42 4.15 96.33 4.53 95.64 4.67 95.60
98.63 98.37 98.15 98.10
99.82 99.74 99.67 99.66
† This
95%
table shows summary statistics and empirical quantiles benchmarked to the N (0,1) distribution for the infeasible Z-statistics related to the global and local QMLE. The simulation design is Model 2 with M = 10, 000 Monte-Carlo simulations.
36
Table 4: Losses† Model
Q
n = 23, 400, ξ 2 = 0.001 Emp. 17.2% Model 1 Theo. 6.9% Emp. 30.7% Model 2 Theo. 21.5% Emp. 51.1% Model 3 Theo. 38.7% n = 46, 800, ξ 2 = 0.001 Emp. 15.3% Model 1 Theo. 6.9% Emp. 29.7% Model 2 Theo. 21.5% Emp. 47.6% Model 3 Theo. 38.7% n = 23, 400, ξ 2 = 0.0002 Emp. 25.2% Model 1 Theo. 6.9% Emp. 45.5% Model 2 Theo. 21.8% Emp. 64.3% Model 3 Theo. 38.1% n = 46, 800, ξ 2 = 0.0002 Emp. 19.7% Model 1 Theo. 6.9% Emp. 38.9% Model 2 Theo. 21.8% Emp. 57.3% Model 3 Theo. 38.1% † Empirical
˜2 Q
˜4 Q
˜6 Q
˜8 Q
K
˜2 K
˜4 K
˜6 K
˜8 K
P
15.8% 5.4% 21.8% 12.4% 33.0% 20.9%
12.9% 11.7% 11.2% 21.7% 2.6% 1.4% 0.9% 9.9% 14.8% 12.1% 11.2% 28.7% 5.8% 3.5% 2.4% 18.2% 20.6% 16.2% 14.7% 43.4% 9.0% 5.0% 3.2% 26.8%
20.9% 8.4% 23.3% 12.1% 32.8% 17.0%
19.4% 19.7% 6.0% 5.0% 19.7% 19.2% 8.0% 6.4% 26.1% 23.8% 10.3% 7.6%
20.1% 4.5% 19.0% 5.6% 23.5% 6.3%
33.0% 6.8% 54.1% 6.8% 110.2% 6.8%
13.9% 5.4% 20.6% 12.4% 29.2% 20.9%
10.9% 9.6% 9.1% 20.0% 2.6% 1.4% 0.9% 9.9% 13.3% 10.4% 9.2% 28.2% 5.8% 3.5% 2.4% 18.2% 16.9% 12.6% 11.0% 38.3% 9.0% 5.0% 3.2% 26.8%
19.0% 8.4% 22.4% 12.1% 26.8% 17.0%
16.7% 16.3% 16.4% 6.0% 5.0% 4.5% 17.8% 16.5% 15.8% 8.0% 6.4% 5.6% 20.6% 17.6% 17.2% 10.3% 7.6% 6.3%
27.0% 6.8% 49.6% 6.8% 99.6% 6.8%
23.8% 20.6% 19.4% 18.6% 32.8% 31.7% 5.4% 2.6% 1.4% 0.9% 9.9% 8.4% 35.6% 28.2% 25.7% 24.5% 46.2% 40.5% 12.6% 5.9% 3.6% 2.5% 18.4% 12.2% 45.3% 32.5% 28.4% 26.2% 56.0% 46.4% 20.5% 8.8% 4.9% 3.1% 26.6% 16.9%
30.3% 30.0% 6.0% 5.0% 37.0% 36.4% 8.0% 6.4% 40.8% 39.5% 10.2% 7.5%
30.5% 4.5% 36.8% 5.6% 39.4% 6.2%
495.5% 6.8% 863.4% 6.8% 437.1% 6.8%
18.1% 5.4% 29.0% 12.6% 38.3% 20.5%
21.4% 21.0% 6.0% 5.0% 28.5% 27.6% 8.0% 6.4% 31.4% 29.5% 10.2% 7.5%
20.7% 4.5% 27.0% 5.6% 28.1% 6.2%
218.0% 6.8% 403.2% 6.8% 255.8% 6.8%
14.8% 13.6% 12.7% 24.9% 2.6% 1.4% 0.9% 9.9% 22.2% 19.8% 18.1% 37.7% 5.9% 3.6% 2.5% 18.4% 26.3% 22.1% 19.6% 47.3% 8.8% 4.9% 3.1% 26.6%
23.6% 8.5% 31.8% 12.2% 37.1% 16.9%
˜ Σ and theoretical losses LΣ for the three models and the 11 estimators. Two levels of samplings n = 23, 400, losses L B B n = 46, 800 and two noise-to-signal ratios ξ 2 = 0.001 and ξ 2 = 0.0002 are considered.
37
Table 5: Estimates of ρ, AVAR ratio estimates and empirical correlation of corrections.† B
ρbB
1 2 4 6 8
0.74 0.8 0.84 0.85 0.86
(QM LE)
AV ARB
(QM LE)
/AV AR1
(RK)
AV ARB
1 0.96 0.92 0.91 0.9
(RK)
/AV AR1
1 0.97 0.94 0.93 0.92
d Q ˜ B − Q, K ˜ B − K) Corr( 0.689 0.769 0.868 0.879
† For
B = 1, 2, 4, 6, 8, ρbB refers to the empirical mean value of estimates of ρ on blocks [Ti−1 , Ti ] across days and values of i for INTC in 2015. The AVAR ratios are estimated by plugging estimates of the integrated volatility, the integrated quarticity and ρ on blocks of different sizes. The last column shows the empirical correlation between the corrections induced by the local method.
Table 6: Summary statistics for the global and local estimators† Estimator
Mean
d Q) Stdv. Corr(.,
Q ˜2 Q ˜4 Q ˜6 Q ˜8 Q
1.771 1.770 1.766 1.761 1.757 1.818 1.818 1.813 1.808 1.804
1.789 1.781 1.769 1.762 1.753 1.795 1.780 1.770 1.756 1.751
K ˜2 K ˜4 K ˜6 K ˜8 K † Sample
1 ≈1 ≈1 0.9999 0.9999 0.9994 0.9992 0.9991 0.9989 0.9988
means, standard deviations, and correlations with the global QMLE for the 10 estimators implemented for INTC data in 2015. The estimators are scaled by a factor 104 .
38
Figure 1: Feasible loss region for three typical RK (Tukey-Hanning 16, Tukey-Hanning 2, Cubic), the QMLE, the PAE and the MSRV. For each estimator, the lower line corresponds to the lower boundary when considering the best possible scenario κ = ρ1/2 and the upper line stands for the upper boundary in the worst case scenario κ = ρ3/2 . The feasible loss region lies between those two lines.
39
(RK)
(QM LE)
Figure 2: For B = 1, · · · , 8 we plot LB (upper left panel), LB for Tukey-Hanning 2 (RK) (QM LE) kernel (upper right panel), the corresponding AVAR ratio defined as AV ARB /AV ARB (QM LE) (lower left panel) and the ratio of pre-averaging AVAR using B blocks over AV ARB (lower right panel) as a function of ρ. The model considered for volatility is a deterministic U shape + 1 Jump, which corresponds to Model 2 in Section 7 without the stochastic volatility part that doesn’t change the picture when added (see Section 7.3 for more details). Here we generate different values of ρ as a function of the jump time, which we restrict to be in 0.013T, T so that each ρ can be associated to a distinct jump time on that interval. We choose this particular model because the sample mean of ρ is .77 which corresponds to a regular value, and the panel of generated ρ values is sufficiently large.
40
˜ 8 (green, left), K and Figure 3: 95% Confidence intervals for the four estimators Q and Q ˜ 8 (blue, right) on INTC data in May 2015. The CIs are computed using the estimates of K (QM LE) (RK) AV ARB , and AV ARB for B = 1, 8 obtained as explained in Section 8. The estimators are scaled by a factor 104 .
41