The HYBRID GARCH Class of Models

The HYBRID GARCH Class of Models∗ Xilong Chen†

Eric Ghysels‡

Fangfang Wang§

First Draft: April 2009 This Draft: April 29, 2010 PRELIMINARY AND INCOMPLETE

Abstract We propose a general GARCH framework that allows the use of different frequency returns to model conditional heteroskedasticity. We call the class of models High FrequencY DataBased PRojectIon-Driven GARCH models as the GARCH dynamics are driven by what we call HYBRID processes. We study three broad classes of HYBRID processes: (1) parameterfree processes that are purely data-driven, (2) structural HYBRIDs where one assumes an underlying DGP for the high frequency data and finally (3) HYBRID filter processes. We develop the asymptotic theory of various estimators and study their properties in small samples via simulations.

∗

We like to thank Rob Engle and Eric Renault for insightful comments. An early version of the paper was presented at the Stevanovich Center - CREATES conference Financial Econometrics and Statistics: Current Themes and New Directions, Skagen, Denmark 4-6 June 2009 under the title High Frequency GARCH Models. † ETS, SAS Institute Inc., Cary, NC 27513. Email: [email protected] ‡ Department of Finance, Kenan-Flagler Business School, and Department of Economics, University of North Carolina at Chapel Hill. Email: [email protected] § Department of Information and Decision Sciences, University of Illinois at Chicago. Email: [email protected]

1

Introduction

Suppose one wants to forecast the expected total volatility over the next five business days and use past daily data. Or, suppose one wants to estimate tomorrow’s expected volatility and use a history of intra-daily returns. To address the first we can estimate a daily GARCH model and iterate five days ahead forecasts. Alternatively, we could estimate a GARCH model with past five-day returns and produce a one-step ahead forecast, which would also produce prediction of the same horizon. The difference between the two approaches is that the former uses past daily data to produce a(n) (iterated) forecast for the next five days, whereas the latter uses a (direct) forecast based on a coarses data set of past returns, namely a history of past five-day returns. Relatively little work has been done on the comparison between direct and iterated forecasts in the context of volatility. One exception is Ghysels, Rubia, and Valkanov (2009) who compare several approaches of producing multiperiod ahead forecasts of volatility –iterated, direct, and mixed-data sampling (MIDAS). The latter is a combination of direct and iterated in the sense that the information set of iterated forecasts - i.e. past daily returns - is combined with a one-step direct forecast. This is achieved via MIDAS, meaning Mi(xed) Da(ta) S(ampling), regressions designed to deal with data sampled at different frequencies, which has emerged in recent work by Ghysels, Santa-Clara, and Valkanov (2002), Ghysels, Santa-Clara, and Valkanov (2006) and Andreou, Ghysels, and Kourtellos (2008). In the present paper we propose a unifying framework, based on a generic GARCH-type model, that addresses the issue of volatility forecasting involving forecast horizons of a different frequency than the information set. Hence, we propose a class of GARCH models that can handle volatility forecasts over the next five business days and use past daily data, or tomorrow’s expected volatility while using intra-daily returns. We call the class of models High FrequencY Data-Based PRojectIon-Driven GARCH models as the GARCH dynamics are driven by what we call HYBRID processes. We study three broad classes of HYBRID processes: (1) parameter-free processes that are purely data-driven, (2) structural HYBRIDs where one assumes an underlying DGP for the high frequency data and finally (3) HYBRID filter processes. HYBRID-GARCH models - by their very nature - relate to many topics discussed in the large literature on volatility forecasting. These topics include - but are not limited to - iterated versus direct forecasting (as already noted), temporal aggregation, weak versus semi-strong GARCH, as well as various estimation procedures. Since there are quite a few papers written on these topics already it is hard to cite a comprehensive list 1

here. However, it is worth noting that we study three broad classes of HYBRID processes: (1) parameter-free processes that are purely data-driven, (2) structural HYBRIDs where one assumes an underlying DGP for the high frequency data and finally (3) HYBRID filter processes. The first class of processes - those that involve parameter-free HYBRID processes relate to a flurry of recent papers, including Engle and Gallo (2006), de Vilder and Visser (2008), Visser (2008), Shephard and Sheppard (2009), Hansen, Huang, and Shek (2010) suggest the use of (daily) realized volatilities, high-low range or realized kernels or generic realized measures. The paper is structured as follows. Section 2 provides a general overview of the models, their estimation and the various classes of HYBRID processes involved. The section is deliberately non-technical. It is followed by a technical section diverging further on the various model specifications in section 3. Section 4 is devoted to the statistical properties of the HYBRID GARCH model, and deriving the various parameter estimators followed by section 5 where the small sample properties of those estimators are examined. The paper concludes with an extensive empirical study and a concluding section.

2

HYBRID Processes and Estimators

This section provides a general overview of the models, their estimation and the various classes of HYBRID processes involved. We deliberately compromise on technical details in order to provide the reader with a general overview of the paper’s contributions. The subsequent sections of the paper will address in detail all the technicalities and provide the main results. The volatility dynamics of a generic HYBRID GARCH model are as follows: Vt+1|t = α + βVt|t−1 + γHt

(2.1)

when Ht is simply a daily squared return we have the volatility dynamics of a standard daily GARCH(1,1), or Ht a weekly squared return those of a standard weekly GARCH(1,1). However, what would happen if we want to attribute an individual weight to each of ∑4 2 the five days in a week? In this case we might consider a process Ht ≡ j=0 ωj rt−j/5 , where we use the notation rt−j/5 to indicate the intermediate daily observations of week

2

t.1 This is an example of a parameter-driven HYBRID process Ht ≡ H(ϕ, ⃗rt ) where ⃗rt = (rt−1+1/m , rt−1+2/m , . . . , rt−1/m , rt )T is Rm −valued random vector. In addition, the weights (ωj , j = 0, . . . , m − 1) constitute the parameter vector ϕ in this case and m = 5. One can think of at least two possibilities: (1) the weights are treated as additional parameters and estimated as such (with m small this is possible, but not as m gets large), or (2) anchor the weights ωj to an underlying daily GARCH(1,1) in which case the parameters α, β and γ and the weights in ϕ are jointly related to the assumed daily DGP. The discussion so far implicitly relates to many issues we discuss next.

Parameter-free HYBRID Processes The HYBRID process Ht may be purely data-driven and not depend on parameters. The obvious case would be a simple return process such that Vt+1|t has the typical GARCH(1,1) dynamics. More recently, however, other purely data-driven examples of what we call generic HYBRID processes have been suggested. For example Engle and Gallo (2006), de Vilder and Visser (2008), Visser (2008), Shephard and Sheppard (2009), Hansen, Huang, and Shek (2010) suggest the use of (daily) realized volatilities, high-low range or realized kernels or generic realized measures.

Structural HYBRID Processes Structural HYBRID processes appear in the context of temporal aggregation - a topic discussed extensively in the GARCH literature, see e.g. Drost and Nijman (1993), Drost and Werker (1996), Meddahi and Renault (2004), among others. Suppose we consider a daily weak GARCH(1,1), then the implied weekly prediction, using past daily returns is: Vt+1|t = αm + βm Vt|t−1 + γm

m−1 ∑

j/m 2 rt−j/m , βm

t∈Z

(2.2)

j=0

with m = 5, and where αm , βm and γm depend on the daily GARCH(1,1) parameters α1 , β1 and γ1 (see more details later in equation (3.4)). Here all the parameters are actually driven by the daily parameters. Therefore, while the HYBRID process is parameter-driven it is in 1

When days spill over into the previous week, we assume rt−j/m ≡ rt−1−(j−5)/m .

3

principle an integral part of the volatility dynamics and H(ϕ, ⃗rt ) in (2.1) does not involve stand-alone parameters ϕ. This will have consequences when we elaborate on the estimation of HYBRID GARCH models. Indeed, the context of temporal aggregation precludes us from using, say standard QMLE methods, a topic that will be discussed later.

HYBRID Filtering Processes

Here the HYBRID process H(ϕ, ⃗rt ) in (2.1) involves parameters that are not explicitly related to α, β and γ appearing in (2.1). There is no underlying high frequency data DGP that is being assumed, unlike in the structural HYBRID case. One can view this as a GARCH model driven by a filtered high frequency process - where the filter weights (hyper-)parameterized by ϕ are estimated jointly with the volatility dynamics parameters. The choice of the parameterizations of is suggested by the work of Ghysels, Santa-Clara, and Valkanov (2006), Ghysels, Sinko, and Valkanov (2006), Ghysels, Rubia, and Valkanov (2009). The commonly used specifications are exponential, beta, linear, hyperbolic, and geometric weights. This approach has implications too as far as estimation is concerned. Unlike the structural HYBRID case, we now can consider likelihood-based methods, although the regularity conditions required are novel and more involved as those of the usual QMLE approach to GARCH estimation for instance in Bollerslev and Wooldridge (1992).

Estimation Procedures

Various estimation procedures will be considered - some tailored to specific cases of HYBRID processes. Let us first collect all the parameters of the model appearing in (2.1) in a parameter vector called θ ∈ Θ, with the true parameter being denoted θ0 . One has to keep in mind that specific cases - notably involving structural HYBRID processes - may involve constraints across the parameters in (2.1) or the filtering weights of the HYBRID process may also be hyper-parameterized, so that the dimension of θ (denoted as d) depends on the specific circumstances considered. For this generic setting we have the following estimators: T 1∑ mdrv ˆ = arg min θT (RVt − Vt|t−1 (θ))2 θ∈C T t=1

4

where C is a convex compact subset of Θ such that θ0 is in the interior of C. This minimum distance estimator involves observations about RV, realized volatility or possibly a realized measure that corrects for microstructure effects etc. This estimator applies to volatility models involving all possible HYBRID processes, including structural ones for which a weak GARCH assumption is required. Note that this means that Vt|t−1 (θ) in the above estimator is based on a best linear predictor, not the conditional variance - a technical issue that will be discussed in the next section. A companion estimation procedure involves a single squared return process, namely: T 1∑ 2 mdr2 ˆ θT = arg min (Rt − Vt|t−1 (θ))2 θ∈C T t=1

The above estimator has a likelihood-based version, namely: ) T ( 1∑ Rt2 lhr2 ˆ θT = arg min log Vt|t−1 (θ) + θ∈C T Vt|t−1 (θ) t=1 which is far more stringent in terms of regularity conditions, notably because Vt|t−1 (θ) is a conditional variance, and in fact does not apply to all types of HYBRID processes - in particular structural ones. The estimator θˆTmdr2 is reminiscent of QMLE estimators for semistrong GARCH models - yet the mixed data frequencies add an extra layer of complexity discussed later in the paper. One can again replace daily squared returns by, say RV and consider the following estimator: ) T ( 1∑ RVt lhrv ˆ log Vt|t−1 (θ) + θT = arg min θ∈C T Vt|t−1 (θ) t=1 The choice of R2 versus RV in θˆTmdr2 versus θˆTmdrv and θˆTlhr2 versus θˆTlhrv has efficiency implications that will be discussed as well. Consider the regression equivalent of equation (2.1), namely: ˜ t + γ˜ H(ϕ, ˜ ⃗rt ) + εt+1 RVt+1 = α ˜ + βRV

(2.3)

This regression is reminiscent of a MIDAS regression, see Ghysels, Santa-Clara, and Valkanov (2006), Ghysels, Sinko, and Valkanov (2006), Ghysels, Rubia, and Valkanov (2009). This regression-based approach also yields a class of estimators. They relate to the regression5

based estimators for weak GARCH models of Francq and Zako¨ıan (2000). This regression approach also yields a different class of estimators. To summarize, we will study three classes of estimators: (1) minimum distance θˆTmdrv and θˆTmdr2 , (2) likelihood-based estimators θˆTlhrv and θˆTlhr2 , and (3) regression-based estimators.

3

Model Specifications

The process Vt|t−1 appearing in (2.1) may be considered as the solution to the dynamic equation, but is not necessarily the data generating process of returns. Even if the HYBRID process would be a single return the idea that GARCH(1,1) dynamics may not be the correctly specified model has been much discussed - most importantly in the work of Nelson 2 (1992) and Nelson and Foster (1995). We introduce a process σt+1|t which is the projection of accumulated squared returns onto a Hilbert space which is determined by the true nature of the data and is parameter-free. Two scenarios will be considered for this projection and regularity conditions will be imposed that will allow us to derive the asymptotic properties of the various estimators we introduced in the previous section. First we need to fix some notations. We use ∥.∥ to denote the norm of an element in a Hilbert √ √ space: 1) ∥A∥ = EA2 when A ∈ L2 (Ω, F, P ), 2) ∥A∥ = tr(AT A) when A ∈ Rn×n or A ∈ Rn×1 for n ≥ 1. For a matrix A, A > 0 if A is positive definite, and A ≥ 0 if A is positive semi-definite. If A is finite entrywise, then we write A < ∞. For X ∈ L2 (Ω, F, P ), PL (X|It ) denotes the best mean square projector of X onto It , i.e. ∥X − PL (X|It )∥2 = inf Y ∈It ∥X − Y ∥2 = inf Y ∈It E(X − Y )2 . Moreover, to emphasize the role of ϕ in the process H(ϕ, ⃗rt ) we will use the notation H(ϕ, ⃗rt ) ≡ Ht (ϕ). In a first subsection we consider structural HYBRID processes, followed by filtering.

3.1

Structural HYBRID Processes

In the structural HYBRID processes setup, the underlying high frequency return rs (where s is of the form t+k/m for some t ∈ Z and k = 0, 1, 2 . . . , m−1) satisfies a weak GARCH(1,1), namely, vs+1/m|s = a + bvs|s−1/m + crs2 (3.1)

6

2 2 where vs+1/m|s is the best linear predictor of rs+1/m onto a closed span of {1, rs−j/m , rs−j/m , j = 0, 1, 2, . . .}, denoted by Ls . And rs+1/m is orthogonal to Ls , i.e., PL (rs+1/m |Ls ) = 0.

With some algebra one can obtain vs+k/m|s = a

1 − (b + c)k−1 + (b + c)k−1 vs+1/m|s 1 − (b + c)

(3.2)

for k ∈ Z+ . Consequently, the total volatility over the period (t, t + 1], denoted by ∑ Vt+1|t ≡ m k=1 vt+k/m|t , is Vt+1|t = αm + βm Vt|t−1 + γm

m−1 ∑

j/m 2 βm rt−j/m

(3.3)

j=0 m

βm = bm , γm = cdm , and dm = (1 − (b + c)m )/(1 − (b + c)). ∑ j/m 2 Clearly, (3.12) is of the form (2.1), and the HYBRID filter is Ht = γm m−1 j=0 βm rt−j/m in this case. It is worth noting that the structural HYBRID model allows the parameters evaluated under different sampling frequencies to be linked with each other explicitly: where αm = a 1−b 1−b

m(1−b)−cdm , 1−(b+c)

1−β m m(1−β1 )−γ1 dm 1−(β1 +γ1 ) β1m

αm = α1 1−β11 βm = γm = γ1 dm

(3.4)

where dm = (1 − (β1 + γ1 )m )/(1 − (β1 + γ1 )). A direct application of the relationship (3.4) is that one can use parameter estimates from say a daily model with, say 5-min returns, to formulate say a weekly model with the same 5-min returns.

3.2

HYBRID Filtering Processes

The HYBRID filtering process is an extension of the structural HYBRID process by allowing for more flexible return dynamics. Although the HYBRID filtering process is formulated without any explicit DGP, we still require the return process to satisfy some weak regularity conditions.

7

Assumption 3.1 Let rs2 ∈ L2 (Ω, F, P ) for some probability space (Ω, F, P ), and PL (rs |Is−1/m ) = 0,

2 PL (RVt+1 |It ) = σt+1|t

(3.5)

∑ 2 2 where RVt+1 = m−1 j=0 rt+1−j/m , It is a closed subspace of L (Ω, F, P ) and it consists of the information on the high frequency returns up to time t. We further assume that the returns are non-degenerate. Hence the high frequency returns are linearly independent. Since we use Vt+1|t driven by the HYBRID filtering process H(ϕ, ⃗rt ) to mimic the dynamics 2 , the HYBRID filtering process is assumed to satisfy the following conditions: of σt+1|t m

z }| { Assumption 3.2 H is a mapping from Φ× L2 × . . . × L2 to L2 . It is nonnegative almost surely. Fix ⃗x, H(., ⃗x) ∈ C 3 . Fix ϕ, H, ∂ϕ H, ∂ϕ2 H and ∂ϕ3 H are measurable. E supϕ∈Φ0 H(ϕ, ⃗x)2 , E supϕ∈Φ0 (∂ϕ H(ϕ, ⃗x))2 , E supϕ∈Φ0 (∂ϕ2 H(ϕ, ⃗x))2 are finite. Moreover, 1, Ht , and each component of ∂ϕ (Ht ) are linearly independent if γ is not a constant; Otherwise, 1, and each component of ∂ϕ (Ht ) are linearly independent Therefore, a necessary condition is that the dimension of ϕ is not bigger than m. Clearly, ∑ ∑m−1 2 Ht (ϕ) meets Assumption 3.2, when Ht (ϕ) = m−1 j=0 ωj rt−j/m where j=0 ωj = 1 and ϕ = (ω0 , ω1 , . . . , ωm−1 )T . Note that It appearing in (3.5) is very generic, and it does not have any specific probabilistic and algebraic structure. We will consider two specifications of It : 2 Assumption 3.3 (Scenario 1) It is a closed span of {1, rt−k/m , rt−k/m ; k = 0, 1, 2 . . .}. 2 Namely, It =Lt , the notation considered in the previous section. In this scenario, σt+1|t is 2 the best linear predictor. Since we use Vt+1|t to replicate σt+1|t , Ht is assumed to be a weighted sum of 1, the intermediate returns and squared returns from period t − 1 to t. And Ht should satisfy Assumption 3.2.

Alternatively, we also consider a more general situation: Assumption 3.4 (Scenario 2) It is the sigma field generated by the high frequency returns up to time t, labeled as Ft . Clearly, Lt is contained in Ft . The prediction equation (3.5) 8

therefore indicates that E(rs |Fs−1/m ) = 0, where Rt+1 ≡

∑m−1 j=0

2 2 E(RVt+1 |Ft ) = E(Rt+1 |Ft ) = σt+1|t

(3.6)

rt+1−j/m (see chapter 2 of Brockwell and Davis (1991)).

Note that in both scenarios we can hyper-parameterize the filter weights, namely: Ht (ϕ) ≡

m−1 ∑

2 Ψj (ϕ)rt−j/m ,

j=0

m−1 ∑

Ψj (ϕ) = 1

(3.7)

j=0

where the weights (Ψ0 (ϕ), Ψ1 (ϕ), Ψ2 (ϕ), . . . , Ψm−1 (ϕ))T are determined by a low-dimensional functional specification inspired by MIDAS regression format of Ghysels, Santa-Clara, and Valkanov (2006), Ghysels, Sinko, and Valkanov (2006), Ghysels, Rubia, and Valkanov (2009). As noted before, the commonly used specifications are exponential, beta, linear, hyperbolic, and geometric weights. The constrained weighting schemes were used to handle intra-daily seasonal patterns in the context of a HYBRID GARCH by Chen, Ghysels, and Wang (2009).

3.2.1

News impact curve

Chen and Ghysels (2008) examine whether the sign and magnitude of intra-daily returns have impact on expected volatility the next day or over longer future horizons. They revisit the concept of news impact curves introduced by Engle and Ng (1993). Overall, they find that moderately good (intra-daily) news reduces volatility (the next day), while both very good news (unusual high intra-daily positive returns) and bad news (negative returns) increase volatility, with the latter having a more severe impact. In light of these finding we consider HYBRID GARCH models that feature intra-daily news impact curves similar to the framework of Chen and Ghysels (2008), except that the latter use a MIDAS regression format. In particular, we fix γ = 1 in (2.1) and the following HYBRID process Ht (ϕ) =

m−1 ∑

Ψj (ϕ1 )N IC(rt−j/m ),

j=0

m−1 ∑

Ψj (ϕ1 ) = 1

(3.8)

j=0

where the weight Ψj (ϕ1 ) is parameterized by a low-dimensional function of ϕ1 and N IC(·) stands for a high frequency data news impact curve. The parameter vector ϕ combines both 9

ϕ1 and the parameters that determine the news impact curve - denoted by the subvector ϕ2 . Regarding the specification of the latter, we consider the parametric news impact curves studied in Chen and Ghysels (2008), namely: N IC(r) = b(r − c)2 ,

b ̸= 0

N IC(r) = br2 1r≥0 + cr2 1r 0, 0 < β0 < 1, γ0 > 0, ϕ0 ∈ Φ where Φ is an open set that collects all the possible values of ϕ’s such that Ht (ϕ) meets Assumption 3.2, and the parameter space is defined as Θ={(α, β, γ, ϕ) : α > 0, 0 < β < 1, γ > 0, ϕ ∈ Φ}. Furthermore, we assume that there exists a well-chosen convex compact subset C satisfying θ0 ∈ C 0 ⊂ C ⊂ Θ0 . These conditions, together with the prediction equation can yield the identification of the ‘true’ parameter θ0 from model (2.1), which is stated in the following theorem: Theorem 4.1 Suppose that returns satisfy Assumption 4.1 and Er4 < ∞. Under Scenario 1 (i.e., Assumption 3.3), ( )2 θ0 = arg min E RVt − Vt|t−1 (θ)

(4.16)

θ∈C

2 Theorem 4.1 indicates that ∥RVt − Vt|t−1 (θ)∥ is not only minimized at ∥RVt − σt|t−1 ∥, but 2 one can also extract θ0 from σt|t−1 . Hence it gives us a way to estimate θ0 , which is called the minimum-distance-based estimation: T )2 1 ∑( mdrv ˜ ˜ θT = arg min RVt − Vt (θ) θ∈C T t=1

(4.17)

based on observations {r1/m , r2/m , . . . , r1 , . . . , rT −1+1/m , . . . , rT }, where V˜t are defined recursively by V˜t (θ) = α + β V˜t−1 (θ) + γHt−1 (ϕ), t ≥ 1

and

V˜0 = v

(4.18)

and v is any arbitrary deterministic value. Under the following additional assumption on the return process: Assumption 4.2 There exists a v > 0 such that Er8+4v < ∞ and the associated strong ∑ v/(2+v) < ∞. mixing coefficient α(k) satisfies ∞ k=0 α(k) the estimator θ˜Tmdrv derived in (4.17) for Scenario 1 is asymptotically normal, as shown below: Theorem 4.2 (Scenario 1) Suppose that Er4 < ∞ and Assumption 4.1 holds. θ˜Tmdrv is a

12

strongly consistent estimator of θ0 . Under the additional Assumption 4.2, we have √

T (θ˜Tmdrv − θ0 ) =⇒ N (0, (Σmd )−1 Ωmdrv (Σmd )−1 ),

√ ′ where 0 < Σmd =E∇Vt|t−1 (θ0 )(∇Vt|t−1 (θ0 )) < ∞, 0 ≤ Ωmdrv =limT →∞ 1/4 var( T ∇OT (θ0 )) )2 ∑ ( < ∞ and OT (θ) = T1 Tt=1 RVt − Vt|t−1 (θ) .

4.2

Estimation under Scenario 2

The conclusion in Theorem 4.1 can be augmented under Scenario 2: Theorem 4.3 Suppose that returns satisfy Assumption 4.1 and Er4 < ∞. Under Scenario 2 (i.e. Assumption 3.4): ( )2 ( )2 θ0 = arg min E RVt − Vt|t−1 (θ) = arg min E Rt2 − Vt|t−1 (θ) . θ∈C

(4.19)

θ∈C

One potential drawback of this approach is that Er4 needs to be finite. We can get around this restriction by using condition (3.6) directly and constructing an objective function as the one shown below: Theorem 4.4 Suppose that Er2 < ∞ and Assumption 4.1 holds. Under Scenario 2 (i.e. Assumption 3.4): (

Rt2 θ0 = arg min E log Vt|t−1 (θ) + θ∈C Vt|t−1 (θ)

)

(

RVt = arg min E log Vt|t−1 (θ) + θ∈C Vt|t−1 (θ)

) (4.20)

Based on Theorem 4.3 and Theorem 4.4 we can proceed to construct the parameter estimation. Equality (4.19) yields the minimum-distance-based estimate: T )2 1 ∑( mdrv ˜ = arg min θT RVt − V˜t (θ) θ∈C T t=1

(4.21)

T 1 ∑ ( 2 ˜ )2 mdr2 ˜ = arg min Rt − Vt (θ) θT θ∈C T t=1

(4.22)

13

Equality (4.20) implies a so-called likelihood-based estimation: ) T ( 1∑ Rt2 lhr2 ˜ ˜ θT = arg min log Vt (θ) + θ∈C T V˜t (θ) t=1

(4.23)

) T ( 1∑ RVt lhrv ˜ ˜ θT = arg min log Vt (θ) + θ∈C T V˜t (θ) t=1

(4.24)

where V˜t are defined recursively by V˜t (θ) = α + β V˜t−1 (θ) + γHt−1 (ϕ), t ≥ 1

and

V˜0 = v

(4.25)

and v is any arbitrary deterministic value. The objective function appearing in (4.23) looks similar to the joint quasi-log-likelihood function (modulo a constant) of {R1 , R2 , R3 , . . . , RT }. However, instead of conditioning on Rt2 R1 , R2 , . . ., Rt−1 , log V˜t (θ) + V˜ (θ) is the log quasi likelihood function of Rt conditioned on a t finer set, the sigma field generated by the high frequency returns up to time t − 1. Next we will study the asymptotic properties of θ˜Tmdrv , θ˜Tmdr2 , θ˜Tlhrv , θ˜Tlhr2 under Scenario 2. Theorem 4.5 (Scenario 2) Suppose that Er4 < ∞ and Assumption 4.1 holds. θ˜Tmdrv and θ˜Tmdr2 are strongly consistent estimates of θ0 . If further assume that Er8 < ∞, √

( ) T (θ˜Tmdrv − θ0 ) =⇒ N 0, (Σmd )−1 Ωmdrv (Σmd )−1

√

( ) T (θ˜Tmdr2 − θ0 ) =⇒ N 0, (Σmd )−1 Ωmdr2 (Σmd )−1

where ′

0 < Σmd = E[∇Vt|t−1 (θ0 )∇Vt|t−1 (θ0 ) ] < ∞,

(4.26) ′

0 < Ωmdrv = E[(RVt − Vt|t−1 (θ0 ))2 ∇Vt|t−1 (θ0 )∇Vt|t−1 (θ0 ) ] < ∞, ′

0 < Ωmdr2 = E[(Rt2 − Vt|t−1 (θ0 ))2 ∇Vt|t−1 (θ0 )∇Vt|t−1 (θ0 ) ] < ∞

(4.27) (4.28)

Theorem 4.6 (Scenario 2) Suppose that Er2 < ∞, and Assumption 4.1 holds. θ˜Tlhr2 and θ˜Tlhrv are strongly consistent estimates of θ0 . Under the additional assumption E(r4 ) < ∞, √

( ) T (θ˜Tlhr2 − θ0 ) =⇒ N 0, (Σlh )−1 Ωlhr2 (Σlh )−1 14

√

( ) T (θ˜Tlhrv − θ0 ) =⇒ N 0, (Σlh )−1 Ωlhrv (Σlh )−1

where

(

−2 Vt|t−1 (θ0 )∇Vt|t−1 (θ0 )∇Vt|t−1 (θ0 )′

)

0 0. Hence with assumptions in Theorem 4.5 and Theorem 4.6, θ˜Tmdrv (or θ˜Tlhrv ) has a smaller asymptotic variance than θ˜Tmdr2 (or θ˜Tlhr2 ).

4.3

Estimation corrected for microstructure noise

When we apply the HYBRID GARCH model to the real data, a common practice is to replace the (true) log-return r appearing in the formula by what we observe. Question arises when we try to fit ultra high frequency data since the observed log return is contaminated by market microstructure noise. In this subsection, we will study the impact of market microstructure on our model estimation and how to minimize this impact. We use r to denote the log return based on the efficient log-price P , while the observed 15

log-price is labeled as P ∗ and r∗ is the observed log return. Let RVt∗ be proxy for RVt in ∗ terms of r∗ . Vt|t−1 is Vt|t−1 with H(ϕ, ⃗rt ) replaced by H(ϕ, ⃗rt∗ ). ∗ ∗ Clearly, the θ1 which is able to minimize ∥RVt+1 − Vt+1|t (θ)∥ over Θ may not agree with θ0 which minimizes ∥RVt+1 − Vt+1|t (θ)∥. However, this won’t affect volatility forecasting ∗ ∗ is close to RVt+1 , in other words, limm→∞ RVt+1 − RVt+1 = 0 in so long as RVt+1 probability. Therefore one may use sub-sampling (see Zhang, Mykland, and Ait-Sahalia (2005) and A¨ıt-Sahalia, Mykland, and Zhang (2006)), or the realized kernel (see BarndorffNielsen, Hansen, Lunde, and Shephard (2008a) and Barndorff-Nielsen, Hansen, Lunde, and Shephard (2008b)), or pre-averaging (see Jacod, Li, Mykland, Podolskij, and Vetter (2008)) ∗ to construct RVt+1 . ∗ Next we will study the connection between θ0 and θ1 , Vt+1|t (θ0 ) and Vt+1|t (θ1 ). Note that ∗ ∗ PL (RVt+1 |It ) = Vt+1|t (θ0 ) and PL (RVt+1 |It∗ )=Vt+1|t (θ1 ) where It∗ is It with r replaced by r∗ . Due to the following proposition

∗ − RVt+1 |It ) = 0 and PL (H(ϕ, ⃗rt∗ ) − H(ϕ, ⃗rt )|It ) = 0 for any Proposition 4.1 If PL (RVt+1 ϕ ∈ Φ, then ∗ ∗ PL (RVt+1 − Vt+1|t (θ)|It ) = 0 ⇐⇒ θ = θ0 . (4.32) ∗ ∗ θ0 can be identified by projecting RVt+1 − Vt+1|t (θ) to It . (4.32) implies that

[ ] ∗ ∗ E (RVt+1 − Vt+1|t (θ))g = 0, ∀g ∈ It

⇐⇒

θ = θ0

(4.33)

Therefore, one can estimate θ0 via generalized method of moments. Proposition 4.1 also ∗ ∗ indicates that PL (Vt+1|t (θ1 ) − Vt+1|t (θ0 )|It )= PL (Vt+1|t (θ1 ) − Vt+1|t (θ0 )|It ), and EVt+1|t (θ1 ) = EVt+1|t (θ0 ). ∗ Finally we show the existence of RVt+1 and H(ϕ, ·) that satisfy the assumptions of Proposition 4.1. ∗ Suppose that the observed log prices are {Pt−j/m , j = 0, 1, . . . , m − 1, t = 1, 2, 3, . . .}2 , and ∗ Pt−j/m = Pt−j/m + ut−j/m where ut−j/m represents the market microstructure effect observed ∗ at time t − j/m. Hence rt−j/m = rt−j/m + et−j/m where et−j/m = ut−j/m − ut−j/m−1/m . ∗ as the two-scale realized volatility of Zhang, Mykland, and Ait-Sahalia Consider RVt+1 2

The prices would not be observed at equidistant time points, however, one can use previous-tick method of Wasserfallen and Zimmermann (1985) to construct an equidistant observed price process.

16

(2005), which is defined as ∗ RVt+1

K−1 m 1 ∑ ∗ ∗ (k) m−K +1∑ ∗ ≡ [P , P ]t,t+1 − (rt+j/m )2 K k=0 Km j=1

(4.34)

∑ (k) ∗ ∗ )2 for k = 0, 1, 2 . . . , K − 1. K − Pt+(jK+k−K)/m where [P ∗ , P ∗ ]t,t+1 = ⌊(m−k)/K⌋ (Pt+(jK+k)/m j=1 is a positive integer3 and K = O(m2/3 ). We use realized kernel of Barndorff-Nielsen, Hansen, Lunde, and Shephard (2010) to ˜ ˜ ∑h−1 construct H(ϕ, ⃗rt ). Define xt−j/m = rt−j/m for j = 1, 2, . . . , m − 2 and xt = 1/h j=0 rt−j/m , ˜ ˜ ∑h−1 ˜ and h ˜ = o(m1/5 )4 . x∗ xt−1+1/m =1/h rt−1+1/m+j/m for some h is defined in the same j=0

t−j/m

way as xt−j/m with r replaced by r∗ . Define H(ϕ, ⃗rt ) =

m−1 ∑

kh γh (⃗xt ),

H(ϕ, ⃗rt∗ )

=

m−1 ∑

kh γh (⃗x∗t ),

(4.35)

h=−m−1

h=−(m−1)

∑m−1−|h| ∑m−1−|h| ∗ xt−j/m xt−j/m−|h|/m , γh (⃗x∗t ) = xt−j/m x∗t−j/m−|h|/m . kh is where γh (⃗xt ) = j=0 j=0 the Paren kernel. H(ϕ, ·) defined above is nonnegative. Suppose that P is generated from a Brownian semimartingale dPs = as ds + vs dWs where a is a predictable locally bounded drift, v is a càdlàg volatility process which is also locally bounded, and W is Wiener process. Assume P is independent of U , and U is iid white noise. On( one hand, the work) of Zhang, Mykland, ( and Ait-Sahalia (2005) ) shows that ∫ t+1 2 ∫ t+1 2 ∗ −1/6 PL RVt+1 − t vu du|It+1 = Op (m ) and PL RVt+1 − t vu du|It+1 = Op (m−1/2 ). ∗ Due to the iterated expectation < RVt+1 − RVt+1 , g > = o(1) for any g ∈ It . On the other hand, Barndorff-Nielsen, Hansen, Lunde, and Shephard (2010) indicates that H(ϕ, ⃗rt∗ ) − H(ϕ, ⃗rt )=op (1), and hence < H(ϕ, ⃗rt∗ ) − H(ϕ, ⃗rt ), g >= o(1). Therefore, for sufficiently ∗ large m which is exactly the case for ultra high frequency data, PL (RVt+1 − RVt+1 |It ) and ∗ defined in (4.34) and H defined in PL (H(ϕ, ⃗rt∗ ) − H(ϕ, ⃗rt )|It ) will be close to 0 with RVt+1 (4.35). 3

We refer the readers to Zhang, Mykland, and Ait-Sahalia (2005) for the optimal choice of K. ˜ see Barndorff-Nielsen, Hansen, Lunde, and Shephard (2009) and BarndorffFor the optimal choice of h, Nielsen, Hansen, Lunde, and Shephard (2010). 4

17

4.4

Parametric estimate of the structural parameters

Inspired by the Multiplicative Error Model of Engle (2002) and the subsequent work of Engle and Gallo (2006), Lanne (2006), Cipollini, Engle, and Gallo (2006), we consider the following model 2 εt+1 RVt+1 = σt+1|t

(4.36)

where conditional on Ft , εt+1 is independent and identically distributed with mean 1. Suppose the cumulative distribution function of ε is F . The choice of F could be a unit exponential (see Engle (2002)), or a gamma distribution as suggested in Engle and Gallo (2006), or a mixture of two gamma distributions of Lanne (2006). Denote the solution to 2 equation (2.1) as Vt+1|t (θ). Suppose that there exists θ0 ∈ Θ such that σt+1|t = Vt+1|t (θ0 ). Therefore, we can use the marginal distribution of ε to formulate the estimation of θ0 . The estimator is then denoted by θ˜Tmem . Suppose that the appropriate probability distribution for the error term is a gamma distribution, i.e., the conditional density of εt+1 is f (εt+1 |Ft ) =

1 g g−1 g ε exp(−gεt+1 ). Γ(g) t+1

2 The parameter space becomes Θ × {g > 0}. Hence E(RVt+1 |Ft )=σt+1|t , and 4 V ar(RVt+1 |Ft )=σt+1|t /g. As pointed out by Engle and Gallo (2006) and Cipollini, Engle, and Gallo (2006), the estimation of θ0 and g are asymptotically independent, and the point estimation of θ0 , θ˜Tmem , is same as θ˜Tlhrv . Hence we have

√

( ) T (θ˜Tmem − θ0 ) =⇒ N 0, (gΣlh )−1 ,

which follows from the proof of θ˜Tlhrv directly. If the existence of an appropriate parametric density function can not be verified, one can consider quasi-likelihood estimation which yields the same results as the likelihood-based estimation discussed in section 4.2.

18

5

Simulation study

In this section, we perform several simulation studies which help understand the small-sample distributional properties of the estimators discussed in section 4, investigate the volatility forecast produced by the HYBRID GARCH model by comparing with other competitive approaches, and look into two special formulations of the HYBRID process – RV (i.e., assign the same weights to intra-daily returns) versus MIDAS filter (i.e., assign different weights to different returns) – and show the role played by weights in improving the prediction of relative low frequency volatility using the high frequency data.

5.1

Aggregation bias

Consider the following strong GARCH(1,1): rs+1 =

√

vs+1|s εs+1 ,

vs+1|s = a + bvs|s−1 + crs2 ,

iid

εs+1 ∼ N (0, 1)

(5.37)

as the DGP, where a = 2.8E − 06, b = 0.9770, c = 0.0225 are taken from Meddahi and Renault (2004). The structural HYBRID GARCH process (3.12) therefore have m m(1−b)−cd(m) structural parameters α = a 1−b , β = bm , γ = cd(m) where d(m) = 1−b 1−(b+c) (1 − (b + c)m )/(1 − (b + c)). It is easy to check that the high frequency return has finite 8th moment(see Bollerslev (1986)) and it satisfies Assumption (4.1). Hence the assumptions in Theorem 4.5 and Theorem 4.6 are met. We want to first assess via simulation study how far the estimators deviate from the true values using the four estimation algorithms described before, how the aggregation frequency m impacts the accuracy of the estimation, and the speed of convergence to the true values. We consider 1000 replications of the sample path (3.12), each having the first 1000 observations burn-in and consisting of 500, 1000 and 2500 observations left in the sample. The aggregation frequency m varies from 1, to 5, 22, 78 and 288. Table 1, Table 2, Table 3, and Table 4 report the simulation results from the four estimation methods. When sample size increases, the mean squared error M SE is decreasing for any sampling frequency m. However a bigger m results in a slower convergence rate. Consistent with Corollary 4.1, the RV-based estimators have a smaller MSE than the R2 -based estimators. Particularly, the likelihood-based estimator based on RV (i.e., θ˜Tlhrv ) outperforms the other three estimators in terms of the scale of MSE and convergence speed. θ˜Tmdr2 is the worst. When m is small 19

(see m=1,5,78) θ˜Tmdrv performs well too.

5.2

Forecast comparison

This subsection is concerned with comparing the out-of-sample volatility prediction from various models, and we are particularly interested in the forecast performance of the weekly HYBRID GARCH with daily data, versus weekly GARCH with weekly returns (standard) and daily GARCH that is iterated forward one week. The daily return is generated from model (5.37). Based on that, we have 1) the weekly HYBRID GARCH: Vt+1|t = α + βVt|t−1 + γ

4 ∑

2 β j/5 rt−j/5

(5.38)

j=0

where α = 0.00007, β = 0.89020, and γ = 0.11239. We consider two parameter estimation: θ˜mdrv and θ˜lhrv (that is line m = 5 in Table 1 and Table 3). T

T

2) the weekly (weak) GARCH with weekly returns: Rt+1 =

√ A Vt+1|t εA t+1 ,

A A Vt+1|t = aA + bA Vt|t−1 + cA Rt2

(5.39)

∑ 2 where Rt+1 = 4j=0 rt+1−j/5 and r is generated from (5.37) (note that s in model (5.37) is measured on daily frequency, while t in model (5.39) is referred to as weekly frequency.) According to Drost and Nijman (1993), we have aA = 0.00007, bA = 0.95054, and cA = 0.04696. We consider two approaches to estimate parameters: 1) the minimum-distancebased estimator, and 2) the likelihood-based estimator. For either case, the ‘r2’ and ‘rv’ specifications are same since it can be treated as a HYBRID GARCH model with m=1. We consider 1000 trials in the simulation study. In each trial we drop the first 1000 observations and obtain a sample path of T + N observations (number of weeks). We use the first T =1000 observations to estimate the parameters in model (5.38) and model (5.39). The rest of the data (N =500) are reserved for out-of-sample forecast comparison. ˜ γ˜ for model (5.38) and aÃ , ˜bA , cÃ for model (5.39), Given the estimation, i.e., α ˜ , β, we construct the one-step ahead forecasts of the weekly volatility based on the weekly A HYBRID GARCH (5.38) and weekly GARCH (5.39), denoted by V˜t+1|t and V˜t+1|t respectively 20

(t = T, T + 1, . . . , T + N − 1). Meanwhile, we also consider the weekly volatility forecast from the daily GARCH (5.37) that is iterated forward one week. We use the daily returns from the first T weeks to estimate the parameters in model (5.37) and then proceed to construct the weekly volatility forecast as a sum of one-day-ahead, two-day-ahead, up to five-day-ahead forecasts of daily volatility with forecast origin the end of previous week. Namely, I V˜t+1|t =

5 ∑

vt+j/5|t ,

t = T, T + 1, . . . , T + N − 1

(5.40)

j=1

where vt+j/5|t satisfies equation (3.2). The parameters are estimated via both minimumdistance-based estimation and likelihood estimation. To evaluate the forecast accuracy, we consider two loss functions discussed in Patton (2009), namely, L1 =MSE and L2 =QLIKE. The proxy for the true weekly volatility is chosen as the ex post weekly realized volatility. The empirical losses across N observations are ˜ i ≡ 1/N ∑T +N −1 Li (V˜t+1|t , RVt+1 ), EL ˜ A ≡ 1/N ∑T +N −1 Li (V˜ A , RVt+1 ), computed as EL i t=T t=T t+1|t ∑T +N −1 I I ˜ ˜ and ELi ≡ 1/N t=T Li (Vt+1|t , RVt+1 ) (i = 1, 2) respectively. Set the empirical loss ˜ A ˜ from weekly HYBRID GARCH as a benchmark, we construct dA i = ELi − ELi and ˜ I − EL ˜ i , the deviance of empirical losses of the aggregated GARCH and iterated dIi = EL i GARCH from the HYBRID GARCH model. I Note that we obtain a value of dA i and a value of di in each run of the simulation. After 1000 iterations, their summary statistics, i.e., sample mean and sample standard deviation are ˜ i, reported in the first two panels of Table 5. Figure 1 and Figure 2 display the boxplots of EL ˜ A , EL ˜ I , dA and dI under the two criteria (i=1,2) with different estimation procedures. EL i i i i Plot (a) in Figure 1 and Figure 2 is a Boxplot of the empirical loss under MSE loss function. Plot (c) is under QLIKE loss criterion. Plot (b) and Plot (d) in Figure 1 and Figure 2 display the relative empirical loss of aggregated GARCH and iterated GARCH with respect to the HYBRID GARCH. The data and the plots indicate that over 75% of the iterations, the weekly HYBRID GARCH model produces less out-of-sample forecast error than the weekly GARCH. But the forecast from the weekly HYBRID is not significantly different from the one from the iterated GARCH, which is consistent with the theory.

Next we increase the aggregation frequency to 22, and check on the monthly HYBRID GARCH (with daily data), the monthly GARCH (with monthly returns) and daily GARCH 21

that is iterated forward one month. The simulation design is same as before. The numerical results are presented in the last two panels of Table 5 and Figure 3, Figure 4. The forecast error from monthly HYBRID GARCH is dominated by the one from monthly GARCH, which gives us a clear picture that monthly HYBRID GARCH outperforms monthly GARCH.

5.3

MIDAS correction to RV-driven HYBRID GARCH model

Last but not the least, we revisit the parameter-free HYBRID GARCH model – the HYBRID process driven by RV, and compare it with a HYBRID GARCH model driven by a MIDAS filter of high frequency returns to show that the MIDAS filter picks up the misspecification. We start with the following framework Vt+1|t = α + βVt|t−1 + γ1 RVt + γ2 M IDAS(⃗rt )

(5.41)

where the high frequency returns ⃗rt =(rt−1+1/m , rt−1+2/m , . . . , rt )T are generated from model ∑ 2 (5.37). M IDAS(⃗rt ) = m−1 j=0 bj (η)rt−j/m , and the weight bj (η) is parameterized by a lowdimensional function of η. The choice of the parameterizations of bj (η) is suggested by the work of Ghysels, Santa-Clara, and Valkanov (2006), Ghysels, Sinko, and Valkanov (2006), Ghysels, Rubia, and Valkanov (2009). The commonly used specifications are exponential weight, beta weight, linear weight, hyperbolic weight, and geometric weight, where the last one is equivalent to weights appearing in structure HYBRID process (3.12). Here for succinctness, we use the exponential Almon MIDAS filter. Since the generic setup (5.41) is not estimable (considering Assumption 3.2), we instead consider a HYBRID GARCH model as follows: m−1 ∑ 2 Vt+1|t = α + βVt|t−1 + γ bj (η)rt−j/m (5.42) j=0

where

exp{η1 (j/m) + η2 (j/m)2 } bj (η) = ∑m−1 . 2 k=0 exp{η1 (k/m) + η2 (k/m) }

(5.43)

∑ 2 and α > 0, 0 < β < 1, γ > 0, η1 , η2 ∈ R. It is easy to check that Ht (ϕ) = γ m−1 j=0 bj (η)rt−j/m (where ϕ=(η1 , η2 )T ) satisfies Assumption 3.2. When η1 = η2 = 0, bj (0)=1/m and Ht (0)=1/mRVt . Model (5.42) is driven by RV in that case. To separate the two cases explicitly, we use γ M IDAS to denote the γ appearing in (5.42) and γ RV the coefficient in front of the RV. Note that γ RV = γ M IDAS /m when η1 = η2 = 0. 22

Again we consider 1000 iterations in the simulation study, with DGP specified in (5.37). The parameters are estimated via (4.24), i.e., the likelihood-based estimation based on RV. Table 6 reports the summary statistics of the parameter estimation for the RV-driven HYBRID GARCH and MIDAS-driven HYBRID GARCH with m=5, m=22, and m=78 respectively. In each panel, the first row ‘mean’ is the sample mean of the point estimation for each parameter, and the second row gives the sample standard deviation of the point estimation. The third row ‘se’ reports the sample mean of the standard error of point estimation and its sample standard deviation is available in the following line. When m = 5 and 22, both η˜1 and η˜2 are non-significant, and the two models yield similar estimation on β and γ M IDAS (See γ˜ M IDAS =.5530 and 5 ∗ γ˜ RV =.5555 when m=5, and γ˜ M IDAS = 8.5812 and 22 ∗ γ˜ RV =8.7934 when m=22.). This implies that the MIDAS filter has a similar contribution to the RV component. However, when m gets larger, β˜ is smaller in ˜ both models and more weight is shifted to the HYBRID component. When m = 78, β=.1665 in the MIDAS-driven HYBRID GARCH model but it is still significant, while it becomes negligible in the RV-driven HYBRID GARCH model and hence all the weight is put on the RV component (compare γ˜ RV *78=78.741 vs γ˜ M IDAS =62.9466). All the parameters in the MIDAS-driven HYBRID GARCH model are significant or marginally significant except η˜2 and therefore it looks more realistic than the the RV-driven HYBRID GARCH. This is also supported by(Figure 5. Plot (a), ) Plot (b), and Plot (c) are the Boxplots of objective function ∑T RV Obj = t=1 log V˜t (θ) + V˜ (θ)t produced from the 1000 simulations for each model. Plot (d) t displays the boxplots of the difference between the MIDAS-driven HYBRID GARCH and the RV-driven HYBRID GARCH in each iteration with m=5, m=22, and m=78. Out of the 1000 trials, there are over 750 trials when the MIDAS-driven HYBRID GARCH model produces less Obj than the RV-driven HYBRID GARCH model and the proportion is even bigger when m=78.

6

Empirical evidence Incomplete

23

7

Conclusion Incomplete

References A¨ıt-Sahalia, Y., P. Mykland, and L. Zhang (2006): “Ultra high frequency volatility estimation with dependent microstructure noise,” Unpublished paper: Department of Economics, Princeton University. Aknouche, A., and A. Bibi (2009): “Quasi-maximum likelihood estimation of periodic GARCH and periodic ARMA-GARCH processes,” Journal of Time Series Analysis, 30(1), 19–46. Andersen, T., T. Bollerslev, and F. Diebold (2007): “Roughing it up: Including jump components in the measurement, modeling, and forecasting of return volatility,” The Review of Economics and Statistics, 89(4), 701–720. Andreou, E., E. Ghysels, and A. Kourtellos (2008): “Regression Models With Mixed Sampling Frequencies,” Discussion paper, Discussion Paper UNC and University of Cyprus. Barndorff-Nielsen, O., P. Hansen, A. Lunde, and N. Shephard (2008a): “Designing realized kernels to measure the ex postvariation of equity prices in the presence of noise,” Econometrica, 76(6), 1481–1536. Barndorff-Nielsen, O., P. Hansen, A. Lunde, and N. Shephard (2008b): “Subsampling realised kernels,” Econometrica. Barndorff-Nielsen, O., P. Hansen, A. Lunde, and N. Shephard (2009): “Realized kernels in practice: Trades and quotes,” Econometrics Journal, 12(3), 1–32. Barndorff-Nielsen, O., P. Hansen, A. Lunde, and N. Shephard (2010): “Multivariate realised kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading,” Review of Financial Studies. Bollerslev, T. (1986): “Generalized autoregressive conditional heteroskedasticity,” Journal of econometrics, 31(3), 307–327. Bollerslev, T., and J. M. Wooldridge (1992): “Quasi-Maximum Likelihood Estimation and Inference in Dynamic Models with Time-Varying Covariances,” Econometrics Reviews, 11(2), 143–172. Bougerol, P. (1993): “Kalman filtering with random coefficients and contractions,” SIAM Journal on Control and Optimization, 31, 942. Brockwell, P., and R. Davis (1991): Time series: theory and methods. Springer. Chen, X., and E. Ghysels (2008): “News - Good or Bad - and its Impact on Volatility Forecasts over Multiple Horizons,” Discussion paper, UNC.

24

Chen, X., E. Ghysels, and F. Wang (2009): “On the role of Intra-Daily Seasonality in HYBRID GARCH Models,” Discussion Paper UIC and UNC. Cipollini, F., R. Engle, and G. Gallo (2006): “Vector multiplicative error models: representation and inference,” . Davydov, Y. (1968): “On convergence of distributions generated by stationary stochastic processes,” Teoriya Veroyatnostei i ee Primeneniya, 13(4), 730–737. de Vilder, R., and M. Visser (2008): “Ranking and Combining Volatility Proxies for Garch and Stochastic Volatility Models,” MPRA paper 11001. Domowitz, I., and H. White (1982): “Misspecified models with dependent observations,” Journal of Econometrics, 20(1), 35–58. Drost, F., and T. Nijman (1993): “Temporal aggregation of GARCH processes,” Econometrica: Journal of the Econometric Society, pp. 909–927. Drost, F. C., and B. M. J. Werker (1996): “Closing the GARCH Gap: Continuous Time GARCH Modeling,” Journal of Econometrics, 74, 31–57. Engle, R. (2002): “New frontiers for ARCH models,” Journal of Applied Econometrics, 17(5), 425–446. Engle, R., and G. Gallo (2006): “A multiple indicators model for volatility using intra-daily data,” Journal of Econometrics, 131, 3–27. Engle, R. F., and V. Ng (1993): “Measuring and Testing the Impact of News on Volatility,” Journal of Finance, 48, 1749–1778. Francq, C., and J. Zako¨ıan (2000): “Estimating weak GARCH representations,” Econometric theory, 16(05), 692–728. Ghysels, E., A. Rubia, and R. Valkanov (2009): “Multi-Period Forecasts of Volatility: Direct, Iterated, and Mixed-Data Approaches,” Working paper, UNC. Ghysels, E., P. Santa-Clara, and R. Valkanov (2002): “The MIDAS touch: Mixed data sampling regression models,” Working paper, UNC and UCLA. (2006): “Predicting volatility: getting the most out of return data sampled at different frequencies,” Journal of Econometrics, 131, 59–95. Ghysels, E., A. Sinko, and R. Valkanov (2006): “MIDAS Regressions: Further Results and New Directions,” Econometric Reviews, 26, 53 – 90. Hansen, P., Z. Huang, and H. Shek (2010): “Realized GARCH: A Complete Model of Returns and Realized Measures of Volatility,” Working paper, Stanford University. Huang, X., and G. Tauchen (2005): “The relative contribution of jumps to total price variance,” Journal of Financial Econometrics, 3(4), 456. Ibragimov, I. (1962): “Some limit theorems for stationary processes,” Theory of Probability and its Applications, 7, 349.

25

Jacod, J., Y. Li, P. Mykland, M. Podolskij, and M. Vetter (2008): “Microstructure noise in the continuous case: the pre-averaging approach,” Stochastic Processes and their Applications. Lanne, M. (2006): “A mixture multiplicative error model for realized volatility,” Journal of Financial Econometrics, 4(4), 594. Meddahi, N., and E. Renault (2004): “Temporal aggregation of volatility models,” Journal of Econometrics, 119(2), 355–379. Nelson, D. (1992): “Filtering and forecasting with misspecified ARCH models I: Getting the right variance with the wrong model,” Journal of Econometrics, 60, 61–90. Nelson, D., and D. Foster (1995): “Filtering and forecasting with misspecified ARCH models II: Getting the right variance with the wrong model,” Journal of Econometrics, 67, 303–335. Patton, A. (2009): “Volatility forecast evaluation and comparison using imperfect volatility proxies,” . Shephard, N., and K. Sheppard (2009): “Realising the future: forecasting with high frequency based volatility (HEAVY) models,” Journal of Applied Econometrics (forthcoming). Straumann, D., and T. Mikosch (2006): “Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrence equations approach,” Annals of Statistics, 34(5), 2449. Visser, M. (2008): “Garch parameter estimation using high-frequency data,” MPRA paper 9076. Wasserfallen, W., and H. Zimmermann (1985): “The behavior of intra-daily exchange rates,” Journal of Banking & Finance, 9(1), 55–72. Zhang, L., P. Mykland, and Y. Ait-Sahalia (2005): “A tale of two time scales: Determining integrated volatility with noisy high-frequency data,” Journal of the American Statistical Association, 100(472), 1394–1411.

26

Technical Appendices A

Preliminary results

We first present some useful results in this section which will facilitate the proofs of the theorems appearing in the paper. In what follows, we use ∇ to denote the vector differential operator so that ∇f is the gradient (column vector) of scaler function f , and Hess(f ) the Hessian matrix of f , i.e., enti,j Hess(f ) = ∂i ∂j f where ∂k denotes the partial derivative with respect to the k th parameter in θ. Lemma A.1 Suppose that the HYBRID filtering H satisfies Assumption 3.2. For θ ∈ C, pT ∇Vt|t−1 (θ)=0 ∀ t ∈ Z almost surely implies that p ≡ 0. Proof of Lemma A.1: Consider Vt+1|t (θ) = α + βVt|t−1 (θ) + γHt (ϕ). ∇α + (∇β)Vt|t−1 (θ) + β(∇Vt|t−1 (θ)) + ∇(γHt (ϕ)), which implies

Note that ∇Vt+1|t (θ) =

0 = pT ∇α + pT (∇β)Vt|t−1 (θ) + pT ∇(γHt (ϕ)) a.s. Let p ≡ (p1 , p2 , p3 , p4 ) ∈ Rd , where p4 is of the same dimension as ϕ. The above equation becomes 0 = p1 + p2 Vt|t−1 (θ) + p3 Ht (ϕ) + p4 γ∇Ht (ϕ)

a.s.

Since p3 Ht (ϕ) + p4 γ∇Ht (ϕ) is a function of returns in period t, p2 =0. Hence we have 0 = p1 + p3 Ht (ϕ) + p4 γ∇Ht (ϕ) Assumption 3.2 implies that p1 = p3 = p4 = 0 (since γ > 0).

a.s.

Lemma A.2 Suppose that the HYBRID filtering H satisfies Assumption 3.2. Then for θ ∈ C, Vt|t−1 (θ) = Vt|t−1 (θ0 ) ∀ t ∈ Z almost surely if and only if θ = θ0 . Proof of Lemma A.2: Sufficiency is apparent. Now we check the necessity. An application of mean value theorem implies that 0 = Vt|t−1 (θ) − Vt|t−1 (θ0 ) = ∇Vt|t−1 (θ∗ )(θ − θ0 ) where θ∗ lies between θ and θ0 . Clearly θ∗ ∈ C, but θ∗ may depend on t. Lemma A.1 indicates that for fixed t and θ ∈ C, each component of ∇Vt|t−1 is linearly independent of each other, which when applying to ∇Vt|t−1 (θ∗ )(θ − θ0 ) = 0 yields θ = θ0 . Lemma A.3 Suppose that Er4 < ∞ and H satisfies Assumption 3.2. Under Assumption 4.1, ∂i Vt|t−1 (θ), ∂i ∂j Vt|t−1 (θ), ∂i ∂j ∂k Vt|t−1 (θ) are strictly stationary ergodic with finite second moment for θ ∈ C and i, j, k ∈ {1, 2, 3, . . . , d}.

27

Proof of Lemma A.3: Note that, with probability one, ∞

Vt|t−1 =

∑ α +γ β k Ht−1−k (ϕ). 1−β

(A.1)

k=0

Assumptions 4.1 and 3.2 imply that Ht , ∂i Ht , ∂i ∂j Ht , ∂i ∂j ∂k Ht are strictly stationary ergodic with finite second moment, and hence supθ∈C EHt2 , supθ∈C E(∂i Ht )2 , supθ∈C E(∂i ∂j Ht )2 , and supθ∈C E(∂i ∂j ∂k Ht )2 are bounded. ∑∞ ∑∞ ∑∞ k k Meanwhile, it is easy to check that k=0 ∂i (γβ k ), k=0 ∂i ∂j (γβ ) and k=0 ∂i ∂j ∂l (γβ ) are absolutely summable uniformly in θ ∈ C, which implies that, a.s.

∞ ∑

a.s.

k=0 ∞ ∑

∂i Vt|t−1 = ∂i (α/(1 − β)) +

∂i (γβ k Ht−1−k (ϕ))

∂i ∂j Vt|t−1 = ∂i ∂j (α/(1 − β)) +

∂i ∂j (γβ k Ht−1−k (ϕ))

k=0 ∞ ∑

a.s.

∂i ∂j ∂k Vt|t−1 = ∂i ∂j ∂k (α/(1 − β)) +

∂i ∂j ∂k (γβ k Ht−1−k (ϕ))

(A.2) (A.3) (A.4)

k=0

are strictly stationary ergodic with finite second moment.

B

Proofs of Theorem 4.1, Theorem 4.3, Theorem 4.4

Proof of Theorem 4.1: Note that ∥RVt − Vt|t−1 (θ)∥ ≤ ∥RVt − v∥ + ∥v − Vt|t−1 (θ)∥ for any v ∈ It−1 . Hence, we have ∥RVt − Vt|t−1 (θ)∥ ≤ ∥RVt − Vt|t−1 (θ0 )∥ + min ∥v − Vt|t−1 (θ)∥ v∈It−1

which implies that min ∥RVt − Vt|t−1 (θ)∥ ≤ ∥RVt − Vt|t−1 (θ0 )∥ + min min ∥v − Vt|t−1 (θ)∥ = ∥RVt − Vt|t−1 (θ0 )∥. θ∈C

θ∈C v∈It−1

Therefore, ∥RVt − Vt|t−1 (θ0 )∥ = minθ∈C ∥RVt − Vt|t−1 (θ)∥. Suppose that there exists θ1 ∈ C such that ∥RVt − Vt|t−1 (θ1 )∥ = minθ∈C ∥RVt − Vt|t−1 (θ)∥. Clearly Vt|t−1 (θ1 ) = Vt|t−1 (θ0 ), and hence θ1 = θ0 , which follows from Lemma A.2. The proof of Theorem 4.3 is same as that of Theorem 4.1. Proof of Theorem 4.4: It suffices to justify the first equality in equation (4.20). The proof of the second equality is same. Define Rt2 lt (θ) = log Vt|t−1 (θ) + . Vt|t−1 (θ) Under Assumption 4.1 and 0 < β < 1, 0 < EVt|t−1 (θ) < ∞ and E log Vt|t−1 (θ) < ∞. Note also that

28

lt (θ) ≥ log α, and lt (θ) ≤ log Vt|t−1 (θ) +

Rt2 α .

Hence E|lt (θ)| < ∞. Under Assumption 3.4

( ) ( ) Vt|t−1 (θ) Vt|t−1 (θ0 ) Vt|t−1 (θ0 ) Rt2 Rt2 E log + − =E − 1 − log ≥0 Vt|t−1 (θ0 ) Vt|t−1 (θ) Vt|t−1 (θ0 ) Vt|t−1 (θ) Vt|t−1 (θ) and the equality holds if and only if Vt|t−1 (θ) = Vt|t−1 (θ0 ), or θ = θ0 which is implied by Lemma A.2. Therefore, Elt (θ) is uniquely minimized at θ0 .

C

Proof of Theorem 4.2

∑T First introduce some notations: εt (θ) ≡ RVt − Vt|t−1 (θ), et (θ) ≡ RVt − V˜t (θ), OT (θ) ≡ 1/T t=1 ε2t (θ), ∑T QT (θ) ≡ 1/T t=1 e2t (θ). The proof of Theorem 4.2 is started with the property of θˆTmdrv ≡ arg minθ∈C OT (θ). Lemma C.1 Under Assumptions 3.3 and 4.1, θˆTmdrv converges to θ0 almost surely. Proof of Lemma C.1: The proof is an application of Theorem 2.2 of Domowitz and White (1982). OT (θ) converges to E(RVt − Vt|t−1 (θ))2 almost surely uniformly in θ. Theorem 4.1 implies that θ0 is the unique minimizer of E(RVt − Vt|t−1 (θ))2 and it is identifiable. Therefore, θˆTmdrv converges to θ0 almost surely. Lemma C.2 Suppose that Er4 < ∞. Under Assumptions 3.3 and 4.1, Hess(OT )(θ0 ) converges to 2Σmd ′ almost surely as T goes to infinity, where 0 < Σmd = E∇Vt|t−1 (θ0 )(∇Vt|t−1 (θ0 )) < ∞. ∑T Proof of Lemma C.2: Note that Hess(OT )(θ0 ) ≡ T1 t=1 Hess(ε2t )(θ0 ), and ∂i ∂j ε2t = 2εt ∂i ∂j εt + 2∂i εt ∂j εt . Since ∂i ∂j εt belongs to It−1 , E∂i ∂j ε2t (θ0 ) = 2E∂i εt (θ0 )∂j εt (θ0 ). Moreover, ∂i ∂j ε2t is strictly stationary ergodic with finite mean (due to Lemma A.3). It follows from ergodic theorem that Hess(OT )(θ0 ) ′ converges almost surely to EHess(ε2t )(θ0 ) = 2E∇εt (θ0 )(∇εt (θ0 )) < ∞, denoted by 2Σmd . Next, we need to show that Σmd is positive definite. Clearly, it is positive semi-definite. Suppose that there ′ ′ ′ exists p ∈ Rd5 such that p E∇εt (θ0 )(∇εt (θ0 )) p = 0. It is equivalent to p ∇εt (θ0 ) = 0 almost surely for all ′ t, or p ∇Vt|t−1 (θ0 ) = 0 almost surely. Lemma A.1 implies p ≡ 0. Therefore Σmd is positive definite. Lemma C.3 Suppose that Er4 < ∞. a.s. Hess(OT )(θ0 ) = 0.

Under Assumptions 3.3 and 4.1, limT →∞ Hess(OT )(θˆTmdrv ) −

Proof of Lemma C.3: It is equivalent to show that limT →∞ ∂i ∂j OT (θˆTmdrv ) − ∂i ∂j OT (θ0 ) = 0 almost surely for any i, j. Since limT →∞ θˆTmdrv − θ0 = 0 almost surely, it suffices to show that supθ∈C |∂i ∂j ∂k OT (θ)| ∑T is stochastically bounded for any i, j, k. Note that ∂i ∂j ∂k OT (θ) = T1 t=1 ∂i ∂j ∂k (ε2t )(θ). Lemma A.3 5

Suppose the dimension of the parameter space is d.

29

implies that ∂i ∂j ∂k OT (θ) converges to E∂i ∂j ∂k (ε2t )(θ) almost surely uniformly in θ ∈ C. In other words, for sufficiently large T , |∂i ∂j ∂k OT (θ)| < 1/2 + |E∂i ∂j ∂k (ε20 )(θ)| almost surely. Hence supθ∈C |∂i ∂j ∂k OT (θ)| < 1/2 + supθ∈C |E∂i ∂j ∂k (ε20 )(θ)| almost surely for large T and supθ∈C |E∂i ∂j ∂k (ε20 )(θ)| < ∞. √ Lemma C.4 Under Assumptions 3.3 and 4.2, T ∇OT (θ0 ) converges to N (0, 4Ωmdrv ) in distribution as √ T → ∞ where 0 ≤ Ωmdrv = limT →∞ 1/4var( T ∇OT (θ0 )) < ∞. √ ∑T ∑d Proof of Lemma C.4: Note that for p ∈ Rd , T pT ∇OT = √2T t=1 l=1 pl Yl,t where Yl,t = εt ∂l εt . Since ∑ ∞ α − γ j=0 β j Ht−1−j (ϕ), we can rewrite εt as εt (θ) = RVt − 1−β εt (θ) =

∞ ∑

˜ t−j (θ) cj (θ)H

j=0

˜ t (θ) = RVt − α/(1 − β), and cj (θ) = −γβ j−1 , H ˜ t−j (θ) = Ht−j (ϕ) for j ≥ 1. Hence where c0 (θ) = 1, H ∞ ∑

Yl,t =

˜ t−i ∂l (cj H ˜ t−j ) ci H

i,j=0

 =

∑

+

0≤i,j≤N

≡

(N ) Yl,t

+

∑

(N ) ξl,t

+

0≤i≤N,j>N

+

∑

 ˜ t−i ∂l (cj H ˜ t−j )  ci H

i>N,j≥0

(N ) ηl,t

for an arbitrary N > 0. Since EYl,t (θ0 ) = 0, we have √

T d ) 2 ∑ ∑ ( (N ) (N ) (N ) T pT ∇OT (θ0 ) = √ pl Yl,t (θ0 ) + ξl,t (θ0 ) + ηl,t (θ0 ) T t=1 l=1 T d ) 2 ∑ ∑ ( (N ) (N ) (N ) (N ) (N ) (N ) =√ pl Yl,t (θ0 ) − EYl,t (θ0 ) + ξl,t (θ0 ) − Eξl,t (θ0 ) + ηl,t (θ0 ) − Eηl,t (θ0 ) T t=1 l=1 T 2 ∑ (N ) (N ) (N ) =√ Yt (θ0 ) + ξt (θ0 ) + ηt (θ0 ) T t=1

∑d ∑d ∑d (N ) (N ) (N ) (N ) (N ) (N ) (N ) (N ) (N ) where Yt ≡ l=1 pl (Yl,t − EYl,t ), ξt ≡ l=1 pl (ξl,t − Eξl,t ), and ηt ≡ l=1 pl (ηl,t − Eηl,t ). (N ) (N ) (N ) (N ) (N ) (N ) Under Assumption 4.2, Yl,t , ξl,t , ηl,t are strictly stationary, so are Yt , ξt and ηt . The proof of the lemma is done in 3 steps: (N )

(N )

(N )

a b 1) Note that Yt is strictly stationary, E(Yt ) = 0. Yl,t is linear in {rt−i rt−j , i, j = 0, 1/m, 2/m . . . , N + (N ) 2+v 8+4v 1 − 1/m, a, b = 0, 1, 2} and Er < ∞, thus E|Yt | < ∞. (N )

Let Ftt12 ≡ σ(rs , t1 ≤ s ≤ t2 ) and Ftt12 (Y ) ≡ σ(Ys ∞ ∞ and Ft+n (Y ) ⊂ Ft+n−N −1+1/m , αY (n) ≡ sup

sup

t (Y ),B∈F ∞ (Y ) t∈Z A∈F∞ t+n

t t , t1 ≤ s ≤ t2 ). For any t and n > N , since F∞ (Y ) ⊂ F∞

|P (AB) − P (A)P (B)| ≤ α(n − N − 1)

30

and hence

∑∞ n=0

αY (n)v/(2+v) < ∞. Theorem 1.7 of Ibragimov (1962) implies that ∞ ∑

2 σN =

(N )

cov(Yt

T 1 ∑ (N ) 2 and √ Yt =⇒ N (0, σN ). T t=1

(N )

, Yt−k ) < ∞,

k=−∞

2) Next we need to show that limN →∞ supT V ar( √1T

∑T

(N ) ) t=1 ξt

(C.5)

= 0.

  T d T −1 ∑ ∑ ∑ 1 |k|  (N ) (N ) (N ) V ar( √ pl ps Cov(ξl,k , ξs,0 ) , ξt ) = ) (1 − T T t=1 l,s=1 k=−(T −1)

Since



and

Cov(ξl,k , ξs,0 ) = Cov  (N )

(N )

∑

˜ k−i ∂l (cj H ˜ k−j ), ci H

0≤i≤N,j>N

it suffices to show that lim

N →∞ (N )

where γl,s (k) = Cov

∑

 ˜ −i ∂s (cj H ˜ −j ) , ci H

0≤i≤N,j>N ∞ ∑

(N )

|γl,s (k)| = 0,

(C.6)

k=−∞

) ˜ −i ∂s (cj H ˜ −j ) for l, s = 1, . . . , d. ˜ k−i ∂l (cj H ˜ k−j ), ∑ c H c H i i i≥0,j>N i≥0,j>N

(∑

Note that ( ) ˜ k−i ∂l (cj H ˜ k−j ), H ˜ ′ ∂s (c ′ H ˜ ′) Cov H −i j −j ˜ k−i ∂l H ˜ k−j , H ˜ ′ ∂s H ˜ ′ ) + cj ∂s (c ′ )Cov(H ˜ k−i ∂l H ˜ k−j , H ˜ ′H ˜ ′) =cj cj ′ Cov(H −i −j j −i −j ˜ k−i H ˜ k−j , H ˜ ′ ∂s H ˜ ′ ) + ∂l (cj )∂s (c ′ )Cov(H ˜ k−i H ˜ k−j , H ˜ ′H ˜ ′ ). + ∂l (cj )cj ′ Cov(H −i −j j −i −j 2 2 ˜ t−i is linear in {1, rt−i−h/m , r2 ˜ ˜ ˜ ˜ Since H t−i−h/m , h = 0, 1, 2 . . . , m − 1}, (Ht−i Ht−j ) and (Ht−i ∂l Ht−j ) are a b linear in {rt−i−h/m rt−j−g/m , h, g = 0, 1, 2 . . . , m − 1, a, b = 0, 1, . . . , 4}. Assumptions 3.2 and 4.2 imply that

˜ t−i H ˜ t−j )2 < M1 , E(H

˜ t−i ∂l H ˜ t−j )2 < M1 E(H

for some positive constant M1 . Note also that one always find 0 < ρ < 1 and M2 > 0 such that |ci | < M2 ρi ′ ′ and |∂l ci | < M2 ρi for k ≥ 0. Therefore, for any i, j, i , j ≥ 0, ) ( ′ ′ ′ ˜ ′ ) ≤ 4M 4 ρi+i +j+j M1 . ˜ k−i ∂l (cj H ˜ k−j ), H ˜ ′ ∂s (c ′ H ci ci Cov H 2 −j j −i It implies that there exists a constant M3 > 0 such that ∑ i,i′ ≥0,j,j ′ >N

) ( ′ ˜ ′ ) ≤ M3 ρN ˜ k−i ∂l (cj H ˜ k−j ), H ˜ ′ ∂s (c ′ H ci ci Cov H −j j −i

31

(C.7)

For k > 2m(N + 1), ∑ i,i′ ≥0,j,j ′ >N

∑

( ) ′ ˜ k−j ), H ˜ ′ ∂s (c ′ H ˜ k−i ∂l (cj H ˜ ′ ) ci ci Cov H −i j −j

∑

∑

[k/(2m)] [k/(2m)]

≤

j=N +1 i′ ≥0,j ′ >N

i=0

∑

+

i>[k/(2m)],i′ ≥0,j,j ′ >N

∑

+

i,i′ ≥0,j>[k/(2m)],j ′ >N

( ) ′ ˜ k−i ∂l (cj H ˜ k−j ), H ˜ ′ ∂s (c ′ H ˜ ′ ) ci ci Cov H −i j −j

( ) ′ ˜ k−i ∂l (cj H ˜ k−j ), H ˜ ′ ∂s (c ′ H ˜ ′ ) ci ci Cov H −i j −j ) ( ′ ˜ k−i ∂l (cj H ˜ ′ ) ˜ k−j ), H ˜ ′ ∂s (c ′ H ci ci Cov H j −j −i

b ˜ t−i H ˜ t−j and H ˜ t−i ∂l H ˜ t−j are linear in {ra Since H t−i−h/m rt−j−g/m , h, g = 0, 1, 2 . . . , m − 1, a, b = 0, 1, 2}, ˜ t−i H ˜ t−j and H ˜ t−i ∂l H ˜ t−j are measurable with respect to F t−i H t−j−1+1/m ≡ σ(rt−i−h/m , h = 0, 1, . . . , m − 1, m, . . . , m(j + 1 − i) − 1) (here we assume i < j). The Davydov inequality (Lemma 2.1 of Davydov (1968)) ′ ′ implies that for 0 ≤ i ≤ [k/(2m)], N < j ≤ [k/(2m)], i ≥ 0, j > N , and k > 2m(N + 1),

˜ k−i ∂l H ˜ k−j , H ˜ ′ ∂s H ˜ ′ ) Cov(H −i −j v 1 1 ˜ ′ ∂s H ˜ ′ |2+v ) 2+v ˜ k−i ∂l H ˜ k−j |2+v ) 2+v (E|H (α([k/(2m)])) 2+v ≤12(E|H −i −j v

≤M4 (α([k/2m])) 2+v

for some M4 > 0.

˜ k−i ∂l H ˜ k−j , H ˜ ′H ˜ ′ ) , Cov(H ˜ k−i H ˜ k−j , H ˜ ′ ∂s H ˜ ′ ) , and Cov(H ˜ k−i H ˜ k−j , H ˜ ′H ˜ ′ ) Similarly, Cov(H −i −j −i −j −i −j v

are bounded by M4 (α([k/2m])) 2+v for a well-chosen M4 . Therefore, for 0 ≤ i ≤ [k/(2m)], N < j ≤ ′ ′ [k/(2m)], i ≥ 0, j > N , and k > 2m(N + 1), ( ) ′ ′ v ′ ˜ k−i ∂l (cj H ˜ k−j ), H ˜ ′ ∂s (c ′ H ˜ ′ ) ≤ 4M24 M4 ρi+i +j+j (α([k/2m])) 2+v . ci ci Cov H −i j −j For k > 2m(N + 1), ∑ i,i′ ≥0,j,j ′ >N

∑

( ) ′ ˜ k−i ∂l (cj H ˜ k−j ), H ˜ ′ ∂s (c ′ H ˜ ′ ) ci ci Cov H −i j −j ∑

∑

[k/(2m)] [k/(2m)]

≤

i=0

4M24 M3 ρi+i

′

+j+j

′

v

(α([k/2m])) 2+v

j=N +1 i′ ≥0,j ′ >N

+

∑

4M24 ρi+i

′

+j+j

∑

′

M1 +

i>[k/(2m)],i′ ≥0,j,j ′ >N

i,i′ ≥0,j>[k/(2m)],j ′ >N

v

≤ M5 (α([k/2m])) 2+v ρN + M5 ρ[k/(2m)]+N

for some M5 > 0

It follows that ∞ ∑ k=−∞

(N )

|γl,s (k)| =

∑ |k|≤2m(N +1)

(N )

|γl,s (k)| +

∑ |k|>2m(N +1)

32

(N )

|γl,s (k)|

4M24 ρi+i

′

+j+j

′

M1

∑

≤ (4m(N + 1) + 1)M3 ρN +

v

M4 (α([k/2m])) 2+v ρN + M5 ρ[k/(2m)]+N

|k|>2m(N +1)

−→ 0 as N −→ ∞ ∑T (N ) 3) Using the same argument, it can be shown that limN →∞ supT V ar( √1T t=1 ηt ) = 0. Therefore √ T limT →∞ V ar( T p ∇OT (θ0 )) exists and is finite. The proof in step 2 (considering N = 0) also implies ∑∞ that k=−∞ |Cl,s (k)| < ∞ where Cl,s (k) = Cov(Yl,t (θ0 ), Ys,t−k (θ0 )). Hence we have ∞ d ∑ ∑ √ lim V ar( T pT ∇OT (θ0 )) = 4 Cl,s (k) = 4pT Ωmdrv p.

T →∞

k=−∞ l,s=1

2 2 2 It can be easily shown that limN →∞ σN = pT Ωmdrv p, and hence limN →∞ exp(2σN t ) = exp(2pT Ωmdrv pt2 ). For any ϵ > 0, there exists N0 such that T T 1 ∑ (N0 ) 1 ∑ (N0 ) 2 2 T mdrv 2 √ √ | exp(2σN t ) − exp(2p Ω pt )| < ϵ, sup V ar( η ) < ϵ, sup V ar( ξt ) < ϵ t 0 T t=1 T t=1 T T

(1)

Let XT,N =

√2 T

∑T t=1

(N )

Yt

(2)

(θ0 ) and XT,N =

√2 T

∑T

(N ) (θ0 ) t=1 ξt

(N )

+ ηt

(θ0 ).

√ E exp(it T pT ∇OT (θ0 )) − exp(−2pT Ωmdrv pt2 ) (1) (2) = E exp(it(XT,N0 + XT,N0 )) − exp(−2pT Ωmdrv pt2 ) (1) (2) (2) = EeitXT ,N0 (1 + itXT,N0 + o(XT,N0 )) − exp(−2pT Ωmdrv pt2 ) ( ) (1) 2 2 2 2 T mdrv (2) (2) pt2 ≤|EeitXT ,N0 − e2σN0 t | + |e2σN0 t − e−2p Ω | + |t|E |XT,N0 | + o(|XT,N0 |) (1) √ 2 2 ≤|EeitXT ,N0 − e2σN0 t | + ϵ + 8|t| ϵ

√ √ hence limT →∞ E exp(it T pT ∇OT (θ0 )) − exp(−2pT Ωmdrv pt2 ) ≤ ϵ + 8|t| ϵ.

Therefore,

√

T ∇OT (θ0 )

converges to N (0, 4Ωmdrv ) in distribution.

Lemma C.5 Under Assumptions 3.3 and 4.2, √

T (θˆTmdrv − θ0 ) =⇒ N (0, (Σmd )−1 Ωmdrv (Σmd )−1 ),

√ ′ where Σmd = E∇Vt|t−1 (θ0 )(∇Vt|t−1 (θ0 )) and Ωmdrv = limT →∞ 1/4var( T ∇OT (θ0 )) < ∞. Proof of Lemma C.5: Note that −∇OT (θ0 ) = Hess(OT )(θT∗ )(θˆTmdrv − θ0 ) where θT∗ is between θ0 and θˆTmdrv . Hess(OT )(θ0 ) converges to Σmd almost surely and Σmd is positive definite. Lemma C.3 implies that Hess(OT )(θT∗ ) converges to Σmd almost surely and Hess(OT )(θT∗ ) is positive definite for sufficiently large

33

T . Hence we have, for large T , √

√ T (θˆTmdrv − θ0 ) = −(Hess(OT )(θT∗ ))−1 T ∇OT (θ0 )

) ( and it converges to N 0, (Σmd )−1 Ωmdrv (Σmd )−1 in distribution.

a.s.

Lemma C.6 Suppose that Er4 < ∞. Under Assumptions 3.3 and 4.1, limT →∞ supθ∈C |OT (θ) − QT (θ)| = √ a.s. 0, and limT →∞ supθ∈C T ∥∇OT (θ) − ∇QT (θ)∥ = 0. a.s.

Proof of Lemma C.6: (1) We first show that limT →∞ supθ∈C |OT (θ) − QT (θ)| = 0. Note that there exists κ > 1 such that a.s. lim κt sup |εt (θ) − et (θ)| = lim κt sup |Vt|t−1 (θ) − V˜t (θ)| = 0

t→∞

t→∞

θ∈C

θ∈C

according to Theorem 3.1 of Bougerol (1993) or Theorem 2.8 of Straumann and Mikosch (2006). In other words, ∀δ > 0, ∃ T0 > 0 such that κt supθ∈C |εt (θ) − et (θ)| < δ for t > T0 . Hence supθ∈C |e2t (θ) − ε2t (θ)| ≤ 2δκ−t supθ∈C |εt (θ)| + δ 2 κ−2t when t > T0 . Since under Assumption 4.1 E supθ∈C |εt (θ)| > 0 is bounded from above , E log supθ∈C |εt (θ)| is finite as well. Considering Lemma 2.1 of Straumann and Mikosch (2006), we have limt→∞ supθ∈C |e2t (θ) − ε2t (θ)| = 0 almost surely, and hence limT →∞ supθ∈C |OT (θ) − QT (θ)| ≤ ∑T limT →∞ T1 t=1 supθ∈C |e2t (θ) − ε2t (θ)| = 0 almost surely. √ (2) Secondly, we need to show limT →∞ supθ∈C T ∥∇OT (θ) − ∇QT (θ)∥ = 0 almost surely. ( ) ∑t−1 t k t ˜t (θ) = ∂i α 1−β t + ∂i (β t )v + γβ H (ϕ), and ∂ V Note that for t ≥ 1, V˜t (θ) = α 1−β + β v + t−k i k=0 1−β 1−β ∑∞ ∑t−1 k k ∂ (γβ H ∂ (γβ H (ϕ)). Note also that ∂ V (θ) = ∂ (α/(1 − β)) + t−1−k (ϕ)) (see Lemma t−k i t|t−1 i k=0 i k=0 i ˜ A.3). It is easy to check that both ∂i Vt+1|t (θ) and ∂i Vt (θ) satisfy ∂i Xt = ∂i α + (∂i β)Xt−1 + β(∂i Xt−1 ) + ∂i (γHt (ϕ)),

t ∈ Z+

(C.8)

for each i. Since under Assumption 4.1 the conditions of Proposition 6.1 of Straumann and Mikosch (2006) are met, then ∂i Vt|t−1 (θ) is the unique stationary ergodic solution to (C.8) and limt→∞ κt1 supθ∈C |∂i Vt|t−1 (θ) − a.s. ∂i V˜t (θ)| = 0 for some κ1 > 1, which implies a.s.

lim κt1 sup |∂i εt (θ) − ∂i et (θ)| = 0

t→∞

θ∈C

−t In other words, ∀δ > 0, ∃T0 > 0 such that supθ∈C |∂i εt (θ) − ∂i et (θ)| < κ−t 1 δ and supθ∈C |εt (θ) − et (θ)| < κ δ for t > T0 . Consequently,

√

t sup |εt (θ)∂i εt (θ) − et (θ)∂i et (θ)| ≤ δ

[ √

tκ

−2t

θ∈C

δ+

√

tκ

−t

sup |εt (θ)| + θ∈C

34

√

tκ−t 1

] sup |∂i εt (θ)| θ∈C

for t > T0 . Note that E supθ∈C |∂i εt (θ)| > 0 and E supθ∈C |εt (θ)| > 0 are bounded from above. We have √ limt→∞ t supθ∈C |εt (θ)∂i εt (θ) − et (θ)∂i et (θ)| = 0 almost surely. Therefore, lim sup

√

T →∞ θ∈Θ

T 1 ∑ T |∂i OT (θ) − ∂i QT (θ)| ≤ lim √ 2 sup |εt (θ)∂i εt (θ) − et (θ)∂i et (θ)| T →∞ T t=1 θ∈Θ

≤ lim

T →∞

2 supθ∈Θ |εT (θ)∂i εT (θ) − eT (θ)∂i eT (θ)| a.s. √ = 0 √ T − T −1

Proof of Theorem 4.2: It follows from Lemma C.6 that QT (θ) converges to Eε2t (θ) almost surely uniformly in θ ∈ C. Similar to the proof of Lemma C.1, we have θ˜Tmdrv converges to θ0 almost surely. Note that ∇OT (θ˜Tmdrv ) − ∇OT (θˆTmdrv ) = Hess(OT )(θT∗ )(θ˜Tmdrv − θˆTmdrv ) where θT∗ ∈ C is between θ˜Tmdrv and √ √ θˆTmdrv . T (∇OT (θ˜Tmdrv ) − ∇OT (θˆTmdrv )) = T (∇OT (θ˜Tmdrv ) − ∇QT (θ˜Tmdrv )) converges to 0 almost surely from Lemma C.6. Meanwhile, since limT →0 θT∗ = θ0 almost surely, an argument similar to the proof of Lemma C.3 yields that Hess(OT )(θT∗ ) converges to Σmd almost surely. That Σmd is positive definite implies that Hess(OT )(θT∗ ) will be positive definite as well for large T . Hence, √

T (θ˜Tmdrv − θˆTmdrv ) = (Hess(OT )(θT∗ ))−1 (∇OT (θ˜Tmdrv ) − ∇QT (θ˜Tmdrv ))

converges to 0 in probability, and it N (0, (Σmd )−1 Ωmdrv (Σmd )−1 ) in distribution.

D

further

implies

that

√

T (θ˜Tmdrv − θ0 )

converges

to


(I) First, we will show that θ˜Tmdrv is a strongly consistent estimator and ( ) N 0, (Σmd )−1 Ωmdrv (Σmd )−1 . This is done in four steps: 1. θˆTmdrv ≡ arg minθ∈C

1 T

∑T

t=1 (RVt

√

T (θ˜Tmdrv − θ0 ) =⇒

− Vt|t−1 (θ))2 converges to θ0 almost surely.

The proof is same as that of Lemma C.1. √ ( ) 2. T (θˆTmdrv − θ0 ) =⇒ N 0, (Σmd )−1 Ωmdrv (Σmd )−1 . The proof consists of three parts. ∑T (1) Let OT (θ) = T1 t=1 ε2t (θ), where εt (θ) = RVt − Vt|t−1 (θ). Because εt (θ0 ) is orthogonal to It−1 , E∂i ∂j ε2t (θ0 ) = 2E∂i εt (θ0 )∂j εt (θ0 ). Following the proofs of Lemma C.2 and Lemma C.3, we ′ a.s. have limT →∞ Hess(OT )(θˆTmdrv ) = Σmd where Σmd = E[∇Vt|t−1 (θ0 )∇Vt|t−1 (θ0 ) ] > 0, and hence Hess(OT )(θˆTmdrv ) is positive definite for sufficiently large T . (2) Yi,t (θ) ≡ εt (θ)∂i εt (θ) is a martingale difference sequence, and it is strictly stationary ergodic with

35

finite second moment (see Lemma A.3). Then, for any nonzero p ∈ Rd , √

T ′ 1 ∑ ′ T p ∇OT (θ0 ) = √ 2p Yt (θ0 ) T t=1

′

(D.9) ′

converges to N (0, 4p Ωmdrv p) in distribution where Ωmdrv = EYt (θ0 )Yt (θ0 ) = E[(RVt − ′ ′ ′ Vt|t−1 (θ0 ))2 ∇Vt|t−1 (θ0 )∇Vt|t−1 (θ0 ) ]. Note that p Ωmdrv p > 0 if and only if p ∇Vt|t−1 (θ0 ) ̸= 0 almost surely. Hence Ωmdrv is positive definite. √ √ (3) Note that for sufficiently large T , T (θˆTmdrv − θ0 ) = −(Hess(OT )(θT∗ ))−1 T ∇OT (θ0 ), where θT∗ ( ) is between θ0 and θˆTmdrv . It converges to N 0, (Σmd )−1 Ωmdrv (Σmd )−1 in distribution by Slutsky’s theorem. √ 3. limT →∞ T (θ˜Tmdrv − θˆTmdrv ) = 0 in probability. The proof is same as that of Theorem 4.2. √ √ The above discussion implies that θ˜Tmdrv converges to θ0 almost surely and T (θ˜Tmdrv − θ0 ) = T (θ˜Tmdrv − √ ( ) θˆTmdrv ) + T (θˆTmdrv − θ0 ) converges to N 0, (Σmd )−1 Ωmdrv (Σmd )−1 in distribution by Slutsky’s theorem. (II) The proof regarding θ˜Tmdr2 is same as that of θ˜Tmdrv in (I).

E


The proof regarding θ˜Tlhr2 is similar to that of θ˜Tlhrv . So we will focus on θ˜Tlhrv . ∑T RVt 1 ˆlhrv = arg minθ∈C LT (θ). Meanwhile, Let lt (θ) ≡ log Vt|t−1 + Vt|t−1 t=1 lt (θ). Define θT (θ) , and LT (θ) ≡ T ∑ ˜ T (θ) = 1 T ˜lt (θ). define ˜lt (θ) = log V˜t + RVt , and L ˜t (θ) V

T

t=1

Lemma E.1 Under assumptions 3.4 and 4.1, θˆTlhrv converges to θ0 almost surely. Proof of Lemma E.1: Due to assumption 4.1, {lt } is stationary ergodic. Theorem 4.4 implies that θ0 is ∑T the unique minimizer of L(θ) ≡ T1 t=1 Elt (θ) = El1 (θ) for θ ∈ C. Therefore, θ0 is identifiably unique (see Domowitz and White (1982)). Besides, LT (θ) converges to L(θ) almost surely uniformly for θ ∈ C because E supθ∈C |lt (θ)| < ∞. For sufficient large T , there exists a measurable function θˆT that attains the minimum of L(θ) within the compact set C and it converges to θ0 almost surely. Lemma E.2 Under assumptions 3.4)and 4.1, Hess(LT )(θ0 ) converges to Σlh almost surely where 0 < Σlh ≡ ( −2 E Vt|t−1 (θ0 )∇Vt|t−1 (θ0 )∇Vt|t−1 (θ0 )′ < ∞. Proof : Lemma A.3 implies that ∂i Vt|t−1 , ∂i ∂j Vt|t−1 exists, and they are strictly stationary ergodic and

36

square integrable. Note that ∂i ∂j LT =

) ( ) T T ( ∂i ∂j Vt|t−1 ∂i Vt|t−1 ∂j Vt|t−1 1∑ RVt 2RVt 1∑ 1− + −1 . ∂i ∂j lt = 2 T t=1 T t=1 Vt|t−1 Vt|t−1 Vt|t−1 Vt|t−1

( )2 ( )2 ( )2 −2 Since EVt|t−1 ∂i Vt|t−1 ∂j Vt|t−1 ≤ α−4 E ∂i Vt|t−1 E ∂j Vt|t−1 < ∞ and E∂i ∂j Vt|t−1 (θ0 ) < ∞, we have ( ) −2 E∂i ∂j lt (θ0 ) = E Vt|t−1 (θ0 )∂i Vt|t−1 (θ0 )∂j Vt|t−1 (θ0 )′ < ∞. It follows from Birkhoff’s ergodic theorem −2 that ∂i ∂j LT (θ0 ) converges to E[Vt|t−1 (θ0 )∂i Vt|t−1 (θ0 )∂j Vt|t−1 (θ0 )′ ] almost surely. Therefore, Hess(LT )(θ0 ) −2 converges to Σlh = E[Vt|t−1 (θ0 )∇Vt|t−1 (θ0 )∇Vt|t−1 (θ0 )′ ] almost surely. And Σlh is positive definite. This is ′ because p ∇Vt|t−1 (θ0 ) ̸= 0 almost surely for non-zero p ∈ Rd (see Lemma A.1).

Lemma E.3 Under assumptions 3.4 and 4.1 and Er4 < ∞, a.s. lim Hess(LT )(θˆTlhrv ) − Hess(LT )(θ0 ) = 0.

T →∞

Proof of Lemma E.3: The proof is similar to that of Lemma C.3. We only need to show that supθ∈C |∂i ∂j ∂k LT (θ)| is stochastically bounded. Since under Assumption 4.1 ∂i ∂j ∂k lt (θ) is strictly stationary ergodic with finite second moment (see Lemma A.3), then supθ∈C |∂i ∂j ∂k LT (θ)| < 1/2 + supθ∈C |E∂i ∂j ∂k l0 (θ)| < ∞ for sufficiently large T . Lemma E.4 Under assumptions 3.4 and 4.1 and Er4 < ∞, √

T ∇LT (θ0 ) =⇒ N (0, Ωlhrv )

( ) −4 where Ωlhrv = E Vt|t−1 (θ0 )(RVt − Vt|t−1 (θ0 ))2 ∇Vt|t−1 (θ0 )∇Vt|t−1 (θ0 )′ is positive definite. Proof of Lemma E.4: Note that ∇LT =

) T T ( 1∑ 1∑ RVt 1 ∇lt = 1− ∇Vt|t−1 . T t=1 T t=1 Vt|t−1 Vt|t−1

Since Ert4 < ∞, EVt|t−1 < ∞, E(∂i Vt|t−1 (θ0 ))2 < ∞, and E(RVt |Ft−1 ) = Vt|t−1 (θ0 ), then {∂i lt (θ0 ), t ∈ Z} is a martingale difference sequence with finite second moment. It is strictly stationary ergodic in that it is a √ measurable function of a strictly stationary ergodic sequence. Therefore, for nonzero p ∈ Rd , T p′ ∇LT (θ0 ) = ∑T ′ ′ lhrv √1 p) in distribution by the martingale central limit theorem, where t=1 p ∇lt (θ0 ) converges to N (0, p Ω T ) ( −4 Ωlhrv = E Vt|t−1 (θ0 )(RVt − Vt|t−1 (θ0 ))2 ∇Vt|t−1 (θ0 )∇Vt|t−1 (θ0 )′ . Since p′ ∇Vt|t−1 (θ0 ) ̸= 0 almost surely, √ p′ Ωlhrv p > 0. In other words, Ωlhrv is positive definite. Therefore, T ∇LT (θ0 ) =⇒ N (0, Ωlhrv ). Lemma E.5 Under assumptions 3.4 and 4.1, a.s.

˜ T (θ)∥ = 0. lim sup ∥LT (θ) − L

T →∞ θ∈C

37

˜ T (θ) = 1 ∑T (lt (θ) − ˜lt (θ)). It suffices to show that Proof of Lemma E.5: Note that LT (θ) − L t=1 T limt→∞ supθ∈C ∥lt (θ) − ˜lt (θ)∥ = 0 almost surely. An application of the mean value theorem yields

1 1 ˜ ˜ ∥lt (θ) − lt (θ)∥ ≤ ∥ log Vt|t−1 (θ) − log Vt (θ)∥ + Vt|t−1 (θ) − V˜ (θ) ≤ (1/α + RVt /α2 )∥Vt|t−1 (θ) − V˜t (θ)∥. Thus t supθ∈C ∥lt (θ) − ˜lt (θ)∥ ≤ (1/αu + RVt /αu2 ) supθ∈C ∥Vt|t−1 (θ) − V˜t (θ)∥ where αu is the lower bound of α within the compact set C and αu > 0. Since E log+ RVt < ∞, limt→∞ sup ∥lt (θ) − ˜lt (θ)∥ = 0 almost surely by θ∈C

Lemma 2.1 of Straumann and Mikosch (2006).

Lemma E.6 Under assumptions 3.4 and 4.1, lim

T →∞

√

˜ T (θ)∥ a.s. T sup ∥∇LT (θ) − ∇L = 0. θ∈C

√ ˜ T (θ)∥ ≤ √1 ∑T supθ∈C ∥∂i lt (θ) − ∂i ˜lt (θ)∥ for each Proof of Lemma E.6: Since T supθ∈C ∥∂i LT (θ) − ∂i L t=1 T √ i, it suffices to show that limt→∞ t supθ∈C ∥∂i lt (θ) − ∂i ˜lt (θ)∥ = 0 almost surely. Note that ∂i lt − ∂i ˜lt =

( ) ( ) ∂i Vt|t−1 RVt RVt ∂i V˜t 1− − 1− . Vt|t−1 Vt|t−1 V˜t V˜t

Applying the mean value theorem to ∂i lt − ∂i ˜lt , we have ∥∂i lt − ∂i ˜lt ∥

∥yi (∂i V˜t − ∂i Vt|t−1 ) + ∂i Vt|t−1 ∥

2RVt − 1 ∥Vt|t−1 − V˜t ∥ + 1 − RVt 1 ∥∂i Vt|t−1 − ∂i V˜t ∥

2 x x x x ( ) ( ) ∥∂i V˜t − ∂i Vt|t−1 ∥ + ∥∂i Vt|t−1 ∥ 2RVt RVt 1 ˜ + 1 ∥Vt|t−1 − Vt ∥ + 1 + ∥∂i Vt|t−1 − ∂i V˜t ∥ ≤ αu2 αu αu αu ≤

where x is between Vt|t−1 and V˜t , 0 ≤ yi ≤ 1, and αu is the lower bound of α within the compact set C. Note that ( ) √ ∥∂i V˜t − ∂i Vt|t−1 ∥ 2RVt t + 1 ∥Vt|t−1 − V˜t ∥ αu2 αu ( ) √ −t −t −2 2RVt t t ˜ ˜ =κ1 ∥∂i Vt − ∂i Vt|t−1 ∥κ ∥Vt|t−1 − Vt ∥ tκ κ1 αu +1 αu ( ) ( ) √ −t ∥∂i Vt|t−1 ∥ 2RVt √ ∥∂i Vt|t−1 ∥ 2RVt t ˜ ˜ + 1 ∥V − V ∥ = κ ∥V − V ∥ + 1 t tκ t t t|t−1 t|t−1 αu2 αu αu2 αu ( ( ) ) √ √ RVt RVt 1 1 1 + t 1+ ∥∂i Vt|t−1 − ∂i V˜t ∥ = κt1 ∥∂i Vt|t−1 − ∂i V˜t ∥ tκ−t . 1 αu αu αu αu

(E.10)

(E.11) (E.12)

Since E log+ RVt < ∞, E log+ (supθ∈C ∥∂i Vt|t−1 (θ)∥RVt ) ≤ E log+ (supθ∈C ∥∂i Vt|t−1 (θ)∥) + E log+ RVt < ∞, a.s. a.s. and limt→∞ κt1 ∥∂i Vt|t−1 − ∂i V˜t ∥ = 0, limt→∞ κt ∥Vt|t−1 − V˜t ∥ = 0 (κ and κ1 are defined in the proof of √ Lemma C.6), we have limt→∞ t supθ∈C ∥∂i lt (θ) − ∂i ˜lt (θ)∥ = 0 almost surely. Proof of Theorem 4.6: Similar to the proof of Theorem 4.5, the above discussion implies that θ˜Tlhrv

38

√ converges to θ0 almost surely. Under the additional condition that Er4 < ∞, T (θˆTlhrv − θ0 ) converges √ to N (0, (Σlh )−1 Ωlhrv (Σlh )−1 ) in distribution and T (θˆTlhrv − θ˜Tlhrv ) converges to 0 in probability. Hence √ T (θ˜Tlhrv − θ0 ) converges to N (0, (Σlh )−1 Ωlhrv (Σlh )−1 ) in distribution.

F

Proof of Corollary 4.1

For easy exposition, we define ri,t ≡ rt−1+i/m for i = 1, 2, . . . , m. Note that Et−1 (Rt2 − RVt )(Rt2 + RVt ) ∑ ∑ =4Et−1 ( rk,t rl,t )( 1≤i