Variance Estimation With Applications. Paul Doukhan, Jérémie Jakubowicz, José Rafael León. Abstract. Self-normalized central limit theorems are important for ...
Variance Estimation With Applications Paul Doukhan, J´er´emie Jakubowicz, Jos´e Rafael Le´on Abstract Self-normalized central limit theorems are important for statistical purposes. A simple way to achieve them is to consider estimations of the limit variance; this expression writes as a complicated covariance series under weak dependence. Using an argument of Carlstein (1986), we work out this program for a new procedure, in the case of vector valued stationary sequences λ-weakly dependent, introduced by Doukhan & Louhichi (1999), rich in examples. Our estimator admits a limiting variance which is useful for technical purpose. Applications to linear models with dependent inputs, sea waves modelling and stochastic differential equations exhibit explicit examples for which such procedures will be proved to be useful through simulation studies. Key words: Variance estimation, Weak dependence. AMS Subject Classification: 60F17, 62G05, 62G09, 62M10.
Running title: Limiting variance estimation.
1
Introduction
In the case of times series, various extensions of the classical central limit theorem (CLT) have been proved. The seminal work by Rosennlatt (1956) [25] and Ibragimov (1962) both gave a very general approach to weak dependence and asymptotics under such conditions; Rosenblatt (1984)’s book [26] gives an important hints to this question and one also needs to mention as well as Dehling & Philipp (1982) [11]. Unlike the independent case, the limiting variance often has an intricate expression. In many weakly dependent cases the CLT writes n
1 X d √ Xk →n→∞ Nd (0, Σ), n
where Σ =
∞ X
EX0 Xk′ ,
k=−∞
k=1
where (Xt )t∈Z denotes an RD −valued and centered stationary random process satisfying convenient additional weak dependence assumptions. In order to make such CLTs directly useful to statisticians, one needs to provide self-normalized b of Σ. versions of such results, i.e. one needs an estimator Σ 1
It is well known that CLTs provide confidence sets for empirical estimators. E.g. if D = 1 an asymptotic 5%-confidence set for the mean writes s s b b Σ Σ X − 1.96 . , X + 1.96 n n 2 For independent and stationary real valued sequences the expression Pn of2Σ = EX0 1 is very simple and it may be empirically estimated by n k=1 Xk . In the dependent case, there are several strategies aiming at estimating Σ.
• One can use the fact that Σ can be simply recovered from the spectral density of (Xt )t∈Z . This is the approach of Bardet et al. (2008) in [3]. • Another way to estimate Σ relyies on the Donsker invariance principle. It consists in splitting the data into smaller blocks (Xt ) with t ranging P in an 1 interval Bi corresponding to Block i. Block i then yields √#B t∈Bi Xt i and the set of all blocks gives a sample from a distribution close to N (0, Σ) when the block size gets large according to the Donsker invariance principle. This is the approach of Carlstein (1986) [8] and Peligrad and Shao (1995) [21]. This is also our approach. Our work differs from [8] and [21] in two distinct ways. On the one hand, we show how using nonoverlapping blocks enables us to simply derive not only an b but also an estimator of Var (Σ). b This is essential if one wants to estimator Σ b On the other hand, our framework is not the obtain confidence intervals for Σ. one of mixing but of λ-weak dependence which encompasses a large range of applications (please refer to Dedecker et al. (2007) [12] and references therein for further details on weak dependence). The paper is organized as follows. Section 2 introduces our estimator and states the main results on subsampling including both a LLN and CLTs. Section 2.3 is devoted to elaborate further on weak dependence which underpins most of the proofs in section 2. Three applications pertaining to various domains are presented in section 3. The first application deals with the linear model with dependent inputs. The second application shows how to apply the estimator of section sec:subsampling in the context of sea waves which are modelled by a small nonlinear pertubation from a weak dependent Gaussian process, following Aza¨ıs et al. (2007) [2]. We conclude section 3 by the problem of the number of zero crossings for oscillators models. We chose these three applications because they exhibit intricate expressions for the limiting variance. However, using the very same simple procedure for all of them we are able to approximate the sought variance in a straightforward manner. The last section is devoted to proofs as well as results of independent interest. Subsection 4.1 relates moments of sums to the cumulants for stationary sequences. For this purpose, a new covariance inequality for products is needed, along with weak dependence properties of triangular arrays, related to Donsker theorem. 2
2
Subsampling
2.1
Preliminaries
Let (Xt ) be a centered RD -valued stationary process such that the following Donsker invariance principle holds: X 1 √ Xj → Z(t) − Z(s) n ns 2 such that kX0 kb = E kX0 k b < ∞. Let h : RDu → R denote an arbitrary function, and set: Lip h =
| h(y1 , . . . , yu ) − h(x1 , . . . , xu ) | , (y1 ,...,yu )6=(x1 ,...,xu ) ky1 − x1 k + · · · + kyu − xu k sup
u where xi , yi ∈ RD if 1 ≤ i ≤ u. For each u ≥ 1 we identify the sets RD and RDu . Now, let Λ denote theset of functions h : RDu → R for some u ∈ N such that Lip h < ∞ and Λ(1) = h ∈ Λ khk∞ ≤ 1 .
Definition 2 The sequence (Xn )n∈Z is λ-weakly dependent if there exists a sequence λ = (λ(r))r∈N decreasing to zero at infinity such that: Cov g1 (Xi1 , . . . , Xiu ), g2 (Xj1 , . . . , Xjv ) ≤ (uLip g1 + vLip g2 + uvLip g1 Lip g2 ) λ(r)
for any u-tuple (i1 , . . . , iu ) and any v-tuple (j1 , . . . , jv ) with i1 ≤ · · · ≤ iu < iu + r ≤ j1 ≤ · · · ≤ jv where g1 , g2 are two real functions of Λ(1) respectively defined on RDu and RDv (u, v ∈ N∗ ). Remark on weak dependence conditions. A variety of alternative conditions on the RHS may be also found in the literature. For instance, θ-weak dependence corresponds to the bound vLip g2 · θ(r) and η-weak dependence to (uLip g1 + vLip g2 )η(r). Various examples of weak dependent sequences with memory may be found in Dedecker et al. (2007) [12]; they are usually η-weak dependent excepted for causal cases where θ-dependence holds. Now λ-weak dependence both holds for η dependent systems since θ(r) ≤ η(r) ≤ λ(r) and for associated or Gaussian processes. This is the reason why we shall prefer 5
the wider first notion of λ-weak dependence suitable for most of the examples. The large variety of standard models λ-weak dependence encompasses makes it attractive, see [12] for precisions. We now derive both a Law of the Large Numbers for Fen and Fbn and a Central Limit Theorem for Fbn . Theorem 1 Consider F : RD → R and β > 0 such that,
• if β ≤ 1, then |F (x)| ≺ kxkβ + 1 and F is Lipschitz, or • if β > 1 then |F (x) − F (y)| ≤ ckx − yk(1 + kxkβ−1 + kykβ−1), for all x, y ∈ RD . Assume that the sequence (Xn )n∈Z is stationary kX0 kb < ∞ and λ−weakly dependent that 2β < q < b is an even integer and that, for positive constants λ > (q − 1)(b − 1)/(b − q), δ < γ, we assume that λ(t) ≺ t−λ , and we set m = nγ , ℓ = nδ and ζ = q/β − 2 > 0. Then; • LLN for Fen . Var Fen ≺ mn /n if moreover λ > 1 + L2
Fen
→n→∞
Fbn
→n→∞
• LLN for Fbn . Var Fbn ≺ mn /n if λ > L2
(β−1)+ +β , ζ
thus
EF (∆)
(β−1)+ ·(δ−γ)+βδ , ζδ
thus
EF (∆)
Almost convergence also holds for each of those estimators if moreover γ < 12 . • CLT for the estimator (6). If λδ > β then: Zn =
√ D N (Fbn − EFbn ) →n→∞ N (0, Var F (∆))
Remark. Due to the expression of the limiting variance (1) of Fen we did not work out a CLT for this estimator, indeed the goodness of fit procedure is much more difficult to handle for this estimator. Another CLT for Fbn . Coming back to our estimator Fbn , an alternative more applicable result involves √ Tn = N (Fbn − EF (∆)) 6
A CLT for this non centered quantity is convenient when estimating the parameter EF (∆). In order to proove such a CLT one can still reuse the Lindeberg technique with blocks. For some bounded and C 3 function f we bound: |E(f (Tn ) − f (Z))| ≤
|E(f (Tn ) − f (Zn ))| + |E(f (Zn ) − f (Z))| √ N kf ′ k∞ |EF (∆1,m ) − EF (∆)| + |E(f (Zn ) − f (Z))|
≤
for Gaussian rvs Zn ∼ N (0, Var Tn ) and Z ∼ N (0, Var F (∆)). The previous convergence relies on the decay rate of |EF (∆1,m ) − EF (∆)|. As we will prove, |EF (∆1,m ) − EF (∆)| = O (1/m) (8)
for F (x) = (x′ a)2 . The Lindeberg bound then tends to zero under the assumption limn→∞ N/m2 = 0.
In this case a self-normalized variant of Theorem 1 is obtained by setting bn = G
N (n) 1 X 2 F (∆i,m(n) ). N (n) i=1
(9)
This yields: Theorem 2 Assume that assumptions of Theorem 1 hold. Assume moreover + that Var F (∆) > 0, and λ > (β−1) ·(δ−γ)+2βδ , then ζδ Zn =
Remark on centering.
√ Fbn − EF (∆) →n→∞ N (0, 1) Nq b n − Fb 2 )+ (G n
The previous results deal with centered processes. As we shall see in the applications section, this is a restrictive assumption. Fortunately, there is an easy way to circumvent it. We can split the sample in two parts and use the first part to estimated the mean, then apply our variance estimation procedure on the second part.
2.4
On the function F .
The main goal of this paper is to estimate variances. Hence, function F defined by x ∈ Rd 7→ x · x′ is a natural candidate. However, for technical reasons, it turns out that it is interesting to broaden the scope of functions F to consider. In all cases, we need Σ to be a function of E[F (∆)]. Besides x ∈ Rd 7→ x · x′ , below are some examples of functions of interest. 7
• F (x) = x′ x yields the estimation of the complete covariance matrix Σ. The condition of theorem 1 hold with β = 2. The special case F (x) = (x′ a)2 , for some a ∈ RD is the simplest multi-dimensional one and also fits this condition. In this case, indeed setting ξt= Xt′ a,we still assume Eξt = 0 P and one may write EF (∆1,m ) = |s| 3 and F (x) =
3 X
θi Hi (x)
involves only 4 parameters.
i=0
11
−1.5
−0.5
0.5
1.5
n (θ − θ)
0
Figure 2:
√
200
400
600
n(θbn − θ) admits the limit variance
800
σξ 2 1−a2z σz 1−a2ξ
It is staightforward to compute EYt = θ0 , EYt2 = position also gives the higher order moments: EYt EYt2 EYt3
= g1 (θ) = g2 (θ) = g3 (θ)
EYt4
= g4 (θ)
P3
i=0
1000
a a
ξ z 1 + 2 1−a ξ az
= .6.
i!θi2 ,. Hermite decom-
= θ0 = θ02 + θ12 + 2θ22 + 6θ32 = θ03 + 3θ0 θ12 + 6θ12 θ2 + 6θ0 θ22 + 8θ23 + 36θ1 θ2 θ3 +18θ0 θ32 + 108θ2 θ32 = θ04 + 6θ02 θ12 + 3θ14 + 24θ0 θ12 θ2 + 12θ02 θ22 + 60θ12 θ22 + 32θ0 θ23 +60θ24 + 24θ13 θ3 + 144θ0 θ1 θ2 θ3 + 576θ1 θ22 θ3 + 36θ02 θ32 +252θ12 θ32 + 432θ0 θ2 θ32 + 2232θ22 θ32 + 1296θ1θ33 + 3348θ34
and g(θ) = (g1 (θ), g2 (θ), g3 (θ), g4 (θ)) is estimated by µn = (µi,n )1≤i≤4 with n
µi,n =
Set H =
∞ X
1X i Y , n j=1 j
i = 1, 2, 3, 4.
EZ0 Zj′ where Zj′ = (Yj , Yj2 , Yj3 , Yj4 )′ . Using again the Hermite
j=−∞
decomposition, along with suitable software, we get tedious explicit expressions for expansions of F j in terms of a matrix C(θ) = ci,j (θ) with 1 ≤ i ≤ 13, 1 ≤ j ≤ 4 (please refer to Appendix for the expansion of C(θ)) : F j (x)
=
3j X
ci+1,j (θ)Hi (x),
i=0
12
for j = 1, 2, 3, 4
Here ci,j (θ) is a polynomial with degree j in θ. Now denote by n n 12 1 X X 1 X Un (a) = a · √ αi √ Zs = Hi (Xs ), n s=1 n s=1 i=0
with αi =
4 X
∀a = (a1 , a2 , a3 , a4 ) ∈ R4
ak ci+1,k (θ)
k=1
We have the following limit for Var Un (a) : Var Un (a) →
12 X i=1
α2i
∞ X
i! rki
k=−∞
Lemma 1 Let C(θ) = (ci,j (θ)) 1 ≤ i ≤ 12 , and R the 12 × 12 diagonal matrix 1≤j≤4 ∞ X with entries Ri = i! rki then: k=−∞
H = C ′ RC Moreover ∇g(θ) is invertible for small enough θ2 , θ3 for each θ0 and θ1 6= 0 and then ′ Σ(θ) = (∇g(θ))−1 H (∇g(θ))−1 . Proof of the lemma. Assume that the process is close to be Gaussian such that function F is only considered for small θ2 , θ3 , then 1 0 0 0 2θ0 2θ1 0 0 ∇g(θ0 , θ1 , 0, 0) = 0 −9θ02 − θ12 −2θ0 θ1 2θ12 −12θ03 + 12θ0 θ12 12θ13 + 12θ02 θ1 −24θ0 θ12 24θ13
thus ∇g(θ0 , θ1 , 0, 0) is obviously invertible since θ1 6= 0. Continuity of θ 7→ ∇g(θ) entails that if θ2 , θ3 are small enough then ∇g(θ) is invertible. Now √ n(b g (θn ) − g(θ)) → N (0, H), Hence Σ(θ) = (∇g(θ))
−1
′ −1 H (∇g(θ)) .
We now show that if we replace all these cumbersome computations by our simple variance estimator, we still can hope decent approximations. 13
The sequence Xs is weakly dependent by assumptions. The matrix H is thus b according to Theorem 1. estimated by a matrix H
b we can deduce a convenient estimator for Σ, applying the previous From H lemma. −1 −1 ′ b b b b Σ = ∇g(θ) H ∇g(θ) We used, for instance, the parameter values θ = (0, 1, .05.01)T with Xt following an AR(1) process with autoregressive parameter a = −.5 and such that EXt2 = b for several values of n. 1. We compared H 0.337 0.258 1.410 2.804 0.258 3.932 4.188 31.040 H = 1.410 4.188 16.013 52.872 2.804 31.040 52.872 345.794
b 104 H
b 106 H
3.3
0.362 0.084 1.097 0.602 0.084 3.970 2.299 28.574 = 1.097 2.299 12.205 42.051 0.602 28.574 42.051 340.405
0.342 0.252 1.288 2.334 0.252 3.910 3.947 30.369 = 1.288 3.947 14.804 49.352 2.334 30.369 49.352 335.354
Oscillatory systems
Let U : R → R be C 1 -function such that U (0) = 0, and lim xU ′ (x) = ∞. |t|→∞
This is known from Abbaoui & Bendjeddou (2001) [1] that there exists a time stationary solution for the equation dx(t) = y(t) dt (11) dy(t) = −(by(t) + U ′ (x(t))) dt + σ dWt The joint density of the stationary solution of (11) and its derivative, (x(0), x′ (0)), is then C 1 b 2 f(x(0),x′ (0)) (x, v) = exp − 2 v + U (x) b σ 2 for a suitable normalization C > 0. A special case addressed by Callenbach et x4 x2 al. (2002) [7] is U (x) = − and ergodicity properties hold in this case (at 4 2 least).
14
Ergodicity implies that the number of crossings of the process x with level u on the interval [0, t], Ntx (u), satisfies Ex(0) ˙ =
Cb2 Ntx (0) lim σ 2 t→∞ t
(12)
General results introduced in Soize (1994) [30] are proved in the recent monograph by Villani (2007) [31]. Much more general models are considered there. We also point out a nice paper by Wu (2001) [32]; one important result for us is his Corollary 2.5 (b) where exponential ergodicity is stated for the invariant distribution under the previous assumptions on U . This clearly entails the geometric decay of the β−mixing coefficients, see Rio (2000) [24] or Doukhan (1994) [13]. In this case, all the mixing machinery thus also works and one may either use the results in Peligrad and Shao (1995) [21] or restate our results under the present β-mixing assumptions. Another way to conclude to both the convergence and the central limit theorem for our variance estimates is to recall that β-mixing implies θ-weak dependence (see Dedecker & Doukhan (2003) [10]) and thus λ−weak dependence also holds; thus our estimations results for Σ directly apply in this cas again. x Indeed, the sequence Xt = Ntx (0) − Nt−1 (0) (t = 1, 2, . . .) is stationary and it is also a weak dependent sequence (as a function of a weak dependent process with a marginal density, see Dedecker et al. (2007) [12], lemma 4.1 page 67, it is also strong mixing (both with geometrically decaying coefficients). We thus obtain: √ Ntx (0) σ2 t E x(0) ˙ →t→∞ N (0, Σ) − t Cb2
Hence our result applies for this limit variance (whose expression does not seem b n → Σ, moreover, a CLT also holds in this to be accessible in general) and Σ convergence. 3.3.1
Linear oscillators
We come to the simplest case U (x) = θx2 /2. We now aim at precising this central limit theorem in the case of a linear oscillators, for which the Gaussian structure is completely explicit. In this case at least this is possible to explicit this limit variance Σ. Here the stationary solution is a completely determined Gaussian process. The spectral density of the Gaussian process solution of this equation writes f (λ) =
2π((λ2
its spectral moments λp = λ0 = Var x(0) =
σ2 , 2bθ2
R
R
σ2 , − θ2 )2 + b2 λ2 )
∀λ ∈ R,
λp f (λ) dλ are λ2 =
σ2 , 2b 15
and
ENtx (u) =
t −u2 /2λ0 , θe π
2 x
1
1.0
0 −1
x
0.0
−2
−1.0 0
20
40
60
80
100
0
200
Time
400
600
800
1000
Time
Figure 3: Simulation of oscillatory system given by (11) with U = θx2 /2. We chose θ = 1, b = 1 and σ = 1. Left: Process x until time T = 100. Right: Same process until T = 1 000. Now the Gaussian stationary solution of (11) is ergodic hence relation (12) holds. Indeed, f (λ) > 0 everywhere. In fact the process is even strong mixing and both the strong mixing coefficients and the covariances of this solution admit a geometric decay to 0 (see Doukhan, 1994 [13] for references). θ N x (0) Ntx (0) = , thus θbt = π t is a consistent estimator of θ and t→∞ t π t √ Ntx (0) θ →t→∞ N (0, Σ), − t t π
Then lim
seems to holds here for at least three distinct reasons: • Gaussianity allows to prove this through the diagram formula, Slud (1994) [29], note that Sluds’s method also allows to derive that a limit variance is not 0, • Mixing, see e.g. Doukhan (1994) [13], • Weak dependence Doukhan & Louhichi (1999) [15], see Dedecker et al. (2007) [12], a CLT is proved in Doukhan & Wintenberger (2007) [18]. Using the subsampling approach on the data simulated with the parameter set above leads to: Σ = 0.03 ± 0.01. Unfortunately no higher moment conditions has been derived hence this is more an intuition that a real result at this point. We now derive the explicit expression of Σ in this case: 1 Lemma 2 Here Σ = 2π
Z
0
∞
p
τ (u) 1−
r2 (u) 16
∞ X
k=1
a22k (2k)!ρ2k (u) du +
θ with, π
350 N(0)
150
250
30 20
N(0)
0 50
10 0 0
20
40
60
80
100
0
200
400
Time
600
800
1000
Time
Figure 4: N process with parameters θ = 1, b = 1 and σ = 1. Left: Ntx (0) until time T = 100. Right: Same process until T = 1000. One can observe the convergence of Ntx (0)/t towards 1/π as predicted by the LLN. r(u)(r′ (u))2 1 r′2 (u) ′′ r (u) + and , ρ(u) = − τ (u) = −r (0) − 1 − r2 (u) τ (u) 1 − r2 (u) r r 2 2 (−1)k+1 a0 = , a2k = , for k ≥ 1. π π 2k k!(2k − 1) ′′
Proof of the lemma. The following computations, with some minor modifications, may be found in Cuzick (1976) [9]. Given that E(Nt (0))2 we get
θ E(Ntx (0)(Nt (0) − 1)) + t , π θ θ 2 x x . ENt (0)(Nt (0) − 1) + t − t π π
=
Var Nt (0) =
By using the Rice’s formula for the second factorial moment Z t ENtx (0)(Ntx (0) − 1) = (t − u)E|ζ(u)ζ ∗ (u)|du, 0
∗
where, for each u, (ζ(u), ζ (u)) is a Gaussian vector with r′2 (u) 1 − r2 (u) r(u)(r′ (u))2 . = −r′′ (u) − 1 − r2 (u)
Var ζ(u) = Var ζ ∗ (u)
= −r′′ (0) −
Cov (ζ(u), ζ ∗ (u))
Defining ρ(u) = Corr(ζ(u), ζ ∗ (u)), we can write ENtx (0)(Ntx (0)
1 − 1) = 2π
Z
0
t
∞ (t − u)τ (u) X 2 p a2k (2k)!ρ2k (u)du. 1 − r2 (u) k=0
17
Here a2k are the Hermite coefficients of function |x| explicited in the lemma. Dropping the first term in this series and letting t → ∞ we get Z t ∞ 1 (t − u)τ (u) X 2 p lim a2k (2k)!ρ2k (u)du t→∞ 2πt 0 1 − r2 (u) k=1 Z ∞ ∞ X τ (u) 1 p a22k (2k)!ρ2k (u)du. = 2 2π 0 1 − r (u) k=1 hence we obtain lim Var (Ntx (0))/t = Σ.
t→∞
The previous lemma proves that Σ has a definitely not tractable expression and thus justifies our subsampling approach.
4 4.1
Proofs Moments, cumulants and weak dependence
Let now (Xn )n∈Z denote a vector valued and stationary sequence (with values in RD ), we define, extending Doukhan and Louhichi (1999) [15]’s coefficients, (al+1 ) (a ) (a ) (a ) · · · Xtq q (13) cX,q (r) = max sup Cov Xt1 1 · · · Xtl l , Xtl+1 1≤l q then the coefficient defined in eqn. (13) satisfies b
cX,q (r) ≤ q 2 4 b−1 µ
(q−1)b b−1
b−q
λ(r) b−1 .
Proof of Proposition 1. Consider integers 1 ≤ ℓ < q, 1 ≤ a1 , . . . , aq ≤ D and t1 ≤ · · · ≤ tq such that tℓ+1 − tℓ ≥ r, we need to bound (uniformly wrt ℓ, 1 ≤ a1 , . . . , aq ≤ D and t1 ≤ · · · ≤ tq ) the expression (aℓ+1 ) (a ) (a ) (a ) c = Cov Xt1 1 · · · Xtℓ ℓ , Xtℓ+1 · · · Xtq q = |Cov (A, B)| (15) 18
In order to bound (15), for some M > 0 depending on r and to be defined later, (1)
(D)
we now set X j = (X j , . . . , X j 1 ≤ a ≤ D if Xj = c A
(a)
) with X j
(D) (1) (Xj , . . . , Xj ).
(a)
= Xj
∨ (−M ) ∧ M for each
Then we also have,
≤ |Cov (A, B)| + |Cov (A − A, B)| + |Cov (A, B − B)|, setting (a1 )
= X t1
(aℓ )
(aℓ+1 )
· · · X tℓ ,
B = X tℓ+1
(aq )
· · · X tq . (aj )
(a )
Now if Yi writes as the product of q − 1 factors Zi,j = |Xtj j | or |X tj | for ℓ X (ai ) (a ) Yi |Xti i − X ti |, and 1 ≤ j ≤ q and j 6= i, we obtain |(A − A)B| ≤ |A(B − B)| ≤
q X
i=1
i=ℓ+1
(a ) Yi |Xti i
(ai ) X ti |.
−
It is thus clear from H¨ older inequality
that an analogue representation of the centering terms yields: |Cov (A−A, B)|+|Cov (A, B−B)| ≤ 2 q−1 b
1 p
q X i=1
(a)
(a)
(a)
max kX0 kbq−1 max kX0 −X 0 kp
1≤a≤D
1≤a≤D
= 1. Set h(x) = x ∨ (−M ) ∧ M then Lip h = 1 and khk∞ = M (a ) (a ) thus fℓ (x1 , . . . , xℓ ) = h(x1 ) · · · h(xℓ ) is such that A = fℓ Xt1 1 , . . . , Xtℓ ℓ , (a ) (aℓ+1 ) with kfℓ k∞ ≤ M ℓ and Lip fℓ ≤ M ℓ−1 ; the , . . . , Xtq q B = fq−ℓ Xtℓ+1 definition of λ-dependence thus implies |Cov (A, B)| ≤ ℓLip fℓ kfq−ℓ k∞ + (q − ℓ)Lip fq−ℓ kfℓ k∞ +ℓ(q − ℓ)Lip fℓ Lip fq−ℓ λ(r) where
+
≤
≤
q 2 q−2 M )λ(r) 4 q 2 M q−1 λ(r), as soon as M ≥ 1, q > 1
(qM q−1 +
(a)
Now for each real valued random variable X1 (a)
E|X0
(a)
− X 0 |p
(a)
(a)
kX0 − X 0 kp
(a)
and 1 ≤ a ≤ D, (a)
− X 0 |p1I|X (a) |≥M
=
E|X0
≤
2p E|X0 |p1I|X (a) |≥M
≤
2p E|X0 |b M p−b ,
≤
2kX0 kbp M 1− p ,
0
(a)
0
(a)
(a)
b
b
(a)
thus setting µ = max1≤a≤D kX0 kb , we obtain with the previous inequalities, |Cov (A − A, B)| |Cov (A, B)|
b
b
+ |Cov (A, B − B)| ≤ 4qµq−1+ p M 1− p , b b ≤ q 4µq−1+ p M 1− p + qM q−1 λ(r) ≤ q 2 4µb M q−b + M q−1 λ(r) 19
The previous expression has an order optimized with 4µb M 1−b = λ(r) by b
c = |Cov (A, B)| ≤ q 2 4 b−1 µ
(q−1)b b−1
b−q
λ(r) b−1 .
As a first application of this relation, it seems useful to state the following corollary; such moment inequalities also entail laws of large numbers. Corollary 1 (One dimensional case: D=1) Let (Xt )t∈Z be a real valued and stationary λ−weakly dependent times series. Assume that E|X0 |b < ∞ for some b > q, EX0 = 0, and ∞ X b−q (r + 1)q−2 λ(r) b−1 < ∞, r=0
then there exists a constant C > 0 only depending on q and on the previous series such that n q X Xj ≤ Cn[q/2] . E j=1
Remarks. • The hypothesis in Doukhan & Louhichi (1999) [15] result needs in fact P∞ b−s s−2 λ(r) b−1 < ∞, for each integer s ∈ [2, q]; we note here that r=0 (r + 1) the present condition implies limr→∞ λ(r) = 0 but if λ(r) ≤ 1 this is clear that our condition is indeed enough to control q-th order moments. • Quote that η-weakly dependence yields exactly the same result, by only replacing λ by η. • Such moment inequalities clearly entail strong laws of large numbers for q = 4. More precisely, a careful analysis of the bound in Doukhan & Louhichi’s (1999) [15] which states that n−1 n 2 4 n−1 X X X ′ (r + 1)2 C4 (r) C2 (r) + c n Xj ≤ c n E j=1
j=0
j=0
proves that for some constant C, and a centered stationary sequence with λ(r) ≤ cr−λ , and E|X0 |b < ∞ n 4 X b−1 Xj ≤ Cn2 , if λ > 2 E (16) b−4 j=1 • Multidimensional extensions are easy; for a norm in RD there exists a constant C with kxk ≤ C max{|x1 , . . . , |xd |}, now for a RD -valued random PD variable X = (X1 , . . . , Xd ) we obtain EkXkD ≤ C i=1 E|Xi |D which concludes up to some constant only depending on D and on the norm. P q−2 Proof. We note that cX,q (r) < ∞ and we use Doukhan & r≥0 (r + 1) Louhichi’s (1999) [15] bound together with proposition 1 to conclude. 20
4.2
A weak dependence condition for (F (∆m,i )i
We set
m
1 X ∆m,j = √ Xi+j−1 m i=1
for 1 ≤ j ≤ N and ∆m = ∆m,1 then Corollary 1 entails with the equivalence of norms in RD (as quoted in a remark) that: k∆m kq ≺ 1
if
∞ X r=0
b−q
(r + 1)q−2 λ(r) q−1 < ∞
Set Fi = F (∆im ), with ∆im is either ∆m,(i−1)m+ℓ , N = n/m ˜ (in our setting (6)) ˜ or ∆m,i , N = n − m ˜ and ℓ = 0 for the case (2). In order to work out both cases this Pwill be convenient to consider the set I(i) of m integers such that ∆im = √1m j∈I(i) Xj . Let M ≥ 1 to be determined below for truncation sake, we shall then define HM : (RD )m → RD , with HM (x1 , . . . , xm ) = hM ( √1m (x1 + · · · + xm )); then Lip HM = √1m if hM (x) =
x x ||x|| M
if kxk ≤ M else
∀x ∈ RD
We write for short FiM = F (hM (∆im )). For Lipschitz functions f and g bounded by 1, in order to state weak dependence properties of this sequence, we shall bound Cov (f, g) ≡ Cov (f (Fi1 , . . . , Fiu ), g(Fj1 , . . . , Fjv )) here as usual (see [12]) we assume i1 ≤ · · · ≤ iu ≤ iu + r ≤ j1 ≤ · · · ≤ jv . ) = Φ((Xj )j∈I(s),1≤s≤u ) with Φ((xs,i )s∈I(i),1≤i≤u ) = , . . . , FiM Then fM ≡ f (FiM u 1 f ((F (HM ((xs,i ))s∈I(i) )1≤i≤u )) and, Cov (f, g)
=
|Cov (f, g)| ≤
Cov (f − fM , g) + Cov (fM , g − gM ) + Cov (fM , gM )
|Cov (fM , gM )| + 2E|f − fM | + 2E|g − gM | Pm Now E|f −fM | ≤ Lip f i=1 E|Fil −Fil (hM (∆iml ))| the terms vanish if |∆im | < M and else it is bounded by |F (x)|+|F (xM/kxk)| ≺ kxkβ . Since E|∆m |β1I|∆m |>M ≤ M β−q E|∆m |q , and |Cov (f − fM , g)| + |Cov (fM , g − gM )| ≺ (uLip f + vLip g)M β−q If φ : RDu = f (F (δ1 ), . . . , F (δu )) we consider Φ((xs,i )1≤s≤m,1≤i≤u ) = φ(HM ((xs,1 )1≤s≤m )), . . . , HM ((xs,u )1≤s≤m )) For β ≤ 1
|φ(δ) − φ(δ ′ )| ≤ Lip f Lip F 21
X
|δi − δi′ |
and if β > 1 |φ(δ) − φ(δ ′ )| ≤ cLip f
X i
|δi − δi′ |(1 + |δi |β−1 + |δi′ |β−1 )
P A common bound writes Lip f M β−1 i |δi − δi′ | up to some constant only depending on F . For both cases β ≤ 1 or β > 1, we thus calculate that + X |δi − δi′ | |φ(δ) − φ(δ ′ )| ≤ C(F )Lip f M (β−1) i
and
+
C(F )M (β−1) √ Lip Φ ≤ Lip f m √ + Combine the previous inequalities yields with A = mM (β−1) (below Ψ is defined analogously to Φ but wrt g, and some immediate changes in notation), |Cov (f, g)| ≤ C(F )(umLip Φ + cmLip Ψ + uvm2 Lip ΦLip Ψ)λ +M β−q (uLip f + vLip g) ≤ A(uLip f + vLip g) + A2 uvLip f Lip g λ +M β−q (uLip f + vLip g)
In this relation λ denotes respectively λ(r − m) ˜ if r ≥ m ˜ for the estimator (2), or λ((r − 1)m ˜ + ℓ) if r ≥ 1 for the estimator (2); and we also set λ = 1 in the other cases. For simplicity also set α = (β − 1)+ (this is 1 for β ≤ 1 and β − 1 else). √ We only try AM q−β λ = 1(= M q+α−β ) mλ) then √ √ q−β −2 Cov (f, g) ≺ ( mλ) q+α−β (uLip f + vLip g) + λ( mλ) q+α−β uvLip f Lip g ≺ +
q−β
q−β
m 2(q+α−β) (uLip f + vLip g)λ q+α−β 1
m− q+α−β uvLip f Lip gλ
q+α−β−2 q+α−β
Quote now that q + α − β = q − β if β ≤ 1 and is q − 1 else thus it is q − β ∨ 1. We derived the bound Lemma 3 (weak dependence of (F (∆m,i )i ) Assume that Lip f, Lip g < ∞ and kf k∞ , kgk∞ ≤ 1 and for some r ≥ 0 indices are such that i1 ≤ · · · ≤ iu ≤ iu + r ≤ j1 ≤ · · · ≤ jv , then there exist constants C, C ′ only depending on F and the weak dependence sequence such that |Cov (f, g)| = ≤
|Cov (f (Fi1 , . . . , Fiu ), g(Fj1 , . . . , Fjv ))| √ q−β C(uLip f + vLip g)( mλ) q−β∨1 −1
+C ′ uvLip f Lip g m q−β∨1 λ
q−β∨1−2 q−β∨1
Here λ = λ(r − m) ˜ if r ≥ m ˜ for the estimator (2), λ = λ((r − 1)m ˜ + ℓ) if r ≥ 1 for the estimator (2) and λ = 1 in the other cases. 22
4.3
Proof of theorem 1 (LLN)
We come to the law of large numbers for the statistics Fen and Fbn . The proofs will be given in a unified frame. We shall prove here that ! N N 1 X 2 X Var F (∆i,m ) ≤ |Cov (F (∆1,m ), F (∆i,m ))| →N →∞ 0 N i=1 N i=1 Set λ = λ((i − 1)m ˜ + ℓ) (for i ≥ 1) or λ = λ((i − m) ˜ for i ≥ m ˜ respectively in the non-overlapping case and in the overlapping case. If respectively i = 0 or i < m, ˜ up to some constant, the covariances are bounded by Ek∆m k2β ≺ 1 (since q ≥ 2β) through Cauchy-Schwartz inequality. Now for respectively i ≥ 1 or i ≥ m ˜ H¨ older inequality implies for p−1 + p′−1 = 1, |Cov (F1 , Fi )| ≤ |Cov (F1M , FiM )| + 4kF1 − F1M kp kF1 kp′ Up to constants, the first term is √ associated with a functions f = F ◦ HM with the Lipschitz constant M (β−1)+ / m and the supremum bound M β . Set ζ = q I mk ≥ β − 2. In the second term the first factor is bounded with using k∆m1(k∆
β pβ M )kpβ I m k ≥ M )] while the second term is ≺ k∆m kp′ β . The pβ = E[k∆m k 1(k∆ q pβ−q best choice of p′ is p′ β = q, and then E[k∆m kpβ1(k∆ I . m k ≥ M )] ≤ Ek∆m k M
|Cov (F1 , Fi )| ≺ ≺ ≺ ≺
q
|Cov (FiM , FiM )| + M 2− β √ ( mM α+β + mM 2α )λ + M −ζ √ β+ζ β−α+ζ ζ ( mλ) α+β+ζ + m α+β+ζ λ α+β+ζ β+ζ
ζ
m α+β+ζ λ α+β+ζ
Hence in the considered non-overlapping case: Var (Fen )
≺
1 N
≺
m , n
1+m
β+ζ α+β+ζ
N X
λ
ζ α+β+ζ
i=1
if
∞ X i=1
!
((i − 1)m ˜ + ℓ)
ζ
β+ζ
λ α+β+ζ ((i − 1)m ˜ + ℓ) = O(m− α+β+ζ )
Note that if a : R+ → R+ in non-increasing then 1 a(im ˜ + ℓ) ≤ m
Z
im+ℓ ˜
a(x) dx
(i−1)m+ℓ ˜
Hence the previous relation holds if λ(r) ≺ r−λ in case β+ζ
ζλ
ℓ1− α+β+ζ ≺ m1− α+β+ζ
23
Now in the overlapping case (N = n − m ˜ and ℓ = 0): Var (Fbn )
≺
1 n
≺
m , n
m+1+m
β+ζ α+β+ζ
n− m ˜ X
λ
i=m+1 ˜ ∞ X
if
i=1
ζ α+β+ζ
!
(i − m) ˜
ζ
λ α+β+ζ (i) < ∞
Now we will prove a strong law of large number, at the moment we obtained sufficient conditions in each case such that either EFbn2 ≤ cmn /n = v(n) or EFbn2 ≤ cmn /n = v(n). We recall that Nn = n/m ˜ n or N = n − m ˜ n depending on the considered estimator, Fbn or Fen for unifying proofs. Thus P(supn≥ν |Fbn | ≥ √ b b 3t) ≤ Aν + Bν + Cν where k(ν) = [ ν] and using the decomposition Fn − Fk2 = P Nk 2 Nn 1 Fi + − 1 Fbk2 Nn
i=Nk2 +1
Nn
Aν
=
Bν
=
∞ X v(k 2 ) t2 k=k(ν) k=k(ν) Nn ∞ X X 1 P 2 max 2 Fi ≥ t k