Strong approximation of the empirical distribution function for ...

4 downloads 0 Views 431KB Size Report
Mar 8, 2013 - Running head: Strong approximation for the empirical process. ... order, with common distribution function F. Define the empirical process of ...
Strong approximation of the empirical distribution function for absolutely regular sequences in Rd . 08/03/2013 J´erˆ ome Dedecker a , Florence Merlev`ede

b

and Emmanuel Rio c .

a

Universit´e Paris Descartes, Laboratoire MAP5, UMR 8145 CNRS, 45 rue des Saints-P`eres, F-75270 Paris cedex 06, France. E-mail: [email protected] b

Universit´e Paris-Est, LAMA (UMR 8050), UPEMLV, CNRS, UPEC, F-77454 Marne-La-Vall´ee, France. E-mail: [email protected] b

Universit´e de Versailles, Laboratoire de math´ematiques, UMR 8100 CNRS, Bˆatiment Fermat, 45 Avenue des Etats-Unis, 78035 Versailles, FRANCE. E-mail: [email protected] Running head: Strong approximation for the empirical process. Key words: Strong approximation, Kiefer process, empirical process, stationary sequences, absolutely regular sequences. Mathematical Subject Classification (2010): 60F17, 60G10. Abstract We prove a strong approximation result with rates for the empirical process associated to an absolutely regular stationary sequence of random variables with values in Rd . As soon as the absolute regular coefficients of the sequence decrease more rapidly than n1−p for some p ∈]2, 3], we show that the error of approximation between the empirical process and a two-parameter Gaussian process is of order n1/p (log n)λ(d) for some positive λ(d) depending on d, both in L1 and almost surely. The power of n being independent of the dimension, our results are even new in the independent setting, and improve earlier results. In addition, for absolutely regular sequences, we show that the rate of approximation is optimal up to the logarithmic term.

1

Introduction

Let (Xi )i∈Z be a strictly stationary sequence of random variables in Rd equipped with the usual product order, with common distribution function F . Define the empirical process of (Xi )i∈Z by X  RX (s, t) = 1Xk ≤s − F (s) , s ∈ Rd , t ∈ R+ . (1.1) 1≤k≤t

In this paper we are interested in extensions of the results of Kiefer for the process RX to absolutely regular processes. Let us start by recalling the known results in the case of independent and identically distributed (iid) random variables Xi . Kiefer (1972) obtained the first result in the case d = 1. He constructed a continuous centered Gaussian process KX with covariance function  E KX (s, t)KX (s0 , t0 ) = (t ∧ t0 )(F (s ∧ s0 ) − F (s)F (s0 )) in such a way that sup

|RX (s, [nt]) − KX (s, [nt])| = O(an )

almost surely,

(1.2)

(s,t)∈R×[0,1]

with an = n1/3 (log n)2/3 . The two-parameter Gaussian process KX is known in the literature as the Kiefer process. Cs¨ org¨ o and R´ev´esz (1975a) extended Kiefer’s result to the multivariate case. For iid random variables with the uniform distribution over [0, 1]d , they obtained (1.2) with an = n(d+1)/(2d+4) (log n)2 . Next they extended this result to iid random variables in Rd with a density satisfying some smoothness conditions (see Cs¨ org¨ o and R´ev´esz (1975b)). In the univariate case, a major advance was made by Koml´os, Major and Tusn´ady (1975): they obtained (1.2) with an = (log n)2 (we refer to Castelle and Laurent-Bonvalot (1998) for a detailed proof) 1

via a new method of construction of the Gaussian process. Concerning the strong approximation by a sequence of Gaussian processes in the case d = 2, Tusn´ady (1977) proved that when the random variables Xi are iid with uniform distribution over [0, 1]2 , then one can construct a sequence of centered continuous Gaussian processes (Gn )n≥1 in R2 with covariance function Cov(Gn (s), Gn (s0 )) = n((s1 ∧ s01 )(s2 ∧ s02 ) − s1 s2 s01 s02 ) , with s = (s1 , s2 ) and s0 = (s01 , s02 ), such that sup |RX (s, n) − Gn (s)| = O(log2 n)

almost surely.

(1.3)

s∈[0,1]2

Adapting the dyadic method of Koml´ os, Major and Tusn´ady (sometimes called Hungarian construction), several authors obtained new results in the multivariate case. For iid random variables in Rd with distribution with dependent components (without regularity conditions on the distribution), Borisov (1982) obtained the almost sure rate of approximation O(n(d−1)/(2d−1) log n) in the Tusn´ady strong approximation. Next, starting from the result of Borisov (1982), Cs¨org¨o and Horv´ath (1988) obtained the almost sure rate O(n(2d−1)/(4d) (log n)3/2 ) for the strong approximation by a Kiefer process. Up to our knowledge, this result has not yet been improved in the case of general distributions with dependent components. For d ≥ 3 and Tusn´ ady’s type results, Rio (1994) obtained the rate O n(d−1)/(2d) (log n)1/2 for random variables with the uniform distribution or more generally with smooth positive density on the unit cube (see also Massart (1989) in the uniform case). Still in the uniform case, concerning the strong  approximation by a Kiefer process, Massart (1989) obtained the almost sure rate O nd/(2d+2) (log n)2 for any d ≥ 2, which improves the results of Cs¨org¨o and R´ev´esz (1975a). In fact the results of Massart (1989) and Rio (1994) also apply to Vapnik-Chervonenkis classes of sets with uniformly bounded perimeters, such as the class of Euclidean balls. In that case, Beck (1985) proved that the error term cannot be better √ than n(d−1)/(2d) . Consequently the result of Rio (1994) for Euclidean balls is optimal, up to the factor log n. However, there is a gap in the lower bounds between the class of Euclidean balls and the class of orthants, which corresponds to the empirical distribution function. Indeed, concerning the lower bounds in Tusn´ ady’s type results, Beck (1985) showed that the rate of approximation cannot be less than cd (log n)(d−1)/2 where cd is a positive constant depending on d. To be precise, he proved (see his Theorem 2) that when the random variables Xi are iid with the uniform distribution over [0, 1]d , then for any sequence of Brownian bridges (Gn )n≥1 in Rd ,   P sup |RX (s, n) − Gn (s)| ≤ cd (log n)(d−1)/2 < e−n . s∈[0,1]d

Beck’s result implies in particular that, for any n ≥ 2,   (log n)(1−d)/2 E sup |RX (s, n) − Gn (s)| ≥ cd /2 .

(1.4)

s∈[0,1]d

The results of Beck (1985) motivated new research in the multivariate case. For theempirical distribution function and Tusn´ ady type results, Rio (1996) obtained the rate O n5/12 (log n)c(d) for random variables with the uniform distribution, where c(d) is a positive constant depending on the dimension d, without the help of Hungarian construction. Here the power of n does not depend on the dimension: consequently this result is better than the previous results if d ≥ 7. It is worth noticing that, although this subject has been treated intensively, up to now, the best known rates for the strong approximation by a Kiefer process in the multivariate case are of the order n1/3 for d = 2, up to some power of log n, even in the uniform case. Furthermore these rates depend on the dimension, contrary to the result of Rio (1996) for Tusn´ ady type approximations. We now come to the weakly dependent case. Contrary to the iid case, there are only few results concerning the rate of approximation. Up to our knowledge, when (Xi )i∈Z is a geometrically strongly mixing (in the sense of Rosenblatt (1956)) strictly stationary sequence of random variables in Rd , the best known result concerning rates of convergence, is due to Doukhan and Portal (1987) stating that one can construct a sequence of centered continuous Gaussian processes (Gn )n≥1 in Rd with common covariance function X Λ(s, s0 ) = Cov(1X0 ≤s , 1Xk ≤s0 ) , k∈Z

2

such that the Ky-Fan distance between {n−1/2 RX (s, n), s ∈ Rd } and {Gn (s), s ∈ Rd } is o(n−a ) for any a < 1/(15d + 12). In their paper, they also give some rates in case of polynomial decay of the mixing coefficients. Concerning the strong approximation by a Kiefer process in the stationary and strongly mixing case, Theorem 3 in Dhompongsa (1984) yields the rate O(n1/2 (log n)−λ ) for some positive λ, under the strong mixing condition αn = O(n−a ) for some a > 2 + d, improving slightly previous results of Phillip and Pinzur (1980) (here λ depends on a and d). Strong mixing conditions seem to be too poor to get optimal rates of convergence. Now recall that, for irreducible, aperiodic and positively recurrent Markov chains, the coefficients of strong mixing and the coefficients of absolute regularity are of the same order (see for example Rio (2000), chap. 9). Since absolute regularity is a stronger condition, it is more convenient to consider absolute regularity, at least in the case of irreducible Markov chains. Let XX 1 |P(Ai ∩ Bj ) − P(Ai )P(Bj )| , β(A, B) = sup 2 i∈I j∈J

the maximum being taken over all finite partitions (Ai )i∈I and (Bi )i∈J of Ω respectively with elements in A and B. For a strictly stationary sequence (Xk )k∈Z , let F0 = σ(Xi , i ≤ 0) and Gk = σ(Xi , i ≥ k). The sequence (Xk )k∈Z is said to be absolutely regular in the sense of Rozanov and Volkonskii (1959) or β-mixing if βn = β(F0 , Gn ) → 0 , as n → ∞ . Concerning the strong approximation by a Kiefer process in the stationary and β-mixing case, Theorem 1 in Dhompongsa (1984) yields the rate O(n(1/2)−λ ) for some positive λ, under the assumption βn = O(n−a ) for some a > 2 + d. Nevertheless this mixing condition is clearly too restrictive and λ is not explicit. We now come to our results. For absolutely regular sequences, the finite dimensional P convergence of {n−1/2 RX (s, n) : s ∈ Rd } to a Gaussian process holds under the summability condition k≥0 βk < ∞, and this condition is sharp. Rio (1998) proved that this summability condition also implies the functional central limit theorem for {n−1/2 RX (s, n) : s ∈ Rd } in the sense of Dudley (1978) for any d ≥ 1. Assume now that the stronger β-mixing condition βn = O(n1−p ) for some p > 2

(1.5)

holds true. In Section 2, we shall prove that, in the case d = 1, one can construct a stationary absolutely regular Markov chain satisfying (1.5), whose marginals are uniformly distributed over [0, 1], and such that, for any construction of a sequence (Gn )n>0 of continuous Gaussian processes on [0, 1],   lim inf (n log n)−1/p E sup |RX (s, n) − Gn (s)| > 0 . n→∞

s∈(0,1]

Concerning the upper bound, Dedecker, Merlev`ede and Rio (2012) obtain a strong approximation by a Kiefer process under a weak dependence condition which is implied by the above condition, with a power-type rate O(n(1/2)−δ ) for some positive δ depending on p. Nevertheless their result holds only for d = 1 and the value of δ is far from the optimal value (1/2) − (1/p). This gap motivates the present work. In Section 3, we prove that, if (Xi )i∈Z is a strictly stationary sequence of random variables in Rd satisfying (1.5) for p ∈]2, 3], there exists a two-parameter continuous (with respect to the pseudo metric defined by (3.1)) Kiefer type process KX such that   E sup |RX (s, [nt]) − KX (s, [nt])| = O(n1/p (log n)λ(d) ) . s∈Rd ,t∈[0,1]

We also prove that, for another Kiefer process KX , sup |RX (s, k) − KX (s, k)| = O(n1/p (log n)λ(d)+ε+1/p )

almost surely, for any ε > 0 .

s∈Rd k≤n

More precisely, the covariance function ΓX of KX is given by ΓX (s, s0 , t, t0 ) = min(t, t0 )ΛX (s, s0 ) where ΛX (s, s0 ) =

X k∈Z

3

Cov(1X0 ≤s , 1Xk ≤s0 ) .

(1.6)

Our proofs can be adapted to iid random variables with values in Rd and arbitrary distribution function, for any d ≥ 2, yielding the error term in the strong approximation O(n1/3 (log n)ε+(2d+4)/3 ) in the almost sure strong approximation by a Kiefer process. This result improves the results of Cs¨org¨o and Horv´ ath (1988) for general distributions as soon as d ≥ 2 and the result of Massart (1989) concerning the specific case of the uniform law as soon as d ≥ 3 (recall that Massart’s rate is O(n1/3 (log n)2 ) for d = 2 and O(n3/8 (log n)2 ) for d = 3). We now describe our methods of proofs. We shall apply the conditional version of the transport theorem of Kantorovich and Rubinstein to the trajectories to get a bound on the error. However, in the dependence setting, we do not apply the transport theorem directly. Indeed, we start by approximating the initial process RX by a Gaussian process with the same covariance structure as RX , using the conditional Kantorovich-Rubinstein theorem applied in the space of trajectories, together with the Lindeberg method. Next, we use a martingale method to approximate the Gaussian process by the Kiefer process. This step is due to the fact that the Lindeberg method in the space of trajectories applies only to processes with the same covariance structure. In all these steps the error terms can be bounded by n1/p up to some power of log n, which leads to the (nearly) optimal rates of convergence for absolutely regular sequences. Note that the Lindeberg method in the space of trajectories was introduced by Sakhanenko (1987) in the real case to bound up the Prokhorov distance between the partial sum process and the Brownian motion. This result was then extended to random vectors in Banach spaces by Bentkus and Lyubinskas (1987) using smoothing techniques introduced by the first author in his doctoral dissertation. Later, Sakhanenko (2000) improved the results of Bentkus and Lyubinskas (1987) in the specific case of the L∞ -norm, yielding efficient estimates under some assumptions on the moments of order two and three of the Euclidean norms of the random vectors. Sakhanenko (1988, 2000) also gives some results for martingale differences under very restrictive assumptions on the conditional moments of order two. In our opinion, the smoothing technique used in Sakhanenko (2000) is not suitable in the dependent case. Indeed the assumption on the conditional moments cannot be relaxed. Our paper is organized as follows. In Section 2 we give an example of absolutely regular process for which we can derive lower bounds for the rates of approximation by any continuous Gaussian processes. In Section 3 we formulate our main results concerning the upper bounds for the rates of approximation both in the dependent setting and in the independent one. The proofs of these results are given in Sections 4 and 5. Section 6 is devoted to the very technical proofs of key intermediate lemmas leading to our main results. Finally, in Section 7 we collect some auxiliary assertions and general facts.

2

Lower bounds for the rate of approximation

In this section, we give an example of a stationary absolutely regular Markov chain with state space [0, 1] and absolute regularity coefficients βk of the order of k 1−p for p > 2 which has the following property: with probability one, the error in the strong approximation by Gaussian processes is bounded from below by (n log n)1/p , for any construction of a sequence of continuous Gaussian processes, whereas the L1 -error is bounded from below by n1/p . Theorem 2.1 For any p > 2, there exists a stationary Markov chain (Xi )i∈Z of random variables with uniform distribution over [0, 1] and β-mixing coefficients (βn )n>0 , such that: (i) 0 < lim inf n→+∞ np−1 βn ≤ lim supn→+∞ np−1 βn < ∞ . (ii) There exists a positive constant C such that, for any construction of a sequence (Gn )n>0 of continuous Gaussian processes on [0, 1]   (a) lim inf n−1/p E sup |RX (s, n) − Gn (s)| ≥ C . n→∞

s∈(0,1]

Furthermore (b)

lim sup(n log n)−1/p sup |RX (s, n) − Gn (s)| > 0 almost surely. n→∞

s∈(0,1]

Before proving this result, we give a second theorem, which proves that the strong approximation of partial sums of functionals of the chain holds with the same error term.

4

Theorem 2.2 Let (Xi )i∈Z be the stationary Markov chain defined in Theorem 2.1 and let f be a map from [0, 1] to R, with continuous and strictly positive derivative f 0 on [0, 1]. Let Sn (f ) =

n X

Z

1

f (Xk ) − n

f (t)dt . 0

k=1

P Then the series Var f (X0 ) + 2 k>0 Cov(f (X0 ), f (Xk )) is absolutely convergent to some nonnegative σ 2 (f ). Furthermore, for 2 < p < 3 and any positive ε, one can construct a sequence of iid Gaussian random variables (gk0 )k>0 with law N (0, σ 2 (f )) such that (a)

Sn (f ) −

n X

gk0 = o(n1/p

p log n(log log n)(1+ε)/p ) almost surely.

k=1

In addition, for any p > 2 and any stationary and Gaussian centered sequence (gk )k∈Z with convergent series of covariances, (b)

n X lim sup(n log n)−1/p Sn (f ) − gk > 0 almost surely. n→∞

k=1

Note that Part (a) of this theorem was proved in Merlev`ede and Rio (2012). Part (b) proves that the result in Merlev`ede and Rio (2012) is optimal up to the factor (log n)(1/2)−(1/p) (log log n)(1+ε)/p . It is worth noticing that the power of the logarithm in the loss tends to 0 as p tends to 2. Proof of Theorem 2.1. The sequence (Xi )i∈Z is defined from a strictly stationary Markov chain (ξi )i∈Z on [0, 1] as in Rio (2000), Section 9.7. Let λ be the Lebesgue measure, a = p − 1 and ν = (1 + a)xa 1[0,1] λ. The conditional distribution Π(x, .) of ξn+1 , given (ξn = x), is defined by Π(x, .) = Π(δx , .) = (1 − x)δx + xν , where δx is the Dirac measure at point x. Then the β-mixing coefficients (βn )n>0 of the stationary chain (ξi )i∈Z with transition probability Π(x, .) satisfy (i) of Theorem 2.1 (see Rio (2000), Section 9.7). Moreover, the stationary distribution π has distribution function F (x) = xa , and consequently setting Xi = ξia we obtain a stationary Markov chain (Xi )i∈Z of random variables with uniform distribution over [0, 1] and adequate rate of β-mixing. Define then the empirical measure Pn by Pn = n−1

n X

δXi .

i=1

The regeneration times (Tk )k of the Markov chain (ξi )i∈Z are defined by induction as follows: T0 = inf{n > 0 : ξn 6= ξn−1 } and Tk = inf{n > Tk−1 : ξn 6= ξn−1 }. Let τk = Tk+1 − Tk . It follows that the empirical measure at time Tk − 1 satisfies the equality (Tk − 1)PTk −1 = (T0 − 1)δX0 +

k−1 X

τj δXTj .

(2.1)

j=0

Consequently, for n ≥ Tk − 1 the maximal jump of RX (s, n) is greater than ∆k =

max τj .

j∈[0,k−1]

Next, from the continuity of Gn , for n ≥ Tk − 1, Dn := sup |RX (s, n) − Gn (s)| ≥ ∆k /2 .

(2.2)

s∈(0,1]

Now the sequence (∆k )k is a nondecreasing sequence of positive integers. Notice that the random variables (ξTk , τk ) are independent and identically distributed. Moreover ξTk has the distribution ν and the conditional distribution of τk given (ξTk = x) is the geometric distribution G(x). Hence, P(∆k ≤ m) = (P(τ0 ≤ m))k , 5

and P(τ0 > m) = (1 + a)m−1−a

m

Z

(1 − y/m)m y a dy ∼ (1 + a)Γ(1 + a)m−1−a as m ↑ ∞ .

(2.3)

0

From the above facts, it follows that X

E(∆k ) =

P(∆k > m) ≥ cp k 1/p .

(2.4)

m≥0

In the same way, one can prove that k∆k k(2+a)/2 ≤ Cp k 1/p .

(2.5)

Here cp and Cp are positive constants depending only on p. Now, by the strong law of large numbers Tk /k converges to E(τ0 ) almost surely, and therefore in probability. Consequently, for k = kn = [n/(2E(τ0 ))], lim P(n < Tkn − 1) = 0 .

n→∞

Now

a

2E(Dn ) ≥ E(∆kn 1n≥Tkn −1 ) ≥ E(∆kn ) − k∆kn k 2+a (P(n < Tkn − 1)) 2+a . 2

From (2.5), (2.4) and the above inequality, we get that, there exists some positive constant C (depending on p) such that, for n large enough, E(Dn ) ≥ Cn1/p , which completes the proof of (a) of Theorem 2.1. To prove (b), we note that, by (2.3) , P(τk > (k ln k)1/p ) ∼ cp /(k log k) .

(2.6)

Since the regeneration times τk are independent, by the converse Borel-Cantelli lemma, it follows that P(τk > (k log k)1/p infinitely often ) = 1 .

(2.7)

Hence, by (2.2), lim sup(n ln n)−1/p DTn −1 ≥ (1/2) almost surely. n

Both this inequality and the strong law of large numbers for Tn then imply (b) of Theorem 2.1. R1 Proof of Theorem 2.2. Let b be a real in ]0, 1] such that f (b) < 0 f (t)dt (note that such a positive b exists). With the same notations as in the proof of the previous theorem, the random variables (XTk , τk ) are independent and identically distributed, and Z mb1/a P(τ0 > m, XT0 < b) = (1 + a)m−1−a (1 − y/m)m y a dy ∼ (1 + a)Γ(1 + a)m−1−a as m ↑ ∞. 0

Consequently, by the converse Borel-Cantelli lemma, P(τk > (k log k)1/p and XTk < b infinitely often ) = 1.

(2.8)

Since Tn /n converges to E(τ0 ) almost surely, it follows that, for some positive constant c depending on E(τ0 ), n+[c(n log n)1/p ]

lim sup n

X

(f (b) − f (Xi )) ≥ 0 almost surely.

(2.9)

i=n+1

Consider now a stationary and Gaussian centered sequence (gk )k∈Z with convergent series of covariances. If follows from both the Borel-Cantelli lemma and the usual tail inequality for Gaussian random variables that, for any positive θ, n+[c(n log n)1/p ] X lim inf (gi + θ) ≥ 0 almost surely. n

i=n+1

R1 Taking θ = ( 0 f (t)dt − f (b))/2 in the above inequality, we then infer from the two above inequalities that Z 1 n+[c(n log n)1/p ]   X 1 g + f (t)dt − f (X ) ≥ θ almost surely, lim sup i i 1/p ] n→∞ [c(n log n) 0 i=n+1 which implies Theorem 2.2. 6

3

Upper bounds for the rate of approximation

In this section, we state the main result of this paper, which is a Kiefer type approximation theorem for absolutely regular sequences. In all this section, we assume that the underlying probability space (Ω, A, P), is rich enough to contain a sequence (Ui )i∈Z = (ηi , δi , υi , i )i∈Z of iid random variables with uniform distribution over [0, 1]4 , independent of (Xi )i∈Z . Theorem 3.1 Let (Xi )i∈Z be a strictly stationary sequence of random variables in Rd . Let Fj be the distribution function of the j-th marginal of X0 . Assume that the absolutely regular coefficients of (Xi )i∈Z are such that βn = O(n1−p ) for some p ∈]2, 3]. Then 1. for all (s, s0 ) in R2d , the series ΛX (s, s0 ) defined by (1.6) converges absolutely. 2. For any (s, s0 ) ∈ R2d and (t, t0 ) in R+ × R+ , let ΓX (s, s0 , t, t0 ) = min(t, t0 )ΛX (s, s0 ). There exists a centered Gaussian process KX with covariance function ΓX , whose sample paths are almost surely uniformly continuous with respect to the pseudo metric d((s, t), (s0 , t0 )) = |t − t0 | +

d X

|Fj (sj ) − Fj (s0j )| ,

(3.1)

j=1

and such that  E

(a)

 |RX (s, k) − KX (s, k)| = O(n1/p (log n)λ(d) ) .

sup s∈Rd , k≤n

Furthermore, one can construct another centered Gaussian process KX with the above covariance function in such a way that (b)

sup

|RX (s, k) − KX (s, k)| = O(n1/p (log n)λ(d)+ε+1/p )

almost surely, for any ε > 0

s∈Rd , k≤n

In both items λ(d) =

3d 2

+2−

2+d 2p



1p∈]2,3[ + 2 +

4d 3



1p=3 .

From the above theorem, in the independent setting, the error in the L1 approximation is bounded up by n1/3 (log n)2+4d/3 , whereas the almost sure error is bounded up by n1/3 (log n)ε+(9+4d)/3 , for any ε > 0. However, in that case, the powers of log n can be improved as follows. Theorem 3.2 Let (Xi )i∈Z be a sequence of independent and identically distributed random variables in Rd . Then one can construct a centered Gaussian process KX with covariance function ΓX (s, s0 , t, t0 ) = min(t, t0 )(F (s ∧ s0 ) − F (s)F (s0 )) where s ∧ s0 = (min(s1 , s01 ), . . . , min(sd , s0d )) , whose sample paths are almost surely uniformly continuous with respect to the pseudo metric d defined in (3.1), and such that   (a) E sup |RX (s, k) − KX (s, k)| = O(n1/3 (log n)(2d+3)/3 ) . s∈Rd , k≤n

Furthermore, one can construct another centered Gaussian process KX with the above covariance function in such a way that (b)

sup

|RX (s, k) − KX (s, k)| = O(n1/3 (log n)+(2d+4)/3 ) almost surely, for any ε > 0 .

s∈Rd , k≤n

Recently, Merlev`ede and Rio (2012) obtained efficient strong approximation results for partial sums of real-valued random variables. In the bounded case, under the mixing condition βn = O(n1−p ), they obtain in their Theorem 2.1 (see Item 1(b)) the rate of almost sure approximation O(n1/p (log n)(1/2)+ε ). According to the results of Section 2, the power of n cannot be improved, contrary to the previous papers on the same subject. Starting from Theorem 3.1, we can derive an extension of this result to partial sums of random vectors in Rd , in the same way as Borisov (1983) derives strong approximation of partial sums from the strong Kiefer approximation of Koml´os, Major and Tusn´ady. 7

Corollary 3.1 Let (Xi )i∈Z be a strictly stationary and absolutely regular sequence of bounded random vectors in Rd . Assume that its absolutely regular P coefficients are such that βn = O(n1−p ) for some p ∈]2, 3]. Then the series of covariance matrices k∈Z Cov(X0 , Xk ) is convergent to a non-negative definite symmetric matrix Γ. Furthermore, there exists a sequence (Zi )i≥1 of iid random vectors with law Pk N (0, Γ) such that, setting ∆k = i=1 (Xi − E(Xi ) − Zi ),  (a) E sup k∆k k = O(n1/p (log n)λ(d) ) . k≤n

In addition, there exists another sequence (Zi )i≥1 of iid random vectors with law N (0, Γ) such that, for any positive ε, sup k∆k k = o(n1/p (log n)λ(d)+ε+1/p ) almost surely.

(b)

k≤n

In both items, λ(d) is defined in Theorem 3.1. Proof of Corollary 3.1. Adding some constant vector to the initial random vectors if necessary, we may assume that the components of the random vectors Xi are non-positive. For each integer i, we set (1) (2) (d) Xi = (Xi , Xi , . . . , Xi ). From our assumptions, there exists some positive constant M such that, for any integer i and any Xi belongs to [−M, 0]d . Then, for any j in [1, d], Z 0 (j) Xi = − 1X (j) ≤t dt . −M

i

(j)

Let then KX (t, k) = KX ((1, . . . , 1, t, 1, . . . , 1), k), where t is the j-th component. Define the random vectors Zk for any positive integer k by Z 0   (j) (j) (j) Zk = − KX (t, k) − KX (t, k − 1) dt for any j ∈ [1, d] . −M

Then the so defined sequence (Zk )k>0 is a Gaussian sequence (this means that, for any positive integer n, (Z1 , . . . , Zn ) is a Gaussian vector) and, from the definition of the covariance of KX , the random vectors Zk are not correlated. It follows that (Zk )k>0 is a sequence of independent Gaussian random vectors. Now, from the definition of Zk , XZ 0 Z 0 (j) (i) Cov(Zk , Zk ) = Cov(1X (i) ≤t , 1X (j) ≤s )dtds . m∈Z

−M

0

−M

m

Hence, interverting the summation and the integral in the above formula, we get that X (j) (i) (i) (j) Cov(Zk , Zk ) = Cov(X0 , Xm ) = Γij , m∈Z

which implies that Zk has the prescribed covariance. Next Z 0  (j) (j) (j) (j) (j) (j) Xk − E(Xk ) − Zk = KX (t, k) − KX (t, k − 1) + P(Xk ≤ t) − 1X (j) ≤t dt . k

−M

(j)

(j)

Let then ∆k denote the j-th component of ∆k and RX (t, k) = RX ((1, . . . , 1, t, 1, . . . , 1), k), where t is the j-th component. From the above identity, Z 0 (j) (j) (j) ∆k = (KX (t, k) − RX (t, k))dt . −M

It follows that, for any integer j in [1, d], (j)

sup |∆k | ≤ M sup |RX (s, k) − KX (s, k)| .

k≤n

s∈Rd k≤n

Part (a) (resp. Part (b)) of Corollary 3.1 follows then from both these inequalities and Part (a) (resp. Part (b)) of Theorem 3.1. 8

4

Proof of Theorem 3.1

In this section we shall sometimes use the notation an  bn to mean that there exists a numerical constant C not depending on n such that W an ≤ Cbn , for all positive integers n. We shall also use the notations Fk = σ(Xj , j ≤ k) and F∞ = k∈Z Fk . For any (s, s0 ) ∈ R2d , by using Lemma 7.4 with U = 1X0 ≤s , V = 1Xk ≤s0 , r = 1 and s = ∞, we get that |Cov(1X0 ≤s , 1Xk ≤s0 )| ≤ 2βk . P Since k≥0 βk < ∞, Item 1 of Theorem 3.1 follows. To prove Item 2, we first transform the random variables Xi . With this aim, for any k in Z and any j (j) in {1, . . . , d}, we denote by Xk the j-th marginal of Xk . By Lemma 7.4 applied with p = 1 and q = ∞, there exists some non-negative random variable b(x, k) with values in [0, 1] such that, for any functions f : Rd → R and g : Rd → [−1, 1], Cov(f (X0 ), g(Xk )) ≤ 2E(b(X0 , k)|f (X0 )|) and E(b(X0 , k)) ≤ βk .

(4.1)

Let then (j)

bj (X0 , k) = E(b(X0 , k) | X0 ) . We now introduce another probability on Ω. Let P∗0,j be the probability on Ω whose density with respect to P is ∞ ∞ X X  Cj−1 1 + 4 bj (X0 , k) with Cj = 1 + 4 E(bj (X0 , k)) . (4.2) k=1

k=1

(j)

(j)

Let Pj be the law of X0 . Notice then that the image measure Pj∗ of P∗0,j by X0 is absolutely continuous with respect to Pj with density ∞ X  Cj−1 1 + 4 bj (x, k) . (4.3) k=1

Pj∗ ,

Let FPj∗ be the distribution function of and let FPj∗ (x−0) = supz0

Note that, if (u1 , . . . , ud ) = (FP1∗ (s1 ), . . . , FPd∗ (sd )) and (v1 , . . . , vd ) = (FP1∗ (s01 ), . . . , FPd∗ (s0d )), then the following equalities hold in distribution: GY (u, n) = GX (s, n) and GY (v, n) = GX (s0 , n). Hence   Var GY (u, n) − GY (v, n) = Var GX (s, n) − GX (s0 , n) . Now, by definition of the covariance of GX ,   n−1 Var GX (s, n) − GX (s0 , n) = lim N −1 Var RX (s, N ) − RX (s0 , N ) . N →∞

Hence, by (4.1) and Corollary 1.4 in Rio (2000), X   n−1 Var GX (s, n) − GX (s0 , n) ≤ E |1X0 ≤s0 − 1X0 ≤s |(1 + 4 b(X0 , k) . k>0

Now |1X0 ≤s0 − 1X0 ≤s | ≤

d X

|1X (j) ≤s0 − 1X (j) ≤sj | .

j=1

0

0

j

Taking into account the definition of P∗0,j , it follows that d X   Var GX (s, n) − GX (s0 , n) ≤ n Cj P∗0,j |1X (j) ≤s0 − 1X (j) ≤sj | . j=1

0

j

0

Now, by definition of P∗j ,  P∗0,j |1X (j) ≤s0 − 1X (j) ≤sj | = P∗j (] min(sj , s0j ), max(sj , s0j )]) = |ui − vi | , 0

j

0

whence d d X X  Var GX (s, n) − GX (s , n) ≤ n Cj |uj − vj | ≤ nC(β) |ui − vi | , 0

j=1

i=1

which completes the proof of (4.7) . We shall prove in what follows that the conclusion of Theorem 3.1 holds for the stationary sequence (Yi )i∈Z and the associated continuous Gaussian process KY with covariance function ΓY defined by (4.6). This will imply Theorem 3.1 by taking for s = (s1 , . . . , sd ),  KX (s, t) = KY (FP1∗ (s1 ), . . . , FPd∗ (sd )), t ,  since for any (s, s0 ) = (s1 , . . . , sd ), (s01 , . . . , s0d ) ∈ R2d ,  ΓX (s, s0 , t, t0 ) = ΓY (FP1∗ (s1 ), . . . , FPd∗ (sd )), (FP1∗ (s01 ), . . . , FPd∗ (s0d )), t, t0 . We start by a reduction to a grid and a discretization.

4.1

Reduction to a grid

In this section, we consider a strictly stationary sequence of random variables Zi in Rd with marginal distributions with support included in [0, 1] and bounded densities. Our aim is to compare the maximal deviation over the unit cube with the maximal deviation over a grid. Let An denote the set of x in [0, 1]d

10

such that nx is a multivariate integer. The main result of the section is that, if the marginal densities are each bounded by M , then, for any integer k ≤ n, sup |RZ (s, k)−GZ (s, k)| ≤ sup |RZ (s, k)−GZ (s, k)|+dM +

sup ks−s0 k

s∈An

s∈[0,1]d

|GZ (s, k)−GZ (s0 , k)| , (4.8)

∞ ≤1/n

where we recall that GZ is a two parameters Gaussian process with covariance function ΓZ defined by (4.6). We now prove the above inequality. For each s = (s1 , . . . , sd ), we set π− (s) = n−1 ([ns1 ], . . . , [nsd ]) and π+ (s) = π− (s) + n−1 (1, . . . , 1). From the monotonicity of the multivariate distribution function F and the empirical distribution function Fk , kFk (π− (s)) − kF (π+ (s)) ≤ RZ (s, k) ≤ kFk (π+ (s)) − kF (π− (s)) . Next let Fi denote the distribution function of the i-th coordinate of Z0 . From our assumption Fi (t) − Fi (s) ≤ M |t − s| . Now, for any s = (s1 , . . . sd ) and t = (t1 , . . . , td ) with s ≤ t, 0 ≤ F (t) − F (s) ≤

d X

(Fi (ti ) − Fi (si )) ,

i=1

which, together with the above inequality, ensures that 0 ≤ kF (π+ (s)) − kF (π− (s)) ≤ k(M d/n) ≤ M d since k ≤ n. Hence RZ (π− (s), k) − dM ≤ RZ (s, k) ≤ RZ (π+ (s), k) + dM . Let then DZ (k, n) =

|GZ (s, k) − GZ (s0 , k)| .

sup ks−s0 k∞ ≤1/n

Clearly −GZ (π− (s), k) − DZ (k, n) ≤ −GZ (s, k) ≤ −GZ (π+ (s), k) + DZ (k, n) . Let ∆Z = RZ − GZ . Adding the two above inequalities, we now get that ∆Z (π− (s), k) − dM − DZ (k, n) ≤ ∆Z (s, k) ≤ ∆Z (π+ (s), k) + dM + DZ (k, n) , which implies immediately (4.8).

4.2

Discretization

We now apply the inequality (4.8) to our problem. Let N ∈ N∗ and let k ∈]1, 2N +1 ]. We first notice that for any construction of a Kiefer process GY with covariance function ΓY defined by (4.6), sup

N X DL (GY ) . sup RY (s, k) − GY (s, k) ≤ sup RY (s, 1) − GY (s, 1) +

1≤k≤2N +1 s∈[0,1]d

s∈[0,1]d

(4.9)

L=0

where DL (GY ) :=

sup (RY (s, `) − RY (s, 2L )) − (GY (s, `) − GY (s, 2L )) .

sup 2L [(2L − i − 1)/2] + 1. Starting from (6.69) and using Lemma 7.5, we get that  ∗ ∗ 2(d+1) 2(d+1) ~ i+j,L ~ i,L E D2 g ∗ ϕa ~Si−1 .(N ⊗N )  a−1 βj  a−1 β[(2L −i−1)/2] . (6.70) L L (L + 1) L (L + 1) Starting from (6.61), summing the inequalities (6.63), (6.65), (6.66), (6.67), (6.68) and (6.70) in j, adding them, and taking into account that βk = O(k 1−p ) with p ∈]2, 3], we then infer that L −1  ∗  uX   ∗ ∗ ~ i+1 .N ~ i+2j . E V ~ i,L − ~i+j,L ~i,L E Dg ∗ ϕa ~Si−1 + T E D2 g ∗ ϕaL ~Si−1 + T 1j≤[(2L −i+1)/2] ⊗V L

j=1



a−1 L (L

+ 1)

L 2X −i

2(d+1)

4(d+1) L(3−p) 4d+5 j 1−p + a−3 2 + a−3 1p=3 L (L + 1) L (L + 1)

j=(2L −i)∧uL 6(d+1) 6−2p 6(d+1) 4(d+1) 3−p + a−5 uL + a−5 (log(uL ))2 1p=3 + a−3 uL L (L + 1) L (L + 1) L (L + 1) 4(d+1) 2(d+1) + a−3 log(uL )1p=3 + a−1 uL β[(2L −i−1)/2] . L (L + 1) L (L + 1)

Next summing on i and taking into account the choice of uL and that aL ≥ (L + 1)d+1 , we get that 2L L −1 X  ∗  uX   ∗ ∗ ~ ~ ~ i+2j .E V ~ ~i+j,L ~i,L E D2 g∗ϕaL ~Si−1 + T ⊗V 1j≤[(2L −i+1)/2] E Dg∗ϕaL Si−1 + Ti+1 .Ni,L − i=1

j=1

 p(d+1) L 4(d+1) L L(3−p)  a1−p 2 + a−3 2 2 + L1p=3 . (6.71) L (L + 1) L (L + 1) Hence, starting from (6.39) and considering the upper bounds (6.44) and (6.50) together with the fact that aL ≥ (L + 1)d+1 , we get that L

2 X

L



E ∆2,i,L (g) −

i=1

~ i+2j E D2 g ∗ ϕaL ~Si−1 + T



 ∗ ∗ ~i+j,L ~i,L .E V ⊗V 1j≤[(2L −i+1)/2]

i=1 j=1

− a1−p L (L



2 uX L −1 X

1 2

2L X

  ~ i+1 .E V ~ ∗⊗2 E D2 g ∗ ϕaL ~Si−1 + T i,L

i=1 p(d+1) L

 4(d+1) L L(3−p) 2 + a−3 2 2 + L1p=3 . L (L + 1)

+ 1)

(6.72)

Gathering (6.38) and (6.72), it follows that L

2 X

L

2   X E ∆1,i,L (g) − E ∆2,i,L (g) − Ri,L

i=1

 (L + 1)

d+1

+

a1−p L (L

+

i=1 p(d+1) L 1) 2 + a−2 L (L

 4(d+1) L L(3−p) + 1)3d+4 2L 1p=3 + a−3 2 2 + L1p=3 , L (L + 1) (6.73)

where Ri,L :=

uX L −1

  ~ i+1 .E V ~∗ ~∗ E D2 g ∗ ϕaL ~Si−2j + T i−j,L ⊗ Vi,L 1j≤[i/2]

j=1



uX L −1

~ i+2j E D2 g ∗ ϕaL ~Si−1 + T

j=1

39



 ~∗ ~∗ .E V i+j,L ⊗ Vi,L 1j≤[(2L −i+1)/2] . (6.74)

We get now an upper bound of

P2L

i=1

Ri,L . We first write that

   ∗ ∗ ~ i+1 − D2 g ∗ ϕa ~Si−2j + T ~ i+1 . E V ~i−j,L ~i,L E D2 g ∗ ϕaL ~Si−1 + T ⊗V L Z 1  i−1 X  ∗  ∗ ∗ ~ i+1 + t(~Si−1 − ~Si−2j ) .V ~m,L ⊗ E V ~i−j,L ~i,L dt . = E D3 g ∗ ϕaL ~Si−2j + T ⊗V 0

m=i−2j+1

Next   ∗  ~ i+1 + t(~Si−1 − ~Si−2j ) .V ~ ~∗ ~∗ E D3 g ∗ ϕaL ~Si−2j + T m,L ⊗ E Vi−j,L ⊗ Vi,L  X X X   ∂ 3 g ∗ ϕaL ~Si−2j + T ~ i+1 + t(~Si−1 − ~Si−2j ) × Ve (K,kK ) E = m,L ∂x(K,kK ) ∂x(P,pP ) ∂x(Q,qQ ) K,k P,p Q,q P

K

Q

(P,p ) (Q,q )  × E Vei−j,LP Vei,L Q . (P,p ) Using Lemma 7.4 with U = Vei−j,LP ,

X X

V =

 (Q,q ) Vei,L Q E

K,kK Q,qQ

  ∂ 3 g ∗ ϕaL ~Si−2j + T ~ i+1 + t(~Si−1 − ~Si−2j ) × Ve (K,kK ) , m,L ∂x(K,kK ) ∂x(P,pP ) ∂x(Q,qQ )

U = σ(Y` , ` ≤ i + 2L − j), V = σ(Yi+2L ), r = 1 and s = ∞, we get that we get that there exists a U-measurable random variable bU (i + 2L ) such that   ∗  ∗ ∗ ~ i+1 + t(~Si−1 − ~Si−2j ) .V ~m,L ⊗ E V ~i−j,L ~i,L ⊗V E D3 g ∗ ϕaL ~Si−2j + T   X X (P,p ) ×E Vei−j,LP bU (i + 2L ) kV k∞ . ≤2 d+1 d+1 pP ∈ EL,Q P ∈IL

Using Lemma 7.3 and the fact that

P

K,kK

P

Q,qQ

(Q,qQ ) (K,kK ) Ve Vem,L ≤ 4(L + 1)2(d+1) , we derive that i,L

2(d+1) kV k∞  a−2 . L (L + 1)

On the other hand,

P

d+1 P ∈IL

P

d+1 pP ∈ EL,Q

 (P,p ) |Vei−j,LP | ≤ 2(L + 1)d+1 and E bU (i + 2L ) ≤ βj . Therefore

  ∗  3(d+1) ~ i+1 + t(~Si−1 − ~Si−2j ) .V ~ ~∗ ~∗ ⊗ E V ⊗ V βj . E D3 g ∗ ϕaL ~Si−2j + T ≤ a−2 m,L i−j,L i,L L (L + 1) So overall, uX L −1

   ∗ ∗ ~ i+1 − D2 g ∗ ϕa ~Si−2j + T ~ i+1 . E V ~i−j,L ~i,L ⊗V E D2 g ∗ ϕaL ~Si−1 + T L

j=1 3(d+1)  a−2 L (L + 1)

uL X

jβj . (6.75)

j=1

~ i+j+1 − ~Si−1 − T ~ i+2j , we write that that On the other hand, setting Ai,j = ~Si+j−1 + T    ∗ ∗ ~ i+j+1 − D2 g ∗ ϕa ~Si−1 + T ~ i+2j . E V ~i+j,L ~i,L E D2 g ∗ ϕaL ~Si+j−1 + T ⊗V L i+j−1 X Z 1     ~ i+2j + tAi,j . V ~∗ −N ~∗ ~∗ ~∗ = E D3 g ∗ ϕaL ~Si−1 + T ⊗ E V ⊗ V dt . m,L m+1,L i+j,L i,L m=i

0

By the previous computations, i+j−1 X

  ∗  ∗ ∗ 3(d+1) ~ i+2j + tAi,j .V ~m,L ⊗ E V ~i+j,L ~i,L ⊗V jβj . E D3 g ∗ ϕaL ~Si−1 + T  a−2 L (L + 1)

m=i

40

On the other hand, using (6.45) together with (6.15), we derive that   ∗  ∗ ∗ ~ i+2j + tAi,j .N ~ m+1,L ⊗ E V ~i+j,L ~i,L E D3 g ∗ ϕaL ~Si−1 + T ⊗V i+j Z X

=

m=i+1

1

~ i+2j + tAi,j (1 − t)E D4 g ∗ ϕaL ϕaL ~Si−1 + T



  ~∗ ~∗ ~∗ ~∗ .E V m+1,L ⊗ V`,L ⊗ E Vi+j,L ⊗ Vi,L .

0

Hence by Lemma 7.5, it follows that i+j−1 X

 X  ∗  ∗ ∗ 4(d+1) ~ i+2j + tAi,j .N ~ m+1,L ⊗ E V ~i+j,L ~i,L ⊗V jβj β` . E D3 g ∗ ϕaL ~Si−1 + T  a−3 L (L + 1)

m=i

`≥0

So overall, since

P

`≥0

β` < ∞ and aL ≥ (L + 1)d+1 ,

uX L −1

   ~ i+j+1 − D2 g ∗ ϕa ~Si−1 + T ~ i+2j . E V ~∗ ~ ∗ ⊗ V E D2 g ∗ ϕaL ~Si+j−1 + T i+j,L i,L L

j=1 3(d+1)  a−2 L (L + 1)

uL X

jβj . (6.76)

j=1

Starting from (6.74) and considering the upper bounds (6.75) and (6.76) together with the assumption that βk = O(k 1−p ) for p ∈]2, 3] and the choice of uL , we derive that 2L X ei,L )  a1−p (L + 1)p(d+1) 2L + a−2 (L + 1)3d+4 2L 1p=3 , (Ri,L − R L L

(6.77)

i=1

where ei,L := R

uX L −1

  ∗ ∗ ~ i+1 .E V ~i−j,L ~i,L E D2 g ∗ ϕaL ~Si−1 + T ⊗V 1j≤[i/2]

j=1



uX L −1

  ∗ ∗ ~ i+j+1 .E V ~i+j,L ~i,L E D2 g ∗ ϕaL ~Si+j−1 + T ⊗V 1j≤[(2L −i+1)/2] . (6.78)

j=1

Observe now that L

2 X

ei,L = R

i=1

uX L −1

L

2 X

  ∗ ∗ ~ i+1 .E V ~i−j,L ~i,L E D2 g ∗ ϕaL ~Si−1 + T ⊗V

j=1 i=2L −2j+2



uX L −1 2j−1 X

  ~ i+1 .E V ~∗ ~∗ E D2 g ∗ ϕaL ~Si−1 + T i−j,L ⊗ Vi,L .

j=1 i=j+1

Hence by Lemma 7.5, the assumption that βk = O(k 1−p ) for p ∈]2, 3] and the choice of uL , it follows that uL 2L X X (5−p)(d+1) 2d+3 ei,L  a−1 (L + 1)2(d+1) R jβj  a2−p + a−1 1p=3 . L L (L + 1) L (L + 1) i=1

(6.79)

j=1

Starting from (6.18) and considering the upper bounds (6.73), (6.77) and (6.79), the inequality (4.46)   ~∗ ~∗ ~∗ ~∗ follows for supg∈Lip(c∗ ,F2L ) E g(SL,d + WaL ) − E g(NL,d + WaL ) . This ends the proof of the lemma. (d+1)L 

6.2

Proof of Lemma 5.1

Recall that  ~L,d + G ~ a , T~L,d + G ~ 0a ) = Wc E c(d+1)L (S (PS~L,d ∗ PG ~ a , NCL,d ∗ PG ~a) . L (d+1)L L 41

As in the proof of Lemma 4.2 we shall use the Lindeberg method. With this aim, we consider a sequence  (d+1)L ~ i,L of independent centered Gaussian vectors N of R2 , independent of F∞ ∨ σ(ηi , i ∈ Z) such 1≤i≤2L that t t t ~ k,L N ~ k,L ~ (0) (U ~ (0) )t ) = E(V ~k,L V ~k,L E(N ) = ~ek,L~ek,L ⊗ E(U ). 1,L 1,L Defining ~ L,d = N ~ 1,L + N ~ 2,L + · · · + N ~ 2L ,L , N ~ L,d ). Let now W ~ a∗ be a random vector in R2(d+1)L with law N (0, a2 I2(d+1)L ) we notice that L(T~L,d ) = L(N L L ~ i,L , 1 ≤ i ≤ 2L ) ∨ σ(ηi , i ∈ Z). Let W ~ a = P⊗(d+1) W ~ a∗ . With these notations, independent of F∞ ∨ σ(N L L L we can write that    ~L,d + W ~ a ) − E(f (N ~ L,d + W ~ a )) . Wc(d+1)L (PS~L,d ∗ PG sup E f (S ~ a , NCL,d ∗ PG ~a) = L L f ∈Lip(c(d+1)L )

Using the notations (6.7) and setting ⊗(d+1)

f ∗ = f ◦ PL

,

(6.80)

we get overall that  ~L,d + G ~ a , T~L,d + G ~ 0a ) = E c(d+1)L (S L L



sup f ∈Lip(c(d+1)L )

  ∗ ∗ ~L,d ~ a∗ ) − E(f ∗ (N ~ L,d ~ a∗ )) . E f ∗ (S +W +W L L

Using Notation 6.2, we then write that ~L,d + G ~ a , T~L,d + G ~0 ) E c(d+1)L (S aL L =

sup



2L  X   ∗ ∗ ~ i+1 − f ∗ ∗ ϕa ~Si−1 + N ~ i+1 ~i,L ~ i,L E f ∗ ∗ ϕaL ~Si−1 + V +T + T L

f ∈Lip(c(d+1)L ) i=1



sup

2L  X   ∗ ∗ ~i,L ~ i,L E f ∗ ∗ ϕaL ~Si−1 + V − f ∗ ∗ ϕaL ~Si−1 + N .

f ∈Lip(c(d+1)L ) i=1

Recall Definition 6.2 and for any i ∈ {1, . . . , 2L }, let  ∗⊗2   1 ∗ ~ ~i,L ∆1,i,L (f ) = f ∗ ∗ ϕaL ~Si−1 + V − f ∗ ∗ ϕaL ~Si−1 − D2 f ∗ ∗ ϕa ~Si−1 .V i,L , 2 and

  1  ∗⊗2 ∗ ~ i,L ~ ∆2,i,L (f ) = f ∗ ∗ ϕaL ~Si−1 + N − f ∗ ∗ ϕaL ~Si−1 − D2 f ∗ ∗ ϕa ~Si−1 .N i,L . 2 With these notations, it follows that L

 ~L,d + G ~ a , T~L,d + G ~0 ) ≤ E c(d+1)L (S aL L

2 X

sup

 E(∆1,i,L (f )) − E(∆2,i,L (f )) .

(6.81)

f ∈Lip(c(d+1)L ) i=1

~ ∗⊗3 ) = 0, we get By using Taylor’s integral formula, independence and noticing that E(N i,L E(∆1,i,L (f )) − E(∆2,i,L (f ))) Z 1 Z 1  ∗⊗3  ∗⊗4 (1 − t)2 3 ∗ (1 − t)3 4 ∗ ∗ ∗ ~ ~ ~ ~ i,L ~ =E D f ∗ ϕaL Si−1 + tVi,L .Vi,L + E D f ∗ ϕaL ~Si−1 + tN .N i,L . 2 6 0 0 (6.82) Notice first that by the properties of the convolution product,  ∗⊗3  ~ ∗ .V ~ ~ ∗ ∗ D2 ϕa (·).V ~ ∗⊗2 )(~Si−1 + tV ~∗ ) ED3 f ∗ ∗ ϕa ~Si−1 + tV = E (Df ∗ (·).V i,L i,L i,L L L i,L i,L Z   ∗ ∗ ~ ~ ∗⊗2 dz . ≤ E sup Df (z).Vi,L D2 ϕaL (z).V i,L z

(d+1)L

R2

42

But ⊗(d+1)

~ ∗ = Df P Df ∗ (z).V i,L L

 ⊗(d+1) ∗ ~ ≤ (z) .PL V i,L

~i,L ≤ c(d+1)L (~0, V ~i,L ) ≤ 1 . Df (u).V

sup (d+1)L

u∈R2

In addition, according to Lemma 5.4 in Dedecker, Merlev`ede and Rio (2012), there exists a constant c not depending on d nor on L such that Z ~ ∗⊗2 dz ≤ c a−2 kV ~i,L k22,d,L E D2 ϕaL (z).V i,L L (d+1)L

R2

where ~i,L k2 kV 2,d,L =

(K,kK ) 2

X

X

K∈{0,...,L}(d+1)

(d+1) kK ∈ EL,K

) ≤ 2(L + 1)d+1 .

(Vei,L

So overall,  ∗⊗3  ∗ ~i,L ~ E D3 f ∗ ∗ ϕa ~Si−1 + tV  a−2 (L + 1)d+1 2L . .V L i,L L

(6.83)

We deal now with the second term in the right hand side of (6.82). With this aim, we notice that (6.80) together with (6.6) imply that if f ∈ Lip(c(d+1)L ) then f ∗ ∈ Lip(c∗(d+1)L ) where c∗(d+1)L is defined in Definition 6.1. Therefore  ∗⊗4   ∗⊗4  ∗ ∗ ~ i,L ~ ~ i,L ~ E D4 f ∗ ∗ ϕa ~Si−1 + tN E D4 g ∗ ϕa ~Si−1 + tN ≤ . sup .N sup .N L L i,L i,L g∈Lip(c∗ ) (d+1)L

f ∈Lip(c(d+1)L )

Applying Lemma 7.2 as we did to get (6.44), we infer that, for any i ∈ {1, . . . , 2L } and any g ∈ Lip(c∗(d+1)L ),  X  ∗⊗4  ∗ ~ i,L ~ E D4 g ∗ ϕa ~Si−1 + tN  a−3 (L + 1)(d+1)/2 . N L L i,L d+1 K∈IL

~i,L V ~ t ), we get that ~ i,L N ~ t ) = E(V Since E(N i,L i,L X X e (K,kK ) )2 ) = E((N i,L d+1 K∈IL

(d+1) kK ∈ EL,K

X

X

d+1 K∈IL

(d+1) kK ∈ EL,K

X

(K,kK ) 2 

e E (N i,L

)

2

.

(6.84)

(d+1) kK ∈ EL,K

(K,kK ) 2

E((Vei,L

) ) ≤ 2(L + 1)d+1 .

Therefore, starting from (6.84), we derive that for any i ∈ {1, . . . , 2L }, any g ∈ Lip(c∗(d+1)L ) and any t ∈ [0, 1],  ∗⊗4  ∗ ~ i,L ~ E D4 g ∗ ϕa ~Si−1 + tN  a−3 (L + 1)(d+1)/2  a−3 (L + 1)3(d+1)/2 . .N (6.85) L i,L L L Starting from (6.81) and considering (6.82) together with the upper bounds (6.83) and (6.85), the lemma follows. 

7

Appendix B

This section is devoted to various technical lemmas.

7.1

Upper bounds for the partial derivatives

We gather now some lemmas concerning the upper bounds for partial derivatives. Their proofs are omitted since they are based on the same arguments as those used in Appendix A in Dedecker, Merlev`ede and Rio (2012). In what follows d and L are nonnegative integers and K = (K0 , . . . , Kd ) ∈ {0, . . . , L}(d+1) . We shall denote for any i = 0, . . . , d, E(L, Ki ) = {1, . . . , 2L−Ki } ∩ (2N + 1) , and (d+1)

EL,K

=

d Y i=0

43

E(L, Ki ) .

Qd (d+1) Therefore the notation kK ∈ EL,K means that kK = (kK0 , . . . , kKd ) ∈ i=0 E(L, Ki ). Let x and y be two column vectors of R(d+1)L with coordinates  t  t (d+1)  (d+1)  x = x(K,kK ) , kK ∈ EL,K K∈{0,...,L}(d+1) and y = y (K,kK ) , kK ∈ EL,K K∈{0,...,L}(d+1) . (d+1)L

into R that is Lipschitz with respect to the distance c∗(d+1)L defined in

Let f be a function from R2 Definition 6.1. This means that

X

|f (x) − f (y)| ≤

sup

|x(K,kK ) − y (K,kK ) | .

(d+1) K∈{0,...,L}(d+1) kK ∈ EL,K

Let a > 0 and ϕa be the density of a centered Gaussian law of R(d+1)L with covariance a2 I2(d+1)L (I2(d+1)L (d+1)L being the identity matrix on R2 ). Let also kxk∞,d,L =

X

|x(K,kK ) | and kxk2,d,L =

sup (d+1)

K∈{0,...,L}(d+1) kK ∈ EL,K



X

X

(x(K,kK ) )2

1/2

.

K∈{0,...,L}(d+1) kK ∈ E (d+1) L,K

For the statements of the next lemmas, we refer to Definition 6.2. Lemma 7.1 The partial derivatives of f exist almost everywhere and the following inequality holds: Df (y).u ≤ 1 . sup sup (d+1)L

y∈R2

(d+1)L

u∈R2

, kuk∞,d,L ≤1

In addition sup K∈{0,...,L}(d+1)

X (d+1) kK ∈ EL,K



∂f

(y) ≤ 1. ∂x(K,kK )

(7.1)

(d+1)L

Lemma 7.2 Let X and Y be two random variables with values in R2 . For any positive integer m and any t ∈ [0, 1], there exists a positive constant γm depending only on m such that    E Dm f ∗ ϕa (Y + tX).X ⊗m ≤ γm a1−m E kXk∞,d,L × kXkm−1 2,d,L . Lemma 7.3 For any integer m ≥ 1, there exists a positive constant κm depending only on m such that

∂ m ϕa

sup

Qm

≤ κm a−m . (Ki ,kKi ) 1 ∂x (K(i),kK(i) ),i=1,...,m i=1 (d+1)L

In addition, for any integer m ≥ 1 and any y ∈ R2 , ∂ m f ∗ ϕa sup (y) Qm ≤ κm−1 a1−m . (K(i),kK(i) ) ∂x (K(i),kK(i) ),i=1,...,m i=1 (d+1)

The supremum above are taken over all the indexes K(i) ∈ {0, . . . , L}(d+1) and kK(i) ∈ EL,K(i) for any i = 1, . . . , m.

7.2

Covariance inequalities

We first recall the following covariance inequality due to Delyon (1990) (see also Theorem 1.4 in Rio (2000)). Lemma 7.4 Let r and s in [1, ∞] such that r−1 + s−1 = 1. Let U and V be real random variables respectively in Lr and Ls , that are respectively U and V measurable. Then there exist two random variables bU and bV with values in [0, 1], measurable respectively with respect to U and to V, such that E(bU ) = E(bV ) = β(U, V) and  1/r  1/s |Cov(U, V )| ≤ 2 E |U |r bU E |V |s bV .

44

Notice that if U = f (X, η) and V = g(Y, δ) where X, η, Y, δ are random variables such that (X, Y ) is independent of (η, δ) and η is independent of δ, then the random variables bU and bV satisfy E(bU ) = E(bV ) = β(σ(X), σ(Y )). For the next lemma, we refer to Definition 6.2. ~ be a random variable with values in R2(d+1)L . Let (V ~ ∗ )i be the random variables Lemma 7.5 Let Z i,L (d+1)L

in R2 defined in (6.11) and (β` )`≥0 the sequence of absolutely regular coefficients associated to the strictly stationary sequence (Xi )i∈Z . Let m be a positive integer, (ki )i≥1 and (`i )i≥1 two sequences of integers. Then, there exists a positive constant γm depending only on m such that m m  Y  O  ~ . ~∗ ⊗V ~ ∗ ≤ γm a1−2m (L + 1)2m(d+1) β|ki −`i | . E V E D2m g ∗ ϕa Z ki ,L `i ,L i=1

i=1

Proof of Lemma 7.5. We use the notation (6.12) and write m   O  ~ . ~k∗ ,L ⊗ V ~`∗,L E D2m g ∗ ϕa Z E V i i i=1

=

X X

  (P,p ) (Q,q ) E Vek1 ,L P Ve`1 ,L Q E

P,pP Q,qQ

m O  ∂2 2m−2 ∗ ∗ ~ ~ ~ D g ∗ ϕ ( Z). E V . ⊗ V a ki ,L `i ,L ∂x(P,pP ) ∂x(Q,qQ ) i=2

(P,p )

We apply now Lemma 7.4 with U = Vek1 ,L P , U = σ(Yk1 +2L ), V =

X

 (Q,q ) Ve`1 ,L Q E

Q,qQ

m O  ∂2 2m−2 ∗ ∗ ~ ~ ~ D g ∗ ϕ ( Z). E V ⊗ V , a ki ,L `i ,L ∂x(P,pP ) ∂x(Q,qQ ) i=2

V = σ(Y`1 +2L ), r = 1 and s = ∞. Hence, we derive that there exists a U-measurable random variable bU (`1 + 2L ) such that m   O  ~ . ~k∗ ,L ⊗ V ~`∗,L E V E D2m g ∗ ϕa Z i i i=1  X X  (P,p )

Ve (Q,qQ ) ≤2 E Vek1 ,L P bU (`1 + 2L ) ×

`1 ,L Q,qQ

P,pP

×

sup (P,pP ),(Q,qQ )

Since E(bU (`1 + 2L )) ≤ β|k1 −`1 | and

 E



m O  ∂2 2m−2 ~ ~k∗ ,L ⊗ V ~`∗,L . V D g ∗ ϕ ( Z). E a i i ∂x(P,pP ) ∂x(Q,qQ ) i=2

P

Q,qQ

(Q,qQ ) Ve ≤ 2(L + 1)d+1 , we get that `1 ,L

m   O  2m ~ ~k∗ ,L ⊗ V ~`∗,L E V E D g ∗ ϕa Z . i i i=1

≤ 8(L + 1)2(d+1) β|k1 −`1 | ×

sup (P,pP ),(Q,qQ )

 E

m   O ∂2 2m−2 ~ ~k∗ ,L ⊗ V ~`∗,L . D g ∗ ϕ ( Z). V a i i ∂x(P,pP ) ∂x(Q,qQ ) i=2

Next, if m ≥ 2, we write that m O  ∂2 2m−2 ~ ~∗ ⊗V ~∗ D g ∗ ϕ ( Z). E V a k ,L ` ,L i i ∂x(P,pP ) ∂x(Q,qQ ) i=2  X X (M,m ) (R,r ) = E Vek2 ,L M Ve`2 ,L R

 E

M,mM R,rR

 ×E

m O  ∂4 2m−4 ~ ~∗ ⊗V ~∗ D g ∗ ϕ ( Z). E V . a k ,L ` ,L i i ∂x(P,pP ) ∂x(Q,qQ ) ∂x(M,mM ) ∂x(R,rR ) i=3

45

(M,mM )

Applying Lemma 7.4 with U = Vek2 ,L X

V =

 (R,r ) Ve`2 ,L R E

R,rR

, U = σ(Yk2 +2L ),

m O  ∂4 2m−4 ∗ ∗ ~ ~ ~ D g ∗ ϕ ( Z). , E V ⊗ V a ki ,L `i ,L ∂x(P,pP ) ∂x(Q,qQ ) ∂x(M,mM ) ∂x(R,rR ) i=3

V = σ(Y`2 +2L ), r = 1 and s = ∞, we get that there exists a U-measurable random variable bU (`2 + 2L ) such that m O  ∂2 2m−2 ~ ~∗ ⊗V ~∗ D g ∗ ϕ ( Z). E V a k ,L ` ,L i i (Q,q ) Q ∂x(P,pP ) ∂x (P,pP ),(Q,qQ ) i=2  X X  (M,m )

Ve (R,rR ) E Vek2 ,L M bU (`2 + 2L ) × ≤2

`2 ,L

 E

sup



R,rR

M,mM

× sup

 sup E

(P,pP ) (M,mM ) (Q,qQ ) (R,rR )

m O  ∂4 2m−4 ∗ ∗ ~ ~ ~ D g ∗ ϕ ( Z). E V ⊗ V . a ki ,L `i ,L ∂x(P,pP ) ∂x(Q,qQ ) ∂x(M,mM ) ∂x(R,rR ) i=3

Since E(bU (`2 + 2L )) ≤ β|k2 −`2 | and

P

Q,qQ

(Q,qQ ) Ve ≤ 2(L + 1)d+1 , we get that `2 ,L

m O  ∂2 2m−2 ~ ~k∗ ,L ⊗ V ~`∗,L ≤ 8(L + 1)2(d+1) β|k −` | D g ∗ ϕ ( Z). E V a 2 2 i i ∂x(P,pP ) ∂x(Q,qQ ) (P,pP ),(Q,qQ ) i=2 m  O  ∂4 2m−4 ~ ~k∗ ,L ⊗ V ~`∗,L . × sup sup E D g ∗ ϕ ( Z). V a i i (Q,q ) (P,p ) (M,m ) (R,r ) Q ∂x M ∂x R ∂x P ∂x (P,pP ) (M,mM ) i=3

sup

(Q,qQ )

 E

(R,rR )

The lemma follows after m − 2 additional steps by using Lemma 7.3 at the end of the procedure. 

References [1] Beck J. (1985), Lower bounds on the approximation of the multivariate empirical process. Z. Wahrsch. Verw. Gebiete 70, 289-306. [2] Bentkus V. Y. and Lyubinskas K. (1987), Rate of convergence in the invariance principle in Banach spaces. Litovsk. Mat. Sb. 27, 423-434. [3] Borisov I.S. (1982), Approximation of empirical fields, constructed with respect to vector observations with dependent components. Sibirsk. Mat. Zh. 23, no. 5, 31-41. [4] Borisov I.S. (1983), An approach to the problem of approximation of distributions of sums of independent random elements. Dokl. Akad. Nauk SSSR 272, 271-275. [5] Castelle N. and Laurent-Bonvalot F. (1998), Strong approximations of bivariate uniform empirical processes. Ann. Inst. H. Poincar´e Probab. Statist. 34, 425-480. [6] Cirel’son B. S., Ibragimov I. A. and Sudakov V. N. (1976), Norms of Gaussian sample functions. Proceedings of the Third Japan-USSR Symposium on Probability Theory, pp. 20-41. Lecture Notes in Math. 550, Springer, Berlin. [7] Cs¨ org¨ o M. and R´ev´esz P. (1975a), A new method to prove Strassen type laws of invariance principle. II. Z. Wahrsch. Verw. Gebiete 31, (1975), no. 4, 261-269. [8] Cs¨ org¨ o M. and R´ev´esz P. (1975b), A strong approximation of the multivariate empirical process. Studia Sci. Math. Hungar. 10, no. 3-4, 427-434. [9] Cs¨ org¨ o M. and Horv´ ath L. (1988), A note on strong approximations of multivariate empirical processes. Stochastic Process. Appl. 28, no. 1, 101-109. [10] Delyon B. (1990), Limit theorem for mixing processes. Tech. report IRISA, Rennes 1, 546.

46

[11] Dedecker J., Merlev`ede F. and Rio E. (2012), On strong approximation for the empirical process of stationary sequences. to appear in Ann. Probab. (on line). [12] Dedecker J., Prieur C. and Raynaud De Fitte P. (2006), Parametrized Kantorovich-Rubinstein theorem and application to the coupling of random variables. Dependence in Probability and Statistics, Lectures Notes in Statistics. 187, 105-121. [13] Dhompongsa S. (1984), A note on the almost sure approximation of empirical process of weakly dependent random vectors. Yokohama math. J. 32, 113-121. [14] Doukhan P. and Portal F. (1987), Principe d’invariance faible pour la fonction de r´epartition empirique dans un cadre multidimensionnel et m´elangeant. Probab. Math. Statist., 8, 117-132. [15] Dudley R. M. (1978), Central limit theorems for empirical measures. Ann. Probab. 6, 899-929. [16] Dudley R. M. and Philipp W. (1983), Invariance principles for sums of Banach space valued random elements and empirical processes. Z. Wahrsch. Verw. Gebiete 62, 509-552. [17] Horn, R.A. and Johnson, C.R. (1991), Topics in matrix analysis. Cambridge University Press, Cambridge. [18] Kiefer J. (1972), Skorohod embedding of multivariate rv’s, and the sample df. Z. Wahrsch. Verw. Gebiete 24, 1-35. [19] Koml´ os J., Major P. and Tusn´ ady G. (1975), An approximation of partial sums of independent RV’s and the sample DF. I. Z. Wahrsch. Verw. Gebiete 32, 111-131. [20] Ledoux M. and Talagrand M. (1991), Probability in Banach spaces. Isoperimetry and processes. Ergebnisse der Mathematik und ihrer Grenzgebiete 23, Springer-Verlag, Berlin. [21] Liu J.S. (1994), Siegel’s formula via Stein’s identities.Statist. Probab. Lett. 21, 247-251. [22] Massart, P. (1989), Strong approximation for multivariate empirical and related processes, via KMT constructions. Ann. Probab. 17, 266-291. [23] Merlev`ede F. and Rio E. (2012), Strong approximation of partial sums under dependence conditions with application to dynamical systems. Stochastic Process. Appl. 122, 386-417. [24] Neumann M. (2011), A central limit theorem for triangular arrays of weakly dependent random variables, with applications in statistics. ESAIM Probab. Stat., published on line. [25] Peligrad M. and Utev S. (2006), Central limit theorem for stationary linear processes. Ann. Probab. 34, 1608-1622. [26] Philipp W. and Pinzur L. (1980), Almost sure approximation theorems for the multivariate empirical process. Z. Wahrsch. Verw. Gebiete 54, 1-13. [27] Rio E. (1994), Local invariance principles and their application to density estimation. Probab. Theory Related Fields 98, 21-45. [28] Rio E. (1996), Vitesse de convergence dans le principe d’invariance faible pour la fonction de r´epartition empirique multivari´ee. C. R. Acad. Sci. Paris S´er. I Math. 322, no. 2, 169-172. [29] Rio E. (1998), Processus empiriques absolument r´eguliers et entropie universelle Prob. Th. Rel. Fields, 111, 585-608. [30] Rio E. (2000), Th´eorie asymptotique des processus al´eatoires faiblement d´ependants. Math´ematiques et applications de la SMAI. 31, Springer. [31] Rosenblatt M. (1956), A central limit theorem and a strong mixing condition, Proc. Nat. Acad. Sci. U. S. A. 42 43-47. [32] Rozanov Y. A. and Volkonskii V. A. (1959). Some limit theorems for random functions I. Theory Probab. Appl. 4, 178-197.

47

[33] R¨ uschendorf L. (1985), The Wasserstein Distance and Approximation Theorems. Z. Wahrsch. Verw. Gebiete 70, 117-129. [34] Sakhanenko A. I. (1988), Simple method of obtaining estimates in the invariance principle. Probability theory and mathematical statistics, 430-443, Lecture Notes in Math., 1299, Springer, Berlin. [35] Sakhanenko A. I. (2000), A new way to obtain estimates in the invariance principle. High dimensional probability II. Progr. Probab. 47, 223-245, Birkh¨auser Boston. [36] Skorohod A. V. (1976), On a representation of random variables. Teor. Verojatnost. i Primenen. 21, 645-648. [37] Tusn´ ady, G. (1977), A remark on the approximation of the sample DF in the multidimensional case. Period. Math. Hungar. 8, 53-55. [38] Wu W. B. (2007), Strong invariance principles for dependent random variables. Ann. Probab. 35, 2294-2320.

48

Suggest Documents