Gaussian Approximation of Local Empirical Processes ... - CiteSeerX

Gaussian Approximation of Local Empirical Processes Indexed by Functions Uwe Einmahl* Department of Mathematics, Indiana University, Bloomington, IN 47405, USA David M. Mason** Department of Mathematical Sciences, University of Delaware, Newark, DE 19716, USA

Summary. An extended notion of a local empirical process indexed by functions is introduced, which includes kernel density and regression function estimators and the conditional empirical process as special cases. Under suitable regularity conditions a central limit theorem and a strong approximation by a sequence of Gaussian processes are established for such processes. A compact law of the iterated logarithm (LIL) is then inferred from the corresponding LIL for the approximating sequence of Gaussian processes.

* Research partially supported by an NSF Grant and SFB 343, University of Bielefeld. ** Research partially supported by the Alexander von Humboldt Foundation, an NSF Grant and SFB 343, University of Bielefeld.

0

1. INTRODUCTION AND STATEMENTS OF MAIN RESULTS. Deheuvels and Mason (1994a) introduced a notion of a local uniform empirical process indexed by a class of sets and proved a functional law of the iterated logarithm. Such local processes are very useful in the study of statistics which are functions of the observations in a suitable neighborhood of a point. For instance, Deheuvels and Mason (1994a) show how the pointwise Bahadur{Kiefer representation for the sample quantile and the law of the iterated logarithm for kernel density estimators follow readily from their results. In this paper we extend the notion of the local empirical process to allow us to include kernel regression function estimators and the conditional empirical process within our setup. We then establish a weak convergence result and a strong invariance principle for such local processes. From our strong invariance principle we derive a general compact law of the iterated logarithm, which yields, among other results, the Deheuvels and Mason (1994a) functional law of iterated logarithm as a special case. Let us begin by xing some notation. Let X; X1 ; X2 ; : : : ; be a sequence of i.i.d. IRd valued random vectors with distribution IP on the Borel subsets B of IRd . Given any x 2 IRd and any measurable set J IRd, we set for any invertible bimeasurable transformation h : IRd ! IRd, (1:1)

A(h) = x + hJ:

Let fhng denote a sequence of invertible bimeasurable transformations from IRd to IRd and assume with An = A(hn ) and an = IP (An ); n 1, 1

(A:i)

an > 0 for all n 1;

(A:ii)

nan ! 1 as n ! 1;

and for some 0 a 1, (A:iii)

an ! a as n ! 1:

For each integer n 1, let k(n) denote the largest positive integer nan and let Pn be the probability measure on (IRd; B) de ned by (1:2)

Pn(B) = IP (x + hn(J \ B))=an for B 2 B:

Let F denote a class of square IP {integrable functions on IRd with supports contained in J . To avoid measurability problems we shall assume that there exists a countable subclass Fc of F and a measurable set D with Pn(D) = 0 for all n 0 such that for any x1 ; : : : ; xm 2 IRd ? D and f 2 F there exists a sequence ffj g Fc satisfying (S:i)

lim f (x ) = f (xk ); k = 1; : : : ; m; j !1 j k

(S:ii)

lim P (f ) = Pn(f ); for each n 1 j !1 n j

and (S:iii)

lim P (f 2 ) = Pn(f 2 ); for each n 1: j !1 n j 2

Given each integer n 1 and invertible bimeasurable transformation h : IRd ! IRd, we introduce the local empirical process at x 2 IRd indexed by F n ?1 ?1 X (1:3) Ln(f; h) = f (h (Xi ?px)) ? Ef (h (Xi ? x)) nIP (A(h)) i=1 and de ne the local empirical distribution function at x indexed by F by

(1:4)

n(f; h) =

n X i=1

f (h?1 (Xi ? x))=(nIP (A(h))):

This general setup allows us to consider the following interesting examples, among others, as special cases.

Example 1. Let U1; U2; : : : ; be independent uniform [0; 1]d; d 1, random variables. Choose an x 2 [0; 1]d and a subclass C of the Borel subsets of J := [r; s]d , where r < s with s ? r = 1. Setting F = f1C : C 2 Cg, where each 1C is the indicator function of C and de ning h : IRd ! IRd by h(x1 ; : : : ; xd ) = (a1=d x1; : : : ; a1=d xd ) with 0 < a 1, we get whenever x + hJ [0; 1]d n 1(U 2 x + hC ) ? ajC j X i pna (1:5) Ln(1C ; h) = i=1

where jC j denotes the Lebesgue measure of C . This is a version of the local uniform [0; 1]d empirical process rst studied by Deheuvels and Mason (1994a).

Example 2. Let X1 ; X2 ; : : : be i.i.d. real valued random variables with a density g con tinuous and positive in a neighborhood of a xed x. Set J = ? 12 ; 12 and de ne 3

h : IR ! IR by h(u) = u with 0 < 1. Further set F = fK g, where K is a kernel function satisfying

Z1

(K:1)

?1

K (u)du = 1;

(K:2)

K is of bounded variation;

(K:3)

K (u) = 0 if juj > 21 :

Then (1:6)

n X

K ((Xi ? x)= )=(n ) i=1 = n(K; h)IP (A(h))=

g^n(x) : =

is the usual kernel density estimator of g(x) with window size .

Example 3. Let (X1 ; Y2 ); (X2 ; Y2 ); : : : ; be i.i.d. G with joint density gXY and marginal densities gX and gY . Choose J = ? 21 ; 21 IR; h(u; v) = ( u; v) with 0 < 1, and A(h) = (x; 0) + hJ . Further, set R(u; v) = vK (u) for (u; v) 2 IR2, where K is a kernel function as in Example 2 and F = fRg. Now (1:7)

n(R; h) = r^n (x)^gn (x) =IP (A(h));

where g^n(x) is the kernel density estimator of the marginal density gX (x) and r^n (x) is the kernel regression estimator (1:8)

r^n (x) =

n X i=1

YiK ((Xi ? x)= )=(n g^n(x)); 4

of r(x) = E (Y jX = x).

Example 4. Keeping the notation of Example 3, now choose the class of functions F = ffy : y 2 IRg, where

fy (u; v) = 1(v y)K (u); (u; v) 2 IR2 :

(1:9) Then (1:10) where (1:11)

n(fy ; h) = Fn(yjx)^gn (x) =IP (A(h)); y 2 IR; n X 1 Fn(yjx) = n 1(Yi y)K ((Xi ? x)= )=g^n(x); i=1

is the conditional empirical distribution rst intensively studied by Stute (1986). In order to state our main weak convergence result we must introduce some further notation and assumptions. Here we shall borrow heavily from Sheehy and Wellner (1992), especially from their Section 3. For integers m 1 and n 1 de ne the empirical process indexed by F (1:12)

mn (f ) = ( )

m X i=1

f Yi

n

( )

!

? mPn(f ) =pm; f 2 F ;

where Y1(n); : : : ; Ym(n) are assumed to be i.i.d. Pn . Let F 0 = ff ? g : f; g 2 Fg ; F 2 = f 2 : f 2 F ; (F 0)2 = (f ? g)2 : f; g 2 F and G = F [ F 2 [ F 0 [ (F 0 )2. For any functional T de ned on a subset H of the real valued functions on J we denote 5

(1:13)

kT kH = sup fjT (f )j : f 2 Hg :

We shall require the following additional assumptions on the class of functions F and the sequence of probability measures fPng. (F.i) F has a uniformly square integrable envelope function F , namely, ((1:14)

lim lim sup Pn(F 2 1(F )) = 0:

!1 n!1

(F.ii) There exists a probability measure P0 such that (1:15)

kPn ? P0kG ! 0 as n ! 1:

For any f; g 2 L2(IRd ; B), and n 0, set (1:16)

2n(f; g) = Var Pn (f ? g);

and (1:17)

e2n(f; g) = EPn (f ? g)2:

Set for each n 0 and > 0 (1:18)

Fn0 () = f(f; g) 2 F F : n(f; g) g ;

and for any real valued functional T on F set (1:19)

kT kFn0 () = sup fjT (f ) ? T (g)j : n(f; g) g : 6

Then our next assumption on F is, (F.iii) (F ; 0 ) is totally bounded and for all > 0 lim limsup P (k(kn(n) )kFn0 () > ) = 0: #0

(1:20)

n!1

We shall denote by (1:21)

(B0(f ))f 2F

a P0 {Brownian bridge indexed by F , that is, B0 is a Gaussian process indexed by F with mean zero and covariance function (1:22)

Cov (B0 (f ); B0 (g)) = P0(fg) ? P0fP0 g; f; g 2 F :

We will assume that B0(f ) has uniformly 0 continuous sample paths, whenever such a version of B0 exists. Note, in particular, that this is the case under the above assumptions since they imply that F is P0 {pregaussian. Let Z be a standard normal random variable independent of B0 and for each 0 a 1 introduce the Gaussian process indexed by F (1:23):

p

W (f ; a) := B0(f ) + 1 ? aZP0(f ); f 2 F :

For use later on we note that we can assume that B0 has been extended to be Gaussian process indexed by the larger class of all measurable real valued functions on IRd such that P0 f 2 < 1 with covariance function given by (1.22). This can be justi ed using the Kolmogorov consistency theorem. Refer to page 5 of Ibragimov and Rozanov (1978) for details. Thus we can assume that the Gaussian process W (f ; a) is also well-de ned on this class. 7

We are now prepared to state our main weak convergence result. As in Sheehy and Wellner (1992), we use the notion of weak convergence in the sense of Homann{ Jrgensen and it will be denoted by the symbol `)'.

Theorem 1.1 Under assumptions (A); (S ) and (F ) the sequence of local empirical processes satis es

(Ln(f; hn ))f 2F ) (W (f ; a))f 2F :

(1:24)

Remark 1.1 Corollary 3.1 of Sheehy and Wellner (1992) implies that whenever (S ); (F:i) and (F:ii) hold and F is sparse in the sense of Pollard (1982), then (F:iii) is also satis ed. In particular, each of the classes F in Examples 2, 3, and 4 are sparse and satisfy (S ); (F:i) and (F:ii). Let Lx = log(x _ e) and LLx = L(Lx) for all x. We shall now state our strong approximation result.

Theorem 1.2 Assume (S ) and (A), where in (A:iii); a = 0. Also assume that (A:iv)

an dn where dn & 0; ndn % 1 and ndn=LLn ! 1 as n ! 1:

Further assume (F:ii) and replace (F:i) by

jf j K for all f 2 F and some K 1:

(F:iv) In addition, assume

(F:v)

for each n 1 and m n; f (h?n 1 ) and f (h?m1 hn) 2 F ;

(F:vi) there exists a sequence of positive constant (bn)n1 such that for all f; g 2 F and n 1 (1:25)

Z

IRd

f (hn x)g(hn x)dP0 (x) = b?1 n

8

Z

IRd

f (x)g(x)dP0 (x);

where

an =bn ! 1; as n ! 1:

(1:26) Further assume

(F:vii)

P 0; as n ! 1; k(kn(n) )kF =LLn !

(F:viii)

F is P0 ? pregaussian:

Then one can construct X1 ; X2; : : : ; i:i:d:IP and a sequence W1; W2 ; : : : of independent P0 {Brownian motions indexed by F , such that with probability one as n ! 1.

(1:27)

n X p 1 Wi (f h?n 1)j= LLn ! 0: sup jLn(f; hn ) ? p nbn i=1 f 2F

Notice that due to assumption (F:vi) for each n 1 (1:28)

W~ n :=

! n ? ? X p Wi f h?n = nbn 1

i=1

f 2F

D (W (f )) = 1 f 2F :

Thus (W~ n)n1 is a sequence of P0{Brownian motions taking values in the separable Banach space B = Cu (F ; e0), the space of uniformly e0 continuous functions of F . Let K be the unit ball of the reproducing kernel Hilbert space pertaining to W1. Further, let B(F ) denote the class of all bounded functionals on F equipped with the supremum norm (1:29)

d (; ') = sup j(f ) ? '(f )j; for ; ' 2 B(F ): f 2F

Our next result gives as a corollary to Theorem 1.2 the compact law of the iterated logarithm for the local empirical process. Its proof is a consequence of the 9

strong approximation (1.27) and the fact that the same result holds for the sequence ~ p Wn= 2LLn n 1, which will be shown in Section 3.

Corollary 1.1 In addition to the assumptions of Theorem 1.2 assume (F:ix) (1Z:30)

for all f; g such that P0(f ? g)2 < 1; n 1; 2n m n

?f ?h? h (x) ? g ?h? h (x) dP (x) M Z (f (x) ? g(x)) dP (x); n n m m d d

IR

1

2

1

2

0

IR

0

where M > 0;

(F.x) for every compact set A IRd and > 0 there exists a q0 > 1 such that for all 1 < q < q0 with nk = [qk ] (1:31)

max

sup jx ? h?m1 hnk xj

nk mnk+1 x2A

for all large enough k depending on A and " > 0. Then with probability one the sequence of processes

(1:32)

p

Ln (f; hn) = 2LLn f 2F

is relatively compact in B(F ) with set of limit points equal to K.

Remark 1.2 Whenever nk(n)=LLn is bounded in n, the almost sure limiting behavior of the local empirical process is much dierent. In this case it is more appropriate to replace the independent Wiener processes by independent Poisson processes to obtain a useful strong approximation. See Deheuvels and Mason (1990) and (1994b). 10

Example 1 (Continued) For each integer n 1 set hn(x ; : : : xd ) = an=d x ; : : : ; an=d xd , where an & 0; nan % 1 and nan=LLn ! 1. Also assume that x + [r; s]d [0; 1]d 1

1

1

1

for all small enough > 0. Then for all n suciently large Pn = P0 = uniform [r; s]d. Now let F be a uniformly bounded class of real valued measurable function on IRd satisfying (C:1)

f () 2 F for all 1;

(C:2)

F satis es (S );

(C:3)

P0 k(0) k(n)kF =LLn ! 0; as n ! 1;

(C:4)

F is P0 ?pregaussian:

In particular (C.1){(C.4) are satis ed whenever F = f1C : C 2 Cg where C is a P0{Donsker class of sets satisfying (S ) and C 2 C whenever C 2 C and 0 < 1. It is trivial to verify that all of the conditions of Theorem 1.2 and Corollary 1.1 hold. Hence Theorem 1.1 of Deheuvels and Mason (1994) is a special case of our Corollary 1.1. We mention here that Arcones (1994) has formulated a compact law the iterated logarithm for the local uniform [0; 1]d empirical process indexed by a uniformly bounded class of functions. Also we note that when d = 1; x = 0 and F = 1[0;t] : 0 t 1 , Example 1 specializes to the uniform [0; 1] tail empirical process. Therefore Theorem 1.2 yields the Mason (1988) strong approximation to this process. We are now going to show that our strong approximation and compact law of the iterated logarithm apply to Examples 2 and 4. We need some facts about 11

VC graph classes. For the basic de nition refer to Gine and Zinn (1984) or Pollard (1984).

Fact 1.1 (Dudley (1978), Alexander (1987)). Let be a monotone function on IR and G be a nite dimensional class of real valued functions de ned on a set S then the class f(g) : g 2 Gg is a VC graph class. Fact 1.2 (Pollard (1984)) Let C be a VC class of sets. Then f1C : C 2 Cg is a V.C. graph class.

Let F be a class of measurable real valued functions on IRd with envelope function F , and let Q be a probability measure on IRd such that Q(F 2) < 1. For any u > 0 let N (u(Q(F 2 ))1=2 ; F ; eQ) = minimum m 1 for which there exist functions f1 ; : : : fm , not necessarily in F , such that for all f 2 F ; 1min e (f; fi ) < im Q u(Q(F 2))1=2 ; where eQ(f; fi ) = (Q(f ? fi )2 )1=2 :

Fact 1.3 (Alexander (1987)). If F is V.C. graph class of measurable real valued

functions on IRd with bounded envelope function F , then there exist constants C > 0 and > 0 such that

N (u(Q(F 2 ))1=2 ; F ; eQ ) Cu?

(1:33)

for all u > 0 and probability measures Q on IRd.

Fact 1.4 Suppose F and F are two classes of uniformly bounded measurable real 1

2

valued functions on IRd such that for constants C1 > 0; C2 > 0; 1 > 0 and 2 > 0

(1:34)

N (uMi ; Fi; eQ ) Ciu?i ; i = 1; 2;

for all u > 0 and probability measures Q on IRd. Then there exist constants C3 > 0

12

and C4 > 0 such that

(1:35)

N (u(M1 + M2 ); F1 + F2; eQ ) C3u?(1+2)

and

N (uM1 M2 ; F1F2; eQ ) C4u?(1+2)

(1:36)

for all u > 0 and probability measures Q on IRd, where F1 +F2 = ff + g : f 2 F1; g 2 F2g

and F1F2 = ffg : f 2 F1; g 2 F2g.

Proof. Trivial. Fact 1.5 Let K be a real valued function of bounded variation on [a; b]; ?1 < a < b < 1, and equal to zero on IR ? [a; b], and let C be a V.C. class of subsets of IR. Then for the class of real valued functions

(1:37)

FK;C = fK (xt)1C (y) : ?1 < t < 1; C 2 Cg

there exist constants c > 0 and > 0 such that (1.33) holds for all probability measures Q on IR2.

Proof. Write K = K ? K , where K and K are two bounded nondecreasing 1

2

1

2

functions on IR and then apply Facts 1.1, 1.2, 1.3 and 1.4. Our next fact is a special case of Theorem 3.1 of Alexander (1987).

Fact 1.6 Let fPngn be a sequence of probability measures on IRd and F be a class 0

of measurable real valued functions on IRd bounded by M such that (S ) and (F.ii)

13

hold. Assume there exist constants C > 0 and > 0 such that for all n 0 and u>0

N (uM 1=2 ; F ; en) Cu? :

(1:38)

then for any sequence of integers m(n) ! 1

(1:39)

n m n (f ) f 2F ( ) ( )

) (B0(f ))f 2F as n ! 1:

Example 2. (Continued) Set hn(u) = nu for n 1, where n & 0; nn % and nn=LLn ! 1 as n ! 1. In this case P is uniform [?1=2; 1=2]. We now choose 0

(1:40)

F = fK (ut) : t 1g :

Since X1 is assumed to have a density continuous and positive in a neighborhood of x, it is readily established using Schee's theorem that Pn converges to P0 in total variation. Therefore (F.ii) holds. Also one sees that (S ) is satis ed. Thus by Fact 1.5 with C = f(?1; 1)g and by Fact 1.6, (1.39) holds, which implies (F.vii) and F.viii). The rest of the assumptions of Theorem 1.2 and Corollary 1.1 are easily checked. After a little manipulation one infers from Corollary 1.1 that with probability one (1:41)

1=2 Z1 p (^ g ( x ) ? E g ^ ( x )) n n 2 p K (u)du : = g(x) limsup nn n!1 2LLn

?1

This result was rst proved in Deheuvels and Mason (1994a).

Example 4 (Continued) Set hn(u; v) = (nu; v) ; n 1; where n & 0; nn % 1 and nn=LLn ! 1 as n ! 1. Assume that gXY is continuous on (x ? "; x + ") IR 14

for some " > 0 and gX (x) > 0. In this case P0 = P0;1 P0;2, where P0;1 is uniform [?1=2; 1=2] and P0;2(B) = P (Y1 2 BjX1 = x) for B 2 B. Choose (1:42)

F = ffy (ut; v) = K (ut)1(v 2 (?1; y]) : t 1; y 2 IRg :

By Facts 1.5 and 1.6, (F.vii) and (F.viii) hold. Since Pn converges to P0 in total variation, (F.ii) is also satis ed. It is straightforward to show that the other assumptions of Theorem 1.2 and Corollary 1.1 hold. Here

n =n ) ? nan Pn(fy ) ; Lp n (fy ; hn ) = X 1(Yi y )K ((X pi2?nax)LLn 2LLn n i=1 and we get that Z 1 1=2 L ( f ; h ) n y n 2 = K (u)du) ; a:s: lim sup sup p n!1 y2IR 2LLn ?1

The remainder of our paper is organized as follows. We rst prove in Section 2 a coupling inequality for the empirical process. Our method of proof is somewhat similar to the one employed by Dudley and Philipp (1983). We rst establish a coupling inequality for multidimensional random vectors, where we use a result of Zaitsev (1987) on the rate of convergence in the multidimensional central limit theorem in combination with the Strassen{Dudley theorem. We then employ a recent inequality of Talagrand (1994) to approximate the empirical process by a suitable nite dimensional process. In Section 3, we prove the main results. The basic idea is that the local empirical process behaves to some extent like a randomly stopped empirical process (see Proposition 3.1 below), which allows us to reduce the approximation problem to one involving the usual empirical process, which in turn can be solved using our coupling inequality. 15

2. A USEFUL COUPLING INEQUALITY In this section we establish a useful coupling inequality for the empirical process, which will be essential for the proof of our strong invariance principle. For probability measures P and Q on the Borel subsets of IRd and > 0, let (2:1)

n

(P; Q; ) := sup P (A) ? Q(A ) : A IRd; closed

where A denotes the closed {neighborhood of A, (2:2)

A :=

x 2 IRd : yinf jx ? yj 2A

o

;

with j j being the Euclidean norm on IRd. Further, let X1 ; : : : ; Xm ; m 1, be independent mean zero random vectors satisfying for some M > 0 (2:3)

jXi j M; 1 i m:

Denote the distribution of X1 + : : : + Xm by IP m and let Qm be the d{dimensional normal distribution with mean zero and covariance matrix (2:4)

cov (X1 ) + : : : + cov (Xm ):

The following inequality follows from the work of Zaitsev (1987).

Fact 2.1 For all integers m 1. (2:5)

(IP m; Qm ; ) c1 exp (?c2=M ) ;

where c1 and c2 are positive constants depending only on d.

16

Using the Strassen{Dudley theorem (see Dudley (1968)), Fact 2.1 and standard arguments from measure theory such as Lemmas 1.2.2 and 1.2.3 in Dudley (1984), we readily infer the next fact.

Fact 2.2 Let X ; : : : ; Xm be independent mean zero d{dimensional random vectors satisfying (2.3). If the underlying probability space ( ; F ; P ) is rich enough, one 1

can de ne independent normally distributed mean zero random vectors V1; : : : ; Vm with cov (Vi ) = cov (Xi ); 1 i m, such that

P j

(2:6)

m X i=1

!

(Xi ? Vi ) j c1 exp (?c2=M ) :

The following Bernstein type inequality follows from Doob's maximal inequality and the proof of Bernstein's inequality on page 14 of Dudley (1984).

Fact 2.3 Let ; : : : ; m be independent mean zero random variables satisfying 1

jij M; 1 i m:

(2:7) Then for all t 0

(2:8)

!

j X

2M t j t 2 exp ? t = 2 B + P 1max j i m j m i=1 3

where Bm :=

2

;

m P E .

i=1

2

i

Let j j+ be the maximal norm on IRd . Using Facts 2.2 and 2.3 we shall establish the following simple coupling inequality for sums of independent d-dimensional random vectors.

Proposition 2.1 Let Xj = Xj ; : : : ; Xj d 1 j n, be independent mean zero (1)

( )

random vectors satisfying (2.3) and let 2 > 0 be such that

(2:9)

E Xj

i

( )

2

2 ; 1 i d; 1 j n: 17

Let 1 L n be an integer and x > 0 be xed. If the underlying probability space is rich enough, one can construct independent normally distributed mean zero random vectors V1 ; : : : ; Vn with cov (Vi ) = cov (Xi ); 1 i n, such that

(2:10)

P 1max j j n

j X

!

(Xi ? Vi) j+ x c3 (L + 1) (exp (?c4x=ML)

i ? + exp ?Lx =64n ); =1 2

2

where c3 and c4 are positive constants depending only on d.

Proof. Set m = [n=L], (2:11)

Uj :=

and (2:12)

Wj =

jm X i=(j ?1)m+1 jm X i=(j ?1)m+1

Xi ; 1 j L Vi; 1 j L;

where V1; : : : ; Vn are constructed using Fact 2.2 to be mean zero independent normally distributed random vectors with cov(Vi ) = cov (Xi ) ; 1 i n, such that (2:13)

P (jUj ? Wj j x=2L)

c1 exp (?c2x=2ML) for 1 j L. Since jxj+ jxj; x 2 IRd , (2.13) clearly implies that for 1 j L,

(2:14)

P (jUj ? Wj j+ x=2L) c1 exp (?c2x=2ML)

It follows that (2:15)

P (1max j kn

k X i=1

(Xi ? Vi)j+ x)

c1L exp (?c2x=2ML) + L;1 + L;2; 18

L;1

0 := P @ max j L

L;2

0 := P @ max j L

where

and

0

0

max j

jm X+k

1 Xi j x=4A ;

jm X+k

1 Vi j x=4A :

km i=jm+1

1

max j

km i=jm+1

1

+

+

To bound L;1 we note that L;1

1 0 jm k L X X j x=4A ; @ X P max j d max i km d +

1

which, in turn, by Fact 2.3 is

j =0

?

1

( )

i=jm+1

?

2d(L + 1) exp ?x2= 32m2 + Mx=6 ? ? 2d (L + 1) exp ?x2 =64m2 + exp (?3x=ML) : Using the fact that the Vi's have a symmetric distribution, along with a standard exponential inequality for the tail probabilities of the normal distribution we get similarly L;2 2d(L + 1) exp(?x2 =32m2): Setting c3 = c1 _ (4d) and c4 = (c2=2) ^ 3 completes the proof of (2.10). Recall de nition (1.3), (1.12), (1.16) and (1.18). The following inequality is readily inferred from Theorem 3.5, Talagrand (1994).

Fact 2.4 Assume that F satis es S and for some K > 0 (2:16)

jf j K; f 2 F : 19

Then there exists an absolute constant A > 0 such that for all t > 0; > 0; m 1 and n 1,

(2:17)

p

p

P k m(mn)(f )kFn0 () t + AE k m(mn)(f )kFn0 ()

exp(?t2=mA2 2) + exp(?t=KA):

Proof. To see (2.17) for K = 1, we note that an inspection of Talagrand's proof shows that his Theorem 3.5 is also valid if we replace his H by

X H := E k i (f (Yi(n)) ? Pn(f ))kG ; m

(2:18)

i=1

where G = f(f + 2)=4 : f 2 Fn0 ()g : Noticing that

p H 2E k m(mn)(f )kFn0 ();

(2:19)

we obtain (2.17) after some straightforward manipulation. Using a well known inequality for Gaussian random variables (see e.g. Lemma 3.1, Ledoux and Talagrand (1991)), we readily obtain the following inequality.

Fact 2.5 Assume that for a probability measure P the class F is P {pregaussian and let for f; g 2 F 0

0

20(f; g) = Var P0(f ? g):

(2:20)

B1; B2; : : : ; be independent Brownian bridges indexed by F . For all t > 0; > Let B; 0 and m 1 (2:21)

1 0 m X p P @k Bj kF 0 2 mE kB kF 0 + tA exp(?t =2m ); j =1

( )

( )

20

2

2

where

F 0 () = ff ? g : 0(f; g) g :

(2:22)

We can now state and prove the main result of this section.

Proposition 2.2 Let F be a class of real valued functions satisfying (S ) and (2.16) and let fPngn be a sequence of probability measures on (IRd; B) for which (F.ii) holds for a probability measure P . Assume further that F is P {pregaussian. Then given any 0 < < 1 and sequence of positive integers fkngn there exists an n() such that for every n n() and u > 0 one can construct empiri cal processes mn (f ) ; 1 m kn and independent P {Brownian Bridges ?B (f ) ; 1 m kf 2Fsuch that n m f 2F 1

0

0

1

( )

(2:23)

0

m X p (n) P 1max k mm ? BikF u + n;kn () mkn i=1

?

K1 exp (?K2 u=K ) + exp ?u2=knA20 2K 2 where

(2:24)

!

p n;kn () := A1 kn (E kB kF + E k(knn)kFn0 ());

K1 = K1 (); K2 = K2 () are constants depending on only, and A0 and A1 are absolute constants.

Proof. First note that since F is P {pregaussian, it is totally bounded with respect to . Thus for any 0 < < 1 we can nd a subclass ff ; : : : ; fr g of F where r depends on such that for all n n (), for some n (), 0

0

1

1

(2:25)

1

min (f; fi ) : ir n

1

21

So if the sequence (m (f ))f 2F ; 1 m kn; n n1(); is given, we set for 1 j kn , Xj(i) = fi (Yj(n)) ? Pn(fi ); i = 1; : : : ; r; and we clearly have for 1 m kn (2:26)

p

p mmn (f1 ); : : : ; m(mn)(fr ) = ( )

m X i=1

Xi ; : : : ; (1)

m X i=1

Xi

r

( )

!

:

Using Proposition 2.1 rst we de ne normally distributed random vectors Vi ; 1 i kn, such that ! m X u 1 := P 1max j (X ? Vi ) j+ > 4 (2:27) mkn i=1 i ? ? c3(r)(1 + ?2 ) exp ?c4(r)2 u=4K pr

+ exp(?u2=1025kn2 K 2) ;

where we apply (2.3) with = K; L = ?2 and M = prK . Next, let = cov (B(f1 ); : : : ; B(fr )) and

n = cov X1(1); : : : ; X1(r) :

We can assume that each

Vi(1); : : : ; Vi(r) = 1n=2 Zi;

where Z1; : : : ; Zkn are independent standard normal r-vectors. Set (2:28)

Wi = 1=2 Zi ; i = 1; : : : ; kn: 22

? Clearly Wi(1); : : : ; Wi(r) =d Bi(f1 ); : : : ; Bi (fr ) ; i = 1; : : : ; kn; where B1; : : : ; Bkn are independent P0{Brownian bridges. Therefore, without loss of generality, we can assume that (2:29)

? Wi ; : : : ; Wi = Bi (f ); : : : ; Bi(fr ) ; 1 i kn;

r

(1)

( )

1

Also since n ! as n ! 1, we have by using symmetry of the Zi 's and a standard bound on the tail of a normal distribution that for all large n (2:30)

2 := P

?

m X

max j (V ? Wi ) j+ u=4 mkn i=1 i

!

1

exp ?u2=2kn for all u > 0 and n n2():

It is easy to see now that for all n n() = n1() _ n2() large enough so that Fn0 () F 0 (2) (by (1.15)), (2:31)

m X p (n) k mm ? BikF u + n;kn () P 1max mkn

1 + 2 + P + A1 +P

p

i=1

!

p

u (n) 0 () m k max k F m n 1mkn 4

knE kknn kFn0 () ! m X p u max k Bi kF 0(2) 4 + A1 knE kB kF 1mkn i=1 ( )

=: 1 + 2 + 3 + 4:

To bound 3, we note by the Ottaviani inequality, (see, for instance, Lemma 3.2.7 of Dudley (1984)), (2:32)

p

3 2P k

p knknn kFn0 () u4 + (A1 ? 2) knE k(knn)kFn0 () ( )

23

;

where we use the fact that by Jensen's inequality

pmE k(n)k

m Fn0 ()

p

knE k(knn)kFn0 (); 1 m kn:

Now applying (2.17), we get (2:33)

?

3 2 exp ?u2=16knA2 2 + 2 exp(?u=4KA);

provided we have chosen A1 A + 2. Finally using Fact 2.5 along with Levy's inequality, Lemma 3.2.11 of Dudley (1984), we nd that (2:34)

?

4 2 exp ?u2=128kn2 ;

provided of course we have chosen A 2. Combining (2.31), (2.30), (2.32), (2.33) and (2.34) we obtain (2.23).

24

3. PROOFS OF MAIN RESULTS In our proofs we shall make repeated use of the following proposition. Given any integer n 1 and invertible transformation hn : IRd ! IRd, let Y1(n); Y2(n); : : : ; be i.i.d. Pn. Set for f 2 F ; n 1 and j 1

Sj(n)(f ) =

(3:1)

j ? X f h?n (Xi ? x) ? janPn(f ); j 1 1

i=1

and

p

Tj(n)(f ) = j(jn)(f ):

(3:2)

Further, independently of Y1(n); Y2(n); : : :, let "1; "2 ; : : :, be i.i.d. Bernoulli (an ) random variables and set v(j ) = "1 + : : : + "j ; j 1:

Proposition 3.1. With the above notation and assumption for all n 1

(3:3)

Sj

n (f )

( )

(n) d = T(j) (f ) + ( (j ) ? janPn(f )) j1 j 1

as vectors indexed by F .

Proof. Let Y n ; Y n ; : : : ; be i.i.d. Pn and ; ; : : : ; be i.i.d. Qn, independent of ( ) 1

( ) 2

1

the Yi's, where

2

?

?

Qn (B) = IP B \ ACn =IP ACn

(3:4)

for all B 2 B. Further let "1 ; "2; : : : ; be i.i.d. Bernoulli (an ) random variables independent of the Yi(n)'s and Wi's. Then it is readily checked that the random variables Xj deined for any j 1 to be , (3:5)

Xj = hnY nj + x 1 ("j = 1) + j?(j)1 ("j = 0) ( ) ( )

25

are i.i.d. IP , from which we see that

S

n (f )

( )

0 j 1 X =d @ f (Yi n ) ? janPn(f )A : ( )

j 1

( )

i=1

j 1

Proof of Theorem 1.1. First, by Proposition 3.1, for each integer n 1, (3:6)

f8Ln (f; hn) : f 2 Fg =d

9 0, as n ! 1, (3:10)

max p

jk(n)?mjc k(n)

p

P 0: k k(n)(kn(n) ) ? pm(mn)kF =pnan !

26

To establish (3.10), in turn, it is enough to verify that for all c > 0; as n ! 1, (3:11)

p

max p

mc k(n)

1

P 0: kpm(mn)k= k(n) !

This will be accomplished by a number of lemmas.

Lemma 3.1 Under assumptions (A); (S ) and (F ) there exists a constant M such that for all n 1 E k(kn(n) )k2F M:

(3:12)

Proof. Let Y~ ; : : : ; Y~k n be independent copies of Y ; : : : ; Yk n with corresponding empirical process ~knn . Since Eknn (f ) = 0 for all f 2 F , by Jensen's inequality it is enough to show that there exists a constant M > 0 such that for all n 1 1

1

( )

E k(kn(n) ) ? ~(kn(n) )k2F =: En(F ) < M:

(3:13) Set

( )

( ) ( )

( ) ( )

n

o

tn := inf t > 0 : P k(kn(n) ) ? ~(kn(n) )kF > t 1=72 :

Applying the Homann{Jrgensen inequality, c.f. Ledoux and Talagrand (1991), page 156, we get

En(F ) 18 E 1max kf (Yi n ) ? f (Y~i(n))k2F =k(n) + t2n ik(n) which, in turn, is

( )

18 E (1max Z =k(n)) + tn ; ik(n) i;n 2

where Zi;n := F (Yi(n)) + F (Y~i(n)); i = 1; : : : ; k(n): 27

2

;

2 ~ 1 i k(n); for some M~ < 1 and By assumption (F.i), we have EZi;n < M; it follows that

(3:14)

E 1max Z =k(n) ik(n) i;n 2

kX (n) i=1

2 ~ EZi;n =k(n) < M:

Finally, noting that by (1.15) and (1.20), tn is bounded, we obtain (3.12).

Lemma 3.2 Let n(); 0 < < 1; n 1, be a set of non{negative random variables such that for all > 0

(3:15)

lim lim sup P (n( )> ) = 0 #0 n!1

and for some constant R

(3:16)

E 2n() < R for all n 1 and 0 < < 1:

Then

(3:17)

lim lim sup E n() = 0 #0 n!1

Proof. The proof is trivial. Note that for any > 0 E n() + R1=2 (P (n() > ))1=2 :

Lemma 3.3. Let m(n) be any sequence of positive integers such that (3:18)

m(n)=k(n) ! 0 as n ! 1:

Then

(3:19)

p

p

lim sup 1mmax E k m(mn)kF = k(n) = 0: m(n) n!1 28

Proof. By (F.ii) and (F.iii) for each 0 < < 1 there exist f ; : : : ; fr 2 F such that for all 1 m m(n) nX p n p p n p m (fi ))j + E n;m(); (3:20) E k m kF = k(n) E j m 1

( )

( )

m

where

( )

( )

k(n)

i=1

p

p

n;m() = k m(mn)kFn0 ()= k(n):

Now for all n large enough so that m(n) k(n), we have by Jensen's inequality for all 1 m m(n) (3:21) where

E m;n() E n(); n() = k(kn(n) )kFn0 ():

Further, by Cauchy{Schwarz and (F.i) for all 1 m m(n) pm(n)(f )j pm(n) p j P (F 2) for i = 1; : : : ; r(): (3:22) E pm i p k(n) n k(n) Noting that (3:23)

E 2n() 2E k(kn(n) )k2F ;

we see that (3.19) follows from (3.21), (3.22), (3.23) and Lemma 3.1 and 3.2. The proof of (3.11) is now an easy consequence of Lemma 3.3 and Ottaviani's inequality.

Proof of Theorem 1.2. For notational convenience we assume K = 1 in (F.iv). First we record an inequality for the empirical process which is obtained by combining. Lemmas 3.2.1 and 3.2.7 of Dudley (1984) with his Remark 3.2.5. 29

Fact 3.1. We have for all x 0 and n 1 (3:24)

P

n

max kTk kF x + 3E kTnn kF 1k n ( )

? ?

( )

2 exp ?x2 =12n + exp (?x=4) :

Lemma 3.4. For any x 0 and nan 1 (3:25)

max kT nk 1k n

P

( ) ( )

? T kan n kF

x + n

( ) [ ]

2 p?x

4nan exp 192 na LL + exp (?x=8) npn ? 1:5 + 2 (Ln) + 2 exp ?2 nanLLn ; p where n := 12E kTl(nn)kF ; ln := 4 nan LLn and (k); 1 k n, is as in Propo-

sition 3.1.

Proof. Obviously (3:26)

P

max kT nk 1k n

( ) ( )

? T kan n kF ( ) [ ]

x + n

p

P 1max j (k) ? kanj 3 nan LLn kn +P

max

n

nk

max kTl ? Tk F x + n ( )

k[nan ] jl?kjln

1

( )

=: n;1 + n;2: Using Fact 2.3 we see that (3:27)

Next observe that (3:28)

p

n;1 2(Ln)?1:5 + 2 exp ?2 nanLLn :

n;2 2nanP

max kTk

k2ln

1

30

x n F2+ 2 :

nk

( )

Since we have

E kT2(lnn)kF 2E kTl(nn)kF ;

(3:29) we get via Fact 3.1 (3:30)

n;2

x x : + exp ? 4nan exp ? 2

48ln

8

Combining this with our bound for n;1, we get (3.25). From Lemma 3.4 and Proposition 2.2 we shall derive the following crucial lemma.

Lemma 3.5. Given 0 < < 1, there exists an m() such that for each n m() one can construct independent Brownian bridges Bj ; 1 j [nan] satisfying (3:31)

0 1 kan X p P @ max kT n ? Bj kF 2A nanLLn + n + n; nan ()A kn k j o n p ?: 1

( ) ( )

[

]

0

[

]

=1

K3 nan exp ?K4 nanLLn + (Ln)

15

;

where K3 and K4 are constants depending only on .

Proof. Just use the inequality

0 1 kan X p n ? P @ max k T Bi kF 2A nan LLn + n + n; nan ()A k kn i A p n n 1

( ) ( )

[

]

0

[

]

=1

0 ( ) ( ) k ? T P 1max k T [kan ] F (k ) kn 2 nan LLn + n 0

+ P @1max kT (n) ? kn [kan ]

kan X

[

]

i=1

1 BikF 23 A nanLLn + n; nan ()A 0

31

p

[

]

Remark 3.1 From the proof it is clear that we can choose the Brownian bridges Bj ; 1 j [nan], independent of (k); 1 k [nan], which we will also assume from now on.

Next, let K (t; f ); t 0; f 2 F be a Kiefer process indexed by F such that

Xl

K (l; f ) =

(3:32)

i=1

Bi (f ); f 2 F ; 1 l [nan ]:

We can assume that K is independent of (k); 1 k n:

Lemma 3.6. For each n 1 and x > 0

(3:33)

P 1max kK (kan ; f ) ? K ([kan]; f ) kF x + 2E kBkF kn

?

(2nan + 2) exp ?x2 =2 :

Proof. It is easy to see that max kK (kan; f ) ? K ([kan]; f ) kF kn max max kK (k + y; f ) ? K (k; f )kF : k nan y 1

0

[

]0

1

Since the Kiefer process K has stationary independent increments, we clearly have that the above probability is

([nan] + 1) P 0max kK (y; f )kF x + 2E kBkF ; y1 which by Levy's inequality is

?

2 (nan + 1) P kB 1kF x + 2E kB1kF ; 32

from which the assertion follows, using Lemma 3.1 of Ledoux and Talagrand (1991). Setting for any n 1

? Bj (f ) = p1a K (jan; f ) ? K ((j ? 1) an; f ) ; 1 j n; n

we obtain independent Brownian bridges which are also independent of (k); 1 k n. Combining Lemmas 3.5 and 3.6 we get the following essential result.

Proposition 3.2. Given 0 < < 1, there exists an m() such that for each n m(), one can construct independent Brownian bridges Bj ; 1 j n which are independent of (k); 1 k n, such that P

k S max k k 1k n

n

( ) ( )

! k X p p ? an Bj kF 3A0 nanLLn + n;[nan] () + n + 2E kBkF

K5 nan exp ?K6

pi

=1

o

nanLLn + (Ln)?1:5 ;

where K5 and K6 are positive constants depending on only.

We now approximate (k); 1 k n, where we use once more Proposition 2.1. Employing more sophisticated strong approximation techniques it would be possible to get much better inequalities. However, the subsequent Lemma 3.7 will be sucient for our purposes.

Lemma 3.7. Given 0 < < 1 there exists an m() such that for each n m(), one can construct independent standard normal random variables Zi; 1 i n such that

33

P 1max j (k) ? kn

n

k p X i=1

p

!

p

Zi an(1 ? an )j 10 nanLLn

o

K7 exp ?K8 nan LLn + (Ln)?1:5 ; where K7; K8 are positive constants.

Proof. Apply Proposition 2.1 with = an; M = 1 and L = ? . 2

2

Remark 3.2. It is clear that we can choose the Zi's independent of the Brownian bridges Bi; 1 i n, de ned in Proposition 3.2. Lemma 3.8. We have for all n 1 and x > 0 ! k p X p P max j an(1 ? an ) ? an Zi j x kn 1

? 4 exp ?x =2nan : 2

i=1

3

Proof. By Levy's inequality the above probability is bounded above by ! ( p n X p Z jx ; 2P j a (1 ? a ) ? a n

n

n

which by a standard exponential inequality is

4 exp

i=1

j

? p ?x =2nan 1 ? 1 ? an :

2

2

p

Using the trivial fact that j1 ? 1 ? an j an, we can nish the proof. Now set (3:34)

W i(f ) := Bi(f ) + ZiP0 (f ); f 2 F ; 1 i n:

Due to the independence of Zi and Bi, we obtain in this way independent P0{Brownian motions indexed by F . 34

Lemma 3.9. Given 0 < < 1 there exists an m() such that for each n m() and all t > 0

P 1max k kn

p

k p X

an ? bn

i=1

Wi kF 2pnan E kW 1 kF + t

!

?

2 exp ?t2=22nan :

Proof. The proof follows from (1.26), Levy's inequality and the following fact: for all x > 0

P k

(3:35)

n X i=1

!

p

?

W i kF > 2 nE kW 1 kF + x exp ?x2=2 ;

which can be readily derived from Lemma 3.1 of Ledoux and Talagrand (1981), ? where we use sup P0 f 2 1: f 2F

We can now infer from Propositions 3.1 and 3.2 and Lemmas 3.7, 3.8 and 3.9:

Proposition 3.3. Given 0 < < 1, there exists an m() such that for each n m(), one can construct independent P {Brownian motions W i ; 1 i n, indexed by F so that 0

(3:36) P

k p X

max kS (n) ? bn kn k

1

K9 nan exp ?K10

~ p

W ikF A nan LLn + n + ~n;[nan]

? 1:5 ? (=an )2 nanLLn + (Ln) + (Ln)

pi

=1

!

where A~ 1 is an absolute constant, K9 and K10 are constants depending on only and

35

(3:37)

~n;[nan] = E kT[(nan)n]kF + pnan E kBkF + pnan E kW 1kF :

Next set for 1 i n

p

Wi (f ) = bn W i(f hn); f 2 F :

(3:38)

and observe that by (1.25), Wi ; 1 i n, are again independent P0 {Brownian motions indexed by F . It is easy now to see that inequality (3.36) is still valid if p we replace bnW i (f ) by Wi (f; hn) := Wi(f h?n 1 ); 1 i n: We now have all of the necessary tools to nish the proof of Theorem 1.2. It is enough to show that for any given 0 < < 1=2 there is a construction possible such that with probability one (3:39)

n X p p n lim sup kSn (f ) ? Wi (fi ; hn)kF = nan LLn D ; n!1 i=1 ( )

where D is a positive constant not depending on . Statement (1.27) then follows from (3.39) by a known argument of Major (1976). Set mk := 2k?1; nk := mk+1 ? 1; k = 1; 2; : : : Using Proposition 3.3 in combination with (3.38), we can construct independent P0{Brownian motions Wi ; mk i nk such that for large k

~ mk 2K9 (Lmk )?1:5 ; P k 1:5Ac

(3:40)

p

where cn = nanLLn, and mk ) (f ) ? S (mk ) (f ) ? k := m max k S n nk?1 k nnk (

36

n X i=mk

Wi (f; hmk ) kF :

n) k + Notice that we are using the fact following from assumption (F.vii) that (E kT[(na n] F

n)=cn ! 0 as n ! 1.

Due to condition (F.v) we have for each mk n nk and 1 l k, n X n kSn (f ) ? Wi(f; hn )kF i=1 ( )

where

Xl i=1

Zk (1) := 1mmax kSm(mk) (f )kF ; n k?l

Zk (2) := 1mmax k n

m X

k?l i=1

It is easy now to see that (3:41)

k+1?i + Zk (1) + Zk (2);

Wi (f; hmk )kF :

n X n lim sup kSn (f ) ? Wi (f; hn)kF =cn n!1 i=1 Xl limsup k+1?i=cmk + limsup Zk (1)=cmk k!1 k!1 i=1 + limsup Zk (2)=cmk : k!1 ( )

Using (3.40) in conjunction with the Borel{Cantelli lemma, we readily obtain for 1 i l with probability one ~ (3:42) limsup k+1?i=cmk 23 A; k!1 In view of Proposition 3.3, we can assume that for any large k, there are independent P0{Brownian motions W i; 1 i mk such that ! m X p 3 ~ mk 2K9 (Lmk )?1:5 : Ac k W k j (3:43) P jZk (1) ? bmk 1mmax i F nk?l i=1 2 Moreover, by Levy's inequality and (3.35), we have for all large k ! m X ~ A ?A~2 2 2l =17 : 1=2 W k (3:44) P 1mmax k 2 ( Lm ) c =b i F k m nk?l i=1 2 k mk 37

Setting l := [7?1=2 ] and recalling that A~ 1, we can infer from (3.43) and (3.44) by a Borel{Cantelli argument with probability one ~ : lim sup Zk (1)=cmk 2A

(3:45)

k!1

Finally note that (3.44) also implies with probability one ~ (3:46) limsup Zk (2)=cmk A 2: k!1 Combining (3.41), (3.42), (3.45) and (3.46), we obtain statement (3.39) with D = 13A~, which nishes the proof of Theorem 1.2.

Proof of Corollary 1.1. Recall the notation from the introduction ; ; W~ n and K. 0

0

Step 1. Let nk := [qk ], where q > 1. We claim that with probability one p (3:47) d(W~ nk = 2Lk; K) ! 0 ; as k ! 1; and (3:48)

o

n

p the set of limit points of W~ nk = 2Lk k1 is equal to K:

In view of Theorem 4.1 of Carmona and K^ono (1976) it suces to show that for any linear functional H 2 B and k 1

lim E H W~ nk H W~ nk+m m!1

(3:49)

To see (3.49), note that by independence

~

~

E H Wnk H Wnk+m nk X i=1

= 0:

=

q

E (H (Wi;k )H (Wi;k+m ))= nk nk+m bnk bnk+m ; 38

? ?

where Wi;k+m = Wi f; hnk+m f 2F ; m 0. Moreover, we have E (H (Wi;k )H (Wi;k+m )) (E (H 2 (Wi;k ))1=2 (E (H 2 (Wi;k+m ))1=2

kH k2 (E kWi;k k2F E kWi;k+mk2F )1=2 :

Recalling (3.38), we readily obtain that

E (H (Wi;k )H (Wi;k+m )) kH k2 E kW1k2F (nk =nk+m)1=2 : and consequently (3.49)

Step 2. Let nk = nk (q) = [qk ] and set k (q) := n max nn k

k+1

kW~ n ? W~ nk kF :

To complete the proof it is obviously enough to show that

p

limsup k (q)= 2Lk c(q)a:s:;

(3:50)

n!1

where lim c(q) = 0: Towards this end we note that q#1 k (q) k;1(q) + k;2(q) + k;3(q); where

p

nk X b 1 1 n k k;1(q) := n max j p ? pnb j p k Wi (f; hnk ) kF ; k nnk+1 nk n bnk i=1

k k+1

k;2(q) := n max nn k

k k+1

k;3(q) := n max nn k

n X

p

(Wi (f; hn) ? Wi (f; hnk ))kF = nk bnk ;

i=1 n X

i=nk +1

p

Wi (f; hnk ) kF = nk bnk : 39

Since (3.47) holds we have using (A.iv) and (1.26)

p

lim sup k;1(q)= 2Lk 1 ? q?1=2a:s:

(3:51)

k!1

Notice that k;2(q) = n max nn k

k k+1

? where fn;k = f h?n 1 hnk :

n X i=1

p

(Wi (fn;k ; hnk ) ? Wi (f; hnk ))kF = nk bnk ;

Next we record the fact which follows from Lemma 3.11 below that (3:52)

limsup sup n max nn n!1 f 2F k

k+1

e0(fn;k ; f ) (q);

where (q) ! 0 as q # 1. Now for any > 0 set (3:53)

F~() = f(f; g) 2 F F : e0 (f; g) g

and for any real valued function T on F set (3:54)

kT kF~() = sup fjT (f ) ? T (g)j : e0(f; g) g :

We now can see that (3:55)

k

k k+1

=d n max nn

k k+1

k;2(q) n max nn k

n X

i=1 n X i=1

p

Wi (f; hnk ) kF~((q))= nk bnk

WikF~((q))=pnk : 40

Using Levy's inequality and the exponential inequality for P0{Brownian motions given in (3.35), we nd that with probability one

p

limsup k;2(q)= 2Lk pq(q)

(3:56)

k!1

Finally noting that k;3(q) =d 1nmax n ?n

k k

k+1

n X i=1

WikF =pnk ;

we obtain by a similar argument that with probability one (3:57)

p

p

limsup k;3(q)= 2Lk q ? 1 k!1

Combining (3.51), (3.56) and (3.57) we obtain (3.50) as soon as we have proved the following lemma.

Lemma 3.10 Let F be a class of functions satisfying (i) support (f ) J; f 2 F (ii) jf j K; f 2 F for some K > 0 (iii) F is totally bounded for 0: If, in addition, (F.ix) and (F.x) hold for a given sequence of bimeasurable invertible transformations hn : IRd ! IRd; then

(3:58)

limsup sup n max nn k!1 f 2F k

k+1

41

e0 (fn;k ; f ) (q);

where (q) ! 0 is q # 1:

Proof. First note that (ii) and (iii) imply that F is also totally bounded for e . Thus for any given " > 0 we can nd F (") = ff ; : : : ; fm g 2 F such that for all f 2F 0

1

(3:59)

min e0 (f; fi ) < ":

im

1

Moreover by Lusin's theorem each function fi 2 F (") can be approximated by a continuous gi bounded by K such that

e0(fi ; gi) < "; 1 i m

(3:60):

Denote this class of functions by G(") = fg1; : : : ; gmg. Next we choose a compact set A IRd such that P0(AC ) "=K 2. By uniform continuity of each of the functions gi on A we can select a > 0 such that jgi(x) ? gi(y)j < " whenever x 2 A and jx ? yj . Now choose q0 as in (F.x) such that for all 1 < q < q0 , (1.31) holds. Hence for all large k and nk n nk+1 for 1 i m

e20(gi (h?n 1 hnk ); gi ) " + 2

Z

g (x)1AC (x)dP0 (x) + d i

IR

2

" + 2": 2

Z

2 ?1 hn x)1AC (x)dP0 (x) g ( h k i n d

IR

Therefore for all large k and nk n nk+1, whenever 1 < q < q0, we have uniformly in f 2 F , using (F.ix) with (3.59), that

e0 (f; fk;n) e0 (f; gi ) + e0(fk;n ; gi (h?n 1 hnk ))

p

+ e0(gi ; gi(h?n 1 hnk )) " + M" + "2 + 2"; 42

where gi is selected so that e0(f; gi ) < ". Since " > 0 can be chosen arbitrarily close to zero, this completes the proof.

Acknowledgements The authors would like to thank the University of Bielefeld, University of Duesseldorf and the University of Paris VI for their hospitality. This paper was written while the authors were visiting these universities in the spring of 1994.

43

References Alexander, K. S.: Central limit theorems for stochastic processes under random entropy conditions. Probab. Th. Re. Fields 75 351{378 (1987). Arcones, M.: Necessary and sucient conditions for the law of the iterated logarithm of local processes. Preprint (1994). Carmona, R., Kono, N.: Convergence en loi et lois du logarithme itere pour les vecteurs gaussians. Z. Wahrscheinlichkeitstheorie verw. Gebiete 36 241{267 (1976). Deheuvels, P., Mason, D. M.: Nonstandard functional laws of the iterated logarithm for tail empirical and quantile processes. Ann. Probability 18 1693{1722 (1990). Deheuvels, P., Mason, D. M.: Functional Laws of the iterated logarithm for local empirical processes indexed by sets. Ann. Probability, to appear (1994a). Deheuvels, P., Mason, D. M.: Nonstandard local empirical processes indexed by sets. J. Stat. Plan. Inf., to appear (1994b). Dudley, R. M.: Distances of probability measures and random variables. Ann. Math. Stat. 39 1563{1572 (1968). Dudley, R. M.: Central limit theorems for empirical measures. Ann. Probability 6 899{829 (1978). Dudley, R. M.: A course on empirical processes. E cole d' E te de Probabilities de Saint{Flour XII{1982. Lecture Notes in Math. 1097 2{142 Berlin: Springer, 1984. Dudley, R. M., Philipp, W.: Almost sure invariance principles for sums of Banach space valued random elements and empirical processes. Z. Wahrscheinlichkeitstheorie verw. Gebiete 62 509{552 (1983). Gine, E., Zinn, J.: Some limit theorems for empirical processes. Ann. Probability 12 929{989 (1984). Ibragimov, I.A., Rozanov, Y.A.: Gaussian Random Processes. New York, Springer 1978. Ledoux, M., Talagrand, M.: Probability in Banach Spaces. New York: Springer, 1991. 44

Major, P.: Approximation of partial sums of i:i:d:r:v:s when the summands have only two moments. Z. Wahrscheinlichkeitstheorie verw. Gebiete 35 221{229 (1976). Mason, D. M.: A strong invariance theorem for the tail empirical process. Ann. Inst. Henri Poincare Probab. Statis. 24 491{506 (1988). Pollard, D.: A central limit theorem for empirical processes. J. Austral. Math. Soc. (Series A) 33 (1982). Pollard, D.: Convergence of Stochastic Processes. New York: Springer, 1984. Sheehy, A., Wellner, J. A.: Uniform Donsker classes of functions. Ann. Probability 20 1983{2030 (1992). Stute, W.: Conditional empirical processes. Ann. Statist. 14 638{647 (1986). Talagrand, M.: Sharper bounds for Gaussian and empirical processes. Ann Probability 22 28{76 (1994). Zaitsev, A. Yu.: On the Gaussian approximation of convolutions under multidimensional analogues of S. N. Bernstein's inequality conditions. Probab. Th. Rel. Fields 74 534{566 (1987).

45

Gaussian Approximation of Local Empirical Processes ... - CiteSeerX

Gaussian Approximation of Local Empirical Processes ... - CiteSeerX

Suggest Documents

Local Times of Gaussian Processes

APPROXIMATION FOR BOOTSTRAPPED EMPIRICAL PROCESSES ...

Local times for multifractional square Gaussian processes

new gaussian approximation for performance evaluation ... - CiteSeerX

new gaussian approximation for performance evaluation ... - CiteSeerX

Throughput Approximation of Decision Free Processes ... - CiteSeerX

Lower Tail Probabilities for Gaussian Processes - CiteSeerX

Bayesian Learning with Gaussian Processes for ... - CiteSeerX

Gaussian stationary processes: adaptive wavelet ... - CiteSeerX

Using Gaussian Processes to Optimize Expensive ... - CiteSeerX

Simulating Gaussian Stationary Processes with ... - CiteSeerX

FROZEN GAUSSIAN APPROXIMATION FOR THE

Latent Correlation Gaussian Processes

Gaussian variational approximation for high

Multimodal Deep Gaussian Processes

Local Gaussian Processes Regression for Real-Time Model-Based

Mixtures of Gaussian Processes - DBS

Local empirical processes near boundaries of convex bodies

Gaussian Processes - Iterative Sparse

The Empirical Determinants of Local-Currency ... - CiteSeerX

Simple Approximation Algorithm for Nonoverlapping Local ... - CiteSeerX

Simple Approximation Algorithm for Nonoverlapping Local ... - CiteSeerX

Gaussian Approximation Potentials: the accuracy of quantum ...

Dirac Mixture Approximation of Multivariate Gaussian Densities