cal process, innovation process, measure-valued martingale, non-parametric ... The classical solution of the two-sample problem when the dimension m = 1 is ...
THE TWO{SAMPLE PROBLEM IN IRm AND MEASURE{VALUED MARTINGALES J.H.J. Einmahl Eindhoven University of Technology
E.V. Khmaladze University of New South Wales and A. Razmadze Mathematical Institute
The so-called two-sample problem is one of the classical problems in mathematical statistics. It is well-known that in dimension one the two-sample Smirnov test possesses two basic properties: it is distribution free under the null hypothesis and it is sensitive to `all' alternatives. In the multidimensional case, i.e. when the observations in the two samples are random vectors in IRm , m 2, the Smirnov test loses its rst basic property. In correspondence with the above, we de ne a solution of the two-sample problem to be a stochastic process, based on the two samples, which is () asymptotically distribution free under the null hypothesis, and which is, intuitively speaking, ( ) as sensitive as possible to all alternatives. Despite the fact that the two-sample problem has a long and very diverse history, starting with some famous papers in the thirties, the problem is essentially still open for samples in IRm ; m 2. In this paper we present an approach based on measure-valued martingales and we will show that the stochastic process obtained with this approach is a solution to the two-sample problem, i.e. it has both the properties () and ( ), for any m 2 IN .
AMS 1991 subject classi cations. Primary 62G10, 62G20, 62G30; secondary 60F05, 60G15, 60G48. Key words and phrases. Dirichlet (Voronoi) tessellation, distribution free process, empirical process, innovation process, measure-valued martingale, non-parametric test, twosample problem, weak convergence, Wiener process. Running head: The two-sample problem in IRm . Research partially supported by European Union HCM grant ERB CHRX-CT 940693. Research partially supported by the Netherlands Organization for Scienti c Research
(NWO) while the author was visiting the Eindhoven University of Technology, and partially by the International Science Foundation (ISF), Grant MXI200.
1. Introduction. Suppose we are given two samples, that is, two independent sequences fXi0 gn1 1 and fXi00 gn1 2 of i.i.d. random variables taking values in m-dimensional Euclidean space IRm ; m 1. Denote with P1 and P2 the probability distributions of each of the Xi0 and X 00i and write Pbn1 and Pn for the empirical distributions of the rst sample and of the pooled sample fXi0 gn1 1 [ fXi00 gn1 2 respectively, i.e. (1.1)
n bPn1 (B ) = 1 X1 11B (Xi0 ) ; n 1 i=1
! n2 n1 X X 1 00 0 Pn(B ) = n 11B (X i) + 11B (Xi ) ; n = n1 + n2 i=1
i=1
where B is a measurable set in IRm and 11B is its indicator function. Consider the dierence (1.2)
21 vn (B ) = nn1 n (Pbn1 (B ) ? Pn (B )); B 2 B; 2
and call the random measure vn () the (classical) two-sample empirical process with the indexing class B. Throughout we avoid the double index (n1 ; n2 ); this can be done without any ambiguity letting n1 = n1 (n) and n2 = n2 (n). We will always assume n1 ; n2 ! 1 as n ! 1. The indexing class B is important for functional weak convergence of vn and will be speci ed in Sections 3{5. The problem of testing the null hypothesis
H0 : P 1 = P 2 ; called `the two-sample problem', is one of the classical problems of statistics. The literature on the two-sample problem is enormous. In here we are able to mention only very few of the papers on the subject, namely those in direct relation to the aims of the present work. The speci c feature of the two-sample problem is that the under H0 presumed common distribution P (= P1 = P2 ) remains unspeci ed and can be any within some typically very large class P . Hence, it is important to have some supply of test statistics such that their null distributions, at least asymptotically as n ! 1, are independent of this common distribution P 2 P . Such statistics are called asymptotically distribution free. The classical solution of the two-sample problem when the dimension m = 1 is associated with Smirnov (1939) where rst the two-sample empirical process (1.3)
n1n 1=2
vn (x) = n 2
(Fbn1 (x) ? Fn (x)); x 2 IR1 ;
was introduced, where Fbn1 and Fn stand for the empirical distribution functions of the rst and the pooled sample respectively, and the limiting distribution of its supremum was derived. This limiting distribution was shown to be free from P provided P 2 Pc , the class of all distributions on IR1 with a continuous distribution function. This classical statement was an 1
early re ection of the now well-known fact that the process (1.4)
vn Fn?1 (t); t 2 [0; 1] ;
converges in distribution, for all P 2 Pc , to a standard Brownian bridge v (see, e.g., Shorack and Wellner (1986)). Then, for a large collection of functionals ', the statistics '(vn Fn?1 ) converge in distribution to '(v) and hence are asymptotically distribution free. Recently Urinov (1992) considered another version of the two-sample empirical process in IR1 : (1.5)
1 n1n 1=2 0 Zx 1 ? Fbn (y?) 1 @Fbn1 (x) ? Mn(x) = n Fn(dy)A ; x 2 IR1 ; 1 ? F ( y ? ) 2 n ?1
and proved that the process Mn Fn?1 is also asymptotically distribution free: for all P 2 Pc it converges to a standard Brownian motion W on [0; 1]. The convergence in distribution of the process (1.3) when x 2 IRm ; m 2, was studied in Bickel (1969). Though asymptotically distribution free processes or statistics were not obtained in that paper, the general approach was well-motivated. Namely, to obtain an asymptotically correct approximation for the distribution of statistics based on vn , like, for example, the Smirnov statistic (1.6)
sup jvn (x)j ;
x2IRm
he studied the conditional distribution of vn given Fn . This conditioning, also adopted in Urinov (1992), and being also a part of the approach of the present paper (see Sections 3 and 4), is motivated by the fact that, under H0 , one can construct the two-sample situation as follows. Let fXi gn1 be a sample of size n from a distribution P 2 P . Let also fi gn1 be n Bernoulli random variables independent of fXi gn1 and sampled without replacement from an urn containing n1 `ones' and n2 'zeros'. Now the set of Xi 's with i = 1 is called the rst sample and those with i = 0 is called the second sample. Any permutation of f(Xi ; i )gn1 independent of fi gn1 will not alter the distribution of fi gn1 . Hence, for statistics '(fXi gn1 ; fi gn1 ) their conditional distribution given Fn is induced by a distribution free from P. Actually, this is the basic approach of all permutation tests and dates back at least as far as Fisher (1936) and Wald and Wolfowitz (1944). A good review of the theory of permutation tests can be found in Puri and Sen (1971). It should be noted, however, that most of the permutation tests are based on asymptotically linear in fi gn1 , and hence asymptotically normal, statistics. To essentially nonlinear statistics, like (1.6), this approach was rst applied in Bickel (1969), to the best of our knowledge. There are several other methods for obtaining statistically important versions or substitutes of the two-sample empirical process, see, e.g., Friedman and Rafsky (1979), Bickel and Breiman (1983), Kim and Foutz (1987), and Henze (1988) for interesting approaches. Though we just discussed the two-sample problem and its solution, the precise mathematical formulation of the problem has not been given yet. The requirement of asymptotically distribution freeness can not be sucient to formulate the problem for it can be trivially satis ed. Another condition on \sensitivity" towards alternatives must be also imposed. 2
In this paper we propose two related formulations of the problem (Section 2), one of them imposes quite strong requirements. Then in Section 3 we construct a (signed-)measurevalued martingale Mn , which is a generalization of the process (1.5) of Urinov (1992), and its renormalized versions un and wn . We prove limit theorems for the asymptotically distribution free modi cations un and wn as well as for Mn , both under the null hypothesis (Section 4) and under alternatives (Section 5) and show that under natural conditions un and wn are solutions of the two-sample problem.
2. General notations; some preliminaries; formulation of the two-sample problem. As we remarked in the Introduction, in the classical two-sample problem in IR1 it is required that under H0 the common distribution P belongs to the class Pc of distributions having continuous distribution functions, and for this class of P 's, the Smirnov process vn Fn?1 and the Urinov process Mn Fn?1 are asymptotically distribution free. In IRm, we also need some requirements under H0 . Let denote Lebesgue measure and let from now on P denote the class of all distributions P with the properties (C1) P ([0; 1]m ) = 1; (C2) f = dP=d > 0 a.e. on [0; 1]m . Condition (C1) is not an essential restriction since it can be satis ed in several ways. For example, if Y1 ; :::; Ym denote the coordinates of some absolutely continuous m-dimensional random vector Y and if G1 ; :::; Gm are some absolutely continuous distribution functions on IR such that the range of Yk is contained in the support of Gk ; k = 1; :::; m, then the random vector X with coordinates Xk = Gk (Yk ), k = 1; :::; m, has an absolutely continuous distribution on [0; 1]m . Another, perhaps better, possibility is to reduce the pooled sample to the sequence fRi gn1 , where the coordinates of each Ri are the normalized coordinatewise ranks of the corresponding coordinates of the i-th observation. (Note that the thus obtained ?1 ), where Fjn ; j = 1; : : : ; m, are two-sample empirical process is equal to vn (F1?n1 ; :::; Fmn the pooled marginal empirical distribution functions.) Though there is de nitely no absolute continuity of the distribution of Ri ; i = 1; :::; n, we will indicate below how the subsequent program can go through for these ranks (see e.g. Lemma 3.5). Condition (C2) represents a certain restriction. Observe, however, that the processes un and wn , de ned below, have limiting distributions which depend on P only through its support. Besides the classical two-sample empirical process vn there can be many other random measures which are also functionals of Pbn1 and Pn and could also be called two-sample empirical processes. We will obtain versions of such processes which will be asymptotically distribution free from P 2 P . It is also needed that such a process is suciently sensitive to possible alternatives. To formulate both requirements precisely we need to describe the class of alternatives. In fact, it will be the class of all compact contiguous alternatives to the two-sample null hypothesis. Here is the precise condition: (C3) The distributions P1 and P2 of each of the Xi0 and Xi00 , respectively, depend on n and are, for each n, absolutely continuous w.r.t. some P 2 P , and the densities dPj =dP; j = 1; 2, admit the following asymptotic representation (2.1)
dP1 21 dP
p = 1 + 2p1n 1 ? p0 h1n ; 1
3
dP2 21 dP
= 1 ? 2p1n pp0 h2n ; 2
Z
R
and (hjn ? h)2 dP ! 0, j = 1; 2, for some h with 0 < jjhjj2 := h2 dP < 1, while n1 =n ! p0 2 (0; 1). The distribution of the pooled sample fXi0 gn1 1 [fXi00 gn1 2 under P is certainly the n-fold direct product P n = P P . It is well-known (Oosterho and van Zwet (1979)) that its distribution under the alternative (2.1), which is the direct product P1n1 P2n2 , is contiguous with respect to P n , and that under P n (2.2)
n1 n2 Ln := ln d(P1 dPnP2 ) !d N (? 21 jjhjj2 ; jjhjj2 ) ;
with N (; 2 ) denoting a normal random variable with mean and variance 2 . Hence, under P1n1 P2n2 (2.3)
Ln !d N ( 21 jjhjj2 ; jjhjj2 ) :
Note that, in (2.1), it could seem more natural to start with some functions h1n and h2n converging to h1 and h2 , instead of converging both to h. However it can be shown that this general situation reduces to (2.1) as it stands, when we replace P by a strategically chosen new P , namely the one such that (P; P ) is `closest' to (P1 ; P2 ), where this closeness is measured in terms of the distance in variation between P n and P1n1 P2n2 : (2.4)
d(P1n1 P2n2 ; P n ) = P1n1 P2n2 (Ln > 0) ? P n(Ln > 0) ;
a very proper distance in a statistical context. Indeed, it is clear that
d(P1n1 P2n2 ; P n ) = 0max ( () ? ) ; 1 n where n () is the power of the -level Neyman-Pearson test for P n against P1n1 P2n2 . According to (2.2) and (2.3) (2.5)
d(P1n1 P2n2 ; P n ) ! 2( 21 jjhjj) ? 1 =: ;
where is the standard normal distribution function. Now we can formulate the two-sample problem. Let B B0 , where B0 denotes the class of all Borel-measurable subsets of [0; 1]m , and consider a sequence of random measures fngn1 restricted to B. The sequence fngn1 will be called a strong P -solution of the two-sample problem, if there exists a measurable space X such that 4
() under P n , for each P 2 P ,
n !d in X and the distribution Q of is the same for all P 2 P ; ( ) under P1n1 P2n2 , for each sequence of alternatives (2.1),
n !d e
in X and the distribution Qe of e is such that
d(Qe; Q ) = . Such a n provides a proper background to base two-sample tests upon. It might be convenient from computational or other points of view to sacri ce a bit of power in favour of, say, computational simplicity: the sequence fn gn1 is called a weak P -solution of the two-sample problem if it possesses property () and if ( ) under P1n1 P2n2 , for each sequence of alternatives (2.1)
n !d e
in X and d(Qe; Q ) > 0 : In the subsequent sections our choice of the space X will be the space l1(B). We will prove that the sequence of random measures fun gn1 (see (3.15)) is a strong P -solution, and we consider also a sequence of dierently normalized random measures fwn gn1 (see (3.16)) and show that under natural assumptions it is a weak P -solution. Remark that for an appropriate indexing class B the classical two-sample empirical process possesses the property ( ), though not the property (). The processes introduced in Friedman and Rafsky (1979) and Henze (1988) have, obviously, the property (). We do not know if they have the property ( ). 3. Two-sample scanning martingales. No ltration is intrinsic to the classical twosample problem. However, we will introduce now one and we will consider the two-sample empirical process along with this ltration to obtain another process with a more simple and convenient behaviour. Let A = fAt ; t 2 [0; 1]g be a family of closed subsets of [0; 1]m with the following properties: 1) A0 = ;; A1 = [0; 1]m ; 2) At At0 if t t0 ; 3) (At ) is continuous and strictly increasing in t. 5
This family will be called a scanning family. Denote with Xi an element of the pooled sample fXi0 gn1 1 [ fXi00 gn1 2 , with Xi0 and Xi00 reordered in some arbitrary and for us unimportant way. Under the two-sample null hypothesis this pooled sample fXi gn1 is simply a sequence of i.i.d. random variables with distribution P 2 P each. Let (3.1)
t(Xi ) = minft : Xi 2 At g
and denote with fti gn1 the order statistics based on ft(Xi )gn1 and let fX(i) gn1 be the correspondingly reordered Xi 's. Put also t0 = 0 and tn+1 = 1 when needed. Later it will be useful to have in mind that absolute continuity of P (condition (C2)) implies that all the ti are dierent a.s. De ne Bernoulli random variables i , 1 i n, by
i = 11fX(i) 2 rst sampleg : Now under H0 the sequence fi gn1 is independent of the fX(i) gn1 and the distribution of the fi gn1 is that of sampling without replacement from an urn containing n1 `ones' and n2 `zeros' (see Section 1). Now we de ne the ltration based on the scanning family A. Let
F0 = fPn (B ); B 2 B0g ; Ft = fPbn1 (B \ At ); B 2 B0 g _ F0 ; t 2 (0; 1] ; F(i) = f 2 F1 : \ fti tg 2 Ft g = fj : j ig _ F0 . If P is continuous, then we have that the conditional distribution of Pbn1 given F0 is free from P , but conditioning on F(j) also produces a simple distribution for Pbn1 : (3.2)
IP fX(i) 2 rst sample jF0 g = IP fPbn1 (X(i) ) = n11 jF0 g = nn1
and, for j i ? 1, (3.3)
n1 Pbn1 (Actj ) IP fX(i) 2 rst sample jF(j) g = nP ; n (Actj )
where Ac = [0; 1]m nA; note that nPn(Actj ) = n ? j a.s. We will write c n1 Pbn1 (Acti?1 ) 1 Pbn1 (At? ) b ; 0 t t ; p = pb(t) = nnP n i (Ac ) nP (Ac ) ; i = 1; :::; n : n
n
t?
ti?1
It is clear that pb(t) = pbi for ti?1 < t ti , i = 1; :::; n, that is pb(t) is Ft -predictable. The classical two-sample empirical process in (1.2) can now be rewritten as (3.4)
vn (B ) = n nn 1 2
21 X n i=1
11B (X(i) )(i ? nn1 ) : 6
Consider now vn along with the ltration fFt ; t 2 [0; 1]g in a way similar to the construction introduced in Khmaladze (1993), i.e. for each B consider vn (B \ At ), or, equivalently, consider Pbn1 (B \ At ), as Pn is F0 -measurable. By doing this we obtain a new object in the two-sample theory, which is for each B a semimartingale with respect to fFt ; t 2 [0; 1]g and for each t a random measure on B0 . Hence we gain the possibility to apply to vn and Pbn1 the well-developed theory of martingales and of marked point processes; see e.g., Bremaud (1981), Jacod and Shiryayev (1987), Karr (1991, Chapter 2). More speci cally, for given B consider the (normalized) martingale part of the submartingale fPbn1 (B \ At ); Ft ; t 2 [0; 1]g. We obtain
IE (Pbn1 (B \)jFs ) = nn pb(s)Pn (B \ Ads ) ; 1
so that (3.5)
n1n 12
Mn(B; t) = n 2
Zt n b (Pn1 (B \ At ) ? n pb(s)Pn (B \ Ads )) 1
0
is a martingale in t. For a class B B0 let (3.6)
a(B) = fB \ At : B 2 B; At 2 Ag:
It is clear that Mn (B; t) = Mn (B \ At ; 1) for any B 2 a(B): Therefore most of the time we will consider the random measure Mn (; 1) on a(B) and denote it simply Mn (): However, because the classes B and A will play an asymmetric role we will keep also the notation Mn (B; t): It is easily seen that Mn (B \ At ) can be rewritten as (3.7)
Zt vn(As?) Mn(B \ At ) = vn(B \ At ) + P (Ac ) Pn (B \ Ads ) : 0
n
s?
Also (3.8)
Mn(B ) = n nn 1 2
12 X n i=1
11B (X(i) )(i ? pbi) :
It is convenient to compare this expression with (3.4) to see the dierence between Mn and vn . Among the rst properties of Mn let us mention one: denote At = At+t ? At ; then Mn(B \ At ) = 0 if Pn (B \ At ) = 0: The next property is stated in the following Lemma 3.1. Let the class B B0 be such that [0; 1]m 2 B. Given F0 , the restriction of the random measure vn to a(B) de nes the restriction of the random measure Mn to a(B) in a one-to-one way. 7
The proof of Lemma 3.1 is rather easy, but the lemma itself is important for justi cation of inference based on Mn , for it says, heuristically, that what can be achieved in testing based on vn can also be achieved based on Mn and vice versa. Proof of Lemma 3.1. For each C = B \ A the value of Mn (C ) can be derived using (3.7), since all At 2 a(B). Now the other way around. Choose B = [0; 1]m and consider the equation (3.9)
Zt vn (As?)
Mn(At ) = vn (At ) + P (Ac ) Pn (Ads ) : n s? 0
It is well-known that it has the unique solution
vn (At? ) = Pn (Act? )
Zt? Mn(Ads ) 0
Pn (Acs ) _
Hence, for any B , the unique inverse of (3.7) is (3.10)
Zt
Z ? Mn(Ads )
0
0
vn (B \ At ) = Mn(B \ At ) ? Pn (B \ Ad )
Pn(Acs) Zt Pn (B \ At ) ? Pn(B \ As) = Mn (B \ At ) ? Mn (Ads ) : 2 cs ) P ( A n 0
Not only the properties of Mn (B; t) are convenient in t, but also the properties of it in B are substantially simpler than those of vn (B ) as the next lemma will show. The two martingales Mn (B; ) and Mn (C; ) are called orthogonal if the process
hMn(B; ); Mn (C; )i(t) =
X
ti t
IE (Mn (B; ti )Mn (C; ti )jFti?1 )
is identically 0. For C = B the process hMn (B; ); Mn (B; )i = hMn (B; )i is called the quadratic variation process. In words, hMn (B; ); Mn (C; )i is a partial sum process of conditional covariances, whereas hMn (B; )i is a partial sum process of conditional variances. According to (3.3) and (3.8) (3.11)
hMn(B; ); Mn (C; )i(t) = n nn
X
1 2 ti t
hMn(B; ) ? Mn(C; )i(t) = n nn
11B\C (X(i) )pbi (1 ? pbi ) ;
X
1 2 ti t
11BC (X(i) )pbi (1 ? pbi ) :
This leads to the aforementioned lemma. 8
Lemma 3.2. If B and C are disjoint, then Mn (B; ) and Mn (C; ) are orthogonal. Therefore, given F0 ; Mn () is a random measure with uncorrelated increments: for B; C 2 a(B) with Pn (B \ C ) = 0
IE (Mn(B )Mn (C )jF0 ) = 0 : Note that this is certainly not the case for vn , since
n (P (B \ C ) ? P (B )P (C )) : IE (vn(B )vn(C )jF0 ) = n ? n n 1 n The process Mn (B; t) is essentially an analogue in IRm of the process in (1.5), studied in Urinov (1992) in IR1 ; see also Caba~na and Caba~na (1994), Section 3, where `wave components' of a Wiener process are studied in IR2 . The quadratic variation describes the `intrinsic' time of a martingale. As formula (3.11) suggests hMn (B; )i(t) Pn (B \At ) : Indeed we have, with a0 (B) = fC1 \C2 : C1 ; C2 2 a(B)g, Lemma 3.3. Assume P 2 P and n1 =n ! p0 2 (0; 1). i) For a.a. sequences fPn gn1 , conditionally on F0 , (3.12)
sup jhMn (B; )i(1) ? Pn (B )j ! 0 a:s: (n ! 1) ;
B2B0
and hence in particular for any B 2 B0 , conditionally on F0 ,
hMn(B; )i(t) ! P (B \ At ) a:s: (n ! 1): ii) Suppose the class B is a Vapnik-Chervonenkis (VC) class. Then sup0 jPn (C ) ? P (C )j ! 0 a:s: (n ! 1) : C 2a (B)
The proof of this lemma is essentially contained in the proof of Lemma 3.4 below and will hence be omitted. Observe that (ii) is just a version of the Glivenko-Cantelli theorem; it is stated here under the above explicit condition to stay in line with Lemma 3.4 and Theorem 4.1. It follows that the provisional limit, under H0 , for Mn (B ) is the restriction to a(B) of a mean zero Gaussian random measure WP (B ) with covariance function (3.13)
IEWP (B )WP (C ) = P (B \ C )
(i.e. Wiener random measure with `time' P ) and apparently, Mn (B ) is not asymptotically distribution free. Therefore we will renormalize it in two dierent ways, see un and wn below. The idea of both these normalizations is inspired by the following simple result (see, e.g., Khmaladze (1988)): If WP is a Wiener random measure on B0 with covariance function (3.13) and P 2 P , then 9
(3.14)
W (B ) =
Z
1 W (dx); B 2 B ; 0 f (x) P B 1 2
is a standard Wiener random measure, i.e. it has covariance function
IEW (B )W (C ) = (B \ C ) : An empirical version of this transformation will be applied to Mn . Suppose (C4) there exists an F0 -measurable density estimator fbn (0 fbn < 1), such that if each Xi has distribution P 2 P , then for all c0 limsup cn
jfbn(x) ? f (x)j c a:s: ;
sup
x:f (x)c0
n!1
with c = c(c0 ), cn known and cn ! 1 (n ! 1); moreover for all suciently large c0 liminf n!1
inf
x:f (x)c0
fbn (x) > 1 a:s:
It is not dicult to see that such estimator exists (see, e.g., Silverman (1986) and Scott (1992)), such an fbn exists under mild smoothness conditions on f . Set fn(x) = fbn (x)_c?n 1 ; x 2 [0; 1]m , and introduce the random measure un by (3.15)
un(B \ At ) un (B; t) =
= n nn 1 2
21 X n
1 2
1
i=1 fn (X(i) )
Z
1 M (dx) n B\At fn (x) 1 2
11B\At (X(i) )(i ? pbi) ;
Since fn is F0 -measurable un (B; ) is a martingale indeed, and n X
hun(B; )i(t) = n nn
1
1 2 i=1 fn (X(i) )
11B\At (X(i) )pbi (1 ? pbi ) :
Eventually we will prove that un is a strong P -solution of the two-sample problem. To prove convergence in distribution of un we will need the following result. Set
n(C ) = n1
n X 1 11 (X ) f (X ) C i i=1 n
i
Lemma 3.4. Assume P 2 P , (C4) holds and n1 =n ! p0 2 (0; 1). (i) For a.a. sequences fPn gn1 , we have conditionally on F0 , for all B 2 B0
hun(B; )i(t) ! (B \ At ) a:s: (n ! 1) : 10
(ii) Suppose the class B is a VC class. Then sup jn (C ) ? (C )j ! 0 a:s: (n ! 1) :
C 2a0 (B)
It could be noted that the initial observation behind the proof of the lemma is that, according to the Kolmogorov strong law of large numbers (SLLN), for each B 2 B0 n 1 11 (X ) ! (B ) (= Z 1 dP )fa:s: (n ! 1) : 1X n i=1 f (Xi ) B i f B
Before we prove this lemma let us introduce another normalization of Mn . Consider the Dirichlet (or Voronoi) tessellation of [0; 1]m associated with the sequence fXi gn1 : for each Xi let (Xi ) = fx 2 [0; 1]m : jjx ? Xi jj = min jjx ? Xj jjg 1j n
and let for each C
Cen =
[
Xi 2C
(Xi ) ; a:s:
en(C ) = (Cen ) =
n X i=1
((Xi ))11C (Xi ) :
Now introduce (3.16)
wn (C ) =
n n X (((X(i) ))) 12 11C (X(i) )(i ? pbi ) : 1 (n1 n2 ) 2 i=1
Then again, since the sequence f((X(i) ))gn1 is F0 -measurable, wn (B \ At ) is for each t a random measure in B and for each B a martingale in t, and, in the obvious notation,
hwn(B; )i(t) = nnn 2
n X
1 2 i=1
((X(i) ))11B\At (X(i) )pbi (1 ? pbi ) :
The expression in (3.16) also can be viewed as another empirical analogue of (3.14):
wn (C ) =
Z
1 M (dx) ; n e f ( x ) n C 1 2
since the step-function fen(x) = (n((Xi )))?1 for all inner points x 2 (Xi ) (and let it be 1 on the boundaries (Xi ) \ (Xj )) can be considered as a density estimator, though an 11
inconsistent one. Its analogue on IR is essentially the 1-nearest neighbour estimator. Denote
(Xi) = x2max jjx ? Xijj : (X ) i
We will consider fXi gn1 which do not necessarily form a random sample, in order to justify to some extent the possibility of using the normalized ranks fRi gn1 as mentioned in Section 2. For these more general Xi , the i which determine rst and second sample are as in Section 1.
i n, are random vectors in [0; 1]m (Xi 6= Xj a.s. for i 6= j ), such that for their empirical distribution Pn we have Pn !w P a.s. (n ! 1), for some P 2 P . Lemma 3.5. Suppose that the Xi , 1
(i) Then
n = 1max (Xi ) ! 0 a:s: (n ! 1) : in (ii) For C B0 , set C " = fx 2 [0; 1]m : jjx ? C jj < "g and C" = ((C c )" )c . Suppose C B0 is such that A C and (3.17)
lim sup (C "nC" ) ! 0 : "#0 C 2C
If n1 =n ! p0 2 (0; 1), then for a.a. sequences fPn gn1 , conditionally on F0 ,
sup jhwn (C; )i(1) ? (C )j ! 0 a.s. (n ! 1) :
C 2C
(iii) Also, under (3.17) sup jen (C ) ? (C )j ! 0 a.s. (n ! 1) : C 2C
Proof of Lemma 3.4. Consider
jhun(B; )i(t) ? (B \ At )j n n 1 X X jhun(B; )i(t) ? n1 f (X1 ) 11B\At (X(i) )j + j n1 f (X ) 11B\At (Xi ) ? (B \ At )j : i=1
(i)
i=1
By the SLLN, as n ! 1, n 1X 1 11 (X ) ! Z 1 11 dP = (B \ A ) a.s. t n i=1 f (Xi ) B\At i f B\At
12
i
So it suces to consider
X 1 n f 1 11B\At (X(i) ) n2 pbi(1 ? pbi) ? 1 11B\At (X(i) )g n i=1 fn(X(i) ) n1n2 f (X(i) ) 2 n2 1 X n 1 n X 1 n 1 1 n f (X ) ? f (X ) 4n n + n f (X ) n n pbi(1 ? pbi) ? 1 : n i i 1 2 1 2 (i) i=1
i=1
First we show that the last term above converges to 0 a.s. Obviously for any " 2 (0; 1) this term is equal to (3.18)
2 1 n2 pb (1 ? pb ) ? 1 : n pbi(1 ? pbi) ? 1 + 1 X i n X(i)62A1?" f (X(i) ) n1n2 i n X(i) 2A1?" f (X(i) ) n1 n2 X
1
1
Since P 2 P , we have P (A1?" ) < 1 and hence it follows from a kind of conditional GlivenkoCantelli theorem that
n1 j sup jbp(t) ? n1 j ! 0 a.s. (n ! 1) : jb p ? max i X 2A n n (i)
t1?"
1?"
(Actually this conditional Glivenko-Cantelli theorem is well-known and is essentially proved in a version of the proof of the ordinary Glivenko-Cantelli theorem for VC classes, see Gaenssler (1983, pp. 28{34).) This yields in combination with the SLLN that
2 n pbi(1 ? pbi) ? 1 ! 0 a.s. n X(i) 2A1?" f (X(i) ) n1 n2 1
X
1
The second term in (3.18) is not greater than
n2 1 X 1 1 c 4n1 n2 n Xi 62A1?" f (Xi ) ! 4p0 (1 ? p0 ) (A1?" ) a.s. For arbitrary > 0, this last expression is less than for " suciently small. Gathering everything we see that the proof of part (i) is complete if we show that
Z 1 1 n 1 1 1X n i=1 j fn (Xi ) ? f (Xi) j = m j fn ? f jdPn ! 0 a.s. [0;1]
De ne for 0 < < 1 < c0 , D1 = fx 2 [0; 1]m : f (x) < g, D2 = fx 2 [0; 1]m , f (x) c0 g, and D3 = fx 2 [0; 1]m : f (x) > c0 g. Then for large enough c0 , we have by (C4) limsup n!1
Z Z 1 1 j f ? f jdPn limsup dPn = P (D3 ) < a.s. n n!1 D D 3
3
13
Also for small enough we have from the de nition of fn (3.19)
limsup n!1
c lim
n!1
Z 1 1 Z j f ? f jdPn limsup jfnf ?f f j dPn n n n!1 D D1 1 Z 1 D1
f dPn = c(D1 ) < a.s.
Finally by (C4)
Z Z 1 1 j f ? f jdPn = limsup jfnf ?f f j dPn n n!1 D n!1 D n 2 2 Z 12 limsup jfn ? f jdPn 12 nlim !1 sup jfn(x) ? f (x)j = 0 a.s. limsup
n!1
x2D2
D2
Since is arbitrary this completes the proof of part (i). Now we will proove part (ii). We have
Z 1 Z 1 sup0 jn (C ) ? (C )j = sup0 j f dPn ? f dP j
C 2a (B)
C 2a (B)
C
n
C
Z 1 1 Z 1 j ? jdPn + sup0 j f (dPn ? dP )j : sup0 C 2a (B) C fn f C 2a (B) C But sup0
C 2a (B)
Z 1 1 Z 1 1 j f ? f jdPn = j f ? f jdPn ; n n C [0;1]m
which converges to 0 a.s. as we just showed. So nally we have to prove
Z 1 sup0 j f (dPn ? dP )j ! 0 a.s. (n ! 1) :
C 2a (B)
C
This is, however, a routine matter: since B is a VC class and
Z [0;1]m
f dP 1
= 1 < 1, the class
of functions f f C : C 2 a0 (B)g is a Glivenko-Cantelli class. 2 Proof of Lemma 3.5. (i) For k 2 IN , let Hk be the nite set of hypercubes of the 1 11
form mj=1[rj =k; (rj + 1)=k], rj 2 f0; 1; :::; k ? 1g. Since Pn !w P a.s. and P 2 P , we see that for all k 2 IN sup jPn (H ) ? P (H )j ! 0 a.s.
H 2Hk
14
But since inf H 2Hk P (H ) > 0, this easily implies that n ! 0 a.s. We now prove part (iii). Let " > 0. Since n ! 0 a.s. and for all C 2 C , Cen C " and (Ce c )n (C c )" , that is C" Cn C ", as soon as n < ", we have limsup sup jen (C ) ? (C )j sup (C " nC" ) a.s. n!1
C 2C
C 2C
Now using (3.17) proves part (iii). Finally we consider part (ii). Because of part (iii), it is sucient to show that sup jhwn (C; )i(1) ? en (C )j
C 2C
= sup j
n X
C 2C
i=1
2 ((X(i) ))11C (X(i) )f nnn pbi(1 ? pbi) ? 1gj 1 2
n X
2 ((X(i) ))j nnn pbi (1 ? pbi) ? 1j ! 0 a.s. 1 2 i=1
This last expression, however, can be treated in much the same way as
2 n pbi(1 ? pbi) ? 1 ; n i=1 f (X(i) ) n1 n2 n 1X
1
2
in the proof of Lemma 3.4.
H0 : property (). Let B B0 be the indexing class for the random measures Mn , un and wn de ned by (3.8), (3.15) and (3.16) respectively, and consider the space `1(B) as the space of trajectories of these measures. To prove the convergence in distribution in `1 (B) one needs the convergence of the nite-dimensional distributions and the asymptotic equicontinuity property, studied in the empirical process context, e.g., in Pollard (1990), see Theorem 10.2, and Sheehy and Wellner (1992). This property follows from Lemma 4.2 below (in combination with Lemmas 3.3{3.5), which in turn follows from appropriate exponential inequalities. The rst lemma of this section provides these inequalities. Consider the process 4. Weak convergence under
(4.1)
(t) =
X
ti t
i (i ? pbi) :
with F0 -measurable coecients i , i = 1; : : : ; n. The process is a martingale and
hi(t) =
X
ti t
i2 pbi (1 ? pbi ) ?(t)=4 with ?(t) =
X
ti t
i2 :
Lemma 4.1. (i) The process fexp( (t) ? 8 ?(t)); 0 t 1g is a supermartingale 2
and IE [exp( (t) ? 82 ?(t)) j F0 ] 1.
15
(ii) We have for z 0 (4.2)
2z2
IP fj(1)j > z j F0 g 2e? ?(1) : Corollary 4.1. For z 0
8 9 ! < = 2 1=2 n IP :jMn (B ) ? Mn (C )j > z n n j F0; 2 exp(?2z2 =Pn (B C )) ; 1 2 9 8 ! = < 2 1=2 n j F0 ; 2 exp(?2z2 =n(B C )) ; IP :jun (B ) ? un(C )j > z n n 1 2 8 !1=2 9 < = 2 IP :jwn(B ) ? wn (C )j > z nnn j F0 ; 2 exp(?2z2 =en(B C )) : 1 2
Proof. Take in inequality (4.2), i equal to 11B (X(i) ) ? 11C (X(i) ) multiplied by n?1=2 , (nfn(X(i) ))?1=2 and (((X(i) )))1=2 , respectively. 2 Proof of Lemma 4.1. The proof follows the well-known pattern. We give it here, though brie y, because the references we know about, represent the exponential inequality for a martingale in Bennett's form (see, e.g., Freedman (1975) or Shorack and Wellner (1986, pp. 899-900) rather than in Hoeding's form (4.2). Observe that
IE [e i (i ?pbi) jFi?1 ] = e? i pbi eln(e i pbi+1?pbi ) e
2 i2 8
;
which can be found by expanding the ln, as a function of i , up to the second term and observing that the second derivative is bounded by 1/4. Therefore
IE [e i (i ?pbi)? 8 i2 jFi?1 ] 1 2
which proves (i). Now
IP fj(1)j > zjF0 g 2
2
2
2
= IP fe(1)? 8 ?(1) > ez? 8 ?(1) jF0 g + IP fe?(1)? 8 ?(1) > ez? 8 ?(1) jF0 g 2
2e 8 ?(1)?z : Minimization of this bound in leads to (4.2). 2 The next lemma is the main step towards the asymptotic equicontinuity property of our random measures. For the rest of this section we assume our indexing class to be a Vapnik-Chervonenkis (VC) class (see, e.g., Dudley (1978)). Before we proceed formally let 16
us remind again under what distributions this asymptotic equicontinuity will be obtained. All the three random measures considered are functions of fX(i) gn1 (or of Pn ) and of fi gn1 which is independent of Pn and has distribution described in Section 3 { see (3.2) and (3.3). We consider the distributions of Mn ; un and wn induced in `1(B) by the distribution of fi gn1 with Pn xed and call them conditional distributions given Pn , or given F0 . Observe that with this construction there is no need to care about possible non-measurability of Pn as a random element in `1 (B) nor to require \enough measurability" of B. So, in most of the cases B will be required to be just a VC class. There are two further reasons, speci c for the two-sample problem, to use indexing classes no wider than a VC class. The rst is, that though we have to study weak convergence under a fairly simple sequence of distributions there are several dierent distances induced by dierent distributions on [0; 1]m which occur in the inequalities of the above corollary. We would need to make assumptions on covering numbers N (; Q) of B in each of these distances, that is, for Q being Pn , n or en , n 2 IN , which would be, from applicational point of view, inconvenient. However, for VC classes we have a uniform-in-Q bound for ln N (; Q) { see Dudley (1978), Lemma 7.13, or van der Vaart and Wellner (1996), p. 86, or the proof of Lemma 4.2. below { and this makes any VC class an appropriate indexing class for each of Mn ; un and wn . The second reason is this: though Mn is not the process of eventual interest for the two-sample problem since it is not asymptotically distribution free, we want it to have a limit in distribution for each P 2 P . Therefore the indexing class B should be P -pregaussian for each P 2 P (see, e.g., Sheehy and Wellner (1992). However, if the class is pregaussian for all P then it must be a VC class (Dudley, 1984, Theorem 11.4.1). Though our P is more narrow than the class of all distributions (on [0; 1]m ), still it seems wide enough to motivate the choice of B being a VC class. Let us formulate now the next lemma. For a nite (non-negative) measure Q on B0 and some subclass B B0 , let B0("; Q) = f(A; B ) 2 B B : Q(A4B ) "g. Call fMn gn1 conditionally asymptotically equicontinuous, uniformly over the discrete distributions, (CAECud ) if for any > 0 (4.3)
lim limsup sup IP f "#0 n!1
Pn
sup
(A;B )2B0 (";Pn )
jMn (A) ? Mn(B )j > jF0 g = 0 ;
where Pn runs over all discrete distributions on [0; 1]m , concentrated on at most n points. Call fungn1 and fwngn1 CAECud if for these sequences a property similar to (4.3) holds with B0("; Pn ) replaced by B0("; n ) and B0("; en), respectively. See Sheehy and Wellner (1992). Lemma 4.2. Let B B0 be a VC class. Then under the null hypothesis P1 = P2 , all the three sequences fMn gn1 , fun gn1 and fwn gn1 are CAECud . Proof. As above, let N ("; Q) denote the covering number of the class B in the pseudometric d(A; B ) = Q(A4B ) and let denote the index of the VC class B. Then for all Q and some constant K depending on , and depending on Q only through Q([0; 1]m )
N ("; Q) K
1 ?1 "
; 0 jF0 g
IP f (A;B)2B sup0 j Mn(A0 ) ? Mn (B0)j > 2 jF0 g (" ;P ) 0 n
+ 2IP f sup j Mn (B0 ) ? Mn (B )j > 4 jF0 g : B2B
Using Corollary 4.1 the rst term on the right in (4.4) can be bounded from above by 2 2N 2 ("0 ; Pn ) exp ?24 nn1 n2 2 31"
!
0
2 = 2 exp 2 ln N ("0 ; Pn ) ? 6" nn1 n2 2 0
! !
2 2 exp 2 ln N ("0 ; Pn) ? 12 " p0 (1 ? p0) ; 0
where for the last inequality n is taken large enough. Now taking "0 small enough, this last expression is not larger than
!
2 2 exp 2 ln "0 ? 12 " p0 (1 ? p0 ) < 2 : 0 1
Now consider the probability in the second term on the right in (4.4). We have for small enough "0 and large enough n
IP f Bsup jMn(B0 ) ? Mn(B )j > 4 jF0 g 2B 1 ( X
"j ln(1="j+1 ) 1=2 ) IP Bsup jMn(Bj ) ? Mn(Bj+1)j > 2 p (1 ? p ) jF0 2B 0 0 j =0 ( "j ln(1="j+1 ) 1=2 ) 1 X 2 jF0 N ("j+1; Pn ) Bsup IP jMn (Bj ) ? Mn(Bj+1 )j > 2 p (1 ? p ) 2B 0 0 j =0 0 n1 n2 1 1 ln(1 =" ) ? 2 4 " j j +1 X n C 2 N 2("j+1 ; Pn) exp B@ A 2p0 (1 ? po )"j j =0
18
2 2
1 X j =0
1 X j =0
exp(2 ln N ("j +1 ; Pn ) ? 3 ln(1="j +1 )) exp(? ln(1="j +1 )) = 2
1 X j =0
"j+1 < "0 < 4 :
Hence, since the `n large enough' requirements do not depend on Pn , limsup sup IP f n!1
Pn
sup
(A;B )2B0 ("0 ;Pn )
jMn(A) ? Mn (B )j > jF0 g ;
which gives (4.3). 2 We are now prepared to formulate the statement on weak convergence of Mn ; un and wn under the null hypothesis. What mainly remains to prove is the nite-dimensional convergence in distribution, which we will obtain via the martingale central limit theorem. Let D(P ) ` !n ' denote convergence in distribution under the sequence of conditional distributions, given Pn . Theorem 4.1. Let n1 =n ! p0 2 (0; 1). For any VC class B B0 and any P 2 P we have for a.a. sequences fPn gn1 (Pn ) WP (n ! 1) Mn D!
in the space `1 (B). If condition (C4) holds, then also (Pn ) W (n ! 1) un D!
in `1(B); and, if condition (3.17) of Lemma 3.5 holds for C = a(B), then also (Pn ) W (n ! 1) wn D!
in `1 (B). Moreover, since B is a VC class the limiting processes WP and W are bounded and uniformly continuous w.r.t. Q(4), for Q being P and , respectively. Proof. First note that Lemma 4.2 in conjunction with Lemma 3.3 yields that for any > 0 and a.a. sequences fPn gn1
(4.5)
lim limsup IP f "#0 n!1
sup
(A;B )2B0 (";P )
jMn (A) ? Mn(B )j > jF0 g = 0 : 19
Similarly Lemmas 4.2 and 3.4 lead to (4.5) with P replaced by and Mn by un ; also Lemmas 4.2 and 3.5 give (4.5) with P replaced by and Mn by wn (note that the conditions of Lemma 3.5 also hold for C = a0 (B)). This settles the proper asymptotic equicontinuity for the three processes. Now we turn to the convergence of the nite dimensional distributions. For a sequence X D of martingales n(t) = in (i ? pbi), the convergence n ! in D[0; 1] to a Brownian ti t motion with covariance F (s ^ t) is equivalent to the conditions: IP
1) hn i(t) ! F (t) for all t 2 [0; 1], and 2)
n X i=1
IP
in2 11f in2 > "gbpi (1 ? pbi ) ! 0 for all " > 0 (the Lindeberg condition),
see, e.g., Liptser and Shiryayev (1981) or Shorack and Wellner (1986, pp. 894 and 895). To obtain the nite-dimensional convergence for Mn (B1 ; ); ; Mn (Bk ; ), according the CramerWold device, it is sucient to prove the convergence for all linear combinations 1 Mn (B1 ; )+ + k Mn(Bk ; ), which leads to the choice
n 1=2 X k j 11Bj (X(i) ) ;
in = n n 1 2
j =1
but since 11B 11C = 11B\C the choice of just one indicator is sucient. Now according to Lemma 3.3 condition 1) is satis ed (even almost surely) with F (t) = EWP2 (B \ At ) = P (B \ At ), whereas condition 2) is trivially satis ed. For un , according to Lemma 3.4, condition 1) holds also a.s., with F (t) = EW 2 (B \ At ) = (B \ At ) and condition 2) as well (see (3.19)), while for wn , again, a.s.-versions of both conditions follow from Lemma 3.5 with the same F as for un . This completes the proof of the nite-dimensional convergence of Mn ; un and wn and hence of the theorem. 2 5. Weak convergence under alternatives: properties ( ) and ( ). According to Theorem 4.1 the processes un and wn have property () (see Section 2). In this section we need to study if they also have the properties ( ) or ( ); that is, we will study the weak convergence of un and wn , as well as Mn , under alternatives (C3). Since these are contiguous alternatives the asymptotic equicontinuity follows and we need only to study the nite-dimensional convergence of these processes. The usual way to do this is to study the joint weak convergence of each of our processes with the logarithm of the likelihood ratio
Ln =
n X i=1
dP2 (X ) 1 ( X ) + (1 ? ) ln i ln dP i dP (i) dP (i)
and then to apply LeCam's Third Lemma, see e.g., Shorack and Wellner (1986, p. 156). It is well known that (5.1)
r
n X
Ln = n nn (i ? nn1 )h(X(i) ) + zn 1 2 i=1 20
and there exists a constant c such that IP
zn ! c under the distribution P n . It would be, however, more consistent with the presentation in Section 4 if we consider, instead of Ln , the logarithm of the likelihood ratio of the conditional distributions, given F0 , which is
L0n = Ln ? ln IE [eLn jF0 ] and to study the joint convergence of our processes and L0n under a.a. sequences of the conditional hypothetical distribution IP fjF0 g. As it is shown in Urinov (1992), for L0n it is again true that
r
n X
L0n = n nn (i ? nn1 )h(X(i) ) + zn0 1 2 i=1 IP
with the remainder term zn0 such that zn0 ! c, still under P n . However, it is not true, as far as we understand it, that condition (C3) implies convergence of zn0 to a constant in IP fjF0 g for a.a. sequences fPngn1 . Hence, though it is possible to show the convergence in this sense to the appropriate limits if L0n is replaced by its leading rst term (see the proof of Theorem 5.1 below), the eventual statement of convergence is true under the unconditional distributions P n andZ P1n1 P2n2 only. Write H (t) = hdP and let Act
g(x) = h(x) ? PH((At(cx))) t(x)
where t(x) is de ned as in (3.1). Remark that the linear operator that maps h into g is norm preserving (though not one-to-one since it annihilates constant functions): (5.2)
Z
Z
Z
Z
g2 dP = (h ? hdP )2 dP = h2 dP (= jjhjj2 ) :
Now denote with Z a N (0; jjhjj2 ) random variable (cf. (2.2)) such that (WP ; Z ) is jointly Gaussian, that is, for any nite collection of B1 ; ; Bk 2 B the vector (WP (B1 ); ; WP (Bk ); Z ) is Gaussian, and let
Z
Cov(WP (B ); Z ) = gdP : B
21
Similarly, let (W; Z ) be jointly Gaussian with
Z
Cov(W (B ); Z ) = gf 1=2 d : B
De
D
Let ` ! ' and ` ! ' denote convergence in distribution under P n and P1n1 P2n2 , respectively. Theorem 5.1. If the class B B0 is such that Mn
in `1(B), then
(5.3)
D
D
! WP and/or un ! W (n ! 1)
Z
De
Mn ! WP + gdP (n ! 1)
and/or
(5.4)
De
Z
un ! W + gf 1=2 d (n ! 1)
in `1(B). The proof of this theorem is deferred to the second half of this section, but to explain the nature of the function g already here, let us remark that the leading term of Ln and L0n has the following explicit representation (see (4.1)):
(5.5) where
n X i=1
n X n 1 i ? n h(X(i) ) = (i ? pbi )gn (X(i) ) ; i=1 Z
hdPn
Act(x)
gn(x) = h(x) ? P (Ac ) n t(x) has the same form as the function g only with P replaced by the empirical distribution Pn . The equality (5.5) can be derived from (3.7) or veri ed directly. Now let us consider whether it follows from this theorem that un has property ( ). We say B generates B0 if the smallest -algebra containing B is B0 . Let Qun and Qe un denote the distributions of un underZ P n and P1n1 P2n2 respectively, and let Q and Qe denote the distributions of W and W + gf 1=2 d respectively.
22
Theorem 5.2. If the indexing class B generates B0 , then for each sequence of alternatives satisfying (C3)
d(Qun ; Qe un ) ! d(Q; Qe ) = (n ! 1) : Hence, Theorems 4.1 and 5.2 show, that if B is a VC class generating B0 and (C4) holds, then un is a strong P -solution of the two-sample problem. Remark that the process Mn also possesses property ( ). It only lacks property (). Let us now consider wn . To nd out what is the limiting covariance between wn (B ) and Ln we need to study the limit of the expression n q 1X 1 1 ( X B (i) )gn (X(i) ) n((X(i) ))pbi (1 ? pbi ) n i=1
where the multipliers pbi (1 ? pbi ) are not essential from convergence point of view. On the unit interval, i.e. m = 1, it can be proved that (5.6)
Z Z n q IP 1X ? 1=2 dP = k gf 1=2 d ; n (( X )) ! k gf 1 1 ( X ) g ( X ) i B i i n i=1
B
B
p
with k = 34 2 , using, e.g., the general method presented in Borovikov (1987). It follows, heuristically speaking, from the fact that the n((Xi )) behave `almost' as independent random variables each with a Gamma(2) distribution with scale parameter 2f (Xi ), and so k stands for the moment of order 21 of a Gamma(2) distribution with scale parameter 2. However, in the unit cube, [0; 1]m , we will need to keep (5.6) for some k < 1 as an assumption. Let (W 0 ; Z 0 ) be again jointly Gaussian with the same marginal distributions as that of W and Z , but with dierent covariance: Cov(W 0 (B ); Z 0 ) = k
then
(5.7)
Z B
gf 1=2 d :
Theorem 5.3. If the class B B0 is such that wn De
Z
wn ! W + k gf 1=2 d
and if B generates B0
(5.8)
1 e d(Qwn ; Qwn ) ! 2 2 kjjhjj ? 1 : 23
D
! W in `1(B) and if (5.6) is true,
From (5.8) it follows that under the conditions of Theorem 5.3 the process wn certainly possesses property ( ) although not property ( ) because ( 21 kjjhjj) < ( 21 jjhjj). So, wn is a weak P -solution of the two-sample problem. Finnaly, we present the postponed proofs of Theorems 5.1 and 5.2. The proof of Theorem 5.3 is much the same and will therefore be ommited. Proof of Theorem 5.1. Since the sequence of alternative distributions fP1n1 n 2 P2 gn1 is contiguous to the sequence fP n gn1 , the CAECud property of Mn and/or un will be true under the alternative distributions as well. Hence (5.3) and (5.4) will follow if we show the convergence of the nite dimensional distributions of Mn and/or un to the proper limits. Let us focus on un { the proof for Mn is similar and simpler. The convergence
Z De fun(Bj )gkj=1 ! fW (Bj ) + gf 1=2 dgkj=1 Bj
will follow from the Cramer-Wold device, the convergence (5.9)
k X j =1
D
j un (Bj ) + Zn !
k X j =1
j W (Bj ) + Z ;
where fj gkj=1 and are any constants and
r
n X
Zn = n nn h(X(i) )(i ? nn1 ) ; 1 2 i=1 and from LeCam's Third Lemma. To see that (5.9) is true observe that, given Pn , the left hand side is the value of a martingale in t
2 r n X t 4 n1 n2 i=1
3
k X 1 j 11Bj (X(i) ) + gn(X(i) )5 (i ? pbi ) fn1=2 (X(i) ) j=1
(cf. (4.1)) at the last point t = n. Hence if we verify that
2
n 1X 4
(5.10)
n i=1
32
k X 1 n2 pb (1 ? pb ) 5 1 1 ( X ) + g ( X ) j Bj (i) n (i) i n1 n2 i fn1=2 (X(i) ) j=1
32 Z 2 1 X k 4 1=2 j 11Bj + g5 dP a:s: (n ! 1) ! f [0;1]m
j =1
D(Pn )
D
for a.a. fPn gn1 , then actually ` ! a.s.' will be proved and hence ` ! ' as well. However, (5.10) will follow from the SLLN if we show that the functions fn and gn can be replaced by f and g respectively and use the truncation applied in the proof of Lemma 3.4 (see (3.18) and 24
the subsequent proof). We have
Z
Z
Act
Act
sup j hdPn ? hdP j ! 0 and sup jPn (Act ) ? P (Act )j ! 0 a.s. 0t1 0t1 and hence sup jgn (x) ? g(x)j ! 0 a:s: (n ! 1) ;
x2A1?"
while on Ac1?" we have, according to (5.5), n n X n X 2 2 (X )11 c (X )pb (1 ? pb ) 1 c g i n1 n2 i=1 n (i) A1?" (i) i n ? 1 i=1 h (X(i) )11A1?" (X(i) )
!
Z
Ac1?"
h2 dP < a.s.
The proof of
Z
j 11=2 ? f 11=2 j2dP ! 0 a:s: fn
is similar to the one used in Lemma 3.4 and is omitted here. 2 Proof of Theorem 5.2. If B generates B0 , then Ln is a linear functional of un and, hence, the following two distances in variation are equal:
d(P1n1 P2n2 ; P n ) = d(Qe un ; Qun ) : Hence
d(Qe un ; Qun ) ! 2( 21 jjhjj) ? 1 : AgainZ since B generates B0 , the distance in variation between the distributions of W and W + gf 1=2 d on B coincides with the one on the whole B0 . Therefore the log-likelihood
statistics of these two Gaussian processes on B and B0 coincide and are equal to
Z
gf 1=2 dW
?
1 2
Z
g2 dP
which, because of (5.2), is N (? 12 jjhjj2 ; jjhjj2 ) under the distribution of W and N ( 21 jjhjj2 ; jjhjj2 ) under the distribution of the shifted W . The distance in variation between these two normal 2 distributions is, obviously, = 2( 21 jjhjj) ? 1. 25
REFERENCES Bickel, P. J. (1969). A distribution free version of the Smirnov two sample test in the p-variate
case. Ann. Math. Statist. 40 1-23. Bickel, P. J. and Breiman, L. (1983). Sums of functions of nearest neighbor distances, moment bounds, limit theorems and a goodness of t test. Ann. Probab. 11 185-214. Borovikov, V. P. (1987). Limit theorems for statistics that are partial sums of function of spacings. Theory Probab. Appl. 32 86-97. Bremaud, P. (1981). Point Processes and Queues. Martingale Dynamics. Springer, New York. Cabana, A. and Cabana, E. M. (1994). Goodness of t and comparison tests of the KolmogorovSmirnov type for bivariate populations. Ann. Statist. 22 1447-1459. Dudley, R. M. (1978). Central limit theorems for empirical measures. Ann. Probab. 6 899-929. Dudley, R. M. (1984). A course on empirical processes. E cole d'ete de probabilite de Saint-Flour XII-1982. Lecture Notes in Math. 1097 1-142. Springer, New York. Fisher, R. A. (1936). Statistical Methods for Research Workers. Edinburgh. Freedman, D. A. (1975). On tail probabilities for martingales. Ann. Probab. 3 100-118. Friedman, J. H. and Rafsky, L. C. (1979). Multivariate generalisations of the Wald-Wolfovitz and Smirnov two-sample tests. Ann. Statist. 7 697-717. Gaenssler, P. (1983). Empirical Processes. IMS, Hayward, Calif. Henze, N. (1988). A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann. Statist. 16 772-783. Jacod, J. and Shiryayev, A. N. (1987). Limit Theorems for Stochastic Processes. Springer, New York. Karr, A. F. (1991). Point Processes and their Statistical Inference. Marcel Dekker, New York. Khmaladze, E. V. (1988). An innovation approach to goodness-of- t tests in IRm . Ann. Statist. 16 1503-1516. Khmaladze E. V. (1993). Goodness of t problem and scanning innovation martingales. Ann. Statist. 21 798-829. Kim, K.-K. and Foutz, R. V. (1987). Tests for the multivariate two-sample problem based on empirical probability measures. Canad. Statist. 15 41-51. Liptser, R. Sh. and Shiryayev, A. N. (1981). On necessary and sucient conditions in the functional central limit theorem for semimartingales. Thepry Probab. Appl. 16 130-135. Oosterhoff, J. and van Zvet, W. R. (1979). A note on contiguity and Hellinger distance. In Contribution to Statistics. (J. Jureckova, ed.) 157-166. Reidel, Dordrecht. Pollard, D. (1990). Empirical Processes: Theory and Applications. IMS. Hayward, Calif. Puri, M. L. and Sen, P. K. (1971). Nonparametric Methods in Multivariate Analysis. Wiley, New York.
26
Scott, D. W. (1992). Multivariate Density Estimation. Theory Practice and Visualisation. Wiley,
New York.
Sheehy, A. and Wellner, J. A. (1992). Uniform Donsker classes of functions. Ann. Probab. 20
1983-2030.
Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Applications to Statistics.
Wiley, New York.
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall,
London.
Smirnov, N. V. (1939). Estimate of deviation between empirical distribution functions in two
independent samples. (in Russian). Bull. Moscow Univ. 2 3-16. Urinov, I. K. (1992). Test of homogeneity under small grouping. Martingale limit theorems. Theory Probab. Appl. 37 658-672. van der Vaart, A. W. and Wellner, J. A. (1996) Weak Convergence and Empirical Processes With Applications to Statistics. Springer, New York. Wald, A. and Wolfovitz, J. (1944). Statistical tests based on permutations of the observations. Ann. Math. Statist. 15 358-372. Department of Mathematics and Computing Science Eindhoven University of Technology P.O. Box 513 5600 MB Eindhoven The Netherlands
School of Mathematics Department of Statistics The University of NSW Sydney 2052 Australia
27