Bernoulli 6(4), 2000, 729±760
An Edgeworth expansion for ®nitepopulation U-statistics È TZ E 2 M I N D AU G A S B L O Z N E L I S 1 and FR I ED RI C H G O 1
Department of Mathematics, Vilnius University, Naugarduko 24, Vilnius 2006, Lithuania. E-mail:
[email protected] 2 FakultaÈ t fuÈr Mathematik, UniversitaÈt Bielefeld, Postfach 100131, 33501 Bielefeld, Germany. E-mail:
[email protected] Suppose that U is a U -statistic of degree 2 based on N random observations drawn without replacement from a ®nite population. For the distribution of a standardized version of U we construct an Edgeworth expansion with remainder O(N ÿ1 ) provided that the linear part of the statistic satis®es a CrameÂr type condition. Keywords: central limit theorem; Edgeworth expansion; ®nite population; U -statistic; sampling without replacement
1. Introduction and results Let A fa1 , . . . , a n g denote a population of size n and let H : A 3 A ! R denote symmetric function of its two arguments. By X 1 , . . . , X N , N < n, we denote random variables with values in A such that X fX 1 , . . . , X N g represents a random sample from A of size N drawn without replacement, i.e. PfX Bg ( nN )ÿ1 for any subset B A of size N . We shall investigate the second-order asymptotics of the distribution of the statistic X H (X i , X j ): U 1 N ÿ1 äÿ3=2 g äÿ3=2 N ÿ1 maxjG9(x)j: x
3
6
By Chebyshev's inequality and the inequality EjË m j m Ej g2 (X 1 , X 2 )j3, PfjË m j > N ÿ1 äÿ3=2 g < ä9=2 N 3 EjË m j3 äÿ3=2 ã3 N ÿ3=2 ln6 N : Finally, using the bound jG9(x)j â4 =q ã2 we obtain Ä Ä1 N ÿ1 äÿ3=2 (â4 =q ã2 ã3 ):
(3:5)
Therefore, in order to prove the theorem it suf®ces to bound Ä1 . Let k be an integer approximately equal to (N m)=2. Put I 0 fm 1, . . . , N g, J 0 Ù n nI 0 , J 1 J 0 [ fm 1, . . . , kg and J 2 J 0 [ fk 1, . . . , N g. Given A, de®ne (random) subpopulations A i fA k , k 2 J i g, i 0, 1, 2, and let A i be random variables uniformly distributed in A i , i 0, 1, 2, indpendent of í. Write v1 (a)
N X
g 2 (a, A j ),
j k1
H N ä=(32q ÿ1 N (È1 È2 ) 1),
v2 (a)
k X
g 2 (a, A j ),
(3:6)
j m1
È i E jv i (Ai )j,
i 1, 2:
Notice that È1 is a function of the random variables A k1 , . . . , A N , and that È2 is a function of A m1 , . . . , A k . Step 2. Split the sample as follows. Put X j A j , for m , j < N. The rest of the sample, X 1 , . . . , X m , is obtained by simple random sampling without replacement from the (random) subpopulation A0 . An application of Prawitz's (1972) smoothing lemma conditionally, given X m1 , . . . , X m , or equivalently, given A m1 , . . . , A N , gives
An Edgeworth expansion for ®nite-population U-statistics F1 (x)
1 ÿ EVP 2
R
efÿxtg
1 H
efÿxtg
1 H
R
737
t K f 1 (t) dt, H ÿt K f 1 (t) dt, H
(3:7) (3:8)
where 2K(s) K 1 (s) iK 2 (s)=(ðs); see, for example, Bentkus et al. (1997). Here K 1 (s) Ifjsj < 1g(1 ÿ jsj)
and
K 2 (s) Ifjsj < 1g(1 ÿ jsj)ðs cot(ðs):
Combining (3.7) and the inversion formula, G(x)
1 i lim VP 2 2ð M!1
j tj< M
^ efÿtxgG(t)
dt , t
(3:9)
we obtain (see, for example, Bentkus et al. 1997) F1 (x) ÿ G(x) < EI 1 EI 2 EI 3 ,
1 ÿ1 t f 1 (t) dt, efÿxtgK 1 I1 H 2 H R
i t dt ^ VP efÿxtgK 2 ( f 1 (t) ÿ G(t)) , I2 2ð H t R
i t ~ dt , VP efÿxtg K 2 ÿ 1 G(t) I3 2ð H t R
(3:10)
where VP means also that one should take lim M!1 if necessary. Combining (3.8) and (3.9), we obtain a bound for G(x) ÿ F1 (ÿx) similar to (3.10). We shall bound F1 (x) ÿ G(x) only. To this end, we prove that 1=2
jEI 1 j jE(I 2 I 3 )j N ÿ1 (â4 =q äÿ1 (äÿ1 q ÿ1 ) äÿ2 q ÿ2 (ã2 ã2 ) ã4 ):
(3:11)
The analogous bound for G(x) ÿ F1 (xÿ) can be derived in the same way. Using these bounds, (3.5) and the inequality ä > r (see Lemma 2.1), we obtain the estimate of the theorem. in the remaining part of the prooof we verify (3.11). Step 3: Estimate for jEI 1 j. We shall replace the random bound H in the integral I 1 by a non-random one and K 1 (t= H) by 1. We have jEI 1 j < jEI 4 j EI 5 , where
t ÿ1 f 1 (t) dt, efÿtxgK 1 Z ft 2 R : jtj < H 1 g, I4 H H Z
t j f 1 (t)j dt < H ÿ1 K1 j f 1 (t)j dt: I 5 H ÿ1 H H1 2,
where Z 1 . . . , Z j are independent and centred ransom variables. We apply this inequality to the sum æ m (A k ) ± cf. (4.15) ± conditionally given A, !3=2 n n X X 3 3 2 j g2 (A k , A l )j pq g2 (A k , A l ) : Ef m1,:::, ng jæ m (A k )j pq l m1
l m1
Finally, using HoÈlder's inequality, we bound the second sum above by !3=2 n n X X 2 g 2 (a, A l ) < (n ÿ m)1=2 j g2 (a, A l )j3 , a 2 A, l m1
l m1
thus arriving at (5.4).
h
Lemma 5.2. For each 0 , d , ð and x, y 2 R, and â(x) de®ned in Section 3.1, we have 2ð 4 1 y2 , where v[d] ( y) 1 pq jâ(x y)j2 < u[d] (x)v[d] ( y), d È(d) and where the function u[d] id de®ned in (4.8). Proof. In the case where jxj > ð d, we have u[d] (x) 1 and the desired inequality follows from the simple bound jâ(x y)j < 1. In the case where jxj , ð d, we apply the mean value theorem to obtain jcos(x y) ÿ cos(x)j < jE sin(x è1 y) yj < (jxj j yj)j yj < cx 2 (cÿ1 1) y 2 :
(5:6)
In the last step we applied the inequality jxyj < cx 2 cÿ1 y 2, with c . 0. Combining (5.6) and the indentity jâ(x y)j2 1 ÿ 2 pq(1 ÿ cos(x y)), we obtain
756
M. Bloznelis and F. GoÈtze jâ(x y)j2 < 1 ÿ 2 pq(1 ÿ cos(x) ÿ cx 2 ÿ (cÿ1 1) y 2 ):
Now invoking (5.15), we obtain jâ(x y)j2 < w1 w2 ,
w1 1 ÿ pq(È(d) ÿ 2c)x 2 ,
w2 (cÿ1 1)2 pqy 2 :
But 1 ÿ pqx 2 È(d) > d=ð, for jxj < ð d. Hence, w1 . d=ð and therefore w1 w2 < h w1 (1 ðd ÿ1 w2 ). Choosing c È(d)=4 completes the proof of the lemma. Lemma 5.3. Assume that â2 1. For every s, t 2 R and 0 , d , ð, we have t2 b1 b21 2 2 s (1 ÿ 2c[d] ), ; 2 : c[d] max E Z (A1 )I [d] (A1 ) > N d d Here Z(a) tg 1 (a) s and I [d] (a) If H 1 j g1 (a)j , dg, for a 2 A. Remark. A similar inequality was used by HoÈglund (1978), where the constatnt (corresponding to c[d] ) was not speci®ed. For our purposes the dependence of c d on the parameters b1 and d is important and thus we include the proof. Proof. Denote K [d] fk : I [d] (a k ) 0g. Clearly, for r . 0, X X r r ÿr 1< j g1 (a k ) H 1 =dj r < nâ r âÿ jK [d] j 3 b1 d : k2K [d]
(5:7)
k2K [d]
Furthermore, since E Z 2 (A1 ) t 2 N ÿ1 s 2 , we have E Z 2 (A1 )2 I [d] (A1 )
t2 s 2 ÿ W nÿ1 , N
W
X
Z 2 (a k ):
(5:8)
k2K [d]
The inequality (a b)2 < 2a2 2b2 implies W < 2W 1 2W 2 , where X t2 2=3 W2 t 2 g 21 (a k ) < n2=3 â3 jK [d] j1=3 : W 1 s 2 jK [d] j, N k2K [d]
In the last step we applied HoÈlder's inequality to obtain 0 12=3 X X g 21 (a k ) < @ j g 31 (a k )jA jK [d] j1=3 : k2K [d]
k2K [d]
Now, (5.7) (with r 2) implies W 1 < s 2 nc d . Furthermore, (5.7) (with r 3) implies W 2 < t 2 N ÿ1 nc d . These inequalities, combined with (5.8), complete the proof. h Lemma 5.4. Assume that â2 1 and that (3.3) holds. For jtj < H 1 and jsj < ðô, the inequalities (4.9) hold true. Proof. Throughout this proof we use the notation introduced in Section 4. Fix B Ù m . By Lemma 5.2,
An Edgeworth expansion for ®nite-population U-statistics ZB
Y
jâ(z k tæ m (A k ))j < ç1 ç2 ,
k2B
ç21
Y
757 Y
ç22
u[1] (z k ),
k2 B
v[1] (tæ m (A k )):
k2B
Using the inequality 1 x < expfxg, we obtain ç22 < expf pqt 2 k B jBj=N g and therefore, (1 ÿ I B )ç2 < g B (t). Finally, combining the inequalities Z B < 1 and Z B < ç1 ç2, we obtain Z B I B Z B (1 ÿ I B ) Z B < I B (1 ÿ I B )ç1 ç2 < I B Ø B , thus proving the ®rst inequality in (4.9). Here the random variables Ø B ç1 g B (t), k B and I B are de®ned in (4.7) and the function g B (t) is given by (4.8). Clearly, Lemma 5.2 (with x z k and y 0) implies the second inequality of (4.9). To prove the third and last one, observe that by HoÈlder's inequality we have E i1 ,:::,i4 Ø B (E i1 ,:::,i4 Ø2B )1=2 and thus, it suf®ces to show E i1 ,:::,i4 Ø2B < F 2B ,
for every i1 , . . . , i4 2 Ù n nB:
(5:9)
To prove (5.9) note that the inequalities jtj < H 1 and jsj < ðô imply u[1] (z k ) < w(A k ),
w(A k ) 1 ÿ
pq È(1)z2k I [1] (A k ), 2
k 2 Ù n,
where we denote I [1] (a) If H 1 j g 1 (a)j , 1g. Therefore, Ø2B g 2B (t)ç21 < g 2B (t)ç,
where ç
Y
w(A k ):
(5:10)
k2 B
Denote D1 fi1 , . . . , i4 g and D2 Ù n nD1 . By Theorem 4 of Hoeffding (1963), j Bj
E i1 ,:::,i4 ç < w ,
where w 1 ÿ
pq È(1)Ã , 2
Ã
1 X 2 z I [1] (A k ): jD2 j k2D k
Below we construct the following lower bound for Ã, ! 2 9 s 1 t2 : Ã > q N 10
(5:12)
Combining (5.11), (5.12) and the inequality 1 x < expfxg expfÿ0:45 pqÈ(1)(t 2 s 2 =q)jBjN ÿ1 g. Now (5.9) follows from (5.10). Let us prove (5.12). We have Ã
n 1 Ez2 I [1] (A1 ) ÿ M, nÿ4 1 nÿ4
(5:11)
2
M
X k2 D1
we
z2k I [1] (A k ):
obtain
ç
v2 È(u), 2 ð1=2 n < s N (1 ÿ s) nÿ N (2ðs(1 ÿ s)n)1=2 < 1, N 2
0 < u < ð, with s
N , n
(5:15) (5:16)
where 1 < N < n.
Acknowledgements The research for this paper was supported by Sonderforschungsbereich 343 in Bielefeld. The authors would like to thank V. Bentkus for discussions and comments.
References Albers, W., Bickel, P.J. and van Zwet, W.R. (1976) Asymptotic expansions for the power of distribution free tests in the one-sample problem. Ann. Statist., 4, 108±156. Babu, G.J. and Singh, K. (1985) Edgeworth expansions for sampling without replacement from ®nite populations. J. Multivariate Anal., 17, 261±278. Bentkus, V., GoÈtze, F. and van Zwet, W.R. (1997) An Edgeworth expansion for symmetric statistics. Ann. Statist., 25, 851±896. Bickel, P.J. (1974) Edgeworth expansions in nonparametric statistics. Ann. Statist., 2, 1±20. Bickel, P.J. and Robinson, J. (1982) Edgeworth expansions and smoothness. Ann. Probab., 10, 500±503.
An Edgeworth expansion for ®nite-population U-statistics
759
Bickel, P.J. and van Zwet, W.R. (1978) Asymptotic expansions for the power of distribution free tests in the two-sample problem. Ann. Statist., 6, 937±1004. Bickel, P.J., GoÈtze, F. and van Zwet, W.R. (1986) The Edgeworth expansion of U -statistics of degree two. Ann. Statist., 14, 1463±1484. Bikelis, A. (1969) On the estimation of the remainder term in the central limit theorem for samples from ®nite populations. Studia Sci. Math. Hungar., 4, 345±354 [in Russian]. Bloznelis, M. and GoÈtze, F. (1997) An Edgeworth expansion for ®nite population U -statistics (1997). Preprint 97-012, SFB 343, UniversitaÈt Bielefeld. Publication available on the Internet at http:// www.mathematik.uni-bielefeld.de/sfb343/Welcome.html. Bolthausen, E. and GoÈtze, F. (1993) The rate of convergence for multivariate sampling statistics. Ann. Statist., 21, 1692±1710. Callaert, H., Janssen, P. and Veraverbeke, N. (1980) An Edgeworth expansion for U-statistics. Ann. Statist., 8, 299±312. Does, R.J.M.M. (1983) An Edgeworth expansion for simple linear rank statistics under the nullhypothesis. Ann. Statist., 11, 607±624. ErdoÂÂs, P. and ReÂnyi, A. (1959) On the central limit theorem for samples from a ®nite population. Publ. Math. Inst. Hungar. Acad. Sci., 4, 49±61. GoÈtze, F. (1979) Asymptotic expansions for bivariate von Mises functionals. Z. Wahrscheinlichkeitstheorie verw. Geb., 50, 333±355. GoÈtze, F and van Zwet, W.R. (1991) Edgeworth expansions for asymptotically linear statistics. Preprint 91-034, SFB 343, UniversitaÈt Bielefeld. Helmers, R. and van Zwet, W.R. (1982) The Berry±Esseen bound for U-statistics. In S.S. Gupta and J.O. Berger (eds), Statistical Decision Theory and Related Topics III, Vol. 1, pp. 497±512. New York: Academic Press. Hoeffding, W. (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58, 13±30. HoÈglund, T. (1978) Sampling from a ®nite population: a remainder term estimate. Scand. J. Statist., 5, 69±71. Kokic, P.N. and Weber, N.C. (1990) An Edgeworth expansion for U-statistics based on samples from ®nite populations. Ann. Probab., 18, 390±404. Kokic, P.N. and Weber, N.C. (1991) Rates of strong convergence for U-statistics in ®nite populations. J. Austral. Math. Soc. Ser. A, 50, 468±480. Nandi, H.K. and Sen, P.K. (1963) On the properties of U -statistics when the observations are not independent II. Unbiased estimation of the parameters of a ®nite population. Calcutta Statist. Assoc. Bull., 12, 124±148. Prawitz, H. (1972) Limits for a distribution, if the characteristic function is given in a ®nite domain. Skand. Aktuar. Tidskr., 5, 138±154. Robinson, J. (1978) An asymptotic expansion for samples from a ®nite population. Ann. Statist., 6, 1005±1011. Ser¯ing, R.J. (1980) Approximation Theorems of Mathematical Statistics. New York: Wiley. Schneller, W. (1989) Edgeworth expansions for linear rank statistics. Ann. Statist., 17, 1103±1123. van Zwet, W.R. (1982) On the Edgeworth expansion for the simple linear rank statistic. In B.V. Gnedenko, M.L. Puri and I. Vincze (eds), Nonparametric Statistical Inference, Budapest 1980, Vol. II, Colloq. Math. Soc. JaÂnos Bolyai 32, pp. 889±909. Amsterdam: NorthHolland. van Zwet, W.R. (1984) A Berry±Esseen bound for symmetric statistics. Z. Wahrscheinlichkeitstheorie verw. Geb., 66, 425±440.
760
M. Bloznelis and F. GoÈtze
Zhao, L.C. and Chen, X.R. (1987) Berry±Esseen bounds for ®nite-population U -statistics. Sci. Sinica Ser. A, 30, 113±127. Zhao, L.C. and Chen, X.R. (1990) Normal approximation for ®nite-population U -statistics. Acta Math. Appl. Sinica, 6, 263±272. Received November 1997 and revised June 1999