Teor i movr. ta matem. statist. Vip. 77, 2007, stor. 145–153
Teor. ˘Imovir. ta Matem. Statyst. No. 77, 2007, pp. 145–153
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES UDC 519.21
ALESSIO FARCOMENI Abstract. We discuss some finite sample properties of vectors of negatively dependent random variables. We extend some inequalities widely used for independent random variables to the case of negatively dependent random variables, and some basic tools like the symmetrization lemma.
A sequence of random variable is said to be negatively (positively) dependent if n P {Xi ≤ zi } ≤ (≥) P(Xi ≤ zi ) i=1
and
P
n
{Xi > zi }
≤ (≥)
P(Xi > zi ),
i=1
for zi ∈ R, i = 1, . . . , n. Negative dependence is implied for instance by negative association. A sequence of random variables is said to be negatively (positively) associated if for all monotonically coordinate-wise non-decreasing functions g1 and g2 , Cov[g1 (X1 , . . . , Xn ), g2 (X1 , . . . , Xn )] ≤ (≥) 0, when it exists. Positive association was introduced in [3], and negative association in [4]. Negative association implies negative dependence. Moreover, it is straightforward to prove that under either condition E[ Xi ] ≤ (≥) E[Xi ]. Any subset of a set of negatively associated or negatively dependent random variables is still negatively associated or negatively dependent; and any non decreasing function of negatively (positively) associated random variables is still negatively (positively) associated. Some further basic properties are given for instance in [7] and [2] make a review of concepts of negative dependence. We provide now a brief list of the most important cases of negatively associated random variables: multivariate normal random variables with non positive (non negative) correlations are negatively (positively) associated. Independent random variables are both positively and negatively associated. Multinomial, multivariate hypergeometric, Dirichlet random variables are always negatively associated. For other examples, refer for instance to [4]. In this paper we show some properties of negatively dependent random variables. The main goal is to extend tools and inequalities widely used for independent random variables. In Section 1 we provide the symmetrization lemma under arbitrary dependence. In Section 2 we provide exponential tail inequalities and a symmetrization argument for negatively dependent random variables. First, the well known Hoeffding inequality 2000 Mathematics Subject Classification. Primary 60E15, 47N30. Key words and phrases. negative dependence, association, Hoeffding inequality, exponential tail inequality, bounded difference inequality, empirical distribution, symmetrization lemma. 145
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES146
is proved for negatively dependent random variables, then we provide extension of the bounded difference inequality. Finally, we show a symmetrization argument. 1. Tools We begin by proving an extension of the symmetrization lemma under arbitrary dependence. Definition 1 (Separability). Let (Y (u), u ∈ U) be a family of random variables on a probability space (Ω, F , P). The family is called separable if there exists a countable set U0 ⊆ U and a set E ∈ F such that (1) P(E) = 1, (2) for any ω ∈ E and for any u ∈ U there exists a sequence (uj , j ≥ 1) in U0 such that Y (uj , w) → Y (u, w) for j → ∞. Lemma 1 (Symmetrization Lemma). Let (Y (u), u ∈ U) be a family of separable random variables, and (Y (u), u ∈ U) an independent copy of (Y (u), u ∈ U) with the same joint distribution for any u1 , . . . , un (that is, with the same dependency structure). Let P(|Y (u)| > ε/2) ≤ 1/2 for any u ∈ U. Then P sup |Y (u)| > ε ≤ 2 P sup |Y (u) − Y (u)| > ε/2 u
u
for any ε > 0. Proof. If (Y (u), u ∈ U) is separable, then also (Y (u), u ∈ U) is. Moreover, there exists a countable set U0 ⊆ U such that supu∈U |Y (u)| = supu∈U0 |Y (u)|. Let ui be the i-th element of U0 . Let A1 = {|Y (u1 )| > ε}, and
Ai = Y (u1 )| ≤ ε, . . . , |Y (ui−1 )| ≤ ε, |Y (ui )| > ε for i ≥ 2. Note that if |Y (ui )| > ε and |Y (ui )| ≤ ε/2 then |Y (ui ) − Y (ui )| > ε/2. We have 1 1 P sup |Y (u)| > ε = P(Ai ) 2 2 u∈U i∈U0 ≤ P(Ai ) P(|Y (ui )| ≤ ε/2) = P(Ai , |Y (ui )| ≤ ε/2) ≤ P(Ai , |Y (ui ) − Y (ui )| > ε/2) ≤ P Ai , sup |Y (u) − Y (u)| > ε/2 u∈U0 ≤ P sup |Y (u) − Y (u)| > ε/2 . u∈U
2. Exponential tail inequalities 2.1. Hoeffding inequality. The key part in proving Hoeffding inequality for negatively dependent random variables is the thesis of Lemma 2. Note that it can hold also under different assumptions. It is straightforward to see, for instance, that Lemma 2 is true for a vector of binary random variables whenever the covariance between any two of them is non positive. , . . . Xn is a vector of negatively associated random variables. Lemma 2. Suppose X1 Then E(exp{t Xi }) ≤ E(exp{tXi }) for any t > 0.
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES147
Proof. This is a straightforward generalization of Lemma 1 in [6], stemming from the fact that if X1 , . . . , Xn is a vector of negatively dependent random variables, than etX1 , . . . , etXn is also negatively dependent. Theorem 1 (Hoeffding inequality). Let X1 , . . . , Xn be a sequence of negatively dependent random variables. Let P(ai < Xi < bi ) = 1, E(Xi ) = 0. Let Sn = i (bi − ai )2 /8. Let ε > 0. Then, for any t > 0,
2 (1) P Xi ≥ ε ≤ e−tε+t Sn . Proof. By Markov theorem and by Lemma 2,
Xi ≥ tε P Xi ≥ ε = P t
= P et Xi ≥ etε
≤ e−tε E et Xi E etXi . ≤ e−tε
The key difference between this proof and the one for independent random variables is at the third step, where we replace equality with an inequality sign. The rest of the proof is analogue to the proof of Hoeffding inequality for independent random variables. It is interesting to note that defining Sn = i E[Xi2 ] [1] show an inequality (eq. 2.8) analogous to (1), with t close enough to zero, for unbounded negatively dependent random variables, only putting some additional assumptions on the higher order moments. 2.2. Bounded difference inequality. A generalization of Hoeffding inequality is given by the Bounded Difference inequality, often used to convert bounds for the expected value to exponential tail inequalities. The Bounded Difference inequality for independent random variables was first derived in [5]. Theorem 2 (Bounded difference inequality). Suppose g(·) satisfies the bounded difference assumption g(x1 , . . . , xn ) − g(x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) ≤ ci , (2) sup x1 ,...,xn ;xi ∈A
for 1 ≤ i ≤ n, and any set A. Suppose X1 , . . . , Xn is a vector of negatively dependent random variables. Then, for all t > 0, n
2 2 P |g(X1 , . . . , Xn ) − E[g(X1 , . . . , Xn )]| ≥ t ≤ 2 exp −2t ci . i=1
Proof. We will prove that
n
2 2 P g(X1 , . . . , Xn ) − E[g(X1 , . . . , Xn )] ≥ t ≤ exp −2t ci . i=1
Similarly it can be proved that
n
2 2 P E[g(X1 , . . . , Xn )] − g(X1 , . . . , Xn ) ≥ t ≤ exp −2t ci . i=1
Combination of these two results yields the thesis.
(3)
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES148
The hypothesis of negative dependence is needed by the application of Theorem 1 in this straightforward extension: let V and Z be such that E[V |Z] = 0, and for some h(·) and c > 0, h(Z) ≤ V ≤ h(Z) + c. Then, for all s > 0, 2 2 E esV Z ≤ es c /8 . (4) Denote now V = g(X1 , . . . , Xn ) − E[g(X1 , . . . , Xn )] and Hi (X1 , . . . , Xi ) = E[g(X1 , . . . , Xn )|X1 , . . . , Xi ] and for any i define Vi = Hi (X1 , . . . , Xi ) − Hi−1 (X1 , . . . , Xi−1 ). Let Fi (x) = P(Xi < x | X1 , . . . , Xi−1 ). Clearly, V = Vi and Hi−1 (X1 , . . . , Xi−1 ) = Hi (X1 , . . . , Xi−1 , x) Fi (dx).
(5)
Define moreover Wi = sup H(X1 , . . . , Xi−1 , u) − Hi−1 (X1 , . . . , Xi−1 ) u
and Zi = inf H(X1 , . . . , Xi−1 , u) − Hi−1 (X1 , . . . , Xi−1 ). u
Clearly, P(Zi ≤ Vi ≤ Wi ) = 1 and Wi − Zi = sup sup H(X1 , . . . , Xi−1 , u) − H(X1 , . . . , Xi−1 , v) ≤ ci u
v
by the bounded difference assumption. Therefore, by (4), for any i 2 2 E esVi X1 , . . . , Xi−1 ≤ es ci /8 .
(6)
(7)
Finally, by Chernoff bound, for any s > 0
E [exp {s ni=1 Vi }] P g(X1 , . . . , Xn ) − E[g(X1 , . . . , Xn )] ≥ t ≤ est n−1 sVn X1 , . . . , Xn−1 E exp s i=1 Vi E e = st e n−1 E exp s i=1 Vi 2 2 ≤ es cn /8 st e −st 2 ≤e exp s c2i /8 , by repeating the same argument n times. Choosing s = 4t/ c2i yields inequality (3). 2.3. Inequalities for the empirical measure. We now provide some inequalities for the empirical measure of negatively dependent random variables. In what follows we define the empirical measure of a vector of random variables X1 , . . . , Xn as 1 1xi ∈A , μn (A) = n i while the empirical distribution considers a restriction to classes of sets A = (−∞, z] and n will be denoted by Fˆ (z) = n−1 i=1 1Xi ≤z for z ∈ R. We will now restrict to a class of sets such that P(Xi ∈ A, Xj ∈ A) ≤ P(Xi ∈ A) P(Xj ∈ A) for any A in the class, and do not explicitly request for negative dependence. A class of this kind will be for instance the class of sets of the form (−∞, z], for z real, under (pairwise) negative dependence for the random variables.
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES149
Theorem 3. Let X1 , . . . , Xn be a set of identically distributed random variables, and X1 , . . . , Xn be an independent copy of X1 , . . . , Xn . Let μ(A) = P(Xi ∈ A), 1 μn (A) = 1Xi ∈A n and μn (A) = n−1 1Xi ∈A . Suppose there exist A, a class of sets such that P(Xi ∈ Ai , Xj ∈ Aj ) ≤ P(Xi ∈ Ai ) P(Xj ∈ Aj ) for Ai , Aj ∈ A, and that (μn (A), A ∈ A) is separable. We have that for any ε > 0 and n ≥ 2/ε2 P sup |μn (A) − μ(A)| > ε ≤ 2 P sup |μn (A) − μn (A)| > ε/2 . (8) A∈A
Furthermore,
A∈A
E sup |μn (A) − μ(A)| ≤ E sup |μn (A) − μn (A)| . A∈A
(9)
A∈A
Proof. Let X1 , . . . , Xn be an independent copy of X1 , . . . , Xn : a vector of random variables independent from the first one but with the same dependence structure. Let μn (A) = n−1 1Xi ∈A and recall that under (pairwise) negative dependence P(Xi ≤ xi ∩ Xj ≤ xj ) ≤ P(Xi ≤ xi ) P(Xj ≤ xj ).
(10)
Our hypothesis is slightly more general, since we do not assume negative dependence but only that (11) P(Xi ∈ Ai ∩ Xj ∈ Aj ) ≤ P(Xi ∈ Ai ) P(Xj ∈ Aj ). Negative dependence and sets of the form (−∞, z] will suffice for the assumption (11). Main steps of the proof are as follows: it is easy to see that under the assumptions V [μn (A) − μ(A)] ≤
μ(A)(1 − μ(A)) . n
(12)
In fact, we have that E[μn (A) − μ(A)] = 0, which implies V [μn (A) − μ(A)] = E[(μn (A) − μ(A))2 ]. We have
1 2 1Xi ∈A 1Xj ∈A − μ2 (A) E (μn (A) − μ(A)) = E 2 n ij =
1 E[1Xi ∈A 1Xj ∈A ] − μ2 (A) n2 ij
1 P(Xi ∈ A, Xj ∈ A) − μ2 (A) n2 ij n 1 = 2 P(Xi ∈ A, Xj ∈ A) + P(Xi ∈ A) − n2 μ2 (A) n i=1 i=j 1 ≤ 2 P(Xi ∈ A) P(Xj ∈ A) + nμ(A) − n2 μ2 (A) , n
=
i=j
where we used inequality (11) in the last step. Last expression is easily seen to be equal to μ(A)(1 − μ(A))/n, as desired; hence (12) is true.
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES150
We can now apply Chebyshev inequality to random variable (μn (A) − μ(A)), together with inequality (12), to obtain
4 ε ≤ 2 V [(μn (A) − μ(A))] P |μn (A) − μ(A)| > 2 ε 4 μ(A)(1 − μ(A) ≤ 2 ε n 1 2 1 ≤ 2 ≤ for all n ≥ 2 . nε 2 ε We can then apply Lemma 1, since we have separability. Hence, for n ≥ 2/ε2 , P sup |μn (A) − μ(A)| > ε ≤ 2 P sup |μn (A) − μn (A)| > ε/2 . (13) A∈A
A∈A
To see the second inequality, by Jensen inequality and law of iterated expectation we get E sup |μn (A) − μ(A)| ≤ E sup E[|μn (A) − μn (A)||X1 , . . . , Xn ] A∈A
≤ E [sup |μn (A) − μn (A)|] . If we restrict to n = 2, we can provide a further randomization argument: Theorem 4. Let X1 and X2 be negatively dependent random variables, and X1 and X2 be independent copy. Let σ1 and σ2 be independent sign variables, such that 1 P(σi = 1) = P(σi = −1) = . 2 Let μ(A) = P(Xi ∈ A) and μ2 (A) = 12 1xi ∈A . Suppose there exist A, a class of sets such that P(X1 ∈ A, X2 ∈ A) ≤ P(X1 ∈ A) P(X2 ∈ A) for A ∈ A; and that (μ2 (A), A ∈ A) is separable. We have that for any A ∈ A
ε 1 ε ≤P P |μ2 (A) − μn (A)| > σi 1Xi ∈A − 1Xi ∈A > . 2 2 2 To prove Theorem 4 we need a preparatory lemma, whose results may be of interest per se when working with negatively dependent random variables. Lemma 3. Suppose for any A ∈ A that P(X2 ∈ A, X1 ∈ A) ≤ P(X2 ∈ A) P(X1 ∈ A). Then P(X2 ∈ A | X1 ∈ A) ≤ P(X2 ∈ A), while if at least one of the two marginal probabilities is smaller than 1 / A) ≥ P(X2 ∈ A). P(X2 ∈ A | X1 ∈ Moreover, P(X2 ∈ A, X1 ∈ / A) ≥ P(X2 ∈ A) P(X1 ∈ / A), and finally / A, X1 ∈ / A) ≤ P(X2 ∈ / A) P(X1 ∈ / A). P(X2 ∈ Proof. The first inequality follows from the definition of conditional probability. To see the second one we can apply Bayes theorem and the first inequality to get: P(X2 ∈ A | X1 ∈ / A) = P(X1 ∈ / A | X2 ∈ A) P(X2 ∈ A)/ P(X1 ∈ / A) = (1 − P(X1 ∈ A | X2 ∈ A)) P(X2 ∈ A)/ P(X1 ∈ / A) ≥ (1 − P(X1 ∈ A)) P(X2 ∈ A)/ P(X1 ∈ / A) = P(X2 ∈ A),
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES151
assuming without loss of generality that P(X1 ∈ A) < 1. The third inequality follows from / A) = P(X1 ∈ / A | X2 ∈ A) P(X2 ∈ A) P(X2 ∈ A, X1 ∈ = (1 − P(X1 ∈ A | X2 ∈ A)) P(X2 ∈ A) ≥ (1 − P(X1 ∈ A)) P(X2 ∈ A). While the last inequality follows from P(X2 ∈ / A, X1 ∈ / A) = (1 − P(X2 ∈ A | X1 ∈ / A)) P(X1 ∈ / A) ≤ (1 − P(X2 ∈ A)) P(X1 ∈ / A).
We can now prove Theorem 4 Proof. Let now X1 and X2 be an independent copy of X1 and X2 . By Theorem 3 we claim that P(|μ2 (A) − μ(A)| > ε) ≤ 2 P(|μ2 (A) − μn (A)| > ε/2), since that is true if one takes the supremum over A ∈ A inside. Let now σ1 and σ2 be sign variables. We will see that a randomization argument can be used here. A possibility for further work is to try and extend the randomization argument for arbitrary n. We have now that
ε 1 ε ≤P σi 1Xi ∈A − 1Xi ∈A > P |μ2 (A) − μn (A)| > . (14) 2 2 2 If the random variables are independent, equality follows from identical distribution after randomization. Under dependence, identical distribution is not guaranteed any more, as in this case. To see (14), let Yi = 1Xi ∈A − 1Xi ∈A .
We now show that P(Y2 = 1, Y1 = 1) = P(Y2 = −1, Y1 = −1): P(Y2 = 1, Y1 = 1) = P(Y2 = −1, Y1 = −1)
⇐⇒ P(Y2 = 1 | Y1 = 1) = P(Y2 = −1 | Y1 = −1) ⇐⇒ P(X2 ∈ A, X2 ∈ / A | X1 ∈ A, X1 ∈ / A) = P(X2 ∈ / A, X2 ∈ A | X1 ∈ / A, X1 ∈ A) / A | X1 ∈ / A) ⇐⇒ P(X2 ∈ A | X1 ∈ A) P(X2 ∈ = P(X2 ∈ / A | X1 ∈ / A) P(X2 ∈ A | X1 ∈ A), And the last equality is true since we took X1 and X2 to be a copy of X1 and X2 with the same dependency structure. With the same strategy it can be seen that, for k = {−1, 0, 1}, P(Y2 = k, Y1 = k) = P(Y2 = −k, Y1 = −k), P(Y2 = k, Y1 = −k) = P(Y2 = −k, Y1 = k), and finally that P(Y2 = k, Y1 = 0) = P(Y2 = −k, Y1 = 0). On the other hand, P(Y2 = 1, Y1 = 1) ≤ P(Y2 = −1, Y1 = 1). We can in fact apply the results of Lemma 3 to show that: / A, X1 ∈ / A) P(Y2 = 1, Y1 = 1) = P(X2 ∈ A, X2 ∈ A, X1 ∈ = P(X2 ∈ A, X1 ∈ A) P(X2 ∈ / A, X1 ∈ / A) ≤ P(X2 ∈ A) P(X2 ∈ / A) P(X1 ∈ A) P(X1 ∈ / A)
(15)
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES152
and
/ A, X1 ∈ / A, X1 ∈ A) P(Y2 = 1, Y1 = −1) = P(X2 ∈ A, X2 ∈ = P(X2 ∈ A, X1 ∈ / A) P(X2 ∈ / A, X1 ∈ A) ≥ P(X2 ∈ A) P(X2 ∈ / A) P(X1 ∈ / A) P(X1 ∈ A).
By identical distribution of Xi and Xi , (15) follows. With few explicit calculations the results on the joint distribution of (Y1 , Y2 ) can be used to compute the distribution of the empirical measure P(|μ2 (A) − μ2 (A)| = 1) = P(Y1 + Y2 = 2) + P(Y1 + Y2 = −2) = 2 P(Y1 = 1, Y2 = 1). On the other hand,
P σi 1Xi ∈A − 1Xi ∈A = 2 = 2 P(σ1 Y1 = 1, σ2 Y2 = 1) = 2 P(Y1 = 1, Y2 = 1) P(σ1 = σ2 ) + P(σ1 = σ2 ) P(Y1 = 1, Y2 = −1) ≥ 2 P(Y1 = 1, Y2 = 1). Moreover, P(|μ2 (A) − μ2 (A)| = 0) = P(Y1 = 0, Y2 = 0) + 2 P(Y1 = 1, Y2 = −1), while P(|σ1 Y1 + σ2 Y2 | = 0) = P(Y1 = 0, Y2 = 0) + 2 P(σ1 = σ2 ) P(Y1 = 1, Y2 = −1) + 2 P(σ1 = σ2 ) P(Y1 = 1, Y2 = 1) ≤ P(Y1 = 0, Y2 = 0) + 2 P(Y1 = 1, Y2 = −1). The results are combined to see ε
1 ε ≤P σi (1Xi ∈A − 1Xi ∈A ) > , P |μ2 (A) − μn (A)| > 2 2 2
that is, inequality (14). 3. Discussion
We proved that negatively dependent random variables enjoy certain special properties of independent random variables, in particular in terms of Hoeffding and bounded difference inequalities and of the possibility to apply symmetrization. These tools pave the road to inequalities for the empirical distribution of negatively dependent random variables. Ackwnoledgements. The author is grateful to Prof. Enzo Orsingher for advice and encouragement, and to a referee for clarifying review and pointing out the reference [1]. References 1. M. D. Amini and A. Bozorgnia, Complete convergence for negatively dependent random sequences, Journal of Applied Mathematics and Stochastic Analysis 16 (2003), 121–126. 2. H. W. Block, T. H. Savits, and M. Shaked, Some concepts of negative dependence, The Annals of Probability 10 (1982), 765–772. 3. J. D. Esary, F. Proschan, and D. W. Walkup, Association of random variables, with applications, The Annals of Mathematical Statistics 38 (1967), 1466–1474. 4. J. D. Kumar and F. Proschan, Negative association of random variables with applications, The Annals of Statistics 11 (1983), 286–295. 5. C. McDiarmid, On the method of bounded differences, Surveys in Combinatorics, Cambridge University Press, 1989, pp. 148–188. 6. A. Volodin, On the Kolmogorov exponential inequality for negatively dependent random variables, Pakistan Journal of Statistics 18 (2002), 249–253. 7. Y. L. Tong, Probability Inequalities in Multivariate Distributions, Academic Press, 1980.
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES153
University of Rome “La Sapienza”, Piazzale Aldo Moro, 5,00185 Roma, Italy E-mail address:
[email protected]
Received 10/08/2006