February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Stein’s method for normal approximation
Louis H. Y. Chen and Qi-Man Shao Institute for Mathematical Sciences, National University of Singapore 3 Prince George’s Park, Singapore 118402 E-mail:
[email protected] and Department of Statistics and Applied Probability National University of Singapore 6 Science Drive 2, Singapore 117543; Department of Mathematics, University of Oregon Eugene, OR 97403, USA E-mail:
[email protected]
Stein’s method originated in 1972 in a paper in the Proceedings of the Sixth Berkeley Symposium. In that paper, he introduced the method in order to determine the accuracy of the normal approximation to the distribution of a sum of dependent random variables satisfying a mixing condition. Since then, many developments have taken place, both in extending the method beyond normal approximation and in applying the method to problems in other areas. In these lecture notes, we focus on univariate normal approximation, with our main emphasis on his approach exploiting an a priori estimate of the concentration function. We begin with a general explanation of Stein’s method as applied to the normal distribution. We then go on to consider expectations of smooth functions, first for sums of independent and locally dependent random variables, and then in the more general setting of exchangeable pairs. The later sections are devoted to the use of concentration inequalities, in obtaining both uniform and non-uniform Berry–Esseen bounds for independent random variables. A number of applications are also discussed.
1
main-t
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
2
main-t
Louis H. Y. Chen and Qi-Man Shao
Contents 1 Introduction 2 Fundamentals of Stein’s method 2.1 Characterization 2.2 Properties of the solutions 2.3 Construction of the Stein identities 3 Normal approximation for smooth functions 3.1 Independent random variables 3.2 Locally dependent random variables 3.3 Exchangeable pairs 4 Uniform Berry–Esseen bounds: the bounded case 4.1 Independent random variables 4.2 Binary expansion of a random integer 5 Uniform Berry–Esseen bounds: the independent case 5.1 The concentration inequality approach 5.2 Proving the Berry–Esseen theorem 5.3 A lower bound 6 Non-uniform Berry–Esseen bounds: the independent case 6.1 A non-uniform concentration inequality 6.2 The final result 7 Uniform and non-uniform bounds under local dependence 8 Appendix References
2 8 8 10 11 13 14 17 19 23 23 28 31 31 34 35 39 39 44 48 53 58
1. Introduction Stein’s method is a way of deriving explicit estimates of the accuracy of the approximation of one probability distribution by another. This is accomplished by comparing expectations, as indicated in the title of Stein’s (1986) monograph, Approximate computation of expectations. An upper bound is computed for the difference between the expectations of any one of a (large) family of test functions under the two distributions, each family of test functions determining an associated metric. Any such bound in turn implies a corresponding upper bound for the distance between the two distributions, measured with respect to the associated metric. Thus, if the family of test functions consists of the indicators of all (measurable) subsets, then accuracy is expressed in terms of the total variation
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
3
distance dT V between the two distributions: Z Z dT V (P, Q) := sup h dP − h dQ = sup |P (A) − Q(A)|, A
h∈H
where H = {1A ; A measurable}. If the distributions are on IR, and the test functions are the indicators of all half-lines, then accuracy is expressed using Kolmogorov distance, as is customary in Berry–Esseen theorems: Z Z dK (P, Q) := sup h dP − h dQ = sup |P (−∞, z] − Q(−∞, z]|, z∈IR
h∈H
where H = {1(−∞,z] ; z ∈ IR}. If the test functions consist of all uniformly Lipschitz functions h with constant bounded by 1, then the Wasserstein distance results: Z Z dW (P, Q) := sup h dP − h dQ , h∈H where H = {h : IR → IR; kh0 k ≤ 1} =: Lip(1), and where, for a function g : IR → IR, kgk denotes supx∈IR |g(x)|. If the test functions are uniformly bounded and uniformly Lipschitz, bounded Wasserstein distances result: Z Z dBW (k) (P, Q) := sup h dP − h dQ , h∈H where H = {h : IR → IR; khk ≤ 1, kh0 k ≤ k}. Stein’s method applies to all of these distances, and to many more. For normal approximation on IR, Stein began with the observation that IE{f 0 (Z) − Zf (Z)} = 0
(1.1)
for any bounded function f with bounded derivative, if Z has the standard normal distribution N (0, 1). This can be verified by partial integration: Z ∞ 2 1 √ f 0 (x)e−x /2 dx 2π −∞ ∞ Z ∞ 2 2 1 1 = √ f (x)e−x /2 +√ xf (x)e−x /2 dx 2π 2π −∞ −∞ Z ∞ 2 1 = √ xf (x)e−x /2 dx. 2π −∞ However, the same partial integration can also be used to solve the differential equation f 0 (x) − xf (x) = g(x),
2
lim f (x)e−x
x↓−∞
/2
= 0,
(1.2)
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
4
main-t
Louis H. Y. Chen and Qi-Man Shao
for g any bounded function, giving Z y Z y 2 2 g(x)e−x /2 dx = {f 0 (x) − xf (x)}e−x /2 dx −∞ −∞ Z y 2 d {f (x)e−x /2 } dx = −∞ dx = f (y)e−y
2
/2
,
and hence y 2 /2
Z
y
2
g(x)e−x
f (y) = e
/2
dx.
−∞
Note that this f actually satisfies limy↓−∞ f (y) = 0, because Z y √ 2 2 1 e−x /2 dx ∼ (y 2π)−1 e−y /2 Φ(y) := √ 2π −∞ as y ↓ −∞, and that f is bounded by the same argument, has R ∞ (and then, 2 limy↑∞ f (y) = 0) if and only if −∞ g(x)e−x /2 dx = 0, or, equivalently, if IEg(Z) = 0. Hence, taking any bounded function h, we observe that the function fh , defined by Z x 2 2 fh (x) := ex /2 {h(t) − IEh(Z)}e−t /2 dt, (1.3) −∞
satisfies (1.2) for g(x) = h(x)−IEh(Z); substituting any random variable W for x in (1.2) and taking expectations then yields IE{fh0 (W ) − W fh (W )} = IEh(W ) − IEh(Z).
(1.4)
Thus the characterization (1.1) of the standard normal distribution also delivers an upper bound for normal approximation with respect to any of the distances introduced above: for any class H of (bounded) test functions h, sup |IEh(W ) − IEh(Z)| = sup |IE{fh0 (W ) − W fh (W )}| .
h∈H
(1.5)
h∈H
The curious fact is that the supremum on the right hand side of (1.5), which contains only the random variable W , is frequently much simpler to bound than that on the left hand side. The differential characterization (1.1) of the normal distribution is reflected in the fact that the quantity IE{f 0 (W ) − W f (W )} is often relatively easily shown to be small, when the structure of W is such as to make normal approximation seem plausible; for instance, when W is a sum of individually small and only weakly dependent
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
5
random variables. This is illustrated in the following elementary argument for the classical central limit theorem. Suppose that X1 , X2 , . . . , Xn are independent and identically distributed random variables, with mean zero and unit variance, and such Pn that IE|X1 |3 < ∞; let W = n−1/2 j=1 Xj . We evaluate the quantity IE{f 0 (W ) − W f (W )} for a smooth and very well-behaved function f . Since W is a sum of identically distributed components, we have IE{W f (W )} = nIE{n−1/2 X1 f (W )}, Pn and W = n−1/2 X1 +W1 , where W1 = n−1/2 j=2 Xj is independent of X1 . Hence, by Taylor’s expansion, IE{W f (W )} = n1/2 IE{X1 f (W1 + n−1/2 X1 )} = n1/2 IE{X1 (f (W1 ) + n−1/2 X1 f 0 (W1 ))} + η1 , where |η1 | ≤ expansion,
1 −1/2 IE|X1 |3 2n
kf 00 k. On the other hand, again by Taylor’s
IE{f 0 (W ) = IEf 0 (W1 + n−1/2 X1 ) = IEf 0 (W1 ) + η2 , −1/2
(1.6)
00
where |η2 | ≤ n IE|X1 | kf k. But now, since IEX1 = 0 and and since X1 and W1 are independent, it follows that
(1.7) IEX12
= 1,
|IE{f 0 (W ) − W f (W )}| ≤ n−1/2 {1 + 21 IE|X1 |3 }kf 00 k, for any f with bounded second derivative. Hence, if H is any class of test functions and CH := sup kfh00 k, h∈H
then it follows that sup |IEh(W ) − IEh(Z)| ≤ CH n−1/2 {1 + 12 IE|X1 |3 }.
(1.8)
h∈H
The inequality (1.8) thus establishes an accuracy of order O(n−1/2 ) for the normal approximation of W , with respect to the distance dH on probability measures on IR defined by Z Z dH (P, Q) = sup h dP − h dQ , h∈H provided only that CH < ∞. For example, as observed by Erickson (1974), the class of test functions h for the Wasserstein distance dW is just the Lipschitz functions Lip(1) with constant no greater than 1, and, for this class, CH = 2: see (2.13) of Lemma 2.3. The distinction between the bounds
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
6
main-t
Louis H. Y. Chen and Qi-Man Shao
for different distances is then reflected in the differences between the values of CH . Computing good estimates of CH involves studying the properties of the solutions fh to the Stein equation given in (1.3) for h ∈ H. Fortunately, for normal approximation, a lot is known. It sadly turns out that this simple argument is not enough to prove the Berry–Esseen theorem, which states that sup |IP[W ≤ z] − Φ(z)| ≤ Cn−1/2 IE|X1 |3 , z∈IR
for a universal constant C. This is because, for H the set of indicators of halflines, CH = ∞. However, it is possible to refine the argument, by making less crude estimates of |η1 | and |η2 |. Broadly speaking, if h = 1(−∞,z] , and writing fz for the corresponding solution fh , then fz0 is smooth, except for a jump discontinuity at z, as can be seen from (1.2). Hence, taking |η2 | for illustration, |fz0 (W1 + n−1/2 X1 ) − fz0 (W1 )| ≤ n−1/2 |X1 |M (z, fz )(|W1 | + n−1/2 |X1 | + 1),
(1.9)
where M (z, f ) := supx6=z {|f 00 (x)|/(|x| + 1)}, provided that z does not lie between W1 + n−1/2 X1 and W1 . If it does, the difference is still bounded by 2kfz0 k. Hence, since both supz∈IR kfz0 k and supz∈IR M (z, fz ) are finite, explicit bounds of order O(n−1/2 ) can still be deduced for the Berry–Esseen theorem, provided that it can be shown that sup IP(W1 ∈ [z, z + h]) ≤ C(h + n−1/2 IE|X1 |3 )
(1.10)
z∈IR
for some constant C (the corresponding estimates for |η1 | are a little more involved, but the argument can be carried through analogously). Inequalities of this form are precisely the concentration inequalities that play a central part in these notes, and the above discussion illustrates their importance. If the random variables Xi are dependent, these simple arguments are no longer valid. However, they can be adapted to work effectively, if less cleanly, for many weak dependence structures. For instance, if individual summands have little influence on W , as is typical when a normal approximation is expected to be good, then it may be possible to decompose W in the form W = n−1/2 (Xi + Ui ) + Wi , where Wi is ‘almost’ independent of Xi , and |Ui | is not too ‘large’, enabling Taylor expansions analogous to (1.6) and (1.7) to be attempted.
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
7
However, especially if Kolmogorov distance is to be considered, it pays to be rather less brutal, making direct use of (1.4), and rewriting the errors in exact, integral form; this will appear time and again in what follows. Better bounds can be achieved as a result, and concentration arguments also become feasible: note that finding sharp bounds for a probability such as IP(a ≤ Wi ≤ a + n−1/2 (Xi + Ui )) is not an easy task when Wi , Xi and Ui are dependent. In these lecture notes, we follow Stein’s original theme, by presenting his ideas in the context of normal approximation. We shall focus on the approach using concentration inequalities. This approach dates back to Stein’s lectures around 1970 (see Ho & Chen (1978)). It was later extended by Chen (1986) to dependent and non-identically distributed random variables with arbitrary index set. A proof of the Berry–Esseen theorem for independent and non-identically distributed random variables using the concentration inequality approach is given in Section 2 of Chen (1998). The approach has further been shown to be effective in obtaining not only uniform but also non-uniform Berry–Esseen bounds for the accuracy of normal approximation for independent random variables (Chen & Shao, 2001) and for locally dependent random fields as well (Chen & Shao, 2004a). As an extension, a randomized concentration inequality was used to establish uniform and non-uniform Berry–Esseen bounds for non-linear statistics in Chen & Shao (2004b). In view of its current successes, the technique seems well worth further exploration. We also briefly discuss the exchangeable pairs coupling, and use it to obtain a Berry–Esseen bound for the number of 1’s in the binary expansion of a random integer. The use of exchangeable pairs was discussed in Stein’s monograph (Stein, 1986), and is now widely known (see, for example, Diaconis, Holmes & Reinert (2004)). Other proofs of the Berry–Esseen theorem which will not be included in these lectures are the inductive argument of Barbour & Hall (1984), Stroock (1993) and Bolthausen (1984), the size bias coupling used by Goldstein & Rinott (1996) and the zero bias transformation introduced by Goldstein & Reinert (1997). The inductive argument works well for uniform bounds when the conditional distribution of the sum under consideration, given any one of the variables in the sum, has a form similar to that of the corresponding unconditional distribution of the sum. The size biasing method works well for many combinatorial problems, such as counting the number of vertices in a random graph; theoretically speaking, the zero bias transformation approach works for arbitrary random variables W with mean zero and finite variance. More on these approaches
February 23, 2005
8
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Louis H. Y. Chen and Qi-Man Shao
can be found in Sections 4.5, 5.2 and 8 of Chapter 4 of this volume. We also refer to Goldstein & Reinert (1997) for some of the important features of the zero bias transformation and to Goldstein (2005) for Berry–Esseen bounds for combinatorial central limit theorems and pattern occurrences using zero and size biasing approaches. Short surveys of normal approximation by Stein’s method were given in Rinott & Rotar (2000) and in Chen (1998). Stein’s method has also been used to establish bounds on the approximation error in the multivariate central limit theorem, in particular by G¨ otze (1991) and by Rinott & Rotar (1996). A collection of expository lectures on a variety of aspects of Stein’s method and its applications has recently been edited by Diaconis & Holmes (2004). These notes are organized as follows. We begin with a more detailed account of Stein’s method than the sketch provided in this section. In Section 3, we consider the expectation of smooth functions under independence and local dependence, and also in the setting of an exchangeable pair having the linear regression property. Some applications are given. Section 4 discusses sums of uniformly bounded random variables. Here, it is shown that Berry–Esseen bounds for distribution functions can be obtained without the use of concentration inequalities. Both the independent case and the problem of counting the number of ones in the binary expansion of a random integer are studied, where in the latter case the use of an exchangeable pair is demonstrated. In Sections 5 and 6, concentration inequalities are invoked to obtain both uniform and non-uniform Berry–Esseen bounds for sums for independent random variables. The last section is devoted to obtaining a uniform Berry–Esseen bound under local dependence. 2. Fundamentals of Stein’s method In this section, we follow Stein (1986), and give a more detailed account of the basic method sketched in the introduction. 2.1. Characterization Let Z be a standard normally distributed random variable, and let Cbd be the set of continuous and piecewise continuously differentiable functions f : R → R with IE|f 0 (Z)| < ∞. Stein’s method rests on the following characterization, which is a slight strengthening of (1.1).
main-t
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
9
Lemma 2.1: Let W be a real valued random variable. Then W has a standard normal distribution if and only if IEf 0 (W ) = IE{W f (W )},
(2.1)
for all f ∈ Cbd . Proof: Necessity. If W has a standard normal distribution, then for f ∈ Cbd Z ∞ 2 1 0 IEf (W ) = √ f 0 (w)e−w /2 dw 2π −∞ Z 0 Z w 2 1 f 0 (w) (−x)e−x /2 dx dw = √ 2π −∞ −∞ Z ∞ Z ∞ 2 1 f 0 (w) xe−x /2 dx dw. +√ 2π 0 w By Fubini’s theorem, it thus follows that Z 0 Z 0 2 1 IEf 0 (W ) = √ f 0 (w) dw (−x)e−x /2 dx 2π −∞ x Z ∞Z x 2 1 +√ f 0 (w) dw xe−x /2 dx 2π 0 0 Z ∞ 2 1 [f (x) − f (0)]xe−x /2 dx = √ 2π −∞ = IEW f (W ). Sufficiency. For fixed z ∈ IR, let f (w) := fz (w) denote the solution of the equation f 0 (w) − wf (w) = 1(−∞,z] (w) − Φ(z).
(2.2)
2
Multiplying by −e−w /2 on both sides of (2.2) yields 0 2 2 e−w /2 f (w) = −e−w /2 1(−∞,z] (w) − Φ(z) . Thus fz is given by w2 /2
w
Z
−∞ Z ∞ 2 w /2
= −e (√ =
√
2
[1(−∞,z] (x) − Φ(z)]e−x
fz (w) = e
2
[1(−∞,z] (x) − Φ(z)]e−x
w w2 /2
2πe
w2 /2
2πe
/2
dx
/2
dx
Φ(w)[1 − Φ(z)] if w ≤ z, Φ(z)[1 − Φ(w)] if w ≥ z.
(2.3)
February 23, 2005
10
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
main-t
Louis H. Y. Chen and Qi-Man Shao
By Lemma 2.2 below, fz is a bounded continuous and piecewise continuously differentiable function. Suppose that (2.1) holds for all f ∈ Cbd . Then it holds for fz , and hence, by (2.2) 0 = IE[fz0 (W ) − W fz (W )] = IE[1(−∞,z] (w) − Φ(z)] = P (W ≤ z) − Φ(z). Thus W has a standard normal distribution. Equation (2.2) is a particular case of the more general Stein equation f 0 (w) − wf (w) = h(w) − IEh(Z),
(2.4)
which is to be solved for f , given a real valued measurable function h with IE|h(Z)| < ∞: c.f. (1.2). As for (2.3), the solution f = fh is given by Z w 2 2 fh (w) = ew /2 [h(x) − IEh(Z)]e−x /2 dx −∞ Z ∞ 2 2 = −ew /2 [h(x) − IEh(Z)]e−x /2 dx. (2.5) w
2.2. Properties of the solutions We now list some basic properties of the solutions (2.3) and (2.5) to the Stein equations (2.2) and (2.4). The reasons why we need them have already been indicated in (1.8) and (1.9), where estimates of suph∈H kfh00 k, supz∈IR kfz0 k and supz∈IR supx6=z |fz0 (x)| were required, in order to determine our error bounds for the various approximations. For the more detailed arguments to come, further properties are also needed. We defer the proofs to the appendix, since they only involve careful real analysis. We begin by considering the solution fz to (2.2), given in (2.3). Lemma 2.2: The function fz defined by (2.3) is such that wfz (w) is an increasing function of w.
(2.6)
Moreover, for all real w, u and v, |wfz (w)| ≤ 1, |wfz (w) − ufz (u)| ≤ 1
(2.7)
|fz0 (w)| ≤ 1, |fz0 (w) − fz0 (v)| ≤ 1
(2.8)
√ 0 < fz (w) ≤ min( 2π/4, 1/|z|)
(2.9)
and |(w + u)fz (w + u) − (w + v)fz (w + v)| ≤ (|w| +
√
2π/4)(|u| + |v|). (2.10)
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
11
We mostly use (2.8) and (2.9) for our approximations. If one does not care about the constants, one can easily obtain p |fz0 (w)| ≤ 2 and 0 < fz (w) ≤ π/2 by using the well-known inequality 2 1 1 e−w /2 , 1 − Φ(w) ≤ min , √ 2 w 2π
w > 0.
Next, we consider the solution fh to the Stein equation (2.4), as given in (2.5), for any bounded absolutely continuous function h. Lemma 2.3: For any absolutely continuous function h : IR → IR, the solution fh given in (2.5) satisfies p kfh k ≤ min π/2kh(·) − IEh(Z)k, 2kh0 k , (2.11) kfh0 k ≤ min 2kh(·) − IEh(Z)k, 4kh0 k
(2.12)
kfh00 k ≤ 2kh0 k.
(2.13)
and
2.3. Construction of the Stein identities In this section, we return to the elementary use of Stein’s method which led to the simple bound (1.8), but express the remainders η1 and η2 in a form better suited to deriving more advanced results. We now take ξ1 , ξ2 , · · · , ξn to be independent random variables satisfying IEξi = 0, 1 ≤ i ≤ n, and Pn such that i=1 IEξi2 = 1. Thus ξi corresponds to the normalized random variable n−1/2 Xi of the introduction; here, however, we no longer require the ξi to be identically distributed. Much as before, we set W =
n X
ξi
and
W (i) = W − ξi ,
(2.14)
i=1
and define Ki (t) = IE{ξi (I{0≤t≤ξi } − I{ξi ≤t 1; if |x| ≤ 1,
so that Z ∞ IE(|t| ∧ 1 + |ξi | ∧ 1)Ki (t) dt −∞
= IE{|ξi |(|ξi | − 1)I{|ξi |>1} } + 12 IE{|ξi |(|ξi | ∧ 1)2 } + IE{ξi2 IE(|ξi | ∧ 1)} = IE{ξi2 I{|ξi |>1} } − 12 IE{|ξi |I{|ξi |>1} } + 12 IE{|ξi |3 I{|ξi |≤1} } + IE{ξi2 IE(|ξi | ∧ 1)}.
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
16
main-t
Louis H. Y. Chen and Qi-Man Shao
Adding over i, it follows that ( |IEh(W ) − IEh(Z)| ≤ 8kh0 k β2 + 12 β3 +
n X
) IEξi2 IE(|ξi | ∧ 1) .
(3.12)
i=1
However, since both x2 and (x ∧ 1) are increasing functions of x ≥ 0, it follows that, for any random variable ξ, IEξ 2 IE(|ξ| ∧ 1) ≤ IE{ξ 2 (|ξ| ∧ 1)} = IE|ξ|3 I{|ξ|≤1} + IEξ 2 I{|ξ|>1} ,
(3.13)
so that the final sum is no greater than β3 + β2 , completing the bound. Theorem 3.1 and Theorem 3.3 together yield the Lindeberg central limit theorem, as follows. Let X1 , X2 , · · · , Xn be independent random variables with IEXi = 0 and IEXi2 < ∞ for each 1 ≤ i ≤ n. Put Sn =
n X
Xi
and
Bn2
=
i=1
n X
IEXi2 .
i=1
To apply Theorems 3.1 and 3.3, let ξi = Xi /Bn
and W = Sn /Bn .
(3.14)
Define β2 and β3 as in (3.10), and observe that, for any 0 < ε < 1, n n 1 X 1 X 2 IE{Xi I{|Xi |>Bn } } + 3 IE{|Xi |3 I{|Xi |≤Bn } } β2 + β3 = 2 Bn i=1 Bn i=1
≤
n n 1 X 1 X 2 I E{X I } + Bn IE{Xi2 I{εBn ≤|Xi |≤Bn } } {|X |>B } i i n Bn2 i=1 Bn3 i=1
+ ≤
n 1 X εBn IE{Xi2 I{|Xi |εBn } } + ε. Bn2 i=1
(3.15)
If the Lindeberg condition holds, that is if n 1 X IE{Xi2 I{|Xi |>εBn } } → 0 Bn2 i=1
as
n→∞
(3.16)
for all ε > 0, then (3.15) implies β2 + β3 → 0 as n → ∞, since ε is arbitrary. Hence, by Theorems 3.1 and 3.3, sup |IP(Sn /Bn ≤ z) − Φ(z)| ≤ 8(β2 + β3 )1/2 → 0 z
This proves the Lindeberg central limit theorem.
as
n → ∞.
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
17
3.2. Locally dependent random variables An m-dependent sequence of random variables ξi , i ∈ Z, is one with the property that, for each i, the sets of random variables {ξj , j ≤ i} and {ξj , j > i + m} are independent. As a special case, sequences of independent random variables are 0-dependent. Local dependence generalizes the notion of m-dependence to random variables with arbitrary index set. It is applicable, for instance, to random variables indexed by the vertices of a graph, and such that the collections {ξi , i ∈ I} and {ξj , j ∈ J} are independent whenever I ∩ J = ∅ and the graph contains no edges {i, j} with i ∈ I and j ∈ J. Let J be a finite index set of cardinality n, and let {ξi , i ∈ J } be a P random field with zero means and finite variances. Define W = i∈J ξi , and assume that Var(W ) = 1. For A ⊂ J , let ξA denote {ξi , i ∈ A} and Ac = {j ∈ J : j 6∈ A}. We introduce the following two assumptions, defining different strengths of local dependence. (LD1) For each i ∈ J there exists Ai ⊂ J such that ξi and ξAci are independent. (LD2) For each i ∈ J there exist Ai ⊂ Bi ⊂ J such that ξi is independent of ξAci and ξAi is independent of ξBic . P P We then define ηi = j∈Ai ξj and τi = j∈Bi ξj . Note that, for independent random variables ξi , we can take Ai = Bi = {i}, for which then ηi = τi = ξi . Theorem 3.4: Theorem 3.1 can be applied with X X δ = 4IE {ξi ηi − IE(ξi ηi )} + IE|ξi ηi2 | i∈J
(3.17)
i∈J
under (LD1), and with δ=2
X i∈J
(IE|ξi ηi τi | + |IE(ξi ηi )|IE|τi |) +
X
IE|ξi ηi2 |
(3.18)
i∈J
under (LD2). Remark: For independent random variables, the value of δ in (3.18) is P 5 i∈J IE|ξi |3 , somewhat larger than the direct bound given in Theorem 3.2.
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
18
main-t
Louis H. Y. Chen and Qi-Man Shao
Proof: We first derive Stein identities similar to (2.17) and (2.19). Let f = fh be the solution of the Stein equation (2.4). Then X IE{W f (W )} = IEξi f (W ) i∈J
=
X
IEξi [f (W ) − f (W − ηi )],
i∈J
by the independence of ξi and W − ηi . Hence X IE{W f (W )} = IE{ξi [f (W ) − f (W − ηi ) − ηi f 0 (W )]} i∈J
(
! X
+ IE
ξi ηi
) 0
f (W ) .
(3.19)
i∈J
Now, because IEξi = 0 for all i, and from (LD1), it follows that XX X 1 = IEW 2 = IE{ξi ξj } = IE{ξi ηi }, i∈J j∈J
i∈J
giving X {ξi ηi − IE(ξi ηi )}f 0 (W ) IE{f 0 (W ) − W f (W )} = −IE
(3.20)
i∈J
−
X
IE {ξi [f (W ) − f (W − ηi ) − ηi f 0 (W )]} .
i∈J 0
By (2.12) and (2.13), kf k ≤ 4kh0 k and kf 00 k ≤ 2kh0 k. Therefore it follows from (3.20) and the Taylor expansion that ( ) X X 2 0 |IEh(W ) − IEh(Z)| ≤ kh k 4IE {ξi ηi − IE(ξi ηi )} + IE|ξi ηi | . i∈J
i∈J
This proves (3.17). When (LD2) is satisfied, f 0 (W − τi ) and ξi ηi are independent for each i ∈ J . Hence, using (3.20), we can write |IEh(W ) − IEh(Z)| X X 0 0 ≤ IE {ξi ηi − IE(ξi ηi )}(f (W ) − f (W − τi )) + kh0 k IE|ξi ηi2 | i∈J i∈J ( ) X X ≤ kh0 k 2 (IE|ξi ηi τi | + |IE(ξi ηi )|IE|τi |) + IE|ξi ηi2 | , i∈J
as desired.
i∈J
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
19
Here are two examples of locally dependent random fields. We refer to Baldi & Rinott (1989), Rinott (1994), Baldi, Rinott & Stein (1989), Dembo & Rinott (1996), and Chen & Shao (2004) for more details. Example 1. Graphical dependence. Consider a set of random variables {Xi , i ∈ V} indexed by the vertices of a graph G = (V, E). G is said to be a dependency graph if, for any pair of disjoint sets Γ1 and Γ2 in V such that no edge in E has one endpoint in Γ1 and the other in Γ2 , the sets of random variables {Xi , i ∈ Γ1 } and {Xi , i ∈ Γ2 } are independent. Let D denote the maximal degree of G; that is, the maximal number of edges incident to a single vertex. Let Ai = {i} ∪ {j ∈ V : there is an edge connecting j and i} S and Bi = j∈Ai Aj . Then {Xi , i ∈ V} satisfies (LD2). Hence (3.18) holds. Example 2. The number of local maxima on a graph. Consider a graph G = (V, E) (which is not necessarily a dependency graph) and independent and identically distributed continuous random variables {Yi , i ∈ V}. For i ∈ V define the 0-1 indicator variable 1 if Yi > Yj for all j ∈ Ni Xi = 0 otherwise where Ni = {j ∈ V : {i, j} ∈ E}, so that Xi = 1 indicates that Yi is a local P maximum. Let W = i∈V Xi be the number of local maxima. Let [ [ Ai = {i} ∪ Nj and Bi = Aj . j∈Ni
j∈Ai
Then {Xi , i ∈ V} satisfies (LD2), and (3.18) holds. One can refer to Dembo & Rinott (1996) and Rinott & Rotar (1996) for more examples and results under local dependence. 3.3. Exchangeable pairs Let W be a random variable that is not necessarily the partial sum of independent random variables. Suppose that W is approximately normal, and that we want to find how accurate the approximation is. Another basic approach to Stein’s method is to introduce a second random variable W 0 on the same probability space, in such a way that (W, W 0 ) is an exchangeable pair; that is, such that (W, W 0 ) and (W 0 , W ) have the same distribution.
February 23, 2005
20
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
main-t
Louis H. Y. Chen and Qi-Man Shao
The approach makes essential use of the elementary fact that, if (W, W 0 ) is an exchangeable pair, then IEg(W, W 0 ) = 0
(3.21)
for all antisymmetric measurable functions g(x, y) such that the expected value exists. A key identity is the following lemma (see Stein 1986). Lemma 3.5: Let (W, W 0 ) be an exchangeable pair of real random variables with finite variance, having the linear regression property IE(W 0 | W ) = (1 − λ)W
(3.22)
for some 0 < λ < 1. Then IEW = 0 and IE(W 0 − W )2 = 2λIEW 2 ,
(3.23)
and, for every piecewise continuous function f satisfying the growth condition |f (w)| ≤ C(1 + |w|), we have IE{W f (W )} =
1 IE{(W − W 0 )(f (W ) − f (W 0 ))}. 2λ
(3.24)
Proof: The proof exploits (3.21) with g(x, y) = (x − y)(f (y) + f (x)), for which IEg(W, W 0 ) exists, because of the assumption on f . Then (3.21) gives 0 = IE{(W − W 0 )(f (W 0 ) + f (W ))} = IE{(W − W 0 )(f (W 0 ) − f (W ))} + 2IE{f (W )(W − W 0 )} (3.25) 0 0 0 = IE{(W − W )(f (W ) − f (W ))} + 2IE f (W )IE(W − W | W ) = IE{(W − W 0 )(f (W 0 ) − f (W ))} + 2λIE{W f (W )}, this last by (3.22), and this is just (3.24). Using this lemma, we can deduce the following theorem. Theorem 3.6: If (W, W 0 ) is exchangeable and (3.22) holds, it follows that Theorem 3.1 can be applied with 1 1 0 2 δ = 4IE 1 − IE (W − W ) | W + IE|W − W 0 |3 . (3.26) 2λ 2λ Remark: In the first term, note that 1 1 IE (W 0 − W )2 | W =1− IE(W 0 − W )2 = 1 − IEW 2 , IE 1 − 2λ 2λ so that the bound is unlikely to be useful unless IEW 2 is close to 1.
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
main-t
Normal approximation
21
Proof: Let f = fh be the solution (2.5) to the Stein equation, and define b K(t) = (W − W 0 ) I{−(W −W 0 )≤t≤0} − I{0 b + δ,
so that f 0 = 1[a−δ,b+δ] and kf k = 21 (b − a) + δ. Set cj (t) = ξj (I{−ξ ≤t≤0} − I{0 0 and y > 0, implying that n X
IE|ξj | min(δ, |ξj |) ≥
j=1
n n X IE|ξj |3 o 1 = , IEξj2 − 4δ 2 j=1
(5.9)
by choice of δ, and hence that H1,1 ≥ 21 IP(a ≤ W (i) ≤ b).
(5.10)
By the H¨ older inequality, H1,2
n o1/2 nX ≤ Var |ξj | min(δ, |ξj |)
j=1
≤
n X
IEξj2 min(δ, |ξj |)2
1/2
j=1
≤δ
n X
IEξj2
1/2
= δ.
(5.11)
IP(a ≤ W (i) ≤ b) − δ,
(5.12)
j=1
Hence Z
∞
IE
0
f (W −∞
(i)
c(t) dt + t)M
≥
1 2
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
34
main-t
Louis H. Y. Chen and Qi-Man Shao
to be compared with the equation ‘IEf 0 (W ) = IP(a ≤ W ≤ b)’ of the heuristic. On the other hand, recalling that kf k ≤ 21 (b − a) + δ, we have IE{W (i) f (W (i) )} − IE{ξi f (W (i) − ξi )} ≤ { 12 (b − a) + δ}(IE|W (i) | + IE|ξi |) 1/2 1 (b − a + 2δ) ≤ √ (IE|W (i) |)2 + (IE|ξi |)2 2 1/2 1 ≤ √ IE|W (i) |2 + IE|ξi |2 (b − a + 2δ) 2 1 = √ (b − a + 2δ), 2
(5.13)
the inequality to be compared with ‘|IE{W f (W )}| ≤ 12 (b − a)’ from the heuristic. Combining (5.12) and (5.13) thus gives √ √ √ √ IP(a ≤ W (i) ≤ b) ≤ 2(b − a) + 2(1 + 2)δ = 2(b − a) + (1 + 2)γ as desired. 5.2. Proving the Berry–Esseen theorem We are now ready to prove the classical Berry–Esseen theorem, with a constant of 7. Theorem 5.2: We have sup |IP(W ≤ z) − Φ(z)| ≤ 7γ.
(5.14)
z
Proof: As discussed in the previous section, we need to find a way to replace IP(W (i) + t ≤ z) by IP(W ≤ z) in (4.3). However, it follows from (5.4) that n Z X i=1
≤
∞
−∞
n Z X i=1
=
IP(W (i) + t ≤ z)Ki (t) dt − IP(W ≤ z) |IP(W (i) + t ≤ z) − IP(W ≤ z)|Ki (t) dt
−∞
n Z X i=1
∞
∞
−∞
IE IP(z − max(t, ξi ) ≤ W (i) ≤ z − min(t, ξi ) | ξi ) Ki (t) dt.
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
35
Proposition 5.1 thus gives the bound n Z ∞ X IP(W (i) + t ≤ z)Ki (t) dt − IP(W ≤ z) i=1
−∞
≤
n Z X i=1
= (1 +
∞
√ √ IE 2(|t| + |ξi |) + (1 + 2)γ Ki (t) dt
−∞
√
2)γ +
n √ X 2 ( 21 IE|ξi |3 + IE|ξi |IEξi2 )
√ ≤ (1 + 2.5 2)γ.
i=1
(5.15)
Now by (4.2)
√ √ |IP(W ≤ z) − Φ(z)| ≤ (1 + 2.5 2 + 1.5(1 + 2π/4))γ ≤ 7γ,
which is (4.1). We remark that following the above lines of proof, one can prove that n X (IEξi2 I{|ξi |>1} + IE|ξi |3 I{|ξi |≤1} ), sup |IP(W ≤ z) − Φ(z)| ≤ 7 z
i=1
dispensing with the third moment assumption. We leave the proof to the reader. With a more refined concentration inequality, the constant 7 can be reduced to 4.1 (see Chen & Shao (2001)). 5.3. A lower bound Let Xi , i ≥ 1, be independent random variables with zero means and finite Pn variances, and define Bn = i=1 VarXi . It is known that if the Feller condition max IEXi2 /Bn2 → 0,
1≤i≤n
(5.16)
is satisfied, then the Lindeberg condition is necessary for the central limit theorem. Barbour & Hall (1984) used Stein’s method to provide not only a nice proof of the necessity, but also a lower bound for the Kolmogorov distance between the distribution of W and the normal, which is as close as can be expected to being a multiple of one of Lindeberg’s sums Pn Bn−2 i=1 IE{Xi2 I{Bn−1 |Xi |>ε} }. Note that no lower bound could simply be a multiple of a Lindeberg sum, since a sum of n identically distributed normal random variables is itself normally distributed, but the corresponding Lindeberg sum is not zero. The next theorem and its proof are due to Barbour & Hall (1984).
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
36
main-t
Louis H. Y. Chen and Qi-Man Shao
Theorem 5.3: Let ξ1 , ξ2 , · · · , ξn be independent random variables which have zero means and finite variances IEξi2 = σi2 , 1 ≤ i ≤ n, and satisfy Pn 2 i=1 σi = 1. Then there exists an absolute constant C such that for all ε>0 n n X X 2 (1 − e−ε /4 ) IE{ξi2 I{|ξi |>ε} } ≤ C sup |IP(W ≤ z) − Φ(z)| + σi4 . z
i=1
i=1
(5.17) Remark: For the sequence of independent random variables {Xi , i ≥ 1}, we take ξi = Xi /Bn . Clearly, Feller’s negligibility condition (5.16) implies Pn that i=1 σi4 ≤ max1≤i≤n σi2 → 0 as n → ∞. Therefore, if Sn /Bn is asymptotically normal, then n X IE{ξi2 I{|ξi |>ε} } → 0 i=1
as n → ∞ for every ε > 0, and the Lindeberg condition is satisfied. Proof: Once again, the argument starts with the Stein identity IE{fh0 (W ) − W fh (W )} = IEh(W ) − IEh(Z), for h yet to be chosen. Integrating by parts, the right hand side is bounded by Z ∞ 0 h (w){IP(W ≤ w) − Φ(w)} dw |IEh(W ) − IEh(Z)| = −∞ Z ∞ ≤∆ |h0 (w)| dw, (5.18) −∞
where ∆ = supz |IP(W ≤ z) − Φ(z)|. For the left hand side, in the usual way, because ξi and W (i) are independent and IEξi = 0, we have n X IE{W fh (W )} = IE{ξi2 fh0 (W (i) )} i=1
+
n X
IE{ξi (fh (W (i) + ξi ) − fh (W (i) ) − ξi fh0 (W (i) ))},
i=1
and, because
Pn
IEfh0 (W )
2 i=1 σi = 1,
=
n X
σi2 IEfh (W (i) + ξi )
i=1
=
n X i=1
σi2 IEfh0 (W (i) ) +
n X i=1
σi2 IE{fh0 (W ) − fh0 (W (i) )},
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
37
Pn with the last term easily bounded by 21 kfh000 k i=1 σi4 . Hence n n X X 0 2 (i) IE{ξi g(W , ξi )} ≤ 12 kfh000 k σi4 , (5.19) IE{fh (W ) − W fh (W )} − i=1
i=1
where g(w, y) = gh (w, y) = −y −1 {fh (w + y) − fh (w) − yfh0 (w)}. Intuitively, if W is close to being normally distributed, then R1 :=
n X
IE{ξi2 g(W (i) , ξi )} and R :=
i=1
n X
IE{ξi2 g(Z, ξi )},
i=1
for Z ∼ N (0, 1) independent of the ξi ’s, should be close to one another. Taking (5.18) and (5.19) together, it follows that weR have a lower bound ∞ for ∆, provided that a function h can be found with −∞ |h0 (w)| dw < ∞ for which IEgh (Z, y) is of constant sign, and provided also that kfh000 k < ∞. In practice, it is easier to look for a suitable f , and then define h by setting h(w) = f 0 (w) − wf (w). The function g is zero for any linear function f , and, for even functions f , it follows that IEg(Z, y) is antisymmetric in y about zero, but odd functions are possibilities; for instance, f (x) = x3 2 has IEg(Z, this f fails to have R ∞ y)0 = −y , of constant sign. Unfortunately, 2 finite −∞ |h (w)| dw, but the choice f (w) = we−w /2 is a good one; it behaves much like the sum of a linear and a cubic function for those values of w where N (0, 1) puts most of its mass, yet dies away to zero fast when |w| is large. Making the computations, Z ∞n 2 1 (w + y)e−(w+y) /2 IEg(Z, y) = − √ y 2π −∞ o 2 2 2 −we−w /2 − ye−w /2 (1 − w2 ) e−w /2 dw r Z ∞ y π y −1 2 2 1 √ exp{− 2 (w + (w + y) )} dw − = −y 2 2 2 2π −∞ 2 1 = √ (1 − e−y /4 ) ≥ 0, (5.20) 2 2 and IEg(Z, y) ≥ of f , we have
1 √ (1 2 2
R ≥
2
− e−ε
/4
) whenever |y| ≥ ε. Hence, for this choice
n X 2 1 √ (1 − e−ε /4 ) IE{ξi2 I{|ξi |>ε} } 2 2 i=1
(5.21)
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
38
main-t
Louis H. Y. Chen and Qi-Man Shao
for any ε > 0. It thus remains to show that R and R1 are close enough, after which (5.18), (5.19) and (5.21) complete the proof. To show this, note first that, taking this choice of f , and setting h(w) = f 0 (w) − wf (w), we have Z ∞ Z ∞ |f 00 (w)| dw ≤ 4; c3 := sup |f 000 (w)| = 3. |h0 (w)| dw ≤ 5; c2 := c1 := w
−∞
−∞
Now define an intermediate R2 between R1 and R, by R2 :=
n X
IE{ξi2 g(W ∗ , ξi )},
i=1
where W ∗ has the same distribution as W , but is independent of the ξi ’s. Then n n Z 1 o X R1 = − IE ξi2 [f 0 (W (i) + tξi ) − f 0 (W (i) )] dt 0
i=1
= R2 +
n X
−
n Z 1 o IE ξi2 [f 0 (W ∗ + tξi ) − f 0 (W (i) + tξi )] dt
i=1 n X
0
n Z 1 o IE ξi2 [f 0 (W ∗ ) − f 0 (W (i) )] dt .
(5.22)
0
i=1
Now, for any θ, and because ξi and W (i) are independent, with IEξi = 0, |IE f 0 (W ∗ + θ) − f 0 (W (i) + θ) | = |IE f 0 (W (i) + ξi + θ) − f 0 (W (i) + θ) | = |IE f 0 (W (i) + ξi + θ) − f 0 (W (i) + θ) − ξi f 00 (W (i) + θ) | ≤
2 1 2 c3 σi ,
by Taylor’s theorem. Hence, from (5.22) and because R1 ≥ R2 − c3
n X
Pn
i=1
σi2 = 1,
σi4 .
i=1
Similarly, R2 = R +
n X
−
n Z 1 o IE ξi2 [f 0 (Z) + tξi ) − f 0 (W ∗ + tξi )] dt
i=1 n X i=1
0
n Z 1 o IE ξi2 [f 0 (Z) − f 0 (W ∗ )] dt , 0
(5.23)
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
main-t
Normal approximation
39
and, for any θ, |IEf 0 (W ∗ + θ) − IEf 0 (Z + θ)| Z ∞ 00 ∗ f (w) IP(W ≤ w − θ) − Φ(w − θ) dw ≤ c2 ∆, = −∞
so that R2 ≥ R − 2c2 ∆.
(5.24)
Combining (5.18) and (5.19) with (5.23) and (5.24), it follows that c1 ∆ ≥ R1 − 21 c3
n X
σi4 ≥ R − 23 c3
i=1
n X
σi4 − 2c2 ∆.
i=1
In view of (5.21), collecting terms, it follows that ∆(c1 + 2c2 ) + 32 c3
n X
n X 2 1 σi4 ≥ √ (1 − e−ε /4 ) IE{ξi2 I{|ξi |>ε} } (5.25) 2 2 i=1 i=1
for any ε > 0. This proves (5.17), with C ≤ 40. 6. Non-uniform Berry–Esseen bounds: the independent case In this section, we prove a non-uniform Berry–Esseen bound similar to (5.3), following the proof in Chen & Shao (2001). To do this, as in the previous section, we first need a concentration inequality; it must itself now be nonuniform. 6.1. A non-uniform concentration inequality Let ξ1 , ξ2 , · · · , ξn be independent random variables satisfying IEξi = 0 for Pn every 1 ≤ i ≤ n and i=1 IEξi2 = 1. Let ξ¯i = ξi I{ξi ≤1} ,
W =
n X
ξ¯i ,
W
(i)
= W − ξ¯i .
i=1
Proposition 6.1: We have (i)
≤ b) ≤ e−a/2 (5(b − a) + 7γ) Pn for all real b > a and for every 1 ≤ i ≤ n, where γ = i=1 IE|ξi |3 . IP(a ≤ W
(6.1)
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
40
main-t
Louis H. Y. Chen and Qi-Man Shao
To prove it, we first need to have the following Bennett–Hoeffding inequality. Lemma 6.2: Let η1 , η2 , · · · , ηn be independent random variables satisfying Pn Pn IEηi ≤ 0 , ηi ≤ α for 1 ≤ i ≤ n, and i=1 IEηi2 ≤ Bn2 . Put Sn = i=1 ηi . Then (6.2) IEetSn ≤ exp α−2 (etα − 1 − tα)Bn2 for t > 0, B2 h αx αx i αx IP(Sn ≥ x) ≤ exp − 2n (1 + 2 ) ln(1 + 2 ) − 2 α Bn Bn Bn
(6.3)
and IP(Sn ≥ x) ≤ exp −
x2 2(Bn2 + αx)
(6.4)
for x > 0. Proof: It is easy to see that (es − 1 − s)/s2 is an increasing function of s ∈ IR, from which it follows that ets ≤ 1 + ts + (ts)2 (etα − 1 − tα)/(tα)2
(6.5)
for s ≤ α, if t > 0. Using the properties of the ηi ’s, we thus have IEetSn = ≤ ≤
n Y i=1 n Y i=1 n Y
IEetηi 1 + tIEηi + α−2 (etα − 1 − tα)IEηi2 1 + α−2 (etα − 1 − tα)IEηi2
i=1
≤ exp α−2 (etα − 1 − tα)Bn2 . This proves (6.2). To prove inequality (6.3), let t=
1 αx ln 1 + 2 . α Bn
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
41
Then, by (6.2), IP(Sn ≥ x) ≤ e−tx IEetSn ≤ exp −tx + α−2 (etα − 1 − tα)Bn2 B2 h αx αx αx i = exp − 2n (1 + 2 ) ln(1 + 2 ) − 2 . α Bn Bn Bn In view of the fact that s2 2(1 + s)
(1 + s) ln(1 + s) − s ≥ for s > 0, (6.4) follows from (6.3).
Proof of Proposition 6.1. It follows from (6.2) with α = 1 and Bn2 = 1 that IP(a ≤ W
(i)
≤ b) ≤ e−a/2 IEeW
(i)
/2
≤ e−a/2 exp(e1/2 − 23 ) ≤ 1.19 e−a/2 . Thus, (6.1) holds if 7γ ≥ 1.19. We now assume that γ ≤ 1.19/7 = 0.17. Much as in the proof of Proposition 5.1, define δ = γ/2 (≤ 0.085), and set 0 if w < a − δ, w/2 (6.6) f (w) = e (w − a + δ) if a − δ ≤ w ≤ b + δ, w/2 e (b − a + 2δ) if w > b + δ. Put M i (t) = ξi (I{−ξ¯i ≤t≤0} − I{01} ]) j6=i
j6=i
≥ −δIE|ξi | +
Xn j=1
IE{|ξj | min(δ, |ξj |)} − δγ
≥ −δγ 1/3 + 0.5 − δγ ≥ −0.085(0.17)1/3 + 0.5 − 0.085(0.17) ≥ 0.43.
(6.8)
Hence H2,1 ≥ 0.43 ea/2 IP(a ≤ W
(i)
≤ b).
(6.9)
By the Bennett inequality (6.2) again, we have IEeW
(i)
≤ exp(e − 2)
and hence, as for (5.11), X 1/2 (i) 1/2 H2,2 ≤ IEeW Var |ξj | min(δ, |ξ¯j |) j6=i
≤ exp( 2e − 1)δ ≤ 1.44 δ.
(6.10)
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
main-t
Normal approximation
43
As to the left hand side of (6.7), we have n o (i) (i) IE{W (i) f (W )} ≤ (b − a + 2δ)IE |W (i) |eW /2 (i) 1/2 ≤ (b − a + 2δ)(IE|W (i) |2 )1/2 IEeW ≤ (b − a + 2δ) exp(e − 2) ≤ 2.06(b − a + 2δ). Combining the above inequalities yields IP(a ≤ W
(i)
e−a/2 δ/2 e 2.06(b − a + 2δ) + 1.44 δ 0.43 e−a/2 .0425 e 2.06(b − a + 2δ) + 1.44 δ ≤ 0.43 ≤ e−a/2 (5(b − a) + 13.4 δ)
≤ b) ≤
≤ e−a/2 (5(b − a) + 7γ). This proves (6.1). Before proving the main theorem, we need the following moment inequality. Lemma 6.3: Let 2 < p ≤ 3, and let {ηi , 1 ≤ i ≤ n} be independent Pn random variables with IEηi = 0 and IE|ηi |p < ∞. Put Sn = i=1 ηi and P n Bn2 = i=1 IEηi2 . Then IE|Sn |p ≤ (p − 1)Bnp +
n X
IE|ηi |p
(6.11)
i=1
Remark: Moment inequalities of this kind were first proved by Rosenthal (1970), in a more general martingale setting. (i)
Proof: Let Sn = Sn − ηi . Then IE|Sn |p =
n X
IEηi Sn |Sn |p−2
i=1
=
n X
IEηi (Sn |Sn |p−2 − Sn(i) |Sn |p−2 )
i=1
+
n X i=1
IEηi (Sn(i) |Sn |p−2 − Sn(i) |Sn(i) |p−2 ),
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
44
main-t
Louis H. Y. Chen and Qi-Man Shao
(i)
once again because ηi and Sn are independent, and IEηi = 0. Thus we have n n X X IE|Sn |p ≤ IEηi2 |Sn |p−2 + IE|ηi ||Sn(i) |{(|Sn(i) | + |ηi |)p−2 − |Sn(i) |p−2 } i=1
≤
n X
i=1
IEηi2 (|ηi |p−2 + |Sn(i) |p−2 )
i=1
+
n X
IE|ηi ||Sn(i) |p−1 {(1 + |ηi |/|Sn(i) |)p−2 − 1}.
i=1 p−2
Since (1 + x)
− 1 ≤ (p − 2)x in x ≥ 0, we thus have
IE|Sn |p ≤
n X
IE|ηi |p +
n X
IEηi2 IE|Sn(i) |p−2
i=1
i=1
+
n X
IE|ηi ||Sn(i) |p−1 (p − 2)|ηi |/|Sn(i) |
i=1
=
n X
p
IE|ηi | + (p − 1)
i=1
n X
IEηi2 IE|Sn(i) |p−2 ,
i=1
and H¨ older’s inequality now gives IE|Sn |p ≤
n X
IE|ηi |p + (p − 1)
i=1
≤
n X
n X
IEηi2 (IE|Sn(i) |2 )(p−2)/2
i=1
IE|ηi |p + (p − 1)Bnp ,
i=1
as desired. 6.2. The final result We are now ready to prove the non-uniform Berry–Esseen inequality. Theorem 6.4: There exists an absolute constant C such that for every real number z, |IP(W ≤ z) − Φ(z)| ≤
Cγ . 1 + |z|3
Proof: Without loss of generality, assume z ≥ 0. By (6.11), IP(W ≥ z) ≤
1 + IE|W |3 1+2+γ ≤ . 3 1+z 1 + z3
(6.12)
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
45
Thus (6.12) holds if γ ≥ 1, and we can now assume γ < 1. Let ξ¯i = ξi I{ξi ≤1} ,
W =
n X
ξ¯i ,
W
(i)
= W − ξ¯i .
i=1
Observing that {W ≥ z} = {W ≥ z, max ξi > 1} ∪ {W ≥ z, max ξi ≤ 1} 1≤i≤n
1≤i≤n
⊂ {W ≥ z, max ξi > 1} ∪ {W ≥ z}, 1≤i≤n
we have IP(W > z) ≤ IP(W > z) + IP(W > z, max ξi > 1), 1≤i≤n
(6.13)
and, since clearly W ≥ W , IP(W > z) ≤ IP(W > z).
(6.14)
Note that IP(W > z, max ξi > 1) ≤ 1≤i≤n
≤ =
n X
IP(W > z, ξi > 1)
i=1
IP(ξi > max(1, z/2)) +
n X
i=1
i=1
n X
n X
IP(ξi > max(1, z/2)) +
IP(W (i) > z/2, ξi > 1) IP(W (i) > z/2)IP(ξi > 1)
i=1
i=1
≤
n X
n X (1 + IE|W (i) |3 )
γ + max(1, z/2)3 i=1
1 + (z/2)3
IE|ξi |3 ≤
Cγ ; 1 + z3
here, and in what follows, C denotes a generic absolute constant, whose value may be different at each appearance. Thus, to prove (6.12), it suffices to show that |IP(W ≤ z) − Φ(z)| ≤ Ce−z/2 γ.
(6.15)
Let fz be the solution to the Stein equation (2.2), and define K i (t) = IE{ξ¯i (I{0≤t≤ξ¯i } − I{ξ¯i ≤t1} ,
i=1
we obtain that IP(W ≤ z) − Φ(z) = IEfz0 (W ) − IE{W fz (W )} n X = IE{ξi2 I{ξi >1} }IEfz0 (W ) i=1 n Z 1 X
+
+
i=1 n X
IE[fz0 (W
(i)
(i) + ξ¯i ) − fz0 (W + t)]K i (t) dt
−∞
IE{ξi I{ξi >1} }IEfz (W
(i)
)
i=1
:= R1 + R2 + R3 .
(6.16)
By (8.3), (2.8) and (6.2), IE|fz0 (W )| = IE{|fz0 (W )|I{W ≤z/2} } + IE{|fz0 (W )|I{W >z/2} } √ 2 ≤ (1 + 2π(z/2)ez /8 )(1 − Φ(z)) + IP(W > z/2) √ 2 ≤ (1 + 2π(z/2)ez /8 )(1 − Φ(z)) + e−z/2 IEeW ≤ Ce−z/2 , and hence |R1 | ≤ Cγe−z/2 . Similarly, we have IEfz (W
(i)
(6.17)
) ≤ Ce−z/2 and |R3 | ≤ Cγe−z/2 .
(6.18)
To estimate R2 , write R2 = R2,1 + R2,2 , where R2,1 = R2,2 =
n Z X
1
i=1 −∞ n Z 1 X i=1
IE[I{W (i) +ξ¯ ≤z} − I{W (i) +t≤z} ] K i (t) dt, i
IE[(W
(i)
(i) + ξ¯i )fz (W + ξ¯i )
−∞
− (W
(i)
+ t)fz (W
(i)
+ t)] K i (t) dt.
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
main-t
Normal approximation
47
By Proposition 6.1, n Z 1 X (i) R2,1 ≤ IE{I{ξ¯i ≤t} IP(z − t < W ≤ z − ξ¯i | ξ¯i )}K i (t) dt −∞
i=1
≤C
n Z X i=1
1
e−(z−t)/2 IE(|ξ¯i | + |t| + γ)K i (t) dt
−∞
≤ Ce−z/2 γ.
(6.19)
From Lemma 6.5, proved below, it follows that n Z 1 n X (i) (i) R2,2 ≤ IE I{t≤ξ¯i } [IE({W + ξ¯i }fz (W + ξ¯i ) | ξ¯i ) −∞
i=1
− IE(W n Z X
−z/2
≤ Ce
(i)
+ t)fz (W
(i)
o + t)] K i (t) dt
1
IE(|ξ¯i | + |t|)K i (t) dt
−∞
i=1
≤ Ce−z/2 γ.
(6.20)
Therefore R2 ≤ Ce−z/2 γ.
(6.21)
R2 ≥ −Ce−z/2 γ.
(6.22)
Similarly, we have
This proves the theorem. It remains to prove the following lemma. Lemma 6.5: For s < t ≤ 1, we have IE{(W
(i)
+ t)fz (W −z/2
≤ Ce
(i)
+ t)} − IE{(W
(i)
+ s)fz (W
(i)
+ s)}
(|s| + |t|).
(6.23)
Proof: Let g(w) = (wfz (w))0 . Then IE{(W
(i)
+t)fz (W
(i)
+t)}−IE{(W
(i)
+s)fz (W
(i)
Z +s)} =
t
IEg(W
(i)
s
From the definition of g and fz , we get √ 2 2π(1 + w2 )ew /2 (1 − Φ(w)) − w Φ(z), w ≥ z g(w) = √ 2 2π(1 + w2 )ew /2 Φ(w) + w (1 − Φ(z)), w < z.
+u)du.
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
48
main-t
Louis H. Y. Chen and Qi-Man Shao
By (2.6), g(w) ≥ 0 for all real w. A direct calculation shows that √ 2 2π(1 + w2 )ew /2 Φ(w) + w ≤ 2 for w ≤ 0.
(6.24)
Thus, we have ( g(w) ≤
4(1 + z 2 )ez
2
/8
(1 − Φ(z)) if w ≤ z/2
4(1 + z 2 )ez
2
/2
(1 − Φ(z)) if z/2 < w ≤ z,
and this latter bound holds also for w > z, as can be seen by applying (6.24) with −w for w to the formula for g in w ≥ z. Hence, for any u ∈ [s, t], we have IEg(W
(i)
+ u) n o n o (i) (i) = IE g(W + u)I{W (i) +u≤z/2} + IE g(W + u)I{W (i) +u>z/2}
≤ 4(1 + z 2 )ez −z/2
≤ Ce
2
/8
(1 − Φ(z)) + 4(1 + z 2 )ez −z+2u
+ C(1 + z)e
2W
2
/2
(1 − Φ(z))IP(W
(i)
+ u > z/2)
(i)
IEe
.
But u ≤ t ≤ 1, and so IEg(W
(i)
+ u) ≤ Ce−z/2 + C(1 + z)e−z IEe2W
(i)
≤ Ce−z/2 ,
by (6.2). This gives IE{(W
(i)
+ t)fz (W
(i)
+ t)} − IE{(W
(i)
+ s)fz (W
(i)
+ s)} ≤ Ce−z/2 (|s| + |t|),
proving (6.23). 7. Uniform and non-uniform bounds under local dependence In this section, we extend the discussion of normal approximation under local dependence using Stein’s method, which was begun in Section 3.2. Our aim is to establish optimal uniform and non-uniform Berry–Esseen bounds under local dependence. Throughout this section, let J be an index set of cardinality n and let {ξi , i ∈ J } be a random field with zero means and finite variances. Define P W = i∈J ξi and assume that Var(W ) = 1. For A ⊂ J , let ξA denote {ξi , i ∈ A}, Ac = {j ∈ J : j 6∈ A} and |A| the cardinality of A. We introduce four dependence assumptions, the first two of which appeared in Section 3.2 (LD1) For each i ∈ J there exists Ai ⊂ J such that ξi and ξAci are independent.
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
49
(LD2) For each i ∈ J there exist Ai ⊂ Bi ⊂ J such that ξi is independent of ξAci and ξAi is independent of ξBic . (LD3) For each i ∈ J there exist Ai ⊂ Bi ⊂ Ci ⊂ J such that ξi is independent of ξAci , ξAi is independent of ξBic , and ξBi is independent of ξCic . (LD4∗ ) For each i ∈ J there exist Ai ⊂ Bi ⊂ Bi∗ ⊂ Ci∗ ⊂ Di∗ ⊂ J such that ξi is independent of ξAci , ξAi is independent of ξBic , and then ξAi is independent of {ξAj , j ∈ Bi∗c }, {ξAl , l ∈ Bi∗ } is independent of {ξAj , j ∈ Ci∗c }, and {ξAl , l ∈ Ci∗ } is independent of {ξAj , j ∈ Di∗c }. It is clear that (LD4∗ ) implies (LD3), (LD3) yields (LD2) and (LD1) is the weakest assumption. Roughly speaking, (LD4∗ ) is a version of (LD3) for {ξAi , i ∈ J }. On the other hand, (LD1) in many cases actually implies (LD2), (LD3) and (LD4∗ ) and Bi , Ci , Bi∗ , Ci∗ and Di∗ could be chosen as: Bi = ∪j∈Ai Aj , Ci = ∪j∈Bi Aj , Bi∗ = ∪j∈Ai Bj , Ci∗ = ∪j∈Bi∗ Bj and Di∗ = ∪j∈Ci∗ Bj . We first present a general uniform Berry–Esseen bound under assumption (LD2). Theorem 7.1: Let N (Bi ) = {j ∈ J : Bj ∩Bi 6= ∅} and 2 < p ≤ 4. Assume that (LD2) is satisfied with |N (Bi )| ≤ κ. Then sup |IP(W ≤ z) − Φ(z)|
(7.1)
z
X
≤ (13 + 11κ)
X 1/2 (IE|ξi |3∧p + IE|Yi |3∧p ) + 2.5 κ (IE|ξi |p + IE|Yi |p ) ,
i∈J
i∈J
where Yi = j∈Ai ξj . In particular, if IE|ξi | + IE|Yi |p ≤ θp for some θ > 0 and for each i ∈ J , then √ sup |IP(W ≤ z) − Φ(z)| ≤ (13 + 11κ) n θ3∧p + 2.5θp/2 κn, (7.2) P
p
z
where n = |J |. Note that in many cases κ is bounded and θ is of order of n−1/2 . In those √ cases, κ n θ3∧p +θp/2 κn = O(n−(p−2)/4 ), which is of the best possible order of n−1/2 when p = 4. However, the cost is the existence of fourth moments. To reduce the assumption on moments, we need the stronger condition (LD3).
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
50
main-t
Louis H. Y. Chen and Qi-Man Shao
Theorem 7.2: Let 2 < p ≤ 3. Assume that (LD3) is satisfied with |N (Ci )| ≤ κ, where N (Ci ) = {j ∈ J : Ci Bj 6= ∅}. Then X sup |IP(W ≤ z) − Φ(z)| ≤ 75κp−1 IE|ξi |p . (7.3) z
i∈J
We now present a general non-uniform bound for locally dependent random fields {ξi , i ∈ J } under (LD4∗ ). Theorem 7.3: Assume that IE|ξi |p < ∞ for 2 < p ≤ 3 and that (LD4∗ ) is satisfied. Let κ = maxi∈J max(|Di∗ |, |{j : i ∈ Dj∗ }|). Then X |IP(W ≤ z) − Φ(z)| ≤ Cκp (1 + |z|)−p IE|ξi |p , (7.4) i∈J
where C is an absolute constant. The above results can immediately be applied to m-dependent random fields. Let d ≥ 1 and Z d denote the d-dimensional space of positive integers. The distance between two points i = (i1 , · · · , id ) and j = (j1 , · · · , jd ) in Z d is defined by |i−j| = max1≤l≤d |il −jl | and the distance between two subsets A and B of Z d is defined by ρ(A, B) = inf{|i − j| : i ∈ A, j ∈ B}. For a given subset J of Z d , a set of random variables {ξi , i ∈ J } is said to be an m-dependent random field if {ξi , i ∈ A} and {ξj , j ∈ B} are independent whenever ρ(A, B) > m, for any subsets A and B of J . Thus, choosing Ai = {j : |j − i| ≤ m} ∩ J , Ci = {j : |j − i| ≤ 3m} ∩ J , Ci∗ = {j : |j − i| ≤ 4m} ∩ J
Bi = {j : |j − i| ≤ 2m} ∩ J , Bi∗ = {j : |j − i| ≤ 3m} ∩ J , and Di∗ = {j : |j − i| ≤ 5m} ∩ J
in Theorems 7.2 and 7.3 yields a uniform and a non-uniform bound. Theorem 7.4: Let {ξi , i ∈ J } be an m-dependent random fields with zero means and finite IE|ξi |p < ∞ for 2 < p ≤ 3. Then X sup |IP(W ≤ z) − Φ(z)| ≤ 75(10m + 1)(p−1)d IE|ξi |p (7.5) z
i∈J
and |IP(W ≤ z) − Φ(z)| ≤ C(1 + |z|)−p 11pd (m + 1)(p−1)d
X i∈J
where C is an absolute constant.
IE|ξi |p ,
(7.6)
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
51
The main idea of the proof is similar to that in Sections 3 and 4, first deriving a Stein identity and then uniform and non-uniform concentration inequalities. We outline some main steps in the proof and refer to Chen & Shao (2004a) for details. Define b b i (t), Ki (t) = ξi {I(−Yi ≤ t < 0) − I(0 ≤ t ≤ −Yi )}, Ki (t) = IEK (7.7) P P b b i (t), K(t) = IEK(t) b K(t) = i∈J K = i∈J Ki (t). We first derive a Stein identity for W . Let f be a bounded absolutely continuous function. Then IE{W f (W )} X X n Z = IE{ξi (f (W ) − f (W − Yi ))} = IE ξi i∈J
=
X
b i (t) dt f 0 (W + t)K
o
nZ = IE
−∞
i∈J
f 0 (W + t) dt
o
−Yi
i∈J ∞
nZ IE
0
∞
o b dt , (7.8) f 0 (W + t)K(t)
−∞
and hence, by the fact that IE{f 0 (W ) − W f (W )} = IE
R∞ −∞ ∞
Z
K(t) dt = IEW 2 = 1, Z ∞ 0 b dt f (W )K(t) dt − IE f 0 (W + t)K(t)
−∞ ∞
−∞
Z
(f 0 (W ) − f 0 (W + t))K(t) dt Z ∞ 0 b +IEf (W ) (K(t) − K(t)) dt −∞ Z ∞ b +IE (f 0 (W + t) − f 0 (W ))(K(t) − K(t)) dt
= IE
−∞
−∞
:= R1 + R2 + R3 . Now let f = fz be the Stein solution (2.3). Then Z ∞ Z ∞ |R1 | ≤ IE (|W | + 1)|tK(t)| dt + IE (I{W ≤z} − I{W +t≤z} )K(t) dt −∞
≤
n 1X
2
−∞
IE(|W | + 1)|ξi |Yi2
i=1
Z
∞
IP(z − max(t, 0) ≤ W ≤ z − min(t, 0))K(t) dt
+ −∞
:= R1,1 + R1,2 . Estimating R1,1 is not so difficult, while R1,2 can be estimated by using a concentration inequality given below.
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
52
main-t
Louis H. Y. Chen and Qi-Man Shao
Observe that ( 0
R2 = IE f (W )
n X
) (ξi Yi − IE(ξi Yi )) ,
i=1
which can also be estimated easily. The main difficulty arises from estimating R3 . The reader may refer to Chen & Shao (2004a) for details. To conclude this section, we give the simplest concentration inequality in the paper of Chen & Shao (2004a), and provide a detailed proof to illustrate the difficulty for dependent variables. Proposition 7.5: Assume (LD1). Then for any real numbers a < b,
where r1 =
IP(a ≤ W ≤ b) ≤ 0.625(b − a) + 4r1 + 4r2 , R∞ 2 b i∈J IE|ξi |Yi and r2 = −∞ Var(K(t)) dt.
(7.9)
P
Proof: Let α = r1 and define −(b − a + α)/2 1 2 2α (w − a + α) − (b − a + α)/2 f (w) = w − (a + b)/2 − 1 (w − b − α)2 + (b − a + α)/2 2α (b − a + α)/2
for for for for for
w ≤a−α a−α z, 2π ≥ 0, by (8.1). This proves (2.6). In view of the fact that lim wfz (w) = Φ(z) − 1 and lim wfz (w) = Φ(z),
w→−∞
w→∞
(8.2)
(2.7) follows by (2.6). By (2.2), we have fz0 (w) = wfz (w) + I{w≤z} − Φ(z) ( wfz (w) + 1 − Φ(z) for w < z, = wfz (w) − Φ(z) for w > z; ( √ w2 /2 ( 2πwe Φ(w) + 1)(1 − Φ(z)) for w < z, = √ 2 ( 2πwew /2 (1 − Φ(w)) − 1)Φ(z) for w > z.
(8.3)
Since wfz (w) is an increasing function of w, by (8.1) and (8.2), 0 < fz0 (w) ≤ zfz (z) + 1 − Φ(z) < 1 for w < z
(8.4)
−1 < zfz (z) − Φ(z) ≤ fz0 (w) < 0 for w > z .
(8.5)
and
Hence, for any w and v, |fz0 (w) − fz0 (v)| ≤ max(1, zfz (z) + 1 − Φ(z) − (zfz (z) − Φ(z)) = 1.
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
55
This proves (2.8). Observe that, by (8.4) and (8.5), fz attains its maximum at z. Thus √ 2 0 < fz (w) ≤ fz (z) = 2πez /2 Φ(z)(1 − Φ(z)). (8.6) By (8.1), fz (z) ≤ 1/z. To finish the proof of (2.9), let g(z) = Φ(z)(1 − Φ(z)) − e−z Observe that g 0 (z) = e−z
2
2
/2
/4
1 z 2Φ(z) and g1 (z) = √ + − √ . 4 2π 2π
/2
g1 (z) and that < 0 if 0 ≤ z < z0 , 1 1 −z2 0 g1 (z) = − e = 0 if z = z0 , 4 π > 0 if z > z0 ,
where z0 = (2 ln(4/π))1/2 . Thus, g1 (z) is decreasing on [0, z0 ) and increasing on (z0 , ∞). Since g1 (0) = 0 and g1 (∞) = ∞, there exists z1 > 0 such that g1 (z) < 0 for 0 < z < z1 and g1 (z) > 0 for z > z1 . Therefore, g(z) attains its maximum at either z = 0 or z = ∞, that is g(z) ≤ max(g(0), g(∞)) = 0, √ which is equivalent to fz (z) ≤ 2π/4. This completes the proof of (2.9). The last inequality (2.10) is a consequence of (2.8) and (2.9) by rewriting (w + u)fz (w + u) − (w + v)fz (w + v) = w(fz (w + u) − fz (w + v)) + ufz (w + u) − vfz (w + v) and using the Taylor expansion. ˜ Proof of Lemma 2.3: We define h(w) = h(w) − IEh(Z), and then put 0 ˜ ˜ and fh are unchanged when h is c0 = supw |h(w)|, c1 = supw |h (w)|. Since h replaced by h − h(0), we may passume that h(0) = 0. Therefore |h(t)| ≤ c1 |t| and |IEh(Z)| ≤ c1 IE|Z| = c1 2/π. First we verify (2.11). From the definition (2.5) of fh , it follows that ( w2 /2 R w −x2 /2 ˜ e |h(x)|e dx if w ≤ 0, −∞ |fh (w)| ≤ R 2 2 ∞ −x /2 ˜ ew /2 w |h(x)|e dx if w ≥ 0 Z ∞ Z ∞ p 2 2 2 ≤ ew /2 min c0 e−x /2 ) dx, c1 (|x| + 2/π)e−x /2 dx |w|
p ≤ min( π/2, 2c1 ),
|w|
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
56
main-t
Louis H. Y. Chen and Qi-Man Shao
where in the last inequality we used the fact that Z ∞ p 2 2 ew /2 e−x /2 dx ≤ π/2. |w|
Next we prove (2.12). By (2.4), for w ≥ 0, Z ∞ 2 0 w2 /2 |h(x) − IEh(Z)|e−x /2 dx |fh (w)| ≤ |h(w) − IEh(Z)| + we w Z ∞ 2 2 w /2 e−x /2 dx ≤ 2c0 , ≤ |h(w) − IEh(Z)| + c0 we w
by (8.1). It follows from (2.5) again that f 00 (w) − wf 0 (w) − f (w) = h0 (w), or equivalently that (e−w
2
/2 0
f (w))0 = e−w
2
/2
(f (w) + h0 (w)).
Therefore w2 /2
0
Z
∞
2
(f (x) + h0 (x))e−x
f (w) = −e
/2
dx
w
and, by (2.11), 0
w2 /2
Z
∞
|f (w)| ≤ 3c1 e
2
e−x
/2
dx ≤ 3c1
p π/2 ≤ 4c1 .
w
Thus we have sup |f 0 (w)| ≤ min 2c0 , 4c1 ). w≥0
Similarly, the above bound holds for supw≤0 |f 0 (w)|. This proves (2.12). Now we prove (2.13). Differentiating (2.4) gives fh00 (w) = wfh0 (w) + fh (w) + h0 (w) = (1 + w2 )fh (w) + w(h(w) − IEh(Z)) + h0 (w).
(8.7)
From Z ∞ 2 1 h(x) − IEh(Z) = √ [h(x) − h(s)]e−s /2 ds 2π −∞ Z x Z x Z ∞Z s 2 1 1 0 −s2 /2 h (x) dte ds − √ h0 (t) dte−s /2 ds = √ 2π −∞ s 2π x x Z x Z ∞ = h0 (t)Φ(t) dt − h0 (t)(1 − Φ(t)) dt, (8.8) −∞
x
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
57
it follows that w2 /2
Z
w
2
[h(x) − IEh(Z)]e−x /2 dx −∞ Z ∞ Z w Z x 2 0 w2 /2 h0 (t)(1 − Φ(t)) dt e−x /2 dx h (t)Φ(t) dt − =e x −∞ −∞ Z w √ 2 = − 2πew /2 (1 − Φ(w)) h0 (t)Φ(t) dt −∞ Z ∞ √ w2 /2 h0 (t)[1 − Φ(t)] dt. (8.9) Φ(w) − 2πe
fh (w) = e
w
From (8.7) – (8.9) and (8.1) we now obtain |fh00 (w)| ≤ |h0 (w)| + |(1 + w2 )fh (w) + w(h(w) − IEh(Z))| Z w √ 0 2 w2 /2 ≤ |h (w)| + w − 2π(1 + w )e (1 − Φ(w)) h0 (t)Φ(t) dt −∞ Z ∞ √ 2 w2 /2 0 + − w − 2π(1 + w )e Φ(w) h (t)(1 − Φ(t)) dt w Z w √ 2 w2 /2 0 (1 − Φ(w)) Φ(t) dt ≤ |h (w)| + c1 − w + 2π(1 + w )e −∞ Z ∞ √ 2 + c1 w + 2π(1 + w2 )ew /2 Φ(w) (1 − Φ(t)) dt. (8.10) w
Hence it follows that |fh00 (w)| ≤ |h0 (w)| 2 √ e−w /2 2 w2 /2 + c1 − w + 2π(1 + w )e (1 − Φ(w)) wΦ(w) + √ 2π −w2 /2 √ 2 e + c1 w + 2π(1 + w2 )ew /2 Φ(w) − w(1 − Φ(w)) + √ 2π = |h0 (w)| + c1 ≤ 2c1 , (8.11) as desired. Acknowledgement: We are grateful to Andrew Barbour for his careful editing of the paper and, in particular, for substantially improving the exposition in the Introduction.
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
58
Louis H. Y. Chen and Qi-Man Shao
References 1. P. Baldi & Y. Rinott (1989) On normal approximation of distributions in terms of dependency graph. Ann. Probab. 17, 1646–1650. 2. P. Baldi, Y. Rinott & C. Stein (1989) A normal approximation for the number of local maxima of a random function on a graph. In: Probability, statistics, and mathematics. Papers in honor of Samuel Karlin, Eds: T. W. Anderson, K. B. Athreya, D. L. Iglehart, pp. 59–81. Academic Press, Boston. 3. A. D. Barbour & L. H. Y. Chen (1992) On the binary expansion of a random integer. Statist. Probab. Lett. 14, 235–241. 4. A. D. Barbour & P. Hall (1984) Stein’s method and the Berry–Esseen theorem. Austral. J. Statist. 26, 8–15. 5. P. van Beeck (1972) An application of Fourier methods to the problem of sharpening the Berry–Esseen inequality. Z. Wahrsch. verw. Geb. 23, 187– 196. 6. A. Bikelis (1966) Estimates of the remainder in the central limit theorem. Litovsk. Mat. Sb. 6, 323–346 (in Russian). 7. E. Bolthausen (1984) An estimate of the remainder in a combinatorial central limit theorem. Z. Wahrsch. verw. Geb. 66, 379–386. ¨ tze (1993) The rate of convergence for multivariate 8. E. Bolthausen & F. Go sampling statistics. Ann. Statist. 21, 1692–1710. 9. L. H. Y. Chen (1986) The rate of convergence in a central limit theorem for dependent random variables with arbitrary index set. IMA Preprint Series #243, Univ. Minnesota. 10. L. H. Y. Chen (1998) Stein’s method: some perspectives with applications. In: Probability towards 2000, Eds: L. Accardi & C. C. Heyde, Lecture Notes in Statistics 128, pp. 97–122. Springer-Verlag , Berlin. 11. L. H. Y. Chen (2000) Non-uniform bounds in probability approximations using Stein’s method. In: Probability and statistical models with applications: a volume in honor of Theophilos Cacoullos, Eds: N. Balakrishnan, Ch. A. Charalambides & M. V. Koutras, pp. 3–14. Chapman & Hall/CRC Press, Boca Raton, Florida. 12. L. H. Y. Chen & Q. M. Shao (2001) A non-uniform Berry–Esseen bound via Stein’s method. Probab. Theory Rel. Fields 120, 236–254. 13. L. H. Y. Chen & Q. M. Shao (2004a) Normal approximation under local dependence. Ann. Probab. 32, 1985–2028. 14. L. H. Y. Chen & Q. M. Shao (2004b) Normal approximation for non-linear statistics using a concentration inequality approach. Preprint. 15. A. Dembo & Y. Rinott (1996) Some examples of normal approximations by Stein’s method. In: Random Discrete Structures, Eds: D. Aldous & R. Pemantle, pp. 25–44. Springer, New York. 16. P. Diaconis (1977) The distribution of leading digits and uniform distribution mod 1. Ann. Probab. 5, 72–81. 17. P. Diaconis & S. Holmes (2004) Stein’s method: expository lectures and applications. IMS Lecture Notes, Vol. 46, Hayward, CA.
main-t
February 23, 2005
Master Review Vol. 9in x 6in – (for Lecture Note Series, IMS, NUS)
Normal approximation
main-t
59
18. P. Diaconis, S. Holmes & G. Reinert (2004) Use of exchangeable pairs in the analysis of simulations. In: Stein’s method: expository lectures and applications, Eds: P. Diaconis & S. Holmes, pp. 1–26. IMS Lecture Notes Vol. 46, Hayward, CA. 19. R. V. Erickson (1974) L1 bounds for asymptotic normality of m-dependent sums using Stein’s technique. Ann. Probab. 2, 522–529. 20. C. G. Esseen (1945) Fourier analysis of distribution functions: a mathematical study of the Laplace-Gaussian law. Acta Math. 77, 1–125. 21. W. Feller (1968) On the Berry–Esseen theorem. Z. Wahrsch. verw. Geb. 10, 261–268. 22. L. Goldstein (2005) Berry–Esseen bounds for combinatorial central limit theorems and pattern occurrences, using zero and size biasing. J. Appl. Probab. 42 (to appear). 23. L. Goldstein & G. Reinert (1997) Stein’s method and the zero bias transformation with application to simple random sampling. Ann. Appl. Probab. 7, 935–952. 24. L. Goldstein & Y. Rinott (1996) Multivariate normal approximations by Stein’s method and size bias couplings. Adv. Appl. Prob. 33, 1–17. ¨ tze (1991) On the rate of convergence in the multivariate CLT. Ann. 25. F. Go Probab. 19, 724–739. 26. S. T. Ho & L. H. Y. Chen (1978) An Lp bound for the remainder in a combinatorial central limit theorem. Ann. Probab. 6, 231–249. 27. S. V. Nagaev (1965) Some limit theorems for large deviations. Theory Probab. Appl. 10, 214–235. 28. L. Paditz (1989) On the analytical structure of the constant in the nonuniform version of the Esseen inequality. Statistics 20, 453–464. 29. V. V. Petrov (1995) Limit theorems of probability theory: sequences of independent random variables. Oxford Studies in Probability 4, Clarendon Press, Oxford. 30. Y. Rinott (1994) On normal approximation rates for certain sums of dependent random variables. J. Comput. Appl. Math. 55, 135–143. 31. Y. Rinott & V. Rotar (1996) A multivariate CLT for local dependence with n−1/2 log n rate and applications to multivate graph related statistics. J. Multiv. Anal. 56, 333–350. 32. Y. Rinott & V. Rotar (2000) Normal approximations by Stein’s method. Decis. Econ. Finance 23, 15–29. 33. H. P. Rosenthal (1970) On the subspaces of Lp (p > 2) spanned by sequences of independent random variables. Israel J. Math. 8, 273–303. 34. C. Stein (1972) A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Proc. Sixth Berkeley Symp. Math. Statist. Prob. 2, 583–602. Univ. California Press, Berkeley, CA. 35. C. Stein (1986) Approximation computation of expectations. IMS Lecture Notes Vol. 7, Hayward, CA. 36. D. W. Stroock (1993) Probability theory: an analytic view. Cambridge Univ. Press, Cambridge, U.K.