arXiv:1404.5671v1 [stat.ME] 22 Apr 2014
Inference from Small and Big Data Sets with Error Rates Mikl´os Cs¨org˝o∗and Masoud M. Nasari† School of Mathematics and Statistics of Carleton University Ottawa, ON, Canada
Abstract In this paper we introduce randomized t-type statistics that will be referred to as randomized pivots. We show that these randomized pivots yield central limit theorems with a significantly smaller magnitude of error as compared to that of their classical counterparts under the same conditions. This constitutes a desirable result when a relatively small number of data is available. When a data set is too big to be processed, we use our randomized pivots to make inference about the mean based on significantly smaller sub-samples. The approach taken is shown to relate naturally to estimating distributions of both small and big data sets.
1
Introduction
In this paper we address the problem of making inference about the population mean when the available sample is either small or big. In case of having a small sample we develop a randomization technique that yields central limit theorems (CLT’s) with a significantly smaller magnitude of error that would compensate for the lack of sufficient information as a result of having a small sample. Our technique works even when the sample is so small that the classical CLT cannot be used to make a valid inference. In the case of ∗ †
[email protected] [email protected]
1
having a big sample, we also develop a technique to make inference about the mean based on a smaller sub-sample that can be drawn without dealing with the entire original data set that may not be even processable. Unless stated otherwise, X, X1 , . . . throughout are assumed to be independent random variables with a common distribution function F (i.i.d. random variables), mean µ := EX X and variance 0 < σ 2 := EX (X − µ)2 < +∞. Based on X1 , . . . , Xn , a random sample on X, for each integer n ≥ 1, define ¯ n := X
n X i=1
n X 2 ¯ n )2 n, Xi n and Sn := (Xi − X i=1
the sample mean and sample variance, respectively, and consider the classical Student t−statistic Pn ¯n X Xi √ = i=1 √ Tn (X) := (1.1) Sn / n Sn n that, in turn, on replacing Xi by Xi − µ, 1 ≤ i ≤ n, yields Pn ¯n − µ (Xi − µ) X √ = i=1 √ , Tn (X − µ) := Sn / n Sn n
(1.2)
the classical Student t-pivot for the population mean µ. (1) (1) Define now Tmn ,n and Gmn ,n , randomized versions of Tn (X) and Tn (X −µ) respectively, as follows: Tm(1)n ,n
where,
Pn wi(n) 1 ¯ mn ,n − X ¯n − Xi X i=1 mn n := q = q , (n) (n) Pn wi Pn wi 1 2 1 2 Sn Sn i=1 ( mn − n ) i=1 ( mn − n ) Pn wi(n) 1 X − µ − i i=1 mn n q G(1) , mn ,n := Pn wi(n) 1 2 Sn i=1 ( mn − n ) ¯ mn ,n := X
n X
(n)
wi Xi /mn ,
(1.3)
(1.4)
(1.5)
i=1
(n)
(n)
is the randomized sample mean and the weights (w1 , . . . , wn ) have a multiP (n) nomial distribution of size mn := ni=1 wi with respective probabilities 1/n, i.e., 1 1 (n) (w1 , . . . , wn(n) ) =d multinomial(mn ; , . . . , ). n n 2
(1)
(1)
The just introduced respective randomized Tmn ,n and Gmn ,n versions of Tn (X) and Tn (X − µ) can be computed via re-sampling from the set of indices {1, . . . , n} of X1 , . . . , Xn with replacement mn times so that, for each (n) 1 ≤ i ≤ n, wi is the count of the number of times the index i of Xi is chosen in this re-sampling process. (n)
Remark 1.1. In view of the preceding definition of wi , 1 ≤ i ≤ n, they form row-wise independent triangular array of random variables such that Pn a (n) = mn and, for each n ≥ 1, i=1 wi (n)
(w1 , . . . , wn(n) )
d =
1 1 multinomial(mn ; , . . . , ), n n
i.e., the weights have a multinomial distribution of size mn with respective (n) probabilities 1/n. Clearly, for each n, wi are independent from the random (n) sample Xi , 1 ≤ i ≤ n. Weights denoted by wi will stand for triangular multinomial random variables in this context throughout. (1)
(1)
Thus, Tmn ,n and Gmn ,n can simply be computed by generating, independently from the data, a realization of the random multinomial weights (n) (n) (w1 , . . . , wn ) as in Remark 1.1. (2) Define the similarly computable further randomized versions Tmn ,n and (2) Gmn ,n of Tn (X) and Tn (X − µ) respectively, as follows: Tm(2)n ,n
G(2) mn ,n
(n) Pn wi 1 ¯ mn ,n − X ¯n X j=1 mn − n Xi q q := = (1.6) Pn wi(n) 1 2 Pn wi(n) 1 2 Smn ,n Smn ,n i=1 ( mn − n ) i=1 ( mn − n ) Pn wi(n) 1 i=1 mn − n Xi − µ q , (1.7) := Pn wi(n) 1 2 Smn ,n i=1 ( mn − n )
2 where Sm is the randomized sample variance, defined as n ,n
2 Sm n ,n
:=
n X i=1
(n)
wi
¯ mn ,n Xi − X
2 mn .
(1.8)
Unlike Tn (X) that can be transformed into Tn (X − µ), the Student pivot for µ as in (1.2) (cf. Gin´e et al. [13] for the asymptotic equivalence of the two), 3
(1)
(2)
its randomized versions Tmn ,n and Tmn ,n do not have this straightforward property, i.e., they do not yield a pivotal quantity for the population mean µ = EX X by simply replacing each Xi by Xi − µ in their definitions. We (1) (2) introduced Gmn ,n and Gmn ,n in this paper to serve as direct randomized (1) (2) pivots for the population mean µ, while Tmn ,n and Tmn ,n will now be viewed ¯ n in case of a big on their own as randomized pivots for the sample mean X data set. Our Theorem 2.1 and its corollaries will explain the higher order accuracy these randomized pivots provide for inference about the mean µ, as compared to that provided by Tn (X − µ). Among the many outstanding contributions in the literature studying the asymptotic behavior of Tn (X) and Tn (X − µ), our main tool in this paper, Theorem 2.1 below, relates mostly to Bentkus et al. [3], Bentkus and G¨otze [4], Pinelis [15] and Shao [17]. A short outline of the contributions of this paper reads as follows. (i) (i) In Section 2 we derive the rates of convergence for Gmn ,n and Tmn ,n , i = 1, 2, via establishing Berry-Ess´een type results in Theorem 2.1 and its (i) Corollaries 2.1-2.3. In Corollary 2.3 we show that, on taking mn = n, Gmn ,n (i) and Tmn ,n , i = 1, 2, converge, in distribution, to the standard normal at the rate √ of O(1/n). This rate is significantly better than the best possible O(1/ n) rate of convergence under similar moment conditions for the classical t-statistic Tn (X) and its Student √ pivot Tn (X − µ), based on a random sample of size n. The latter O(1/ n) rate is best possible in the sense that it cannot be improved without restricting the class of distribution functions of the data, for example, to normal or symmetrical distributions. In section 2 we also present numerical studies that well support our conclusion that, (i) (i) on taking mn = n, Gmn ,n and Tmn ,n , i = 1, 2, converge to standard normal at a significantly faster rate than that of the classical CLT. In Sections 4 and 5, the respective rates of convergence of the CLT’s in Section 2 will be (i) put to significant use. In Section 4, Gmn ,n , i = 1, 2, are studied as natural (i) asymptotic pivots for the population mean µ = EX X. In section 5, Tmn ,n , ¯n i = 1, 2, are studied as natural asymptotic pivots for the sample mean X that closely shadows µ, when dealing with big data sets of univariate observations of n labeled units {X1 , . . . , Xn }. In this case, instead of trying to process the entire data set that may even be impossible to do, sampling it indirectly via generating random weights independently from the data as in (2) Remark 1.1 makes it possible to use Tmn ,n to construct an interval estimation 4
¯ n based on significantly smaller sub-samples which can for the sample mean X be obtained without dealing directly with the entire data set (cf. Remark ¯ n in turn will be seen to contain the 5.3). The latter confidence set for X population mean µ as well, and with same rates of convergence, in terms mn ¯ n in there. In Section 6 the sample and n, as those established for having X and population distribution functions are studied along the lines of Sections 2-5. The proofs are given in Sections 7 and Appendices 1 and 2. For throughout use, we let (ΩX , FX , PX ) denote the probability space of the random variables X, X1 , . . ., and (Ωw , Fw , Pw ) be the probability space on which the weights (1) (2) (2) (n) w1 , (w1 , w2 ), . . . , (w1 , . . . , wn(n) ), . . . are defined. In view of the independence of these two sets of random variables, jointly they live on the direct product probability space (ΩX × Ωw , FX ⊗ Fw , PX,w = PX . Pw ). For each n ≥ 1, we also let P.|w (.) stand for the (n) (n) (n) conditional probabilities given Fw := σ(w1 , . . . , wn ) with corresponding conditional expected value E.|w (.).
2
(i)
(i)
The rate of convergence of the CLT’s for Gmn and Tmn , i = 1, 2
One of the efficient tools to control the error when approximating the distribution function of a statistic with that of a standard normal random variable is provided by Berry-Ess´een type inequalities (cf., e.g., Serfling [16]), which provide upper bounds for the error of approximation for any finite number of observations in hand. It is well known that, on assuming EX |X − µ|3 < +∞, as the sample size n increases to infinity, the rate at which the Berry-Ess´een upper bound for sup−∞ (ε1 /ε)2 + PX (|Sn2 − σ 2 | > ε21 ) + ε2 > 0, where, for t ∈ R, Φ(t − ε) − Φ(t) > −ε2 and Φ(t + ε) − Φ(t) < ε2 . Then, for all n, mn we have (A)
Pw
sup −∞ 0, we have mn 1 , } , (A) Pw ≤ t)−Φ(t) > δ = O max{ sup PX|w (G(1) mn ,n n2 mn −∞ ε) ≤ PX|w |Zmn | >
ε + PX |Sn2 − σ 2 | > ε21 . ε1
One can readily see that PX|w
ε ε2 |Zmn | > ≤ ( )2 ε1 ε1 = (
ε1 2 ). ε 27
(n)
wi i=1 ( mn
Pn
σ2
− n1 )2 EX (X1 − µ)2 (n)
wi i=1 ( mn
Pn
− n1 )2
(7.3)
Combining now the preceding conclusion with (7.3), (7.2) can be replaced by ε1 −( )2 − PX |Sn2 − σ 2 | > ε21 + PX|w (Zmn ≤ t − ε) ε ≤ PX|w (G(1) mn ,n ≤ t) ε1 ≤ ( )2 + PX |Sn2 − σ 2 | > ε21 + PX|w (Zmn ≤ t + ε). (7.4) ε Now, the continuity of the normal distribution Φ allows us to choose ε3 > 0 so that Φ(t + ε) − Φ(t) < ε2 and Φ(t − ε) − Φ(t) > −ε2 . This combined with (7.4) yields ε1 −( )2 − PX |Sn2 − σ 2 | > ε21 + PX|w (Zmn ≤ t − ε) − Φ(t − ε) − ε2 ε ≤ PX|w (G(1) mn ,n ≤ t) − Φ(t) ε1 2 ≤ ( ) + PX |Sn2 − σ 2 | > ε21 + PX|w (Zmn ≤ t + ε) − Φ(t + ε) + ε2 . ε (7.5)
We now use the Berry-Ess´een inequality for independent and not necessarily identically distributed random variables (cf., e.g., Serfling [16]) to write
PX|w (Zmn
Pn wi(n) 1 3 CEX |X − µ|3 i=1 | mn − n | ≤ t + ε1 ) − Φ(t + ε1 ) ≤ . P (n) 3/2 3/2 w n σ ( i − 1 )2 i=1
and PX|w (Zmn
mn
n
Pn wi(n) 1 3 −CEX |X − µ|3 i=1 | mn − n | ≤ t − ε1 ) − Φ(t − ε1 ) ≥ . P (n) , 3/2 wi n σ 1 2 3/2 ( − ) i=1
mn
n
where C is a universal constant as in the Berry-Ess´een inequality in this context (cf. page 33 of Serfling [16]). Incorporating these approximations into (7.5) we arrive at sup −∞ δn (n) P wi n 1 2 3/2 −∞ δ ≤ Pw PX|w (G(1) ≤ t)−Φ(t) mn ,n
n nX wi(n) − Pw m n i=1 n m X n +Pw 1 − n1 i=1
3
3
1 3 δn (1 − ε) 2 (1 − n1 ) 2 o > 3 n mn2 (n) wi 1 2 − 1 > ε − mn n
=: Π1 (n) + Π2 (n).
We bound Π1 (n) above by mn 6 1 −3 −3 (n) ) mn (n + n2 )Ew (w1 − ) n n 15m3n 25m2n mn 1 2 (n + n ){ + + }. (7.7) = δn−2 (1 − ε)−3 (1 − )−3 m−3 n n n3 n2 n δn−2 (1 − ε)−3 (1 −
As for Π2 (n), recalling that Ew of Chebyshev’s inequality yields
(n)
wi i=1 ( mn
Pn
− n1 )2 =
1 ) (1− n , mn
an application
n (n) X 1 2 (1 − n1 ) 2 m2n wi − ) − Π2 (n) ≤ 2 Ew ( mn n mn ε (1 − n1 )2 i=1 n (n) n X wi m2n 1 2 2 (1 − n1 )2 o2 E ( = 2 − ) − w m n m2n ε (1 − n1 )2 n i=1 (n) (n) n w1(n) 1 4 1 2 w2 1 2 w1 m2n − + n(n − 1)E − − nE = 2 w w 1 2 mn n mn n mn n ε (1 − n ) (1 − n1 )2 o − . (7.8) m2n
29
We now use the fact that w (n) ’s are multinomially distributed to compute the preceding relation. After some algebra it turns out that it can be bounded above by n1 − 1 (1 − n1 )4 (mn − 1)(1 − n1 )2 4(n − 1) m2n 1 n + + + + 2 1 2 3 3 3 3 3 2 mn nmn n mn mn ε (1 − n ) n mn 1 2o n − 1 4(n − 1) (1 − n ) 1 + + − . (7.9) − nm2n n3 m3n n2 m3n m2n Incorporating (7.7) and (7.9) into (7.6) completes the proof of part (A) of Theorem 2.1. Proof of Corollary 2.1 The proofs of parts (A) and (B) of this corollary are immediate consequences of Theorem 2.1. To prove parts (C) and (D) of this corollary, in view of Theorem 2.1 it suffices to show that, for arbitrary ε1 , ε2 > 0, as n, mn → +∞, n Pw PX|w (|Smn ,n − Sn2 | > ε1 ) > ε2 = O( 2 ). mn
(7.10)
To prove the preceding result we first note that 2 Sm n ,n
−
Sn2
=
(n)
(n)
(n)
(Xi − Xj )2 wi wj 1 − mn (mn − 1) n(n − 1) 2
X
(Xi − Xj )2 wi wj 1 − − σ2 . mn (mn − 1) n(n − 1) 2
1≤i6=j≤n
=
(n)
X
1≤i6=j≤n
By virtue of the preceding observation, we proceed with the proof of (7.10) (n)
by first letting di,j :=
(n)
(n)
wi wj mn (mn −1)
−
1 n(n−1)
and writing
X (n) (Xi − Xj )2 − σ 2 > ε1 )ε2 Pw PX|w ( di,j 2 1≤i6=j≤n ≤ Pw EX|w
X
1≤i6=j≤n
(n)
di,j
2 (Xi − Xj )2 − σ2 > ε21 ε2 . 2
30
(7.11)
Observe now that X (n) (Xi − Xj )2 2 di,j EX|w − σ2 2 1≤i6=j≤n = EX
2 X (X1 − X2 )2 (n) (di,j )2 − σ2 2 1≤i6=j≤n
+
X
(n) (n)
di,j di,k EX (
1≤i,j,k≤n i,j,k are distinct
+
X
(n) (n) di,j dk,l EX
1≤i,j,k,l≤n i,j,k,l are distinct
(Xi − Xj )2 (Xi − Xk )2 − σ 2 )( − σ2) 2 2 2 (Xi − Xj )2 2 (Xk − Xl ) ( − σ )( − σ2 ) . 2 2
(7.12) We note that in the preceding relation, since i, j, k are distinct, we have that (Xi − Xk )2 (Xi − Xj )2 − σ 2 )( − σ2) 2 2 EX (X12 − σ 2 ) (Xi − Xj )2 (Xi − Xk )2 − σ 2 |Xi E − σ 2 |Xi = . = E E 2 2 4 Also, since i, j, k, l are distinct, we have that EX (
EX
2 2 (Xi − Xj )2 2 (Xk − Xl ) 2 2 (Xi − Xj ) ( − σ )( − σ ) = EX − σ 2 ) = 0. 2 2 2
Therefore, in view of (7.12) and (7.11), the proof of (7.10) follows if we show that X
(n)
(di,j )2 = OPw (
1≤i6=j≤n
and
(n) (n)
X
di,j di,k = OPw (
1≤i,j,k≤n i,j,k are distinct
Noting that, as n, mn → +∞, Ew
1 ) m2n n ). m2n
1 (n) (di,j )2 ∼ 2 mn 1≤i6=j≤n
X
31
(7.13)
(7.14)
and Ew
X
1≤i,j,k≤n i,j,k are distinct
n (n) (n) (n) di,j di,k ≤ n3 Ew (d1,2 )2 ∼ 2 . mn
The preceding two conclusions imply (7.13) and (7.14), respectively. Now the proof of Corollary 2.1 is complete.
Proof of Corollary 2.2 The proof of this result is relatively easy. Due to their similarity, we only give the proof for part (A) as follows. For arbitrary positive δ, we write sup −∞ 0, and write n (n) X 2 w 1 ˆ ¯ Pw PX|w Xm − Xn > ε1 > ε2 ≤ Pw EX|w ( i − )Xi > ε21 ε2 m n i=1 n (n) X 1 wi − )2 > σ −2 ε21 ε2 = Pw ( m n i=1
(n) w1 1 2 ≤ nEw − m n 1 ) (1 − −1 n → 0, as m → ∞. ≤ σ −2 ε−2 1 ε2 m (8.1) −1 σ 2 ε−2 1 ε2
32
¯ n > ε1 → 0 in ˆ mn − X The preceding conclusion means that PX|w X probability-Pw . Hence, by the dominated convergence theorem, we conclude ˆm → X ¯ n in probability PX,w . that X 2 We are now to show that the randomized sample variance Sm is an in n ,n probability consistent estimator of the ordinary sample variance Sn2 for each fixed n, when m → +∞. Employing now the u-statistic representation of 2 the sample variance enables us to rewrite Sm , as in (1.8), as follows n ,n 2 Sm n ,n
=
P
(n)
1≤i≤j≤n
(n)
wi wj (Xi − Xj )2
2m(m − 1)
.
In view of the preceding formula, we have 2 Sm − Sn2 = n ,n
X
1≤i6=j≤n
1 1 (Xi − Xj )2 . − 2m(m − 1) 2n(n − 1)
Now, for ε1 , ε2 > 0, we write 2 2 Pw PX|w Sm > ε > ε − S 1 2 n n ,n = Pw PX|w
≤ Pw ≤ Pw
X X wi(n) wj(n) 1 ( − )(Xi − Xj )2 > 2ε1 > ε2 m(m − 1) n(n − 1) 1≤i6=j≤n
X X wi(n) wj(n) 1 EX (Xi − Xj )2 > 2ε1 ε2 − mn (mn − 1) n(n − 1) 1≤i6=j≤n
X X wi(n) wj(n) 1 > ε1 ε2 σ −2 . − mn (mn − 1) n(n − 1) 1≤i6=j≤n
33
(8.2)
The preceding relation can be bounded above by: (n) (n) n 2 w1 w2 1 n(n − 1)Ew − m(m − 1) n(n − 1) w (n) w (n) w1(n) w3(n) 1 1 1 2 − − n(n − 1)(n − 2)Ew m(m − 1) n(n − 1) m(m − 1) n(n − 1) w (n) w (n) w3(n) w4(n) o 1 1 n(n − 1)(n − 2)(n − 3)Ew 1 2 − − m(m − 1) n(n − 1) m(m − 1) n(n − 1) (n) (n) n 2 1 w1 w2 −2 4 − ε−2 ε σ n(n − 1)E w 1 2 m(m − 1) n(n − 1) (n) (n) 2 1 w1 w2 − n(n − 1)(n − 2)Ew m(m − 1) n(n − 1) (n) (n) 2 o 1 w1 w2 − n(n − 1)(n − 2)(n − 3)Ew m(m − 1) n(n − 1) −2 −2 4 ε1 ε2 σ n(n − 1) n 1 n n2 o n(n − 1)(n − 2) + n(n − 1)(n − 2)(n − 3) . + + n4 m2 n4 m2 n4 m2
−2 4 ε−2 1 ε2 σ
+ + ≤ + + = +
Clearly, the latter term approaches zero when m → +∞, for each fixed n. 2 By this we have shown that Sm → Sn2 in probability-PX,w , when n is fixed n ,n and only m → +∞. ˆ mn ,n in (4.4) Consistency of X We give the proof of (4.4) for mn = n, noting that the proof below remains the same for mn ≤ n and it can be adjusted for the case mn = kn, where k is a positive integer. In order to establish (4.4) when mn = n, we first note that n (n) X wi 1 1 EX|w ( | − |) = 2(1 − )n n n n i=1
34
and, with ε1 , ε2 , ε3 > 0, we proceed as follows. ˆ mn ,n − µ > ε1 > ε2 Pw PX|w X (n) n X wj 1 1 ˆ mn ,n − µ > ε1 > ε2 , ≤ Pw PX|w X | − | − 2(1 − )n ≤ ε3 n n n =1 (n) n X wj 1 1 + Pw | − | − 2(1 − )n > ε3 n n n j=1
(n) n X wj 1 1 − |(Xi − µ) > ε1 2(1 − )n − ε3 > ε2 ≤ Pw PX|w | n n n i=1
+
ε−2 3 Ew
(n) n X wj 1 1 2 | − | − 2(1 − )n n n n j=1
n (n) 2 X 1 1 w ≤ Pw ( i − )2 > σ −2 2(1 − )n − ε3 ε2 n n n i=1 (n) (n) w1(n) w1 1 1 2 1 w2 1 + nEw ( − ) + n(n − 1)Ew − − − 4(1 − )2n n n n n n n n =: K1 (n) + K2 (n).
ε−2 3
A similar argument to that in (8.1) implies that, as n → +∞, and then ε3 → 0, we have K1 (n) → 0. As to K2 (n), we note that (n)
1 1 w1 − )2 = n−2 (1 − ) n n n (n) (n) w 1 1 w 1 1 n Ew 1 − 2 − = −n−3 + 4n−2 (1 − )n (1 − ) . n n n n n n−1 Observing now that, as n → +∞, Ew (
(n) w (n) 1 1 w 1 n(n − 1)Ew 1 − 2 − − 4(1 − )2n → 0, n n n n n we conclude that, as n → +∞, K2 (n) → 0. By this we have concluded the ˆ mn ,n for the population mean µ, when mn = n. consistency of X
9
Appendix 2
P (n) The convergence in distribution of the partial sums of the form ni=1 wi Xi (i) associated with Tmn ,n , i = 1, 2, were also studied in the context of the boot35
strap by Cs¨org˝o et al. [5] via conditioning on the weights (cf. Theorem 2.1 and Corollary 2.2 therein). We note that the latter results include only (i) randomly weighted statistics that are similar to Tmn ,n , i = 1, 2, which are ¯ n . In view of the fact that G(i) natural pivots for the sample mean X mn ,n , i = 1, 2, as defined by (1.4) and (1.7), are natural pivots for the population mean µ := EX X, in a similar fashion to Theorem 2.1 and its Corollary 2.2 of Cs¨org˝o et al. [5], here we state conditional CLT’s, given the weights (n) (n) (n) wi ’s, where (w1 , . . . , wi ) =d multinomial(mn ; 1/n, . . . , 1/n), for the par(n) P w tial sums ni=1 | mi n − n1 |(Xi − µ). The proofs of these results are essentially identical to that of Corollary 2.2 of Cs¨org˝o et al. [5] in view of the more general setup in terms of notations in the latter paper. Theorem 9.1. Let X, X1 , . . . be real valued i.i.d. random variables with mean µ and variance σ 2 , where 0 < σ 2 < +∞. (a) If mn , n → ∞, in such a way that mn = o(n2 ), then PX|w (G(1) mn ,n ≤ t) −→ Φ(t) in probability − Pw f or all t ∈ R. (b) If mn , n → ∞ in such a way that mn = o(n2 ) and n = o(mn ), then PX|w (G(2) mn ,n ≤ t) −→ Φ(t) in probability − Pw , f or all t ∈ R.
References [1] Billingsley, P. (1986). Probability and Measure. John Wiley & Sons, Second Edition. [2] Baum, L. E. and Katz, M. (1965). Convergence rates in the law of large numbers. The American Mathematical Society 120, 108-123. [3] Bentkus, V., Bloznelis, M. and G¨otze, F. (1996). A Berry-Ess´een bound for Student’s statistic in the non-i.i.d. Case. Journal of Theoretical Probability 9, 765-796. [4] Bentkus, V. and G¨otze, F. (1996). The Berry-Ess´een bound for Student’s statistic. Annals of Probability 24, 491-503. [5] Cs¨org˝o, M., Martsynyuk, Yu. V. and Nasari M. M. (2012). Another look at bootstrapping the Student t-statistic. arXiv:1209.4089v4 [math.ST]. 36
[6] Cs¨org˝o, M. and Nasari M. M. (2013). Asymptotics of randomly weighted u- and v-statistics: application to bootstrap. Journal of Multivariate Analysis, 121, 176-192. [7] Cs¨org˝o, S. and Rosalsky, A. (2003). A survey of limit laws for bootstrapped sums. International Journal of Mathematics and Mathematical Sciences 45, 2835-2861. [8] Das Gupta, A. (2008). Asymptotic Theory of Statistics and Probability. Springer. [9] Dvoretzky, A. Kiefer, J. and Wolfowitz, J. (1956). Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Annals of Mathematical Statistics 27, 642-669. [10] Hsu, P. L. and Robbins, H. (1947). Complete convergence and the law of large numbers. Proceeding of the National Academy of Science 33, 25-31. [11] Erd˝os, P. (1949). On a theorem of Hsu and Robbins. Annals of Mathematical Statistics 20, 165-331. [12] Erd˝os, P. (1950). Remark on my paper “On a theorem of Hsu and Robbins”. Annals of Mathematical Statistics 21, 138. [13] Gin´e, E., G¨otze, F. and Mason, D. M. (1997). When is the Student t-statistic asymptotically standard normal? Annals of Probability 25, 1514-1531. [14] Massart P. (1990). The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Annals of Probability 18, 1269-1283. [15] Pinelis, I. (2012). On the BerryEss´een bound for the Student statistic. arXiv:1101.3286 [math.ST] [16] Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York. [17] Shao, Q. M. (2005). An explicit Berry-Ess´een bound for Student’s tstatistic via Stein’s method. In Stein’s method and applications, Lect. Notes Ser. Inst. Math. Sci. Natl. Univ. Singap. 5, 143-155. Singapore University Press, Singapore. 37