ASYMPTOTIC EXPANSIONS FOR THE DISTRIBUTION FUNCTION OF THE SAMPLE MEDIAN CONSTRUCTED FROM A SAMPLE WITH RANDOM SIZE Vladimir E. Bening, Victor Yu. Korolev Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University; IPI FRC CSC RAS KEYWORDS Sample median; sample with random size; asymptotic expansion; Student distribution; Cauchy distribution; Laplace distribution. ABSTRACT Statistical regularities of the information flows in contemporary communication, computational and other information systems are characterized be the presence of the so-called “heavy tails”. The outlying observations make the traditional moment-type location estimators inaccurate. In this case the robust median-type location estimators are preferable. On the other hand, the random character of the intensity of the flow of informative events results in that the available sample size (traditionally this is the number of observations registered within a certain time interval) is random. The randomness of the sample size crucially changes the asymptotic properties of the estimators. In the paper, asymptotic expansions are obtained for the distribution function of the sample median constructed from a sample with random size. A general theorem on the asymptotic expansion is proved for this case. The cases of the Laplace, Student and Cauchy distributions are considered. Special attention is paid to the situations in which the heavy-tailed distributions (Cauchy, Laplace) are inherent in both the original sample and the asymptotic regularities of the sample median (Student, Laplace) due to the randomness of the sample size. This approach can be successfully used for big data mining and analysis of information flows in highperformance computing. INTRODUCTION Statistical regularities of the information flows in contemporary communication, computational and other information systems are characterized be the presence of the so-called “heavy tails”. The outlying observations make the traditional moment-type location estimators inaccurate. As is known, in this case the robust median-type estimators are preferable. On
Proceedings 30th European Conference on Modelling and Simulation ©ECMS Thorsten Claus, Frank Herrmann, Michael Manitz, Oliver Rose (Editors) ISBN: 978-0-9932440-2-5 / ISBN: 978-0-9932440-3-2 (CD)
Alexander I. Zeifman Vologda State University, IPI FRC CSC RAS; ISEDT RAS
the other hand, the random character of the intensity of the flow of informative events results in that the available sample size (traditionally this is the number of observations registered within a certain time interval) is random. The randomness of the sample size crucially changes the asymptotic properties of the estimators, see, e. g., [12], [3]. In the paper, asymptotic expansions (a. e.) are obtained for the distribution function (d. f.) of the sample median constructed from a sample with random size. The cases of the Laplace, Student and Cauchy distributions are considered. These results are further development of the research presented in [6], [1], [12], [15], [16], [7], [8], [4], [5]. Special attention is paid to the situations in which the heavy-tailed distributions (Cauchy, Laplace) are inherent in both the original sample and the asymptotic regularities of the sample median (Student, Laplace) due to the randomness of the sample size. We use the following notation: R and N are the sets of real and natural numbers, respectively, Φ(x) and ϕ(x) are the d. f. of the standard normal law and its density. Let X1 , X2 , . . . , Xn be independent identically distributed random variables with the common d. f. F (x − θ) and probability density p(x − θ), where θ is the unknown location parameter to be estimated from the sample X1 , X2 , . . . , Xn . By X(1) 6 X(2) 6 . . . 6 X(n) we denote the order statistics constructed from the original observations X1 , X2 , . . . , Xn . Let Mn be the sample median (see, e. g., [10], [18], [9]), that is, X(m+1) , n = 2m + 1, . (1.1) Mn = 1 (X + X ), n = 2m, m ∈ N. (m) (m+1) 2 The first-order asymptotic properties of the sample median Mn are well known (see, e. g., the book [18], Theorem 5.3.2 on p. 313, or the book [17], p. 81). Namely, if F (0) = 12 and p(0) > 0, then, as n → ∞, √ sup Pθ n(Mn − θ) < x − Φ 2xp(0) −→ 0, (1.2) x∈R
−2 −1 Eθ (Mn − θ)2 = 2p(0) n + o(n−1 ).
(1.3)
The second-order asymptotic properties of the sample median were considered in [9]. Recall the main results of that paper. For this purpose, first, formulate the regularity conditions imposed in [9] on the density p(x). Condition 1.1. The density p(x) is symmetric around zero, i. e., p(−x) = p(x), x ∈ R, and p(0) > 0. Condition 1.2. The density p(x) has three continuous bounded derivatives in some neighborhood of zero of the form (0, δ), δ > 0. Condition 1.3. There exist constants C > 0 and α > 0 such that the d. f. F (x) satisfies the inequality 1 − F (x) 6 Cx−α ,
x > 0.
For example, note that these regularity conditions are satisfied by the Cauchy distribution with the density p(x) = [π(1 + x2 )]−1 ,
x ∈ R,
(1.4)
and the Laplace distribution with the density p(x) = 12 e−|x| ,
x ∈ R.
(1.5)
For the Laplace distribution the sample median Mn coincides with the ma[imum likelihood estimator of the parameter θ (see, e. g., [9]). In what follows we will use the notation k = n/2 , p0 = p(0) > 0, p1 = p0 (0+), p2 = p00 (0+), where [ · ] denotes the integer part of a number. Theorem 1.1 [9]. 1. Let the density p(x) satisfy the regularity conditions 1.1 and 1.2. Then √ √ p1 x|x| 2 √ + Pθ 2p0 2k(Mn − θ) < x = Φ(x) + ϕ(x) 8p0 k x2 p 2 x4 p21 x 3 + x2 + +ϕ(x) − + O(n−3/2 ) 3 8k 6p0 8p40 uniformly in x ∈ R 2. If the regularity conditions 1.1–1.3 hold, then 1 p1 Eθ (Mn − θ)2 = 2 − 4 √ 3/2 − 8p0 k 8p0 πk 1 p2 15p21 − 3 + − + O(n−5/2 ). 16p20 k 2 4p30 16p40 Corollary 1.1. 1. For the Laplace distribution (1.5) we have √ √ x|x| 2 Pθ 2k(Mn − θ) < x = Φ(x) − ϕ(x) √ + 4 k x(18 + 10x2 − 3x4 ) + O(n−3/2 ), 48k 1 1 1 Eθ (Mn − θ)2 = + √ 3/2 − + O(n−5/2 ). 2 2k 16k πk +ϕ(x)
2. For the Cauchy distribution (1.4) we have √ Pθ 2 2k(Mn − θ) < πx = Φ(x)+ +ϕ(x)
x[9 + x2 (3 − π 3 )] + O(n−3/2 ), 24k
Eθ (Mn − θ)2 =
π2 π 2 (π 2 − 6) + O(n−5/2 ). + 8k 32k 2
It is easy to see that √ 1 2 1 + (−1)n+1 2 1 √ = √ +O(n−3/2 ), +O(n−3 ), =√ + k n2 n n k 4 23/2 1 = 2 + O(n−3 ). + O(n−5/2 ), k2 n n3/2 Hence, the assertions of Theorem 1.1 and Corollary 1.1 can be rewritten as √ √ p1 x|x| 2 √ + Pθ 2p0 2k(Mn − θ) < x = Φ(x) + ϕ(x) 4p0 n 2 4 2 x p2 x p1 x 3 + x2 + − + O(n−3/2 ), +ϕ(x) 3 4n 6p0 8p40 √ 1 p1 2 2 Eθ (Mn − θ) = 2 − 4 √ 3/2 + 4p0 n 4p0 πn n+1 (−1) −5 1 15p21 p2 + 2 2 + O(n−5/2 ). − 3+ 4p0 n 2 4p0 16p40 1
k 3/2
=
For the Laplace distribution (1.5) we obtain the asymptotic relations √ Pθ 2k(Mn − θ) < x = Φ(x)− x|x| x(18 + 10x2 − 3x4 ) + O(n−3/2 ), −ϕ(x) √ + ϕ(x) 24n 2 n √ i 1 1 2 2 1 h1 2 Eθ (Mn −θ) = + √ 3/2 + 2 +(−1)n+1 +O 5/2 , n 2n 2 πn n whereas for the Cauchy distribution (1.4) we have √ Pθ 2 2k(Mn − θ) < πx = Φ(x)+ x[9 + x2 (3 − π 3 )] + O(n−3/2 ), 12n π 2 π 2 + (−1)n+1 − 5 π2 2 Eθ (Mn − θ) = + + O(n−5/2 ). 4n 8n2 ASYMPTOTIC EXPANSIONS FOR THE DISTRIBUTION FUNCTION OF THE SAMPLE MEDIAN CONSTRUCTED FROM A SAMPLE WITH RANDOM SIZE +ϕ(x)
In classical problems of mathematical statistics, the size of the available sample, i. e., the number of available observations, is traditionally assumed to be deterministic. In the asymptotic settings it plays the role of infinitely increasing known parameter. At the same time, in practice very often the data to be analyzed is collected or registered during a certain period of time and the flow of informative events each of which brings a next observation forms a random point process. Therefore, the number of available observations is unknown till the end of the process of their registration and also must be treated as a (random) observation. For example, this is so in high-frequency financial statistics where the number of events in a limit order book during a time unit essentially depends on
the intensity of order flows [15]. In these cases the number of available observations as well as the observations themselves are unknown beforehand and should be treated as random to avoid underestimation of risks or error probabilities. Therefore it is quite reasonable to study the asymptotic behavior of general statistics constructed from samples with random sizes for the purpose of construction of suitable and reasonable asymptotic approximations. As this is so, an appropriate non-random centering and normalization of the statistics under consideration must be used to obtain reasonable approximation to the distribution of the basic statistics. Otherwise the approximate distribution becomes random itself and, for example, the problem of evaluation of quantiles or significance levels becomes senseless. In asymptotic settings, statistics constructed from samples with random sizes are special cases of random sequences with random indices. The randomness of indices usually leads to that the limit distributions for the corresponding random sequences are heavy-tailed even in the situations where the distributions of non-randomly indexed random sequences are asymptotically normal see, e. g., [2], [3], [13]. For example, if a statistic which is asymptotically normal in the traditional sense, is constructed on the basis of a sample with random size having negative binomial distribution, then instead of the expected normal law, the Student distribution with power-type decreasing heavy tails appears as an asymptotic law for this statistic. As regards sample median constructed from a sample with random size, in [12] it was shown that, if the sample size has the geometric distribution, then, instead of the normal law expected in the classical situation (see Theorem 1.1), the actual asymptotic distribution of the sample median is the Student law with two degrees of freedom defined by the d. f. x 1 1 + √ , x ∈ R. S2 (x) = 2 2 + x2 This distribution has so heavy tails that the moments of orders δ > 2 do not exist. Consider a problem setting that is traditional for mathematical statistics. Let random variables N1 , N2 , . . . , X1 , X2 , . . . , be defined on one and the same probability space (Ω, A, P). Assume that for each n > 1 the random variable Nn takes only natural values and is independent of the sequence X1 , X2 , . . . of independent identically distributed random variables. Let Tn = Tn (X1 , . . . , Xn ) be a statistic, that is, a measurable function of X1 , . . . , Xn . For every n > 1 define the random variable TNn as TNn (ω) = TNn (ω) X1 (ω), . . . , XNn (ω) (ω) for each ω ∈ Ω. The random variable TNn so defined is referred to as a statistic constructed from the sample with random size Nn . As this is so, n ∈ N is the “infinitely large” parameter required to make the asymptotic settings reasonable. For better understanding, n may be treated as the “mean” or “expected” or “most probable” value of Nn .
Now we formulate the condition that determines the a. e. for the statistic Tn under the non-random sample size n. Condition 2.1. There exist constants l ∈ N, µ ∈ R, σ > 0, α > l/2, γ > 0, C1 > 0, a differentiable d. f. F (x) and bounded differentiable functions fj (x), j = 1, ..., l such that l X fj (x) C1 ≤ , n ∈ N. sup P σnγ (Tn −µ) < x −F (x)− nj/2 nα x j=1 The following condition determines the a. e. for the d. f. of the normalized random sample size Nn . Condition 2.2. There exist constants m ∈ N, β > m/2, C2 > 0, a function 0 < g(n) ↑ ∞ (n → ∞), a d. f. H(x), H(0+) = 0, and functions hi (x), i = 1, ..., m with bounded variation such that m X hi (x) C2 Nn ≤ , n ∈ N. < x − H(x) − sup P g(n) ni/2 nβ x>0 i=1
In [7] the following statement was proved. Theorem 2.1. Let the statistic Tn = Tn (X1 , ..., Xn ) and the random sample size Nn satisfy Conditions 2.1 and 2.2, respectively. Then there exists a constant C3 > 0 such that sup P σg γ (n)(TNn − µ) < x − Gn (x) ≤ x
≤ C1 ENn−α +
C3 + C2 Dn , nβ
where l ∂ X fj (xy γ ) γ Dn = sup F (xy ) + dy (yg(n))j/2 x 1/g(n) ∂y j=1
Z
∞
and the a. e. Gn (x) is defined by the formula Gn (x) = Z ∞ Z ∞ m X 1 = F (xy γ )dH(y) + F (xy γ )dhi (y)+ i/2 n 1/g(n) 1/g(n) i=1 Z ∞ l X 1 fj (xy γ ) + (n) dH(y)+ g j/2 y j/2 1/g(n) j=1 +
m l X X j=1 i=1
1 ni/2 g j/2 (n)
Z∞
fj (xy γ ) dhi (y). y j/2
1/g(n)
With the account of Theorem 1.1 it is easy to see that the sample median Mn satisfies Condition 2.1 with √ √ γ = 21 , α = 32 , l = 2, µ = θ, σ = 2p0 2, g(n) = k, (2.1) √ p1 x|x| 2 F (x) = Φ(x), f1 (x) = ϕ(x) , 4p0 x x2 p2 x4 p21 f2 (x) = ϕ(x) 3 + x2 + − . (2.2) 4 6p30 8p40 In the same way as Lemma 5.1 was proved in [7], it can be shown that there exists a constant D > 0 such that Dn 6 D,
n ∈ N.
(2.3)
In [8] a similar theorem was obtained for a non-normalized statistic under the following regularity condition: Condition 2.3. There exist constants l ∈ N, α > l/2, C1 > 0, a differentiable d. f. G(x) and bounded differentiable functions gi (x), i = 1, . . . , l such that l X gi (x) C1 sup P(Tn < x) − G(x) − ≤ α, n ni/2 x i=1
n ∈ N.
Theorem 2.2. Let Conditions 2 ([10]) and 2.3 hold. Then l X 2C2 sup P(TNn < x)−Gn (x) ≤ C1 ENn−α + β sup |gi (x)|, n x x i=1
Cr n−1 ,
r > 1,
C n−r , r
r ∈ (0, 1),
m X hj (y − u(n))
nj/2
j=1
= G(x) +
l X
Furthermore, ENn = r(n − 1) + 1.
gi (x)
+
m X
(3.4)
Thus, from (3.2) – (3.4) it follows that the random sample size Nn satisfies Condition 2.2 with g(n) = r(n − 1) + 1, H(x) = Hr (x), m = 1, h1 (x) ≡ 0, C2 = Cr > 0, ( 1, r > 1, β= r, r ∈ (0, 1).
=
(3.5) (3.6)
Using the equality
Z
i=1
(3.2)
where Hr (x) is the gamma-distribution function with parameter r > 0, Z x rr e−ry y r−1 dy, x > 0. (3.3) Hr (x) = Γ(r) 0
where the function Gn (x) has the form Gn (x) = Z ∞ l X 1 gi (x) = G(x) + d H(y − u(n))+ i/2 i/2 1/v(n) y i=1 v(n) +
n ∈ N,
Cr > 0,
∞
z −i/2 d H(z/v(n) − u(n))+
1
∞ X γ(γ − 1) · ... · (γ − k + 1) k (1+x)γ = x , |x| < 1, γ ∈ R, k! k=0
n−j/2 hj (z/v(n) − u(n)) .
j=1
THE STUDENT ASYMPTOTIC DISTRIBUTION In [3] it was shown that if the random sample size Nn has the negative binomial distribution with parameters p = 1/n and r > 0, that is, (k + r − 2) · . . . · r 1 1 k−1 , k∈N P(Nn = k) = 1 − (k − 1)! nr n (3.1) (with r = 1 this is the geometric distribution), then for an asymptotically normal statistic Tn we have √ √ lim sup P σ n(TNn − µ) < x − S2r (x r) = 0
it is easy to verify that for r > 0, r 6= 1, n ∈ N 1 1 − nr−1 ENn−1 = =O r . r−1 (n − 1)(1 − r)n n For r = 1 we have ENn−1 =
1 log n, n−1
n > 1.
(3.8)
So, with the account of Theorem 2.1 we have Z ∞ √ Φ(x y)dHr (y) = 1/(r(n−1)+1) ∞
Z = 0
n→∞ x∈R
(see [3], Corollary 2.1), where Sf (x) is the d. f. of the Student law with the parameter f = 2r corresponding to the density Γ(f + 1/2) x2 −(γ+1)/2 pf (x) = √ 1+ , x ∈ R, f πf Γ(f /2) Γ( · ) is Euler’s gamma-function, and f > 0 is the shape parameter (if f is natural-valued, then it is referred as “the number of degrees of freedom”). In general, this parameter can be arbitrarily small corresponding to the case of heavy tails. If f = 2, that is r = 1, then the d. f. S2 (x) can be expressed explicitly (see the preceding section). With r = 1/2 we have the Cauchy distribution. In the book [4] (see formula (6.112) there on p. 233) the following convergence rate estimate was obtained: N n sup P < x − Hr (x) ≤ EN n x>0
(3.7)
1 1 √ Φ(x y)dHr (y) + O = S2r (x) + O , (3.9) n n Z ∞ √ √ x|x| ϕ(x y) ydHr (y) = 1/(r(n−1)+1) ∞
1 √ √ ϕ(x y) ydHr (y) + O ≡ n 0 1 x|x|rr Γ(r + 1/2) , +O ≡√ n 2π(r + x2 /2)r+1/2 Γ(r) Z
= x|x|
(3.10)
Hence, we obtain the following statement. Theorem 3.1. Let Conditions 1.1 and 1.2 hold and for some r > 21 the random variable Nn has the negative binomial distribution (3.1). Then, as n → ∞, we have √ sup Pθ 2p0 2m(MNn − θ) < x − S2r (x)− x
−
p1 Γ(r + 1/2)x|x|rr √ p = 2p0 (r + x2 /2)r+1/2 Γ(r) 2π r(n − 1) + 1
=
O(n−1 log n),
r = 1,
O(n−1 ),
r > 12 , r 6= 1,
where the function S2r (x) was defined in (3.9) and m = [(r(n − 1) + 1)/2]. Corollary 3.1. Let Conditions 1.1 and 1.2 hold and for some r > 12 the random variable Nn has the negative binomial distribution (3.1). 1. In the case of Laplace distribution (1.5) for the d. f. of the sample median Mn we have the a. e. of the form √ sup Pθ 2m(MNn − θ) < x − S2r (x)−
Corollary 3.2. Let Conditions 1.1 and 1.2 hold and for some r > 12 the random variable Nn has the negative binomial distribution (3.1). 1. In the case of Laplace distribution (1.5) we have the following a. e. for the d. f. of the normalized sample median ˜n : M √ ˜ N − 2mθ < x) − Φ(x)− sup Pθ (M n x
√ ϕ(x)x|x| r Γ(r − 1/2) − p = Γ(r) 2 rn(n − 1) + n O(n−1 log n), r = 1, = O(n−1 ), r > 12 , r 6= 1,
x
−
Γ(r + 1/2)x|x|rr p = √ 2Γ(r) π(r + x2 /2)r+1/2 r(n − 1) + 1 O(n−1 log n), r = 1, = O(n−1 ), r > 12 , r 6= 1,
where the function S2r (x) was defined in (3.9). 2. In the case of Cauchy distribution (1.4) we have the a. e. √ sup Pθ 2 2m(MNn − θ) < πx − S2r (x) = x
=
O(n−1 log n),
r = 1,
O(n−1 ),
r > 12 , r 6= 1,
Now define the normalized sample median as √ n − 1X(m+1) , n = 2m + 1, ˜n = M √ 1 n(X (m) + X(m+1) ), n = 2m, m ∈ N. 2 (3.11) With the account of the formula √ Z ∞ 2p1 x|x| ϕ(x) 1 √ p √ dHr (y) ≡ 4p n (r(n − 1) + 1) y 0 1/(r(n−1)+1) √ 1 ϕ(x)p1 x|x| 2r Γ(r − 1/2) p ≡ +O , (3.12) Γ(r) n 4p0 rn(n − 1) + n and Theorem 2.2 we obtain the following statement. Theorem 3.2. Let Conditions 1.1 and 1.2 hold and for some r > 12 the random variable Nn has the negative binomial distribution (3.1). Then, as n → ∞, for the d. f. of the sample median we have the a. e. √ sup Pθ 2p0 (MNn − 2mθ) < x − Φ(x)− x
√ ϕ(x)p1 x|x| 2r Γ(r − 1/2) p − = Γ(r) 4p0 rn(n − 1) + n O(n−1 log n), r = 1, = O(n−1 ), r > 12 , r 6= 1,
2. In the case of Cauchy distribution (1.4) we have the following a. e. for the d. f. of the normalized sample median ˜n : M √ ˜ N − 2mθ) < πx − Φ(x) = sup Pθ (M n x
=
O(n−1 log n),
r = 1,
O(n−1 ),
r > 21 , r 6= 1,
THE LAPLACE ASYMPTOTIC DISTRIBUTION Let θ > 0. Consider the Laplace distribution with the d. f. Λθ (x) and density n √2|x| o 1 λθ (x) = √ exp − , x ∈ R. θ θ 2 In [5] the following example was presented of a sequence of random variables Nn (s) depending on the parameter s ∈ N. Let Y1 , Y2 , ... be independent identically distributed random variables with the continuous d. f. For s ∈ N define the random variable N (s) = min{i ≥ 1 : max Yj < 1≤j≤s
max
s+1≤k≤s+i
Yk }.
It is well known that the random variables so defined have the so-called discrete Pareto distribution s P N (s) ≥ k) = , k≥1 (4.1) s+k−1 (see, e. g., [21] or [20]). Now let N (1) (s), N (2) (s), ... be independent identically distributed random variables with distribution (4.1). Define the random variable Nn (s) = max N (j) (s). 1≤j≤n
Then, as it was shown in [5], lim sup P Nn (s) < nx − e−s/x = 0, n→∞ x>0
(4.2)
and for an asymptotically normal statistic Tn we have the relation √ lim sup P σ n(TNn (s) − µ) < x − Λ1/s (x) = 0, n→∞ x∈R
where Λ1/s (x) is the Laplace d. f. with θ = 1/s.
In [19] the following estimate of the rate of convergence in (4.2) was obtained: there exists a constant Cs ∈ (0, ∞) such that sup P Nn (s) < nx − e−s/x ≤ Cs n−1 , Cs > 0, n ∈ N.
Then, as n → ∞, for the d. f. of the sample median MNn (s) we have the a. e. 1 √ p1 ls (x) √ =O Pθ 2p0 2k(MNn (s) −θ) < x −Λ1/s (x)− n 2p0 n
x>0
uniformly in x ∈ R, where the functions Λ1/s (x) and ls (x) were defined in (4.5) and (4.6), respectively.
(4.3) So, from (4.3) it follows that the random variable Nn (s) satisfies Condition 2.2 with g(n) = n, h1 (x) ≡ 0,
H(x) = e−s/x ,
m = 1,
(4.4)
C2 = Cs > 0,
β = 1.
(4.5)
Consider ENn−1 (s) in more detail. From the definition of Nn (s) and (4.1) we obtain k n k − 1 n − = P Nn (s) = k = s+k s+k−1 Z k xn−1 dx. = sn n+1 k−1 (s + x)
Corollary 4.1. Let Conditions 1.1 and 1.2 hold. Assume that for some s ∈ N the random variable Nn (s) has the distribution (4.7). 1. As n → ∞, for Laplace distribution (1.5) we have 1 √ ls (x) , sup Pθ 2k(MNn (s) −θ) < x −Λ1/s (x)− √ = O n x 2n where the functions Λ1/s (x) and ls (x) were defined in (4.5) and (4.6), respectively. 2. As n → ∞, for Cauchy distribution (1.4) we have √ sup Pθ 2 2k(MNn (s) − θ) < πx − Λ1/s (x) = O(n−1 ). x
Therefore ENn−1 (s)
˜ n (see (3.11)), using For the normalized sample median M the formula Z ∞ 1 x|x| , √ dy e−s/y ≡ ls (x) + O y n n−1
∞ X P Nn (s) = k = = k k=1
Z ∞ X xn−1 1 k dx ≤ = sn k k−1 (s + x)n+1
we obtain the following theorem.
k=1
≤ sn
∞ Z X k=1
k
k−1
n−2
x dx = sn (s + x)n+1
Z 0
∞
n−2
x dx. (s + x)n+1
To calculate the last integral use the relation Z ∞ xs−1 Γ(s)Γ(n) dx = n s , a, b, s, n > 0, s+n (a + bx) a b Γ(s + n) 0 see [11], formula 856.12. We obtain 1 Γ(n − 1)Γ(2) = = O(n−1 ). ENn−1 (s) ≤ sn 2 s Γ(n + 1) s(n − 1) So, with the account of Theorem 1.1 and the formulas Z ∞ Z ∞ 1 √ √ Φ(x y)dy e−s/y = Φ(x y)dy e−s/y + O = n n−1 0 1 = Λ1/s (x) + O , (4.5) n Z ∞ √ √ x|x| ϕ(x y) ydy e−s/y = n−1 ∞
1 1 √ √ = x|x| ϕ(x y) ydy e−s/y + O ≡ ls (x) + O , n n 0 (4.6) we directly obtain the following theorem. Z
Theorem 4.1. Let Conditions 1.1 and 1.2 hold. Assume that for some s ∈ N the random variable Nn (s) has the distribution k n k − 1 n P Nn (s) = k = − , k ∈ N. s+k s+k−1 (4.7)
Theorem 4.2. Let Conditions 1.1 and 1.2 hold. Assume that for some s ∈ N the random variable Nn (s) has the distribution (4.7). Then, as n → ∞, for the d. f. of the ˜ N (s) we have the a. e. normalized sample median M n √ 1 √ 2p1 ls (x) ˜ √ Pθ 2p0 (MNn (s) − 2mθ) < x −Φ(x)− =O n 4p0 n uniformly in x ∈ R, where the function ls (x) was defined in (4.6). Corollary 4.2. Let Conditions 1.1 and 1.2 hold. Assume that for some s ∈ N the random variable Nn (s) has the distribution (4.7). 1. For the Laplace distribution (1.5) we have 1 √ s (x) ˜ N (s) − 2mθ) < x) − Φ(x) − l√ sup Pθ (M . =O n n x 2n 2. For the Cauchy distribution (1.4) we have √ ˜ N (s) − 2mθ) < πx − Φ(x) = O(n−1 ). sup Pθ 2(M n x
CONCLUSION In the paper a general transfer theorem was presented for the asymptotic expansions of the distribution of the sample median constructed from a sample with random size. This theorem gives an algorithm for the construction of these asymptotic expansions from the given asymptotic expansion for the distribution of the sample median in a sample with a non-random size and the given asymptotic expansion for
the distribution of the random sample size. The bounds for the corresponding residuals were also presented in terms of O- and o-symbols. As examples of the application of the general theorem, two special cases were considered where the asymptotic distributions of the sample median in a sample with random size are normal scale mixtures such as the Laplace and Student laws. Moreover, the examples related to samples from the Cauchy, Laplace and Student laws were considered as well. This approach can be successfully used for big data mining and analysis of information flows in high-performance computing. ACKNOWLEDGEMENTS Research supported by the Russian Foundation for Basic Research (project 15-07-02652). REFERENCES
[1] V. E. Bening. Asymptotic Theory of Testing Statistical Hypotheses: Efficient Statistics, Optimality, Power Loss, and Deficiency. – Utrecht: VSP, 2000. [2] V. E. Bening, V. Yu. Korolev. Generalized Poisson Models and their Applications in Insurance and Finance. – Utrecht: VSP, 2002. [3] V. E. Bening, V. Yu. Korolev. On an application of the Student distribution in the theory of probability and mathematical statistics // Theory of Probability and its Applications, 2005. Vol. 49. No. 3. P. 377–391. [4] V. E. Bening, V. Yu. Korolev, I. A. Sokolov, S. Ya. Shorgin. Randomized Models and Methods of the Theory of Reliability of Information and Technical Systems. – Moscow: Torus Press, 2007. [5] V. E. Bening, V. Yu. Korolev. Some statistical problems related to the Laplace distribution // Informatics and Its Applications, 2008. Vol. 2. No. 2. P. 19–34. [6] V. E. Bening On the deficiency of some estimators based on samples with random sizes // Bulletin of Tver State University, Series “Applied Mathematics”, 2015. No. 1. P. 5–14. [7] V. E. Bening, N. K. Galieva, V. Yu. Korolev. Asymptotic expansions for the distribution functions of statistics constructed from samples with random sizes // Informatics and Its Applications, 2013. Vol. 7. No. 2. P. 75–91. [8] V. E. Bening, V. A. Savushkin. On approximations to the distributions of statistics constructed from samples with random sizes // Bulletin of Tver State University, Series “Applied Mathematics”, 2014. No. 1. P. 91–112. [9] M. V. Burnashev. Asymptotic expansions for the median estimator of the parameter // Theory Probab. Appl., 1996. Vol. 41. No. 4. P. 738–753. [10] H. Cramer. Mathematical Methods of Statistics. – Princeton: Princeton University Press, 1946. [11] H. B. Dwight. Tables of Integrals and Other Mathematical Data, 4th ed. – New York: Macmillan, 1961. [12] B. V. Gnedenko. On estimation of unknown parameters from a random number of independent observations // Transactions of Razmadze Tbilisi Mathematical Institute, 1989. Vol. 92. P. 146–150 (in Russian). [13] B. V. Gnedenko, V. Yu. Korolev. Random Summation: Limit Theorems and Applications. – Boca Raton: CRC Press, 1996. [14] J. L. Hodges, E. L. Lehmann. Deficiency // Ann. Math. Statist., 1970. Vol. 41. No. 5. P. 783–801. [15] Korolev V. Yu., Chertok A. V., Korchagin A. Yu., Zeifman A. I. Modeling high-frequency order flow imbalance by functional limit theorems for twosided risk processes // Applied Mathematics and Computation, 2015. Vol. 253. P. 224–241. [16] Korolev V. Yu., Korchagin A. Yu., Zeifman A.I. Convergence of nonhomogeneous random walks generated by compound Cox processes to generalized variance-gamma Levy processes // Doklady Mathematics, 2015. Vol. 92. No. 1. P. 408–411. [17] E. L. Lehmann Elements of Large-Sample Theory. – Berlin–New York: Springer, 1999. [18] E. L. Lehmann, G. Casella. Theory of Point Estimation. 2nd Edition. – Berlin–New York: Springer, 2003.
[19] O. O. Lyamin. On the rate of convergence of the distributions of some statistics to the Laplace and Student distributions // Bulletin of Moscow State University, Series 15 Computational Mathematics and Cybernetics, 2011. No. 1. P. 39–47. [20] V. B. Nevzorov. Records. Mathematical Theory. – Moscow: Fazis, 2000. [21] S. S. Wilks Recurrence of extreme observations // Journal of American Mathematical Society, 1959. Vol. 1, No. 1, P. 106–112.
AUTHOR BIOGRAPHIES VLADIMIR BENING is Doctor of Science in physics and mathematics; professor, Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University; senior scientist, Institute of Informatics Problems, Federal Research Center ”Computer Science and Control” of Russian Academy of Sciences. His email is
[email protected]. VICTOR KOROLEV is Doctor of Science in physics and mathematics, professor, Head of Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University; leading scientist, Institute of Informatics Problems, Federal Research Center ”Computer Science and Control” of the Russian Academy of Sciences. His email is
[email protected]. ALEXANDER ZEIFMAN is Doctor of Science in physics and mathematics; professor, Heard of Department of Applied Mathematics, Vologda State University; senior scientist, Institute of Informatics Problems, Federal Research Center ”Computer Science and Control” of the Russian Academy of Sciences; principal scientist, Institute of Socio-Economic Development of Territories, Russian Academy of Sciences. His email is
[email protected].