1
Asymptotic Intermediate Efficiency of the Chi-square and Likelihood Ratio Goodness of Fit Tests Sherzod M. Mirakhmedov Institute of Mathematics. National University of Republic Uzbekistan 100125, Tashkent, Durmon yuli st., 29 e-mail:
[email protected]
Abstract. Asymptotic relative intermediate efficiency of the chi-square and log-likelihood ratio goodness of fit tests is studied. Asymptotical comparison of these tests through asymptotical behavior of the probabilities of type one errors, subject the tests have the same fixed asymptotic power is considered also. It is assumed that the number of groups increases together with the number of observations. Keywords. Asymptotic efficiency, asymptotic slope, chi-square statistic, log-likelihood ratio statistic, multinomial distribution. MSC (2000): 60F10, 62G10, 62G20
1. Introduction The goodness-of-fit tests from groped data constitute a classical problem in statistical inference. Here the original problem is to test whether a sample has come from a given distribution. This problem is transformed into a problem of fit for a multinomial distribution by the method of grouping data: the support of the given distribution is divided into N intervals and is observed the number of observations, say k , arisen in the k th interval. Test statistics in this approach are constructed through comparison of k with its expected number npk , k 1,..., N , where n denotes the sample size. Most notably test statistics of this kind are Pearson’s chi-square statistic:
(m npm )2 npm m 1 N
N2
and the log-likelihood ratio (LR) statistic: N
m
m 1
npm
N 2m ln
,
here pm , m 1,..., N , are the probabilities of a testing hypothesis. In the present paper we study the asymptotic properties in testing goodness-of-fit of these statistics in the situation when N N (n) , as n . This is of interest, for instance Mann and Wald (1942) have obtained the relation
N
Cn2/5 , where C depend on the test size, concerning the optimal choice of the number of groups
in chi-square goodness of fit test. While we should note that there are other radically different recommendations of choice of number of intervals N, see for instance Gvanceladze and Chibisov (1979).
2
Specifically, we consider the problem of testing the goodness of fit of an absolutely continuous distribution F to a set of n observations grouped into N equal probability intervals. Through a probability integral transformation the original problem can be reduced to testing the null hypothesis of uniformity, i.e. H 0 : F ( x) f ( x) 1 , 0 x 1. We shall test H 0 against a family of the sequences of alternatives: H1n : f n ( x) 1 (n)ln ( x)
where
1
l ( x)dx 0 , 0 n
ln
2
1 , supn ln
(1.1)
and (n) 0 as n .
The chi-square and LR statistics are special variants of the symmetric statistics
S Nh h(1 ) ... h(N ) , where h is a function defined on non-negative axis,1 ,..., N the numbers of observations in the intervals. A test based on the statistic S Nh is called h-test for brevity. Asymptotical properties of h-tests essentially depend on behavior of (n) , hence the alternatives (1.1) need some classification. Due to Holst (1972), Ivchenko and Medvedev (1978) and Mirakhmedov (1987) there is
no power of the h-tests for the alternatives (1.1) with (n) o (nn )1/4 , where n n / N , and for the alternatives (1.1) with (n) (nn )1/4 the chi-square test is asymptotically most powerful within class of h-tests (satisfying some Lyapunov’s kind condition). The rate (n) (nn )1/4 keeps asymptotic power of the h-tests bounded away from the level and 1, and hence (1.1) in this case form a family of Pitman alternatives. Another family of “extreme” alternatives arises when one assumes that (n) remains fixed, i.e. alternatives does not approach the hypothesis; this case arises in Bahadur and Hodges-Lehman settings. Quine and Robinson (1985) have proved that when n the chi-square and LR tests have the same Pitman asymptotical efficiency (AE) , but the chi-square test is inferior to the LR test in terms of the Bahadur AE. Note that for Bahadur settings (n) is a constant, i.e. alternative does not approaches the hypothesis. Further, these sequences of alternatives (1.1) with
(n) 0 and (n)(nn )1/4 provide yet family of intermediate alternatives, we shall denote it by symbol
alt
. According of our best knowledge there is only the paper by Ivchenko and Mirakhmedov
(1995) who have considered the intermediate properties of h-tests in terms of asymptotical - slopes, see Section 2, in testing hypothesis on the uniformity of the probabilities of a multinomial distribution. In the case n (0, ) they have proved that the chi-square test is optimal in term of - slope within the class of h-tests for the subfamily of subfamily of
alt
alt
such that (n) O (nn )1 log n
1/4
, but for the
satisfying the condition (n)n1/6 log 1/3 N the chi-square test is inferior to tests
satisfying the Cramér condition, see Appendix, condition (A.1). Note that the N statistic satisfies the Cramér condition, whereas the N2 statistic does not.
3
Thus existing results does not cover the properties of the chi-square and LR tests for the family
alt
.
Further , because the chi-square test being more (when n is bounded away from zero and infinity) or the same ( if n ) AE w.r.t. LR test for the Pitman family of alternatives (closest to hypothesis and distinguishable by h-tests family), and because it loses its leadership for the family of fixed alternatives, hence very interesting problem is to find the maximum “distance” between the alternatives and the hypothesis when the chi-square test still retain “leadership”. In this work we address those problems in terms both of -slopes and of the traditional definition of asymptotic relative efficiency (ARE), viz. a limiting value of the ratio of the sample sizes; for the definitions see Section 2 and Appendix. Let
stands for the subfamily of
alt
such that (n) (nn2 ) , 0 . In
particularly, we show, in terms of slopes as well as in terms of asymptotic intermediate efficiency due to Inglot (1999), that for the family same efficient, but for the family
with 1/ 8 the chi-square and LR tests are asymptotically
with 0 1/ 8 the 2 test is inferior to the LR test. These
complement the results of Quine and Robinson (1985), Mirakhmedov (1987) and Ivchenko and Mirakhmedov (1995). The rest of the paper is organized as follows. The main results are presented in Section 2; in Section 3 the proofs are given; for the reader’s convenience, the auxiliary Assertions are collected in Appendix. In what follows Ck is an universal positive constant; all asymptotic statements are considered as n ;
F stands for “ r.v has the distribution F ”; an
bn stands for an o(bn ) .
2. The results We still use the notations of Sections 1. It turned out that for the chi-square test we should consider the following subfamilies of the family of intermediate alternatives
1/6
:
,
o
, the subfamily such that (n) o n max(1, n2 )
, the subfamily such that (n) (nn2 ) , 1/ 8 1/ 6 ,
1/8
alt
, the subfamily such that, (n) (nn2 )1/8 .
Before moving on to the specified AE results of the chi-square and LR tests we need to consider some notations on the general h-tests. Let n and n stand for the size and power respectively of the htest, Pi , i , Vari the probability, expectation and variance counted under H i , i 0,1 , here and everywhere in the sequel H1 means the family of alternatives under consideration. We suppose that large values of S nf rejects the hypothesis and E1S Nh E0 S Nh . Everywhere in what follows h is not linear function.
4
Assuming that n (0.1) we shall measure the performance of the test by the asymptotic value of a -slope, viz., en (S Nh ) log P0 S Nh E1S Nh .
(2.1)
Similarly, when n (0.1) we measure of the performance of the test by the asymptotic value of a -slope, viz., en (S Nh ) log P1 S Nh E0 S Nh .
(2.2)
We shall focus in the sequel on the -slope of h-tests. For a given family of alternatives the asymptotic relative efficiency ( ARE) of one test to another, subject they have the same asymptotic power which is bounded away from zero and 1, is defined as the ratio of their asymptotic -slopes. This corresponds to a concept of comparison of two tests through comparison of asymptotic behavior of the probabilities of the first type errors, when they have the same fixed asymptotic power; in our opinion such concept is reasonable and corresponds to the spirit of ARE of two tests. Theorem 2.1. The followings are hold: a) In the family
o
for arbitrary n , and
b) for each (1/ 8,1/ 6] in the family
n(16 )/(14 )
N
if
n3(14 )/4(12 ) ,
(2.3)
one has
en ( N2 ) 1 (1 o(1)) ; 4 nn (n) 4 c) In the family
1/8
(2.4)
one has
en ( N2 ) o(1) . nn 4 (n) Remark 2.1. Note that the case 1/ 6 is in a) part. For 1/ 8 the condition (2.3) implies N o( n ) . In particularly, if 1/ 6 then N o(n3/8 ) , and if 1/ 8 , (0,1/ 24] , then
condition (2.3) implies n(124 )/2(18 )
N
n(324 )/2(38 ) . When varies in the interval (1/ 8,1/ 6] the
strips (2.3) cover the interval (1, o( n )) . Theorem 2.2. Let n (0, ] . For the subfamily of
alt
such that (n) o(n1/2 ) one has
en ( N ) 1 2 ( N , n )(1 o(1)) , nn 4 (n) 4 where ( N , n ) corr log , 2 (2n 1) , n1 cov( log , ) ,
( N , n ) 1 (6n )1 (1 o(1)) if n .
Poi(n ) . Note that
5
Corollary 2.1. a) In the family
o
for arbitrary n one has
ARE N2 , n 2 n , n (1 o(1)) ; b) For each (1/ 8,1/ 6] in the family
if the condition (2.3) is fulfilled then
ARE N2 , n 1 o(1) ; c) In the family
if (n) o(n1/2 ) then
ARE N2 , n o(1) . Theorem 2.1 allows getting the asymptotic intermediate efficiency (AIE) due to Inglot (1999), see Definition A3 in Appendix, of N2 test w.r.t. N test. Theorem 2.3. The following assertions are hold true: a) For the families
o
and arbitrary n one has
AIE N2 , n 2 n , n (1 o(1)) ;
b) For the family
, where each (1/ 8,1/ 6] and N satisfy (2.3), one has AIE ( n2 , n ) 1 ;
c) For the family
1/8
if n and (n) o(n1/2 ) then AIE ( n2 , n ) 0 .
Theorem 2.4. Let n (0, ) . Then in the family
o
for arbitrary h-test satisfying the Cramer
condition (A.1), in particularly for LR test, one has AIE N2 , S Nh 2 Snf , (1 o(1)) 1. Remark 2.2. Ivchenko and Mirakhmedov (1995) have considered h-tests in the problem of goodness of fit for a multinomial distribution M (n, p1 ,..., pN ) , namely the testing hypothesis H 0 :
pm 1/ N , m 1,..., N , against sequences of alternatives H1 : ( p1 ,..., pN ) (1/ N ,...,1/ N ) , which approach H 0 so that ( N ) N 1 m1 ( Npm 1)2 0 , nn ( N ) . The above presented results N
can be reformulated and proved (by rewriting line by line of their proofs) for this goodness of fit problem replacing ( N ) instead of 2 (n) . Note that the case n [0, ) is of interest for the multinomial distributions with “small sample”, see for instance, Ivchenko and Medvedev (1978). Thus, if n then the family of alternatives (1.1) with (n) (nn2 )1/8 is a threshold in the asymptotic efficiency of the chi-square test: for the alternatives which are at the “distance” equal or more than (nn2 )1/8 from the hypothesis AE of the 2 test is inferior to N test, whereas for the alternatives which are at the distance less than (nn2 )1/8 the n2 and N tests have the same AIE. These complement the results of Queen and Robinson (1985). If n , 0 , then for the
6
family of alternatives with (n) o(n1/6 ) the 2 test asymptotically more efficient w.r.t. the all h-tests satisfying the Cramer condition (A.1), in particularly w.r.t. N test. This complements the results of Holst (1972) , Ivchenko and Medvedev (1978), Mirakhmedov (1987) and Ivchenko and Mirakhmedov (1995). Remarks 2.3. Above considered intermediate approach based on the -slope is somewhere between Pitman and Bahadur approaches. Alike one can consider intermediate approach based on the
-slope, the case intermediate between Pitman and Hodges-Lehman settings, when the performance of a h-test based on second kind error asymptotic and is measured by the asymptotic value of en ( S Nh ) , see (2.2). We state that under the Pitman families of alternatives and
all
the en ( S Nh ) and en ( S Nh ) are
asymptotically equivalent. In Inglot and Ledwina (2004, Remark 4) it was remarked that an approach based on the second kind error asymptotic is completely uninformative for Cramer-von Mises test; but it may rise a situation when under intermediate alternatives it is informative. The above statement confirms the last sentence. 3. Proofs We still use the notations of Sections 1 and 2. First we consider the symmetric statistics S Nh . Let
Poi(n ) , g ( ) h( ) Eh( ) ( n ) , n1 cov(h( ), )
2 (h) Var g( ) Varh( ) 1 corr 2 h( ), , L3, N E g ( ) / 3 (h) N , 3
(S Nh , n ) corr h( ) , 2 (2n 1) . Lemma 3.1. If L3, N 0 , then both under the H 0 and for any f n
alt
one has
Pi Snf u Vari Snf Ei Snf (u ) o(1) , i 0,1 ;
(3.1)
E0 S Nh NEh( )(1 o(1)) , Var1S Nh Var0 S Nh (1 o(1)) N 2 (h)(1 o(1)) ,
(3.2)
and
xN (h) E1S Nh E0 S Nh / Var0 S Nh def
nn 2 (n) (S Nh , n )d 2 1 o(1) . 2
(3.3)
Proof. From Theorem of Mirakhmedov (2007) it follows that S Nh has asymptotically normal distribution with expectation AiN m1 Ei h(m ) and variance iN2 m1Vari g (m ) , where N
m
N
Poi(npim ) , m 1,..., N and i 0,1 . Hence by well-known theorem on convergence of moments
(see, for example Theorem 6.14 by Moran (1984)) Ei S Nh AiN o(1) and Vari S Nh iN2 (1 o(1)) .
(3.4)
7
Therefore (3.1) follows. Note that in our situation the random vector of frequencies of the groups (1 ,...,N )
M n, pi1 ,..., piN , where p0 m 1/ N under H 0 , and under the alternatives (1.1)
p1m
m/ N
( m 1)/ N
(1 (n)lm ( x))dx
1 1 (n)lmN , m 1,..., N . N
Because of this applying Taylor expansion idea in the right-hand sides of (3.4) after some computation we derive (3.2) and (3.3), see also Holst (1972), Ivchenko and Mirakhmedov (1995). Proof is completed. Remark 3.1. Note that the condition L3, N 0 is fulfilled for the chi-square and LR statistics iff
nn . Lemma 3.2. For every distribution f n
alt
and for arbitrary symmetric statistic
SˆNh S Nh E0 S Nh / Var0 S Nh such that L3, N 0 the condition (i) of Definition A.2 satisfies with
bn ( f n ) xn (h) . Proof. Due to Lemma 3.1 we have h h Sˆnh Var0 Snh Sn E1Sn P1 1 P1 xn (h) h Var1Snh Var1Sn xn (h)
Var0 Snh 2 xn (h) 1 o(1) 1 o(1) , h Var S 1 n since (3.2) and the fact that xn (h) for the family
alt
. Lemma 3.2 follows.
By (3.3) we have h nn 2 S NEh( ) en ( S Nh ) log P0 N (n) ( S Nh , n )d 2 1 o(1) . 2 ( h) N
(3.5)
Proof of Theorem 2.1. For the chi-square statistic h(u) (u n )2 / n , Eh( ) 1 , 2 (h) 2 and
( N2 , n ) 1 , hence by (3.5)
en ( N2 ) log P0 N2 xn 2 N N , where xn d 2 nn / 2 2 (n)(1 o(1)) xn d 2 nn / 2 2 (n) . Note that xn o within the family
1
(3.6)
N min(1, n
1/3
. Next for each (1/ 8,1/ 6] the first and second relation of the condition (2.3)
imply xn o( N ) and N 3/2 / nxn 0 respectively. Therefore the cases a) and b) follows by applying in (3.6) Assertions A2 and A4 respectively. Proof of part c). Remark that in the family
xn o( N ) imply
, (0,1/ 8] , we have xn 1/ 2 n1/4 , therefore
n o( N ) . Hence Assertion A4 can’t be used here. Instead we will
8
prove the following Lemma 3.3. Let n o( N ) . In the families
with 1/ 8 one has en ( N2 ) o(nn 4 (n)) .
Proof. Let (n) n n d 2 nn 2 (n) 1 . Note that E0 N2 N (1 o(1)) , Var0 N2 2 N (1 o(1)) and xN ( f ) nn 2 2 (n)d 2 1 o(1) . Use these to get
P0 N2 E1 N2 P0 N2 E0 N2 xN ( f ) Var0 N2
N P0 (m n )2 n d 2 nn 2 (n) 1 o(1) m1
N P0 (m n )2 n 0 1 v(n) P0 1 v(n) m2
N 1 P0 (ˆm n )2 n 0 P0 1 v(n) . m1
Here ˆm
(3.9)
Bi n v(n),( N 1)1 . Put ˆn (n v(n)) / ( N 1) . It is easy to see that
v(n)) / n N 1 d (n) N 1/2 1 o(1) and ˆn n 1 O N 1 d (n) N 1/2 . We have
N 1 N 1 P0 (ˆm n )2 n 0 P0 (ˆm ˆn )2 ( N 1)n m1 m1
N 1 P0 (ˆm ˆn )2 ˆn v(n) n m1
N 1 ˆ 2 ˆ (ˆm n ) n P0 m1 d (n) o(1) c 0 , 2 2(n v(n)) / ( N 1)
(3.10)
N 1
because (n v(n))ˆn nn 1 o(1) , and hence the CLT for the statistic (ˆm ˆn )2 is enable to m 1
use, see Mirakhmedov (1990, Corollary 3 ). Set g ( x, p) x log x / p (1 x) log (1 x) / (1 p) , x (0,1) and p (0,1) . Let
Bi k , p . Due to
Lemma 1 of Quine and Robinson (1985): for an integer kx P kx 0.8 2 kx(1 x)
Note that under H 0 the r.v. 1
1/2
exp kg ( x, p) .
(3.11)
B(n, N 1 ) , therefore applying (3.11) we obtain
P0 1 v(n)
c v(n) 1 v(n)n 1
1/2
1 v(n)n1 exp v(n) log(n1v(n)) n(1 n 1v(n)) log 1 N 1
9
c v(n)
1/2
exp v(n) log(n1v(n)) .
Hence
log P0 1 v(n) nn 4 (n)
n (n) nn log v(n) v(n) log(n1v(n)) v ( n) c c log 4 4 nn (n) nn (n) n
1 1 c 4 3 n (n) (n) n n
n 1/3 n 1/8 N2 v ( n) c 2 2 log log o(1) , N N n n
(3.12)
since n o( N 2 ) . Lemma 3.3 follows from (3.9), (3.10) and (3.12). For
n o( N ) part c) follows from Lemma 3.3. Let now N o( n ) . Lemma 3.2 allows to
conclude that en ( N2 ) asymptotically coincides with intermediate slope, see Definition A2; the constant c and the sequence n are such that n c en ( N2 ) / nn 4 (n) / 4(1 o(1)) , since (3.3). On the other hand, when testing uniformity against family
alt
the Neyman-Pearson (NP) test can be applied.
As it was shown by Inglot and Ledvina (1996, Sec 5) the intermediate slope of NP test has the form
n 2 (n) / 2 . Hence, if n 1 under the family
1/8
then en ( N2 ) / en ( NP)
2cn 2 (n) 2c(n / N 2 )1/4 last is impossible because the ratio can’t be greater than 1. Proof of
Theorem 2.1 is completed.
Proof of Theorem 2.2 . By (3.2), (3.3) and (3.5) en ( N ) log P0 N xn Var0 N E0 N , where xn d 2 nn / 2 2 (n) ( N2 , n )(1 o(1)) . Note that xn o( N 1/2 ) iff (n) o(n1/2 ) . Proof of Theorem 2.2
is concluded by applying Assertions A3 and A5. Proof of Corollary 3.1. Let c be an arbitrary constant and tnh ( f n ) nn / 2 2 (n) (S Nh , n )d 2 c ,
fn
alt
. The test statistics is SˆNh ( S Nh E0 S Nh ) / Var0 S Nh . For the power of the h-test with critical
region
Sˆ
h n
tnh ( f n ) c
(3.13)
using Lemma 3.1 we obtain
S h E S h E S h E0 Snh Var0 Snh 1 n P1 Sˆnh tnh ( f n ) c P1 n tnh ( f n ) 1 n c (c) o(1) , h h h Var S Var S Var S 1 n 1 n 0 n
since (3.2) and (3.3). Hence asymptotic power of this h-test is bounded away from zero and 1. The related significance level of (3.13) is defined via
n P0 Sˆnh tnh ( f n ) c P0 Sˆnh nn / 2 2 (n) (S Nh , n )(1 o(1)) n (Snh ) .
(3.14)
The proof of Corollary 2.1 is concluded by applying these facts for chi-square and LR tests and using Theorems 2.1 and 2.2.
10
Proof of Theorem 2.3 and 2.4. The proof is concluded by checking the conditions of Assertion A.6
ˆ , the standardized version of the chi-square and LR statistics, with Vn(2) ( f n ) ˆ N2 and Vn(1) ( f n ) n which are considered as the test statistics. The condition (i) of the Definition A2 follows from Lemma 3.2. It follows from concerning large deviation results given in Assertions A1-A5 and Theorem 2.1 that for Sˆnf statistic and its special cases ˆ N2 and ˆ n the condition (ii) of Definition A2 is satisfied for appropriate families of alternatives. Therefore we obtain: (a) The ˆ n is
alt
, qn(1) ,1 -regular for the n (0, ] , where qn(1) n1/2 and asymptotic slope (AS)
is bn ( f n ) 41 n 4 (n) 2 ( N , n ) , note that bn ( f n ) 41 n 4 (n) if n . (b) The ˆ N2 statistic is
and
, qn(2) ,1 -regular, where
o
, qn(2) n1/3 min(n1/6 , n1/2 ) for arbitrary n ,
, qn(2) n1/2 for each (1/ 8,1/ 6] and N from the strip (2.3). The AS of the chi-square
statistic is bn ( f n ) 41 n 4 (n) for every f n
. Also the ˆ N2 statistic is
1/8
, n1/2 , n -regular with
n o(1) and AS bn ( f n ) n n 4 (n) . (c) If n 0 (0, ) then according to Assertion A1 the test statistic S nh satisfying the Cramer condition (A.1) is
alt
,1,1 -regular and its asymptotic slope (AS) is [bn ( f n )]2 / 2 , where
bn ( f n ) 21 n 2 (n) S Nh , n , f n
Further, observe that family
o
and
alt
alt
.
is renumerable, log n ( N2 ) nn 4 (n) / 4(1 o(1)) and e 1 for the
, (1/ 8,1/ 6] , whereas log n ( N2 ) o(nn 4 (n)) and e 0 for the family
1/8
,
sine (3.14) and Theorem 2.1. Due to these and above (a),(b) and (c) facts we see that all conditions of Assertion A6 are satisfied. Theorems 2.3 and 2.4 follow. In order to conform the statement of Remark 3.4 we note that f f nn 2 S N E1S N f 2 en ( S ) log P1 ( n ) ( S , ) d 1 o (1) . N n f 2 Var1S N
f N
since (3.2) and (3.3). This implies that -slope also is determined by the large deviation probabilities for the left tails of S Nh under alternatives
alt
. Such kind results are given in Mirakhmedov (2016) for
the N2 and N statistics and by Ivchenko and Mirakhmedov (1995) for the some class of symmetric statistics. Appendix. We still use the notations of the previous sections. Assertion A1. Let the function h is not linear, n (0, ) , Npmax C1 , for some C1 0 , and max E exp H g (m ) C2 , for some H 0 and C2 0 , m
1 m N
Then for xn , 0 xn o N 1/2 it holds
Poi(npm ) .
(A.1)
11
log P S xn VarS ES h n
h n
h n
xn3 1 2 xn O log xn . 2 N
Assertion A1 follows from Theorem 2 of Ivchenko and Mirakhmedov (1995) and the fact that
1 ( xn ) ( xn 2 ) 1 exp xn2 / 2 1 o(1) , xn .
(A.2)
Note that for the statistics N Cramer condition (A.1) is satisfied but for the N2 does not. Assertion A2. Let pm N 1 1 (n)dm,n , m 1, 2,..., N , where (n) 0 and N
d m,n 0 , m 1
1 N 2 d m,n d 2 , N m1
Then for arbitrary n and xn such that 0 xn o
N min(1, n2 )
1/ 3
one has
1 P N2 xn 2 N N xn2 O log xn , 2
Assertion A3. Let pm N 1 1 o(1) , m 1, 2,..., N , If n , xn , xn o N 1/6 then
1 log P N xn 2 N N xn2 O log xn . 2 Assertion A2 and Assertion A3 follows respectively from Corollary 2.2 and Corollary 2.3 of Mirakhmedov (2016) and (A.2). Assertion A4. Let N o( n ) , Npmin c . If xn , xn o( N ) and N 3/2 n1/2 xn then
xn3 xn N 3/2 1 2 log P xn N N xn O log N . 2 n N 2 N
Assertion A5. Let n and Npmin c for some c 0 . If xn and xn o( N ) then
log P N xn
xn3 1 2 N 3/2 2 N N xn O log N . 2 n N
Assertions A4 and A5 are respectively Eq. (2.17) and Eq. (2.13) of Kallenberg (1985). The notion of Kallenberg’s intermediate efficiency, Kallenberg (1983), for the case when dealing with nonparametric sequences of alternatives has been developed by Inglot (1999). We need a slightly more general definition than his. Corresponding notations in the variant adapted to the problem of testing uniformity against sequences of alternatives (1.1) are presented below. So, we consider testing the hypothesis H 0 : f ( x) 1 , 0 x 1 . In what follows symbol
denote
the family of all sequences of alternative distributions whose densities have the form (1.1), where
ln ( x) as in (1.1), but
(n) 0 and n (n) ,
(A.1)
12
Definition A1. We say that a family
of alternatives (1.1) satisfying (A.1) is renumarable if for every
and all sequences of positive integers n j and k j , n j , n j O(k j ) the sequence f n f n if
fn
n k j else f n f n j if n k j , j 1, 2,... also belong to
Note that the families
and
alt
.
are renumerable, whereas subfamilies
o
,
and
1/8
, see Section 3,
are not. Let q (qn ) and ( n ) be a sequence of positive numbers such that
lim qn 0 , lim nqn2 , or qn 1 for all n ,
(A.2)
lim n 0 , or n 1 for all n. Also let
be a subfamily of the family
(A.3)
, and Vn ( f n ) , f n
, a test statistic rejecting the hypothesis
for large values of Vn ( f n ) . Definition A2. The test statistic Vn ( f n ) is called ( , q, ) -regular for testing H 0 against H1 if Vn ( f n ) is defined for every f n
and the following two conditions hold: and arbitrary small 0
(i) there exists a positive function bn ( f n ) such that for every f n
V (f ) lim P1 n n 1 1, nbn ( f n ) (ii) there exists a constant c 0 such that for every sequence xn , xn o(qn ) and n nxn2 lim
1 log P0 Vn ( pn ) nxn c . n n xn2
The function c n bn ( f n ) is called the intermediate slope of Vn ( f n ) . 2
For two sequences of test statistics Vn(1) ( f n ) and Vn(2) ( f n ) with right critical regions let ntn(1), and
ntn(2), be corresponding critical values of the same level , i.e. P0 Vn(i ) ntn(i,) , and P0 Vn(i ) d for every d ntn(i,) , i 1, 2 . Let Vn( j ) ( f n ) be ( ( j)
( j)
, q( j ) , ( j ) ) -regular, where
, the sequences q( j ) (q jn ) , ( j ) ( jn ) satisfy (A.2) and (A.3) respectively, j 1, 2 . Let n
be a sequence of levels such that lim n lim( n n)1 n 0 , with n min(1n , 2n ) ,
(A.4)
and the power of Vn(2) ( f n ) test of the level n is bounded away from 0 and 1:
0 liminf P1 Vn(2) ( f n ) n tn(2),n limsup P1 Vn(2) ( f n ) n tn(2),n 1 . Put
nV ( 2) ,V (1) nV ( 2) ,V (1) (n, f n , n ) inf m : P1 Vm(1)k ( f n ) m k tm(1)k ,n
(A.5)
13
P1 Vn(2) ( f n ) n tn(2),n for all k 0 . satisfies (A.1) and there exists n satisfying (A.4) and (A.5). If there
Definition A3. Let a family exists the limit lim
nV ( 2) ,V (1) n
eV ( 2) ,V (1) [0, ] ,
which does not depend on the particular choice of n , then eV ( 2)V (1) is called asymptotic intermediate efficiency of Vn(2) ( f n ) with respect to Vn(1) ( f n ) , shortly AIE Vn(2) ( f n ),Vn(1) ( f n ) . Assertion A6. Let (i) aforementioned statistics Vn( j ) ( f n ) be ( corresponding functions bn( j ) ( f n ) and constants c ( j ) , j 1, 2 ; (ii) sufficiently large ; (iii)
fn
(2)
(1)
( j)
, q( j ) , ( j ) ) -regular with
(1)
(2)
and q2 n q1n 1n , for n
be renumerable , q1n1 and n n q1n are non-decreasing; (iv) for every
there exists the limit
lim (v) for each f n
(2)
c (2) 2 n bn(2) ( pn ) c (1) 1n bn(1) ( pn )
2
2
e [0, ] ;
there exists n satisfying (A.4) and (A.5) and such that log n o(nq22n ) . Then
eV ( 2) ,V (1) e . The proof of Assertion A6 consists of rewriting line by line the proof of Theorem 2.7 of Inglot (1999) with quite obvious changes, and is here omitted.
References. 1. Gvanceladze L.G. and Chibisov D.M. (1979). On tests of fit based on grouped data. In Contribution to Statistics, J.Hajek Mamorial Valume. J.Jurechkova. ed. 79-89. Academia, Prague. 2. Holst L. (1972). Asymptotic normality and efficiency for certain goodness-of-fit tests. Biometrica, 59, p.137-145. 3. Inglot T. and Ledwina T.,(1996). Asymptotic optimality of data-driven Neyman’s tests for uniformity. Ann. Statist. 24, p.1982-2019. 4. Inglot T., (1999). Generalized intermediate efficiency of goodness-of-fit tests. Math. Methods Statist; 8, p.487-509. 5. Inglot T. and Ledwina T.,(2004). On consistent minimax distinguishability and intermediate efficiency of Cramer-von Mises test. J. Stat. Planning Infer., 124, p. 453-474. 6. Ivchenko G.I., Medvedev Y.I.(1978). Decomposable statistics and verifying of tests. Small sample case. Theory of Probability Appl., 23 , 796-806.
14
7. Ivchenko G.I., Mirakhmedov Sh.A., (1995). Large deviations and intermediate efficiency of the decomposable statistics in multinomial scheme. Math. Methods in Statist., 4, p.294-311. 8. Kallenberg W.C.M.,(1983). Intermediate efficiency, theory and examples. Ann. Statist, 11, p. 170182. 9. Kallenberg W.C.M.,(1985). On moderate and large deviations in multinomial distributions. Ann. Statist. 13, 1554-1580. 10. Mann H.B. and Wald, A. (1942). On the choice of the number of intervals in the application of the chi-square test. Ann. Math. Statist. 13, 306-317. 11. Mirakhmedov Sh.A., (1987). Approximation of the distribution of multi-dimensional randomized divisible statistics by normal distribution.(Multinomial case). Theory Probabl. Appl., 32, p.696-707. 12. Mirakhmedov Sh.A.(1992), Randomized decomposable statistics in the scheme of independent allocating particles into boxes. Discrete Math. Appl. 2, p.91-108. DOI: 10.1515/dma.1992.2.1.91, October 2009
13. Mirakhmedov S.M1. (2007). Asymptotic normality associated with generalized occupancy problem. Statistics & Probability Letters, v.77, p.1549-1558 14. Mirakhmedov S.M. (2016), The Probabilities of Large Deviations for Chi-square and Log likelihood Ratio Statistics. http://arxiv.org/abs/1606.00250. 15. Quine M.P. and Robinson J. (1985). Efficiencies of chi-square and likelihood ratio goodness-of-fit tests, The Annals of Statistics. 13, p.727–742.
1
Mirakhmedov S.M. is former Mirakhmedov Sh.A.