Asymptotic Intermediate Efficiency of the Chi-square and ... - arXiv

1

Asymptotic Intermediate Efficiency of the Chi-square and Likelihood Ratio Goodness of Fit Tests Sherzod M. Mirakhmedov Institute of Mathematics. National University of Republic Uzbekistan 100125, Tashkent, Durmon yuli st., 29 e-mail: [email protected]

Abstract. Asymptotic relative intermediate efficiency of the chi-square and log-likelihood ratio goodness of fit tests is studied. Asymptotical comparison of these tests through asymptotical behavior of the probabilities of type one errors, subject the tests have the same fixed asymptotic power is considered also. It is assumed that the number of groups increases together with the number of observations. Keywords. Asymptotic efficiency, asymptotic slope, chi-square statistic, log-likelihood ratio statistic, multinomial distribution. MSC (2000): 60F10, 62G10, 62G20

1. Introduction The goodness-of-fit tests from groped data constitute a classical problem in statistical inference. Here the original problem is to test whether a sample has come from a given distribution. This problem is transformed into a problem of fit for a multinomial distribution by the method of grouping data: the support of the given distribution is divided into N intervals and is observed the number of observations, say  k , arisen in the k th interval. Test statistics in this approach are constructed through comparison of  k with its expected number npk , k  1,..., N , where n denotes the sample size. Most notably test statistics of this kind are Pearson’s chi-square statistic:

(m  npm )2 npm m 1 N

 N2  

and the log-likelihood ratio (LR) statistic: N

m

m 1

npm

 N  2m ln

,

here pm , m  1,..., N , are the probabilities of a testing hypothesis. In the present paper we study the asymptotic properties in testing goodness-of-fit of these statistics in the situation when N  N (n)   , as n   . This is of interest, for instance Mann and Wald (1942) have obtained the relation

N

Cn2/5 , where C depend on the test size, concerning the optimal choice of the number of groups

in chi-square goodness of fit test. While we should note that there are other radically different recommendations of choice of number of intervals N, see for instance Gvanceladze and Chibisov (1979).

2

Specifically, we consider the problem of testing the goodness of fit of an absolutely continuous distribution F to a set of n observations grouped into N equal probability intervals. Through a probability integral transformation the original problem can be reduced to testing the null hypothesis of uniformity, i.e. H 0 : F ( x)  f ( x)  1 , 0  x  1. We shall test H 0 against a family of the sequences of alternatives: H1n : f n ( x)  1   (n)ln ( x)

where

1

 l ( x)dx  0 , 0 n

ln

2

 1 , supn ln



(1.1)

  and  (n)  0 as n   .

The chi-square and LR statistics are special variants of the symmetric statistics

S Nh  h(1 )  ...  h(N ) , where h is a function defined on non-negative axis,1 ,..., N the numbers of observations in the intervals. A test based on the statistic S Nh is called h-test for brevity. Asymptotical properties of h-tests essentially depend on behavior of  (n) , hence the alternatives (1.1) need some classification. Due to Holst (1972), Ivchenko and Medvedev (1978) and Mirakhmedov (1987) there is





no power of the h-tests for the alternatives (1.1) with  (n)  o (nn )1/4 , where n  n / N , and for the alternatives (1.1) with  (n)  (nn )1/4 the chi-square test is asymptotically most powerful within class of h-tests (satisfying some Lyapunov’s kind condition). The rate  (n)  (nn )1/4 keeps asymptotic power of the h-tests bounded away from the level and 1, and hence (1.1) in this case form a family of Pitman alternatives. Another family of “extreme” alternatives arises when one assumes that  (n) remains fixed, i.e. alternatives does not approach the hypothesis; this case arises in Bahadur and Hodges-Lehman settings. Quine and Robinson (1985) have proved that when n   the chi-square and LR tests have the same Pitman asymptotical efficiency (AE) , but the chi-square test is inferior to the LR test in terms of the Bahadur AE. Note that for Bahadur settings  (n) is a constant, i.e. alternative does not approaches the hypothesis. Further, these sequences of alternatives (1.1) with

 (n)  0 and  (n)(nn )1/4   provide yet family of intermediate alternatives, we shall denote it by symbol

alt

. According of our best knowledge there is only the paper by Ivchenko and Mirakhmedov

(1995) who have considered the intermediate properties of h-tests in terms of asymptotical  - slopes, see Section 2, in testing hypothesis on the uniformity of the probabilities of a multinomial distribution. In the case n    (0, ) they have proved that the chi-square test is optimal in term of  - slope within the class of h-tests for the subfamily of subfamily of

alt

alt



such that  (n)  O  (nn )1 log n 

1/4

 , but for the

satisfying the condition  (n)n1/6 log 1/3 N   the chi-square test is inferior to tests

satisfying the Cramér condition, see Appendix, condition (A.1). Note that the  N statistic satisfies the Cramér condition, whereas the  N2 statistic does not.

3

Thus existing results does not cover the properties of the chi-square and LR tests for the family

alt

.

Further , because the chi-square test being more (when n is bounded away from zero and infinity) or the same ( if n   ) AE w.r.t. LR test for the Pitman family of alternatives (closest to hypothesis and distinguishable by h-tests family), and because it loses its leadership for the family of fixed alternatives, hence very interesting problem is to find the maximum “distance” between the alternatives and the hypothesis when the chi-square test still retain “leadership”. In this work we address those problems in terms both of  -slopes and of the traditional definition of asymptotic relative efficiency (ARE), viz. a limiting value of the ratio of the sample sizes; for the definitions see Section 2 and Appendix. Let



stands for the subfamily of

alt

such that  (n)  (nn2 ) ,   0 . In

particularly, we show, in terms of   slopes as well as in terms of asymptotic intermediate efficiency due to Inglot (1999), that for the family same efficient, but for the family





with   1/ 8 the chi-square and LR tests are asymptotically

with 0    1/ 8 the  2 test is inferior to the LR test. These

complement the results of Quine and Robinson (1985), Mirakhmedov (1987) and Ivchenko and Mirakhmedov (1995). The rest of the paper is organized as follows. The main results are presented in Section 2; in Section 3 the proofs are given; for the reader’s convenience, the auxiliary Assertions are collected in Appendix. In what follows Ck is an universal positive constant; all asymptotic statements are considered as n   ; 

F stands for “ r.v  has the distribution F ”; an

bn stands for an  o(bn ) .

2. The results We still use the notations of Sections 1. It turned out that for the chi-square test we should consider the following subfamilies of the family of intermediate alternatives





1/6

:

,

o

, the subfamily such that  (n)  o n max(1, n2 )



, the subfamily such that  (n)  (nn2 ) , 1/ 8    1/ 6 ,

1/8

alt

, the subfamily such that,  (n)  (nn2 )1/8 .

Before moving on to the specified AE results of the chi-square and LR tests we need to consider some notations on the general h-tests. Let  n and  n stand for the size and power respectively of the htest, Pi ,  i , Vari the probability, expectation and variance counted under H i , i  0,1 , here and everywhere in the sequel H1 means the family of alternatives under consideration. We suppose that large values of S nf rejects the hypothesis and E1S Nh  E0 S Nh . Everywhere in what follows h is not linear function.

4

Assuming that n    (0.1) we shall measure the performance of the test by the asymptotic value of a  -slope, viz., en (S Nh )   log P0 S Nh  E1S Nh  .

(2.1)

Similarly, when  n    (0.1) we measure of the performance of the test by the asymptotic value of a  -slope, viz., en (S Nh )   log P1 S Nh  E0 S Nh  .

(2.2)

We shall focus in the sequel on the  -slope of h-tests. For a given family of alternatives the  asymptotic relative efficiency (  ARE) of one test to another, subject they have the same asymptotic power which is bounded away from zero and 1, is defined as the ratio of their asymptotic  -slopes. This corresponds to a concept of comparison of two tests through comparison of asymptotic behavior of the probabilities of the first type errors, when they have the same fixed asymptotic power; in our opinion such concept is reasonable and corresponds to the spirit of ARE of two tests. Theorem 2.1. The followings are hold: a) In the family

o

for arbitrary n , and

b) for each   (1/ 8,1/ 6] in the family

n(16 )/(14 )

N



if

n3(14 )/4(12 ) ,

(2.3)

one has

en (  N2 ) 1  (1  o(1)) ; 4 nn (n) 4 c) In the family

1/8

(2.4)

one has

en (  N2 )  o(1) . nn 4 (n) Remark 2.1. Note that the case   1/ 6 is in a) part. For   1/ 8 the condition (2.3) implies N  o( n ) . In particularly, if   1/ 6 then N  o(n3/8 ) , and if   1/ 8   ,   (0,1/ 24] , then

condition (2.3) implies n(124 )/2(18 )

N

n(324 )/2(38 ) . When  varies in the interval (1/ 8,1/ 6] the

strips (2.3) cover the interval (1, o( n )) . Theorem 2.2. Let n    (0, ] . For the subfamily of

alt

such that  (n)  o(n1/2 ) one has

en ( N ) 1 2   ( N , n )(1  o(1)) , nn 4 (n) 4 where  ( N , n )  corr  log    ,  2  (2n  1)  ,   n1 cov( log  ,  ) , 

 ( N , n )  1  (6n )1 (1  o(1)) if n   .

Poi(n ) . Note that

5

Corollary 2.1. a) In the family

o

for arbitrary n one has

 ARE   N2 , n    2  n , n  (1  o(1)) ; b) For each   (1/ 8,1/ 6] in the family



if the condition (2.3) is fulfilled then

 ARE   N2 , n   1  o(1) ; c) In the family



if  (n)  o(n1/2 ) then

 ARE   N2 ,  n   o(1) . Theorem 2.1 allows getting the asymptotic intermediate efficiency (AIE) due to Inglot (1999), see Definition A3 in Appendix, of  N2 test w.r.t.  N test. Theorem 2.3. The following assertions are hold true: a) For the families

o



and arbitrary n one has



AIE  N2 , n   2   n , n  (1  o(1)) ;

b) For the family



, where each   (1/ 8,1/ 6] and N satisfy (2.3), one has AIE ( n2 , n )  1 ;

c) For the family

1/8

if n   and  (n)  o(n1/2 ) then AIE ( n2 , n )  0 .

Theorem 2.4. Let n    (0, ) . Then in the family

o

for arbitrary h-test satisfying the Cramer

condition (A.1), in particularly for LR test, one has AIE   N2 , S Nh    2  Snf ,   (1  o(1))  1. Remark 2.2. Ivchenko and Mirakhmedov (1995) have considered h-tests in the problem of goodness of fit for a multinomial distribution M (n, p1 ,..., pN ) , namely the testing hypothesis H 0 :

pm  1/ N , m  1,..., N , against sequences of alternatives H1 : ( p1 ,..., pN )  (1/ N ,...,1/ N ) , which approach H 0 so that  ( N )  N 1  m1 ( Npm  1)2  0 , nn  ( N )   . The above presented results N

can be reformulated and proved (by rewriting line by line of their proofs) for this goodness of fit problem replacing  ( N ) instead of  2 (n) . Note that the case n   [0, ) is of interest for the multinomial distributions with “small sample”, see for instance, Ivchenko and Medvedev (1978). Thus, if n   then the family of alternatives (1.1) with  (n)  (nn2 )1/8 is a threshold in the asymptotic efficiency of the chi-square test: for the alternatives which are at the “distance” equal or more than (nn2 )1/8 from the hypothesis AE of the  2 test is inferior to  N test, whereas for the alternatives which are at the distance less than (nn2 )1/8 the  n2 and  N tests have the same AIE. These complement the results of Queen and Robinson (1985). If n   , 0     , then for the

6

family of alternatives with  (n)  o(n1/6 ) the  2 test asymptotically more efficient w.r.t. the all h-tests satisfying the Cramer condition (A.1), in particularly w.r.t.  N test. This complements the results of Holst (1972) , Ivchenko and Medvedev (1978), Mirakhmedov (1987) and Ivchenko and Mirakhmedov (1995). Remarks 2.3. Above considered intermediate approach based on the  -slope is somewhere between Pitman and Bahadur approaches. Alike one can consider intermediate approach based on the

 -slope, the case intermediate between Pitman and Hodges-Lehman settings, when the performance of a h-test based on second kind error asymptotic and is measured by the asymptotic value of en ( S Nh ) , see (2.2). We state that under the Pitman families of alternatives and

all

the en ( S Nh ) and en ( S Nh ) are

asymptotically equivalent. In Inglot and Ledwina (2004, Remark 4) it was remarked that an approach based on the second kind error asymptotic is completely uninformative for Cramer-von Mises test; but it may rise a situation when under intermediate alternatives it is informative. The above statement confirms the last sentence. 3. Proofs We still use the notations of Sections 1 and 2. First we consider the symmetric statistics S Nh . Let



Poi(n ) , g ( )  h( )  Eh( )   (  n ) ,   n1 cov(h( ),  )

 2 (h)  Var g( )  Varh( ) 1  corr 2  h( ),    , L3, N  E g ( ) /  3 (h) N , 3

 (S Nh , n )  corr  h( )   ,  2  (2n  1)  . Lemma 3.1. If L3, N  0 , then both under the H 0 and for any f n 



alt

one has



Pi Snf  u Vari Snf  Ei Snf  (u )  o(1) , i  0,1 ;

(3.1)

E0 S Nh  NEh( )(1  o(1)) , Var1S Nh  Var0 S Nh (1  o(1))  N 2 (h)(1  o(1)) ,

(3.2)

and

xN (h)   E1S Nh  E0 S Nh  / Var0 S Nh  def

nn 2  (n)  (S Nh , n )d 2 1  o(1)  . 2

(3.3)

Proof. From Theorem of Mirakhmedov (2007) it follows that S Nh has asymptotically normal distribution with expectation AiN   m1 Ei h(m ) and variance  iN2   m1Vari g (m ) , where N

m

N

Poi(npim ) , m  1,..., N and i  0,1 . Hence by well-known theorem on convergence of moments

(see, for example Theorem 6.14 by Moran (1984)) Ei S Nh  AiN  o(1) and Vari S Nh   iN2 (1  o(1)) .

(3.4)

7

Therefore (3.1) follows. Note that in our situation the random vector of frequencies of the groups (1 ,...,N )

M  n, pi1 ,..., piN  , where p0 m  1/ N under H 0 , and under the alternatives (1.1)

p1m  

m/ N

( m 1)/ N

(1   (n)lm ( x))dx 

1 1   (n)lmN  , m  1,..., N . N

Because of this applying Taylor expansion idea in the right-hand sides of (3.4) after some computation we derive (3.2) and (3.3), see also Holst (1972), Ivchenko and Mirakhmedov (1995). Proof is completed. Remark 3.1. Note that the condition L3, N  0 is fulfilled for the chi-square and LR statistics iff

nn   . Lemma 3.2. For every distribution f n 



alt

and for arbitrary symmetric statistic



SˆNh  S Nh  E0 S Nh / Var0 S Nh such that L3, N  0 the condition (i) of Definition A.2 satisfies with

bn ( f n )  xn (h) . Proof. Due to Lemma 3.1 we have h h  Sˆnh Var0 Snh    Sn  E1Sn P1   1     P1    xn (h)  h Var1Snh   Var1Sn  xn (h)  

 Var0 Snh   2   xn (h)   1  o(1)  1  o(1) , h   Var S 1 n   since (3.2) and the fact that xn (h)   for the family

alt

. Lemma 3.2 follows.

By (3.3) we have h   nn 2  S  NEh( )  en ( S Nh )   log P0  N   (n)  ( S Nh , n )d 2 1  o(1)  . 2   ( h) N   

(3.5)

Proof of Theorem 2.1. For the chi-square statistic h(u)  (u  n )2 / n , Eh( )  1 ,  2 (h)  2 and

 (  N2 , n )  1 , hence by (3.5)





en (  N2 )   log P0  N2  xn 2 N  N , where xn  d 2 nn / 2  2 (n)(1  o(1)) xn  d 2 nn / 2  2 (n) . Note that xn  o within the family

1

(3.6)



N min(1, n



1/3



. Next for each   (1/ 8,1/ 6] the first and second relation of the condition (2.3)

imply xn  o( N ) and N 3/2 / nxn  0 respectively. Therefore the cases a) and b) follows by applying in (3.6) Assertions A2 and A4 respectively. Proof of part c). Remark that in the family

xn  o( N ) imply



,   (0,1/ 8] , we have xn  1/ 2 n1/4 , therefore

n  o( N ) . Hence Assertion A4 can’t be used here. Instead we will

8

prove the following Lemma 3.3. Let n  o( N ) . In the families

with   1/ 8 one has en (  N2 )  o(nn 4 (n)) .



Proof. Let  (n)  n  n  d 2 nn 2 (n)   1 . Note that E0  N2  N (1  o(1)) ,   Var0  N2  2 N (1  o(1)) and xN ( f )  nn 2 2 (n)d 2 1  o(1)  . Use these to get







P0  N2  E1 N2  P0  N2  E0  N2  xN ( f ) Var0  N2



N   P0  (m  n )2  n  d 2 nn 2 (n) 1  o(1)   m1 





N   P0   (m  n )2  n  0 1  v(n)  P0 1  v(n) m2 





 N 1   P0  (ˆm  n )2  n  0 P0 1  v(n) .  m1 



Here ˆm





(3.9)



Bi n  v(n),( N  1)1 . Put ˆn  (n  v(n)) / ( N  1) . It is easy to see that





v(n)) / n   N 1  d (n) N 1/2  1  o(1)  and ˆn  n 1  O  N 1  d (n) N 1/2  . We have

 N 1   N 1  P0  (ˆm  n )2  n  0  P0  (ˆm  ˆn )2  ( N  1)n   m1   m1 









 N 1   P0  (ˆm  ˆn )2  ˆn   v(n)  n   m1 





 N 1  ˆ 2 ˆ   (ˆm  n )  n   P0  m1  d (n)  o(1)   c  0 , 2  2(n  v(n)) / ( N  1)   

(3.10)

N 1

because (n  v(n))ˆn  nn 1  o(1)    , and hence the CLT for the statistic  (ˆm  ˆn )2 is enable to m 1

use, see Mirakhmedov (1990, Corollary 3 ). Set g ( x, p)  x log  x / p   (1  x) log  (1  x) / (1  p)  , x  (0,1) and p  (0,1) . Let 

Bi  k , p  . Due to

Lemma 1 of Quine and Robinson (1985): for an integer kx P   kx  0.8  2 kx(1  x) 

Note that under H 0 the r.v. 1

1/2

exp kg ( x, p) .

(3.11)

B(n, N 1 ) , therefore applying (3.11) we obtain

P0 1  v(n)





 c v(n) 1  v(n)n 1



1/2

 1  v(n)n1  exp v(n) log(n1v(n))  n(1  n 1v(n)) log  1  N 1  

9

 c  v(n) 

1/2





exp v(n) log(n1v(n)) .

Hence



log P0 1  v(n) nn 4 (n)

n   (n) nn log v(n)  v(n) log(n1v(n)) v ( n) c  c log 4 4 nn (n) nn (n) n

 1 1  c 4  3  n (n)  (n) n n 

  n 1/3  n 1/8   N2  v ( n)  c  2    2   log   log   o(1) ,   N N n         n   

(3.12)

since n  o( N 2 ) . Lemma 3.3 follows from (3.9), (3.10) and (3.12). For

n  o( N ) part c) follows from Lemma 3.3. Let now N  o( n ) . Lemma 3.2 allows to

conclude that en (  N2 ) asymptotically coincides with intermediate slope, see Definition A2; the constant c and the sequence  n are such that  n c  en (  N2 ) / nn 4 (n) / 4(1  o(1)) , since (3.3). On the other hand, when testing uniformity against family

alt

the Neyman-Pearson (NP) test can be applied.

As it was shown by Inglot and Ledvina (1996, Sec 5) the intermediate slope of NP test has the form

n 2 (n) / 2 . Hence, if  n  1 under the family

1/8

then en (  N2 ) / en ( NP)

 2cn 2 (n)  2c(n / N 2 )1/4   last is impossible because the ratio can’t be greater than 1. Proof of

Theorem 2.1 is completed.





Proof of Theorem 2.2 . By (3.2), (3.3) and (3.5) en ( N )   log P0  N  xn Var0  N  E0  N , where xn  d 2 nn / 2  2 (n)  (  N2 , n )(1  o(1)) . Note that xn  o( N 1/2 ) iff  (n)  o(n1/2 ) . Proof of Theorem 2.2

is concluded by applying Assertions A3 and A5. Proof of Corollary 3.1. Let c be an arbitrary constant and tnh ( f n )  nn / 2  2 (n)  (S Nh , n )d 2  c ,

fn 

alt

. The test statistics is SˆNh  ( S Nh  E0 S Nh ) / Var0 S Nh . For the power of the h-test with critical

region

Sˆ

h n



 tnh ( f n )  c

(3.13)

using Lemma 3.1 we obtain

 S h  E S h  E S h  E0 Snh Var0 Snh 1 n P1 Sˆnh  tnh ( f n )  c  P1  n  tnh ( f n )  1 n  c   (c)  o(1) , h h h Var S Var S Var S 1 n  1 n 0 n 





since (3.2) and (3.3). Hence asymptotic power of this h-test is bounded away from zero and 1. The related significance level of (3.13) is defined via



 



 n  P0 Sˆnh  tnh ( f n )  c  P0 Sˆnh  nn / 2  2 (n)  (S Nh , n )(1  o(1))   n (Snh ) .

(3.14)

The proof of Corollary 2.1 is concluded by applying these facts for chi-square and LR tests and using Theorems 2.1 and 2.2.

10

Proof of Theorem 2.3 and 2.4. The proof is concluded by checking the conditions of Assertion A.6

ˆ , the standardized version of the chi-square and LR statistics, with Vn(2) ( f n )  ˆ N2 and Vn(1) ( f n )   n which are considered as the test statistics. The condition (i) of the Definition A2 follows from Lemma 3.2. It follows from concerning large deviation results given in Assertions A1-A5 and Theorem 2.1 that for Sˆnf statistic and its special cases ˆ N2 and ˆ n the condition (ii) of Definition A2 is satisfied for appropriate families of alternatives. Therefore we obtain: (a) The ˆ n is



alt

, qn(1) ,1 -regular for the n    (0, ] , where qn(1)  n1/2 and asymptotic slope (AS)

is bn ( f n )  41 n 4 (n)  2 ( N , n ) , note that bn ( f n )  41 n 4 (n) if n   . (b) The ˆ N2 statistic is

and







, qn(2) ,1 -regular, where



o

, qn(2)  n1/3 min(n1/6 , n1/2 ) for arbitrary n ,

, qn(2)  n1/2 for each   (1/ 8,1/ 6] and N from the strip (2.3). The AS of the chi-square

statistic is bn ( f n )  41 n 4 (n) for every f n 



. Also the ˆ N2 statistic is 

1/8

, n1/2 , n  -regular with

 n  o(1) and AS bn ( f n )   n n 4 (n) . (c) If n  0  (0, ) then according to Assertion A1 the test statistic S nh satisfying the Cramer condition (A.1) is



alt

,1,1 -regular and its asymptotic slope (AS) is [bn ( f n )]2 / 2 , where

bn ( f n )  21 n  2 (n)   S Nh , n  , f n 

Further, observe that family

o

and



alt

alt

.

is renumerable, log  n (  N2 )  nn 4 (n) / 4(1  o(1)) and e  1 for the

,   (1/ 8,1/ 6] , whereas log  n (  N2 )  o(nn 4 (n)) and e  0 for the family

1/8

,

sine (3.14) and Theorem 2.1. Due to these and above (a),(b) and (c) facts we see that all conditions of Assertion A6 are satisfied. Theorems 2.3 and 2.4 follow. In order to conform the statement of Remark 3.4 we note that f f   nn 2  S N  E1S N  f 2 en ( S )   log P1     ( n )  ( S ,  ) d 1  o (1)   . N n f 2  Var1S N   



f N

since (3.2) and (3.3). This implies that  -slope also is determined by the large deviation probabilities for the left tails of S Nh under alternatives

alt

. Such kind results are given in Mirakhmedov (2016) for

the  N2 and  N statistics and by Ivchenko and Mirakhmedov (1995) for the some class of symmetric statistics. Appendix. We still use the notations of the previous sections. Assertion A1. Let the function h is not linear, n    (0, ) , Npmax  C1 , for some C1  0 , and max E exp H g (m )   C2 , for some H  0 and C2  0 , m

1 m N

Then for xn   , 0  xn  o  N 1/2  it holds

Poi(npm ) .

(A.1)

11



log P S  xn VarS  ES h n

h n

h n



 xn3  1 2   xn  O  log xn  . 2 N 

Assertion A1 follows from Theorem 2 of Ivchenko and Mirakhmedov (1995) and the fact that





1  ( xn )  ( xn 2 ) 1 exp  xn2 / 2 1  o(1)  , xn   .

(A.2)

Note that for the statistics  N Cramer condition (A.1) is satisfied but for the  N2 does not. Assertion A2. Let pm  N 1 1   (n)dm,n  , m  1, 2,..., N , where  (n)  0 and N

 d m,n  0 , m 1

1 N 2 d m,n  d 2   ,  N m1

Then for arbitrary n and xn such that 0  xn  o





N min(1, n2 )



1/ 3

 one has



1 P  N2  xn 2 N  N   xn2  O  log xn  , 2

Assertion A3. Let pm  N 1 1  o(1)  , m  1, 2,..., N , If n   , xn   , xn  o  N 1/6  then





1 log P  N  xn 2 N  N   xn2  O  log xn  . 2 Assertion A2 and Assertion A3 follows respectively from Corollary 2.2 and Corollary 2.3 of Mirakhmedov (2016) and (A.2). Assertion A4. Let N  o( n ) , Npmin  c . If xn   , xn  o( N ) and N 3/2 n1/2 xn   then

 xn3 xn N 3/2  1 2 log P   xn N  N    xn  O   log N  . 2 n   N 2 N

Assertion A5. Let n   and Npmin  c for some c  0 . If xn   and xn  o( N ) then



log P  N  xn

 xn3 1 2 N 3/2  2 N  N   xn  O   log N  . 2 n   N



Assertions A4 and A5 are respectively Eq. (2.17) and Eq. (2.13) of Kallenberg (1985). The notion of Kallenberg’s intermediate efficiency, Kallenberg (1983), for the case when dealing with nonparametric sequences of alternatives has been developed by Inglot (1999). We need a slightly more general definition than his. Corresponding notations in the variant adapted to the problem of testing uniformity against sequences of alternatives (1.1) are presented below. So, we consider testing the hypothesis H 0 : f ( x)  1 , 0  x  1 . In what follows symbol



denote

the family of all sequences of alternative distributions whose densities have the form (1.1), where

ln ( x) as in (1.1), but

 (n)  0 and n (n)   ,

(A.1)

12

Definition A1. We say that a family

of alternatives (1.1) satisfying (A.1) is renumarable if for every

and all sequences of positive integers n j and k j , n j   , n j  O(k j ) the sequence f n  f n if

fn 

n  k j else f n  f n j if n  k j , j  1, 2,... also belong to 

Note that the families

and

alt

.

are renumerable, whereas subfamilies

o

,



and

1/8

, see Section 3,

are not. Let q  (qn ) and   ( n ) be a sequence of positive numbers such that

lim qn  0 , lim nqn2   , or qn  1 for all n ,

(A.2)

lim n  0 , or  n  1 for all n. Also let

be a subfamily of the family



(A.3)

, and Vn ( f n ) , f n 

, a test statistic rejecting the hypothesis

for large values of Vn ( f n ) . Definition A2. The test statistic Vn ( f n ) is called ( , q, ) -regular for testing H 0 against H1 if Vn ( f n ) is defined for every f n 

and the following two conditions hold: and arbitrary small   0

(i) there exists a positive function bn ( f n ) such that for every f n 

   V (f )  lim P1  n n  1     1,  nbn ( f n )    (ii) there exists a constant c  0 such that for every sequence xn , xn  o(qn ) and  n nxn2   lim





1 log P0 Vn ( pn )  nxn  c .  n n xn2

The function c n bn ( f n ) is called the intermediate slope of Vn ( f n ) . 2

For two sequences of test statistics Vn(1) ( f n ) and Vn(2) ( f n ) with right critical regions let ntn(1), and





ntn(2), be corresponding critical values of the same level  , i.e. P0 Vn(i )  ntn(i,)   , and P0 Vn(i )  d    for every d  ntn(i,) , i  1, 2 . Let Vn( j ) ( f n ) be ( ( j)





( j)

, q( j ) , ( j ) ) -regular, where

, the sequences q( j )  (q jn ) ,  ( j )  ( jn ) satisfy (A.2) and (A.3) respectively, j  1, 2 . Let  n

be a sequence of levels such that lim  n  lim( n n)1 n  0 , with  n  min(1n , 2n ) ,

(A.4)

and the power of Vn(2) ( f n ) test of the level  n is bounded away from 0 and 1:









0  liminf P1 Vn(2) ( f n )  n tn(2),n  limsup P1 Vn(2) ( f n )  n tn(2),n  1 . Put





nV ( 2) ,V (1)  nV ( 2) ,V (1) (n, f n ,  n )  inf m : P1 Vm(1)k ( f n )  m  k tm(1)k ,n



(A.5)

13







 P1 Vn(2) ( f n )  n tn(2),n for all k  0 . satisfies (A.1) and there exists  n satisfying (A.4) and (A.5). If there

Definition A3. Let a family exists the limit lim

nV ( 2) ,V (1) n

 eV ( 2) ,V (1)  [0, ] ,

which does not depend on the particular choice of  n , then eV ( 2)V (1) is called asymptotic intermediate efficiency of Vn(2) ( f n ) with respect to Vn(1) ( f n ) , shortly AIE Vn(2) ( f n ),Vn(1) ( f n )  . Assertion A6. Let (i) aforementioned statistics Vn( j ) ( f n ) be ( corresponding functions bn( j ) ( f n ) and constants c ( j ) , j  1, 2 ; (ii) sufficiently large ; (iii)

fn 

(2)

(1)

( j)

, q( j ) , ( j ) ) -regular with

(1)



(2)

and q2 n  q1n 1n , for n

be renumerable , q1n1 and  n n q1n are non-decreasing; (iv) for every

there exists the limit

lim (v) for each f n 

(2)

c (2) 2 n bn(2) ( pn )  c (1) 1n bn(1) ( pn ) 

2

2

 e  [0, ] ;

there exists  n satisfying (A.4) and (A.5) and such that log  n  o(nq22n ) . Then

eV ( 2) ,V (1)  e . The proof of Assertion A6 consists of rewriting line by line the proof of Theorem 2.7 of Inglot (1999) with quite obvious changes, and is here omitted.

References. 1. Gvanceladze L.G. and Chibisov D.M. (1979). On tests of fit based on grouped data. In Contribution to Statistics, J.Hajek Mamorial Valume. J.Jurechkova. ed. 79-89. Academia, Prague. 2. Holst L. (1972). Asymptotic normality and efficiency for certain goodness-of-fit tests. Biometrica, 59, p.137-145. 3. Inglot T. and Ledwina T.,(1996). Asymptotic optimality of data-driven Neyman’s tests for uniformity. Ann. Statist. 24, p.1982-2019. 4. Inglot T., (1999). Generalized intermediate efficiency of goodness-of-fit tests. Math. Methods Statist; 8, p.487-509. 5. Inglot T. and Ledwina T.,(2004). On consistent minimax distinguishability and intermediate efficiency of Cramer-von Mises test. J. Stat. Planning Infer., 124, p. 453-474. 6. Ivchenko G.I., Medvedev Y.I.(1978). Decomposable statistics and verifying of tests. Small sample case. Theory of Probability Appl., 23 , 796-806.

14

7. Ivchenko G.I., Mirakhmedov Sh.A., (1995). Large deviations and intermediate efficiency of the decomposable statistics in multinomial scheme. Math. Methods in Statist., 4, p.294-311. 8. Kallenberg W.C.M.,(1983). Intermediate efficiency, theory and examples. Ann. Statist, 11, p. 170182. 9. Kallenberg W.C.M.,(1985). On moderate and large deviations in multinomial distributions. Ann. Statist. 13, 1554-1580. 10. Mann H.B. and Wald, A. (1942). On the choice of the number of intervals in the application of the chi-square test. Ann. Math. Statist. 13, 306-317. 11. Mirakhmedov Sh.A., (1987). Approximation of the distribution of multi-dimensional randomized divisible statistics by normal distribution.(Multinomial case). Theory Probabl. Appl., 32, p.696-707. 12. Mirakhmedov Sh.A.(1992), Randomized decomposable statistics in the scheme of independent allocating particles into boxes. Discrete Math. Appl. 2, p.91-108. DOI: 10.1515/dma.1992.2.1.91, October 2009

13. Mirakhmedov S.M1. (2007). Asymptotic normality associated with generalized occupancy problem. Statistics & Probability Letters, v.77, p.1549-1558 14. Mirakhmedov S.M. (2016), The Probabilities of Large Deviations for Chi-square and Log likelihood Ratio Statistics. http://arxiv.org/abs/1606.00250. 15. Quine M.P. and Robinson J. (1985). Efficiencies of chi-square and likelihood ratio goodness-of-fit tests, The Annals of Statistics. 13, p.727–742.

1

Mirakhmedov S.M. is former Mirakhmedov Sh.A.

Asymptotic Intermediate Efficiency of the Chi-square and ... - arXiv

Asymptotic Intermediate Efficiency of the Chi-square and ... - arXiv

Suggest Documents

On the Asymptotic Efficiency of ABC Estimators - arXiv

ASYMPTOTIC EFFICIENCY AND LOCAL ...

APPROXIMATE AsYMPTOTIC BAHADUR EFFICIENCY OF ...

asymptotic efficiency of the kolmogorov - smirnov test

ASYMPTOTIC NORMALITY AND EFFICIENCY OF TWO SOBOL

Asymptotic Bias and Efficiency in Case-Control

influence function and asymptotic efficiency of the affine equivariant ...

Asymptotic Power and Efficiency of Lepage-Type Tests for the ...

On the Asymptotic Convergence of the Transient and Steady ... - arXiv

films and the intermediate phase - arXiv

Asymptotic efficiency of exponentiality tests based on

INTERMEDIATE RESOLUTION SPECTROSCOPY OF THE ... - arXiv

Asymptotic optimality and efficient computation of the leave ... - arXiv

Bounds on the Size and Asymptotic Rate of Subblock ... - arXiv

Asymptotic Symmetries and Electromagnetic Memory arXiv

Asymptotic Efficiency of the PHD in Multitarget ... - Semantic Scholar

On the Asymptotic Efficiency of Approximate Bayesian ...

Raising the Efficiency Limit of the GaAs-Based Intermediate Band ...

Evaluation of the efficiency potential of intermediate band solar cells ...

Necessary and sufficient conditions for the asymptotic ... - arXiv

The Perturbed NLS Equation and Asymptotic Integrability Yair ... - arXiv

Stochastic market efficiency - arXiv

Asymptotic Theory of High-Efficiency Converters of ... - Springer Link

Asymptotic Analysis of Equivalences and Core-Structures in ... - arXiv