Minimum φ-divergence estimator for homogeneity in ... - Sankhya

10 downloads 0 Views 238KB Size Report
In this paper the minimum ϕ-divergence estimator, ̂θϕ, in the problem of homo- ... Minimum distance estimation was presented by Wolfowitz (1953) and it ...
Sankhy¯ a : The Indian Journal of Statistics 2001, Volume 63, Series A, Pt. 1, pp. 72–92

MINIMUM ϕ-DIVERGENCE ESTIMATOR FOR HOMOGENEITY IN MULTINOMIAL POPULATIONS∗ By L. PARDO and M.C. PARDO Complutense University of Madrid, Spain and K. ZOGRAFOS University of Ioannina, Greece SUMMARY. In this paper we present a new family of statistics for the problem of homogeneity when the estimation of some parameters is necessary. We consider the minimum ϕ-divergence estimator of the parameters instead of the maximum likelihood estimator and a family of statistics based on ϕ-divergence as a generalization of the usual chi-square statistic.

1.

Introduction

A problem which is frequently encountered in practice is that of deciding whether some sets of quantitative data are all derived from the same distribution. In this ³ ´t (1) (υ) context consider υ independent random samples X(1) = X1 , ..., X(1) = n1∗ ,...,X ³ ´t (υ) X1 , ..., X(υ) , of sizes n1∗ , ..., nυ∗ respectively. The question is now to denυ∗ cide if the samples X(1) , ...,X(υ) are all derived from the same distribution function F (x) = Q (X ≤ x) , x ∈ R and Q a probability measure on the real line. A way to approach the solution of this problem is to define a partition of the real³ line into´m mutually exclusive and exhaustive intervals, say I1 , ..., Im , where (i) Pr Xk ∈ Ij = pij for i = 1, ..., υ, j = 1, ..., m and k = 1, ..., ni∗ . If we de½ ¾ m t P note by ∆m = P = (p1 , ..., pm ) : pi = 1, pi ≥ 0, i = 1, ..., m , we have Pi = t

i=1

(pi1 , ..., pim ) ∈ ∆m , i = 1, ..., υ. If X(1) , ...,X(υ) are all drawn, at random, from the same distribution function F, then it is expected that pij = Q (Ij ) , for every i = 1, ..., υ and j = 1, ..., m and therefore the problem is now reduced to a problem of testing homogeneity in multinomal populations, i.e., to test the null hypothesis, Paper received October 1999; revised September 2000. AMS (1991) subject classification. 62H15, 62F12. Key words and phrases. Asymptotic distributions, ϕ-divergence, Minimum ϕ-divergence estimator, testing statistical hypotheses, homogeneity. ∗ This work was partially supported by Grant DGES PB96-0635.

minimum ϕ-divergence estimator for homogeneity

H0 : p1j = ... = pυj = Q (Ij ) = qj , j = 1, ..., m,

73

(1.1)

t

where Q = (q1 , ..., qm ) ∈ T ⊆ ∆m . The vector Q can be completely specified. In this case, T is a point set, Q = t Q0 = (q10 , ..., qm0 ) and it is known. The usual statistic for testing (1.1) is given by X 2 (Q0 ) =

υ X m 2 X (nij − ni∗ qj0 ) ni∗ qj0 i=1 j=1

(1.2)

whose asymptotic distribution under the null hypothesis H0 given by (1.1), is a chisquare with (m − 1)υ degrees of freedom. In the expression (1.2), nij represents the observed number of the components of X(i) , i = 1, ..., υ, belonging to the interval m P Ij , j = 1, ..., m and ni∗ = nij . j=1

If T = ∆m , that is to say, Q is completely unknown, the usual statistic for testing (1.1) is given by υ X m 2 ³ ´ X (nij − ni∗ n∗j /n) b = X2 Q ni∗ n∗j /n i=1 j=1

(1.3)

and its asymptotic distribution under the null hypothesis H0 given by (1.1), is a chi-square with (m − 1)(υ − 1) degrees of freedom. In the expression (1.3) n∗j , υ P j = 1, ..., m, is given by n∗j = nij . i=1

Two general families of statistics, including the statistic given in (1.2) and (1.3), for testing the null hypothesis of homogeneity stated in (1.1), can be obtained in terms of the φ-divergence measures. The φ-divergence between two discrete t t probability distributions P = (p1 , ..., pm ) and Q = (q1 , ..., qm ) , was introduced for a real convex function φ (t) , t > 0, independently by Csisz´ar (1963) and Ali and Silvey (1966), by the formula Dφ (P, Q) =

m X j=1

µ qj φ

pj qj

¶ (1.4)

where 0φ (0/0) = 0 and 0φ (u/0) = limu→∞ φ (u) /u. In Zografos et al. (1990), Pardo et al. (1993) and Morales et al. (1994) it is established, as a generalization of the statistics (1.2), that the family of statistics Tnφ (Q0 ) = 2

υ X

³ ´ ni∗ Dφ Pbi , Q0

(1.5)

i=1

with Pbi = (ni1 /ni∗ , ..., nim /ni∗ ) , φ (1) = φ0 (1) = 0 and φ00 (1) = 1 is asymptotically distributed as a chi-square random variable with υ (m − 1) degrees of freedom under t

74

l. pardo, m.c. pardo, and k. zografos

the null hypothesis given in (1.1). If we consider the function φ (x) = get Tnφ (Q0 ) = X 2 (Q0 ) .

1 2

2

(x − 1) we

In Pardo et al. (1999) it has been established, as a generalization of the statistic (1.3), that the family of statistics υ ³ ´ ³ ´ X b =2 b Tnφ Q ni∗ Dφ Pbi , Q

(1.6)

i=1

¡ ¢ b = n∗1 , ..., n∗m t , φ (1) = φ0 (1) = 0 and φ00 (1) = 1 is asymptotically diswith Q n n tributed as a chi-square random variable with (υ − 1) (m − 1) degrees of freedom un2 der the null hypothesis given in (1.1). If we consider the function φ (x) = 12 (x − 1) we get ³ ´ ³ ´ b = X2 Q b . Tnφ Q Another important problem appears if Q is specified as a function of unknown parameters, (i.e. Q lies in a subset T of ∆m ), which needs to be estimated from the experimental data. In this case Q = Q (θ) , θ ∈ Θ ⊆ RM0 and the usual statistic for testing (1.1) is given by ³ ³ ´´2 υ X m ³ ³ ´´ X nij − ni∗ qj θbn ³ ´ X 2 Q θbn = ni∗ qj θbn i=1 j=1

(1.7)

with θbn the multinomial maximum likelihood estimate of θ under the hypothesis H0 (i.e. to consider all the data as one sample with the grouping ³ ³ ´´ frequencies n∗1 , ..., n∗m ). The limiting distribution for the statistic X 2 Q θbn is again a chisquare random variable but with υ (m − 1)−M0 degrees of freedom, see for instance Ivchenko and Medvedev (1990), under the null hypothesis given in (1.1), where M0 is the number of the parameters which define the hypothetical distribution (the dimension of the vector θ). In this paper we present a double generalization of the statistic given in (1.7): Instead of dealing with the multinomial maximum likelihood estimate of θ under the hypothesis (1.1) and the statistic X 2 we shall consider the minimum ϕ-divergence estimator and a family of statistics based on the φ-divergence measures that it will contain as a particular case the statistic (1.7), respectively. 2.

Minimum ϕ-divergence Estimator: Properties

To obtain the multinomial maximum likelihood estimator (MLE), θbn , to use the test statistic (1.7) we must maximize the likelihood function for a sample of size n

minimum ϕ-divergence estimator for homogeneity

under the null hypothesis (1.1), i.e.,

m Q j=1

qj (θ)

n∗j

75

. But this is equivalent to minimize

m ³ ´ Y 1 n Dϕ Pb, Q∗ (θ) = c − log qj (θ) ∗j , n j=1

with ϕ (x) = x log x

where c is a constant which does not depend on θ. The MLE is known to be efficient in regular models but it is also known to be nonrobust. The efficiency as well as the nonrobustness are resulting from specific properties of the logarithmic function used in the definition of the function. Replacing the logarithmic function by other functions with appropiate properties, for instance the functions ϕ (x) associated to the ϕ-divergence, one obtain a new class of estimators called minimum ϕ-divergence estimator which is given by ³ ´ θbϕ = arg min Dϕ Pb, Q∗ (θ) . θ∈Θ

Therefore the MLE in the multinomial model, θbn , coincides with the minimum ϕ-divergence estimator for θ with ϕ (x) = x log x. Cressie and Read (1984) considered, in goodness-of-fit for testing H0 : P = P (θ) , as a generalization of the MLE, the minimum power-divergence estimator ³ ´ arg min Dϕ(λ) Pb, P (θ) θ∈Θ

where

¢ −1 ¡ λ+1 ϕ(λ) (x) = (λ (λ + 1)) x − x ; λ 6= 0, λ 6= −1 . ϕ(0) (x) = limλ→0 ϕ(λ) (x) , ϕ(−1) (x) = limλ→−1 ϕ(λ) (x)

(2.1)

Later, Morales et al. (1995), also in the problem of goodness-of-fit, considered the minimum ϕ-divergence estimator, ³ ´ arg min Dϕ Pb, P (θ) . θ∈Θ

In this paper the minimum ϕ-divergence estimator, θbϕ , in the problem of homogeneity is considered for the first time. Minimum distance estimation was presented by Wolfowitz (1953) and it provides a convenient method of consistently estimating unknown parameters. An extensive bibliography for minimum distance estimates can be found in Parr (1981), some additions in Read and Cressie (1988) and Morales et al. (1995). Wolfowitz was motivated by the desire to provide consistent parameter estimators in cases where other methods had not proved successful. Other desirable features of minimum distance estimators are natural robustness properties, a concrete interpretation for the value to which the estimator converges even when the model is wrong, easy of application to problems not involving symmetries or invariance properties, extremely competitive small-sample behavior in the several situations thus far explored by the Monte Carlo method in, for example, the simulation results given by Parr (1985).

76

l. pardo, m.c. pardo, and k. zografos

In the case where the model is discrete, or where the initial information about the data and hypothetical parametrized model is reduced by partitioning the observation space the minimum ϕ-divergence estimator is efficient (first order, in the sense of Rao (1961, 1973)) and robust in the sense of Lindsay (1994). For more details see Lindsay (1994) and Basu and Sarkar (1994a,b). Assuming in our model the standard regularity assumptions given by Birch (1964) as well as that the application Q : Θ → ∆m has continuous second partial derivatives in a neighborhood of the true value of the parameter, θ0 we can use the same method that the one given by Cox (1984) in Theorem 2 to establish, ³ ´−1 ³ ´³ ´ t t −1/2 θbϕ = θ0 + A? (θ0 ) A? (θ0 ) A? (θ0 ) diag Q? (θ0 ) Pb − Q? (θ0 ) °´ ³° ° ° +o °Pb − Q? (θ0 )° with

³ ´ µ ∂Q? (θ ) ¶ 0 −1/2 A? (θ0 ) = diag Q? (θ0 ) . ∂θ υm×M 0

t

If we denote by λi = limn→∞ nni∗ , i = 1, ..., υ and λ = (λ1 , ..., λυ ) , it is clear that ´ ¡ ¢ √ ³ L n Pb − Q? (θ0 ) → N 0υm×1 , diag (λ) ⊗ ΣQ(θ0 ) n→∞

t

where 0a×b is the a × b null matrix, ΣQ(θ0 ) = diag (Q (θ0 )) − Q (θ0 ) Q (θ0 ) , diag (λ) is the diagonal matrix with the vector λ in the diagonal and ⊗ is the Kronecker product between the respective matrices, given by   λ1 ΣQ(θ0 ) 0 ... 0 0  0  λ2 ΣQ(θ0 ) ... 0 0   . . . ... . . diag (λ) ⊗ ΣQ(θ0 ) =     .  . ... . 0 0 0 ... 0 λυ ΣQ(θ0 ) If we denote Qλ (θ0 ) =

t

(λ1 q1 (θ0 ) , ..., λ1 qm (θ0 ) , ..., λυ q1 (θ0 ) , ..., λυ qm (θ0 )) ,

we can see that Qλ (θ0 ) = limn→∞ Q∗ (θ0 ) . It can be seen that diag (λ) ⊗ ΣQ(θ0 ) = diag (Qλ (θ0 )) (I − A0 ) where and

¡ ¢−1 t A0 = X0 X0t diag (Qλ (θ0 )) X0 X0 diag (Qλ (θ0 ))    X0 =   

1m 0 . . 0

0 1m . . .

... ... . . 0

0 0 . . 0

0 0 . 0 1m

     

, (υm)×υ

minimum ϕ-divergence estimator for homogeneity

77

where 1m , is the m × 1 unity vector. In the following proposition we present some results which will be used in the next theorem. Proposition 2.1. If we denote by ³ ´ µ ∂Q (θ ) ¶ λ 0 −1/2 Aλ (θ0 ) = diag Qλ (θ0 ) ∂θ υm×M0 and we consider the matrices B0 = and

³ ´ ³ ´ 1/2 −1/2 diag Qλ (θ0 ) A0 diag Qλ (θ0 )

³ ´−1 t t A = Aλ (θ0 ) Aλ (θ0 ) Aλ (θ0 ) Aλ (θ0 )

then we have that: (i) A0 A0 = A0 (ii) AA = A (iii) AB0 = 0υm×υm . (iv) B0 B0 = B0. Proof. We know that ¡ t ¢¡ ¢−1 X0 diag (Qλ (θ0 )) X0 X0t diag (Qλ (θ0 )) X0 = Iυ×υ where Ia×a is the identity matrix of order a. Then A0 A0

−1

= X0 (X0t diag (Qλ (θ0 )) X0 ) X0t diag (Qλ (θ0 )) X0 −1 × (X0t diag (Qλ (θ0 )) X0 ) X0t diag (Qλ (θ0 )) = A0

and we have the first equality. We have AA = =

³ ´−1 t t Aλ (θ0 ) Aλ (θ0 ) Aλ (θ0 ) Aλ (θ0 ) ³ ´−1 t t × Aλ (θ0 ) Aλ (θ0 ) Aλ (θ0 ) Aλ (θ0 ) A.

Finally, we establish the equality (iii). We have ³ ´ ³ ´ 1/2 −1/2 B0 = diag Qλ (θ0 ) A0 diag Qλ (θ0 ) ³ ´t 1/2 1/2 = Iυ×υ ⊗ Q (θ0 ) Q (θ0 )

78

l. pardo, m.c. pardo, and k. zografos

and Aλ (θ0 ) = λ1/2 ⊗ A (θ0 ) where

³ ´ µ ∂Q (θ ) ¶ 0 −1/2 A (θ0 ) = diag Q (θ0 ) . ∂θ m×M0

Then

³ ´−1 t t AB0 = Aλ (θ0 ) Aλ (θ0 ) Aλ (θ0 ) Aλ (θ0 ) B0 . t

But if we denote V = Aλ (θ0 ) B0 , we get ³ ´t ¶ ³¡ ´µ ¢t t 1/2 1/2 V = λ1/2 ⊗ A (θ0 ) Iυ×υ ⊗ Q (θ0 ) Q (θ0 ) ³ ´t ¡ ¢t t 1/2 1/2 = λ1/2 ⊗ A (θ0 ) Q (θ0 ) Q (θ0 ) ¡ ¢t = λ1/2 ⊗ 0M0 ×m = 0M0 ×(υm) . The last equality (iv) follows in a similar way as the previous one.

2

Now we shall obtain the asymptotic distribution of the minimum ϕ-divergence estimator. Theorem 2.1. Let ϕ be convex, twice continuously differentiable in an open neighborhood of 1, ϕ (1) = 0 and ϕ00 (1) = 1. If the model verifies the Birch’s conditions, then the asymptotic distribution of the minimum ϕ-divergence estimator θbϕ is given by, ´ ³ ´ √ ³ L −1 n θbϕ − θ0 → N 0M ×1 , IF (θ0 ) n→∞

0

where IF (θ0 ) is the Fisher information matrix in the multinomial model. Proof. We know that ³ ´−1 ³ ´³ ´ t t −1/2 θbϕ = θ0 + A? (θ0 ) A? (θ0 ) A? (θ0 ) diag Q? (θ0 ) Pb − Q? (θ0 ) °´ ³° ° ° +o °Pb − Q? (θ0 )° and

´ ¡ ¡ ¢¢ √ ³ L n Pb − Q? (θ0 ) → N 0υm×1 , diag (Qλ (θ0 )) Iυm×υm − A0 . n→∞

Then

´ √ ³ L n θbϕ − θ0 → N (0M0 ×1 , W ) , n→∞

where W =

³ ´−1 ³ ´ t t −1/2 Aλ (θ0 ) Aλ (θ0 ) Aλ (θ0 ) diag Qλ (θ0 ) diag (Qλ (θ0 )) ³ ´ ³ ´−1 ¡ ¢ −1/2 t × Iυm×υm − A0 diag Qλ (θ0 ) Aλ (θ0 ) Aλ (θ0 ) Aλ (θ0 ) .

minimum ϕ-divergence estimator for homogeneity

79

It is easy to see that, W = where,

³ ´−1 ³ ´−1 ³ ´−1 t t t Aλ (θ0 ) Aλ (θ0 ) − Aλ (θ0 ) Aλ (θ0 ) V Aλ (θ0 ) Aλ (θ0 ) Aλ (θ0 ) ³ ´ ³ ´ t 1/2 −1/2 V = Aλ (θ0 ) diag Qλ (θ0 ) A0 diag Qλ (θ0 )

and by Propositon 2.1, we have V = 0. Then, ³ ´−1 t W = Aλ (θ0 ) Aλ (θ0 ) . But t

Aλ (θ0 ) Aλ (θ0 ) = = =

³¡ ´ ¢t ¢ t ¡ 1/2 λ1/2 ⊗ A (θ0 ) λ ⊗ A (θ0 ) ¡ 1/2 ¢t 1/2 t λ λ ⊗ A (θ0 ) A (θ0 ) t t 1 ⊗ A (θ0 ) A (θ0 ) = A (θ0 ) A (θ0 ) .

2

In √ the previous theorem has been proved that the minimum ϕ-divergence estimator is n−consistent, typically asymptotically normal and efficient in the classical Fisher information sense. On the other hand maximum likelihood estimators are also culminated through their large sample properties like consistency and efficiency. Hence the question which is naturally raised is: Why should we consider alternatives to the maximum likelihood estimators?. Berkson (1980), has evaluated the exact mean square error of the minimum chi-square estimator, minimum ϕ-divergence es2 timator in the special case where ϕ (x) = (x − 1) . He found that the mean square error of the minimum chi-square estimator is smaller than the respective one of the maximum likelihood estimator in all cases considered. Hence, although the maximum likelihood estimator and the minimum ϕ-divergence estimator have the same asymptotic covariance matrix, to the order n−1 , and share analogous large sample properties, there are cases where the minimum ϕ-divergence estimator works well and also it is characterized by nice robustness properties. Motivated by the above remark, it is presented in this paper a new family of statistics for the problem of homogeneity of multinomial populations when the unknown parameters are estimated by the minimum ϕ-divergence estimator instead of the MLE. The following corollary will help to establish the asymptotic distribution of the φ-divergence family of statistics for the problem of homogeneity of multinomial populations. Corollary 2.1. Under the hypotheses of the previous theorem we have ´ ³ ³ ´ ³ ´´ √ ³ ?³ ´ L 1/2 1/2 A diag Qλ (θ0 ) n Q θbϕ − Q? (θ0 ) → N 0υm×1 , diag Qλ (θ0 ) n→∞

and A is given in Proposition 2.1.

80

l. pardo, m.c. pardo, and k. zografos Proof. By the first Taylor expansion we have ´ √ ³ ? ³b ´ n Q θϕ − Q? (θ0 ) ´ ? √ ³ = ∂Q∂θ(θ0 ) n θbϕ − θ0 + oP (1) ´ ³ ³ ´−1 1/2 t = diag Q? (θ0 ) A? (θ0 ) A? (θ0 ) A? (θ0 ) ³ ´ ³ ´ t −1/2 √ n Pb − Q? (θ0 ) + oP (1) . ×A? (θ0 ) diag Q? (θ0 )

³ ´−1 t t As limn→∞ A? (θ0 ) A? (θ0 ) A? (θ0 ) A? (θ0 ) = A then ´ √ ³ ?³ ´ L n Q θbϕ − Q? (θ0 ) → N (0υm×1 , Σ1 ) n→∞

with

³ ´ ³ ´ ¡ ¢ 1/2 −1/2 Σ1 = diag Qλ (θ0 ) A diag Qλ (θ0 ) diag (Qλ (θ0 )) Iυm×υm −A0 ³ ´ ³ ´ −1/2 1/2 × diag Qλ (θ0 ) At diag Qλ (θ0 ) ³ ´ ³ ´ ³ ´ ¢ 1/2 1/2 ¡ −1/2 = diag Qλ (θ0 ) A diag Qλ (θ0 ) Iυm×υm −A0 diag Qλ (θ0 ) At ³ ´ 1/2 × diag Qλ (θ0 ) ³ ´³ ³ ´ ³ ´ ´ 1/2 1/2 −1/2 = diag Qλ (θ0 ) AA − A diag Qλ (θ0 ) A0 diag Qλ (θ0 ) A ³ ´ 1/2 × diag Qλ (θ0 ) ³ ´ ³ ´ 1/2 1/2 = diag Qλ (θ0 ) (A − AB0 A) diag Qλ (θ0 ) ³ ´ ³ ´ 1/2 1/2 = diag Qλ (θ0 ) A diag Qλ (θ0 ) . The last equality follows by Proposition 2.1 (iii). 3.

2

Test of Homogeneity based on φ-divergences and Minimum ϕ-divergence Estimator

The observed number, nij , of the components of X(i) , i = 1, ..., υ, belonging to the interval Ij , j = 1, ..., m, constitute independent multinomial random vecm P t tors (ni1 , ..., nim ) , with parameters ni∗ = nij and Pi = (pi1 , ..., pim ) , for j=1

i = 1, ..., υ. The MLE’s of pij under H0 given by (1.1), are denoted by pbij = υ υ P m P P nij /ni∗ with n = ni∗ = nij . In the sequel it will be used the notation i=1 i=1 j=1 ³ ´t t Pbi = (b pi1 , ..., pbim ) and Pb = (n1∗ /n)Pb1t , ..., (nυ∗ /n)Pbυt . In relation with the vect

tor Q (θ) = (q1 (θ) , ..., qm (θ)) , θ ∈ Θ ⊆ RM0 , we define the vector Q? (θ) =

t

((n1∗ /n)q1 (θ) , ..., (n1∗ /n)qm (θ) , ..., (nυ∗ /n)q1 (θ) , ..., (nυ∗ /n)qm (θ)) ³ ´t t t ? = q1? (θ) , ..., qmυ (θ) .

minimum ϕ-divergence estimator for homogeneity

81

As a generalization of the statistic given in (1.7) and in the same sense as the generalizations given in (1.5) and (1.6) we consider in this paper the statistic based on the φ-divergence measures given by υ ³ ³ ´´ ³ ³ ´´ ³ ³ ´´ X Tnφ Q θbϕ = 2nDφ Pb, Q? θbϕ =2 ni∗ Dφ Pbi , Q θbϕ

(3.1)

i=1

³ ´ where θbϕ = arg minθ∈Θ Dϕ Pb, Q? (θ) is the minimum ϕ-divergence estimator for θ. If we consider the function φ (x) =

1 2

2

(x − 1) and ϕ (x) = x log x, we get

³ ³ ´´ ³ ³ ´´ Tnφ Q θbn = X 2 Q θbn . On the basis of the statistic (3.1), we must reject the null hypothesis (1.1) iff ³ ³ ´´ >k Tnφ Q θbϕ where k must be chosen for getting a level α test. In situations it will be ³ some ³ ´´ φ b possible to get the exact distribution of the statistic Tn Q θϕ and then the value k. But in general this is not ³ ³ ´´possible and we must use the asymptotic distribution of the statistic Tnφ Q θbϕ . In the following theorem we present its asymptotic distribution. Theorem 3.1. Under the assumptions of Theorem 2.1, we have ³ ³ ´´ L Tnφ Q θbϕ → χ2(m−1)υ−M0 . n→∞

Proof. It is clear that ³ ´´´t ³√ ³ ´´´ ³ ³ ´´ ³√ ³ ³ Tnφ Q θbϕ = nB 1/2 Pb − Q? θbϕ nB 1/2 Pb − Q? θbϕ + oP (1) , where,

³ ´ −1 B = diag Q? (θ0 ) .

But, ´ √ ³ ? ³b ´ n Q θϕ − Q? (θ0 ) ³ ´ ³ ´ ³ ´ 1/2 −1/2 √ = diag Q∗ (θ0 ) A diag Q∗ (θ0 ) n Pb − Q? (θ0 ) + oP (1) . Then √

à n

Pb −´Q? (θ0 ) ³ Q? θbϕ − Q? (θ0 )

! L

→ N (0υm×1 , Σ)

n→∞

82

l. pardo, m.c. pardo, and k. zografos

where

µ Σ=

with

Iυm×υm L



¡ ¢¡ ¢ diag (Qλ (θ0 )) Iυm×υm − A0 Iυm×υm , Lt

³ ´ ³ ´ 1/2 −1/2 L = diag Qλ (θ0 ) A diag Qλ (θ0 ) .

So √

à n

Pb −´Q? (θ0 ) ³ Q? θbϕ − Q? (θ0 )

! L

→ N (02υm×1 , Σ? ) ,

n→∞

where ¡ ¢ ¡ ¢ ¶ diag (Qλ (θ0 )) Iυm×υm − A0 ¢ diag (Qλ (θ0 )) Iυm×υm − A0 L¢t ¡ ¡ . L diag (Qλ (θ0 )) Iυm×υm − A0 L diag (Qλ (θ0 )) Iυm×υm − A0 Lt

µ ?

Σ =

It is clear that,

³ ´´ √ ³ L n Pb − Q? θbϕ → N (0υm×1 , Σ2 ) n→∞

with Σ2 =

¡ ¢ ¡ ¢ diag (Qλ (θ0 )) I¡υm×υm − A0 − ¢ Lt diag (Qλ (θ0 )) Iυm×υm ¡ − A0 ¢ −diag (Qλ (θ0 )) Iυm×υm − A0 L + L diag (Qλ (θ0 )) Iυm×υm − A0 Lt .

Therefore

mυ ³ ³ ´´ X L Tnφ Q θbϕ → µi Zi2 n→∞

i=1

where Zi are independent normal variables with³ mean 0 and ´ variance 1 and µi , i = −1 1, ..., mυ, are the eigenvalues of the matrix diag Qλ (θ0 ) Σ2 . But the eigenvalues of this matrix are the same as the ones of the following matrix ³ ´ ³ ´ −1/2 −1/2 T ∗ = diag Qλ (θ0 ) Σ2 diag Qλ (θ0 ) . This matrix can be written in the following form, ³ ´ ³ ´ ¡ ¢ −1/2 −1/2 T ∗ = diag Qλ (θ0 ) diag (Qλ (θ0 )) Iυm×υm − A0 diag Qλ (θ0 ) ³ ´ ³ ´ ¡ ¢ −1/2 −1/2 − diag Qλ (θ0 ) L diag (Qλ (θ0 )) Iυm×υm − A0 diag Qλ (θ0 ) ³ ´ ³ ´ ¡ ¢ −1/2 −1/2 − diag Qλ (θ0 ) diag (Qλ (θ0 )) Iυm×υm − A0 Lt diag Qλ (θ0 ) ³ ´ ³ ´ ¡ ¢ −1/2 −1/2 + diag Qλ (θ0 ) L diag (Qλ (θ0 )) Iυm×υm − A0 Lt diag Qλ (θ0 ) . But ³ ´ ³ ´ −1/2 1/2 (a) diag Qλ (θ0 ) L diag Qλ (θ0 ) = A.

minimum ϕ-divergence estimator for homogeneity

83

(b) If we denote C by ³ ´ ³ ´ −1/2 −1/2 C = diag Qλ (θ0 ) L diag (Qλ (θ0 )) A0 diag Qλ (θ0 ) we have, C

=

=

³ ´ ³ ´ ³ ´ −1/2 1/2 −1/2 diag Qλ (θ0 ) diag Qλ (θ0 ) A diag Qλ (θ0 ) ³ ´ −1/2 × diag (Qλ (θ0 )) A0 diag Qλ (θ0 ) ³ ´ ³ ´ 1/2 −1/2 A diag Qλ (θ0 ) A0 diag Qλ (θ0 ) = AB0 = 0.

(c) We have, ³ ´ ³ ´ ³ ´ ³ ´ 1/2 −1/2 1/2 −1/2 diag Qλ (θ0 ) A0 diag Qλ (θ0 ) A diag Qλ (θ0 ) diag Qλ (θ0 ) =

B0 A = 0.

Then, in view of (a), (b) and (c), the matrix T ∗ = Iυm×υm − B0 − A − A + A = Iυm×υm − B0 − A is idempotent since, ¡ ¢¡ ¢ T ∗ T ∗ = Iυm×υm − B0 − A Iυm×υm − B0 − A = T ∗ . Hence the eigenvalues of T ∗ are only 1 and 0. We take the trace of T ∗ to find the number of unit eigenvalues, ¡ ¢ T race (T ∗ ) = T race Iυm×υm − ¢B0 − A ¡ = T race Iυm×υm − T race (B0 ) − T race (A) . ¡ ¢ It is clear that T race I(υm)×(υm) = υm, T race (A) = M0 and ³ ³ ´ ³ ´´ 1/2 −1/2 T race (B0 ) = T race diag Qλ (θ0 ) A0 diag Qλ (θ0 ) = T race (A ´ ³ 0) −1 = T race X0 (X0t diag (Qλ (θ0 )) X0 ) X0t diag (Qλ (θ0 )) ´ ³ −1 = T race (X0t diag (Qλ (θ0 )) X0 ) X0t diag (Qλ (θ0 )) X0 = T race (Iυ×υ ) = υ, therefore

³ ³ ´´ L Tnφ Q θbϕ → χ2(m−1)υ−M0 . n→∞

2

On the basis of this theorem we must reject the null hypothesis given in (1.1) at level α iff ³ ³ ´´ Tnφ Q θbϕ > χ2(m−1)υ−M0 ,α (3.2)

84

l. pardo, m.c. pardo, and k. zografos

³ ´ where P χ2(m−1)υ−M0 > χ2(m−1)υ−M0 ,α = α. We will present, in the sequel, an approximation for the power function of the t previous test. Let Q = (λ1 q11 , ..., λ1 q1m , ..., λυ qυ1 , ..., λυ qυm ) a point in the alternative hypothesis. If the alternative hypothesis Q it is true, we have that Pb tends in probability to Q. We denote by θa the point on Θ verifying θa = arg min Dϕ (Q, Q∗ (θ)) . θ∈Θ

³ ´ Then we have that Q∗ θbϕ tends in probability to Qλ (θa ) . In the next theorem we use the following assumption ³ ´´ ´ √ ³³ L n Pb, Q∗ θbϕ − (Q, Qλ (θa )) → N (0, Σ3 ) (3.3) n→∞

under the alternative hypothesis Q, where µ ¶ Σ11 Σ12 Σ3 = , Σ11 = diag (Q) − QQt and Σ12 = Σt21 . Σ21 Σ22 Theorem 3.2. The asymptotic power for the asymptotically α-level test given by

³ ³ ´´ Tnφ Q θbϕ > χ2(m−1)υ−M0 ;α

at the point Q of the alternative hypothesis, assuming the condition (3.3), is given by µ µ ¶¶ √ 1 1 2 ∗ ¡ ¡ ¢¢ √ χ (Q) = 1 − Φ βT φ Q b − nDφ (Q, Q (θa )) θϕ n σ 2 n (m−1)υ−M0 ;α where σ = T t Σ11 T + 2T t Σ12 S + S t Σ22 S, and

µ T t = (∇1 Dφ (Q, P ))P =Qλ (θa ) with ∇1 =

and

µ S t = (∇2 Dφ (Q, W ))W =Qλ (θa ) with ∇2 =

∂ ∂ , ..., ∂p11 ∂pυm



∂ ∂ , ..., ∂w11 ∂wυm

, ¶ .

Proof. A first order Taylor expansion of the divergence measure gives ´ ³ ³ ´ ³ ´ ³ ´´ ³ = Dφ (Q, Qλ (θa )) + T t Pb − Q + S t Q∗ θbϕ − Qλ (θa ) + Dφ Pb, Q∗ θbϕ °´ ° ° ³ ´ ³° ° ° ° ° +o °Pb − Q° + °Q∗ θbϕ − Qλ (θa )° , and the result follows from here.

2

minimum ϕ-divergence estimator for homogeneity

85

Corollary 3.1. The test given in the previous theorem is consistent in the sense of Fraser (1957), i.e., for every alternative Q it follows that ¢¢ (Q) = 1 for all 0 < α < 1. lim βT φ ¡Q¡b θ

n→∞

2

ϕ

n

The result obtained in Corollary 3.1 can be established without using the assumption (3.3), because for every alternative Q we have ³ ³ ´´ P → Dφ (Q, Qλ (θ0 )) > 0. Dφ Pb, Q∗ θbϕ n→∞

Then under the alternative hypothesis Q it follows that à ! ³ ³ ³ ´´ ´ ³ ³ ´´ χ2 (m−1)υ−M0 ,α φ 2 ∗ b b b > χ(m−1)υ−M0 ,α = P Dφ P , Q θϕ > P Tn Q θϕ →1 2n as n → ∞. ¢¢ (Q) verifies Then the power function βT φ ¡Q¡b θ ϕ

n

à ¢¢ (Q) = P βT φ ¡Q¡b θ n

³

Dφ Pb, Q

ϕ



³

θbϕ

´´ >

χ2(m−1)υ−M0 ,α 2n

! → 1.

n→∞

Now we present another approximation of the power function of the test given in (3.1). We consider local alternatives of the form (n)

H1,n : Pi t

and ci = (ci1 , ..., cim ) with

m P j=1

= Q (θ0 ) + √

1 ci , i = 1, ..., υ ni∗

(3.4)

cij = 0 for i = 1, ..., υ.

Theorem 3.3. If the assumptions of Theorem 2.1 are satisfied, then under the local³ alternatives H1,n given in (3.3), the asymptotic distribution of the statistic ³ ´´ T φ Q θbϕ is a noncentral chi square distribution with (m − 1) υ − M0 degrees of n

freedom and noncentrality parameter δ t δ with δ = Bλ (Iυm×υm − L) d, ³ ´t ³ ´ 1/2 1/2 −1/2 where d = λ1 ct1 , ..., λυ ctυ , Bλ = diag Qλ (θ0 ) and the matrix L is given in Theorem 3.1. Proof. If we denote by P (n) =

³n

1∗

n

(n)

P1 , ...,

nυ∗ (n) ´t P n υ

86

l. pardo, m.c. pardo, and k. zografos

we have

´ √ ³ ´ √ ³ ´ √ ³ n Pb − Q∗ (θ0 ) = n Pb − P (n) + n P (n) − Q∗ (θ0 ) .

But

r ¶t ´ µr n √ ³ (n) nυ∗ t 1∗ t ∗ n P − Q (θ0 ) = c , ..., c n 1 n υ

with

µr

Then

n1∗ t c , ..., n 1

r

nυ∗ t c n υ

¶t →

n→∞

³ ´t 1/2 t λ1 ct1 , ..., λ1/2 c ≡ d. υ υ

´ √ ³ L n Pb − Q∗ (θ0 ) → N (d, diag (Qλ (θ0 )) (Iυm×υm − A0 )) . n→∞

On the other hand,

³ ³ ´´ Tnφ Q θbϕ =

with X= and



X t X + oP (1) ,

³ ³ ´´ nBλ Pb − Q? θbϕ

L

X → N (Bλ (Iυm×υm − L) d, Bλ Σ2 Bλ ) n→∞

where Σ2 is the matrix appearing in Theorem 3.1. Now the result follows from Lemma on p. 63 of Ferguson (1996) provided Σ2 is a projection of rank (m − 1) υ − M0 and Bλ Σ2 Bλ δ = δ with δ = Bλ (Iυm×υm − L) d. The first of these properties is proved on Theorem 3.1. Now we shall prove that Bλ Σ2 Bλ δ = δ. We have, Bλ Σ2 Bλ δ = (Iυm×υm − B0 − A) δ = δ − B0 Bλ (Iυm×υm − L) d − ABλ (Iυm×υm − L) d = δ − B0 Bλ d + B0 Bλ Ld − ABλ d + ABλ Ld = δ − B0 Bλ d + B0 Bλ Bλ−1 ABλ d − ABλ d + ABλ Bλ−1 ABλ d. We know that AA = A by Proposition 2.1, then Bλ Σ2 Bλ δ = δ − B0 Bλ d + B0 ABλ d. But

=

µ ³ ´t ¶ ³ ³ ´´ ¡ ¢ 1/2 1/2 −1/2 Iυ×υ ⊗ Q (θ0 ) Q (θ0 ) diag λ−1/2 ⊗ diag Q (θ0 ) d µ ¶ ³ ´ ³ ´ t ¡ ¢ 1/2 1/2 −1/2 diag λ−1/2 ⊗ Q (θ0 ) Q (θ0 ) diag Q (θ0 ) d à !t m m 1/2 P 1/2 P q1 (θ0 ) c1j , ..., qm (θ0 ) cυj

=

0υm×1

B0 Bλ d = =

j=1

j=1

and by Proposition 2.1 (iii), we have Bλ Σ2 Bλ δ = δ.

minimum ϕ-divergence estimator for homogeneity 4.

87

A Simulation Study

A new family of tests of homogeneity based on φ-divergences and minimum ϕdivergence estimator (3.1) has been introduced in this paper. In this section, we are interested in answering the following question: Which is the best test statistic of this family for testing homogeneity? Of course, it is not possible to present a jointly study for all the members of the family (3.1) but the study must be made for each parametric family of functions φ and ϕ defining the test statistic and the estimator, respectively. We consider the family of functions φ(x) = ϕ(x) = ϕ(λ) (x) which is defined in (2.1). To investigate the efficiency of the family of statistics ³ ³ ´´ ³ ³ ´´ ϕ Tna1 Q θba2 ≡ Tn (a1 ) Q θbϕ(a2 ) we simulate its exact power for several parameter values for different alternatives. Then the statistic for testing the homogeneity is given by,  a1   υ X m ³ ³ ´´ X 2 p b ij Tna1 Q θba2 = ni∗ pbij  ³ ´  − 1 a1 (a1 + 1) i=1 j=1 qj θba 2

and limiting cases for a1 = 0 or a1 = −1. For a1 = 0, we have υ X m ³ ³ ´´ X Tn0 Q θba2 =2 ni∗ pbij log i=1 j=1

pb ³ ij ´ , qj θba2

and for a1 = −1,

³ ´ υ X m ³ ³ ´´ ³ ´ qj θba2 X Tn−1 Q θba2 =2 ni∗ qj θba2 log . pbij i=1 j=1

We can also observe that θba2 , verifies ³ ³ ´´ Dϕ Pb, Q∗ θba = arg min Dϕ (a2 )

where ³ Dϕ(a2 )

´ Pb, Q∗ (θ) =

2

θ∈Θ

³ (a2 )

Pb, Q∗ (θ)

´

  a2  υ X m X 1 ni∗ p b ij pbij  ³ ´  − 1 . a2 (a2 + 1) i=1 j=1 n qj θba2

In order to produce power calculations, it is necessary to choose a test size α and calculate the associate region. We calculate it using the approximation obtained in Theorem 3.1. ³ ³ ´´ The general scheme for calculating the exact power of Tna1 Q θba2 is as follows:

88

l. pardo, m.c. pardo, and k. zografos

Step 0: We fix (a) number of populations (υ) (b) sample sizes (n1∗ , ..., nυ∗ ) (c) number of unknown parameters (M0 ) (d) number of classes in the partition (m) (e) a partition of R (I1 , ..., Im ) (f ) number of simulations (N ) (g) test size (α) (h) power = 0 Step 1: Given a1 and a2 fixed, do for i = 1 until N (a) Generate υ independent random samples of sizes n1∗ , ..., nυ∗ , respectively. (b) Calculate Pb from the data of Step 1(a). ³ ´ (c) Obtain θba2 ≡ θbϕ(a2 ) minimizing on θ the function Dϕ(a2 ) Pb, Q∗ (θ) . ³ ³ ´´ greater than χ2(m−1)υ−M0 ,α where (d) If Tna1 Q θba2 ³ ´ P χ2(m−1)υ−M0 > χ2(m−1)υ−M0 ,α = α then power = power + 1 Step 2: power = power/N . The simulation study was performed using FORTRAN-90 and to carry out Step 1(c) it has been used the nag nlp module of the NAG F-90 Numerical Libraries. For our study we consider three populations (υ = 3) and as null hypothesis the ¡ ¢−1 logistic distribution with F (x, θ) = 1 + e−(x−θ) . We consider the partition of R into m = 5 quantization intervals by the zero hypothesis quantiles aj = ln

λj of orders λj = j/5, 1 − λj

1 ≤ j ≤ 4.

So M0 = 1, the vector Q (θ) = (q1 (θ) , ..., q5 (θ)) is given by q1 (θ) = F (−1.3863, θ) q2 (θ) = F (−0.4055, θ) − F (−1.3863, θ) q3 (θ) = F (0.4055, θ) − F (−0.4055, θ) q4 (θ) = F (1.3863, θ) − F (0.4055, θ) q5 (θ) = 1 − F (1.3863, θ)

minimum ϕ-divergence estimator for homogeneity and

89

³ ´t t t t Q∗ (θ) = (n1∗ /n)Q (θ) , (n2∗ /n)Q (θ) , (n3∗ /n)Q (θ) .

The exact powers are calculated against the two following alternative hypotheses: Alternative 1: Population 1 sampled from a Normal (0, 2) Population 2 sampled from a Logistic (1) Population 3 sampled from a Cauchy (0, 2) Alternative 2: Population 1 sampled from a Logistic (−0.5) Population 2 sampled from a Logistic (0) Population 3 sampled from a Logistic (0.5) All the computations presented here are based on N = 5000 replications. We generated samples of size n1∗ , n2∗ and n3∗ for different combinations of (n1∗ , n2∗ , n3∗ ) ; samples of equal sizes and of unequal sizes were³ chosen. ³ ´´For each of these cases we computed the empirical levels of Tna1 ,a2 ≡ Tna1 Q θba2 as the empirical powers. We used 0.05 value for the nominal level α. We choose a1 = −2, −1, −0.5, 0, 2/3, 1 since the corresponding statistics are the known Neyman modified X2 , the modified loglikelihood ratio, the Freeman-Tukey, the loglikelihood ratio, the power divergence and the Pearson’s X2 statistics, respectively. We choose a2 = −0.5, 0, 2/3, 1. We do not consider the -2 and -1 values since the corresponding estimators do not exit for samples with empty cells. Note that for a2 = 0 the corresponding minimum power estimator coincides with the multinomial maximum likelihood estimator. ³ ³ ´´ The empirical levels and powers of Tna1 Q θba2 are presented in Tables 1, 2 and 3 for small equal sample sizes (n1∗ = n2∗ = n3∗ = 35) , moderate unequal sample sizes (n1∗ = 45, n2∗ = 50, n3∗ = 40) and very unequal ³ sample ³ ´´sizes (n1∗ = 35, n2∗ = 60, n3∗ = 50), respectively. The perfomance of Tna1 Q θba2 in maintining nominal significance level for a1 ≤ 0 is very poor for all considered combinations of sample sizes, particulary for a1 = −2 and −1. So we focus only on the last two 2/3,a2 and Tn1,a2 since we are not going to columns of the tables corresponding to Tn allow that the empirical level of a test statistic has an absolute deviation of nominal level greater that 0.01 (a 20% of the nominal level). By comparing Tables 1, 2 and 3 one can see that, Tn1,−0.5 is the best test statistic since it has higher power than the others for both alternatives and for the three combinations of sample sizes. The 2/3,1 second best is Tn and the third one for all the cases except for 2 and ³ Alternative ³ ´´ defined in sample sizes (n1∗ = 35, n2∗ = 60, n3∗ = 50) is the classical X 2 Q θbn the Introduction. ³ ³ Table ´´ 3 shows that there are five test statistics better than the for Alternative 2 and very unequal sample sizes. classical X 2 Q θbn

90

l. pardo, m.c. pardo, and k. zografos ¡ ¡

Table 1. Empirical levels and powers of the Tna1 Q θba2 for given values of parameters a1 and a2 and the sample sizes n1∗ = n2∗ = n3∗ = 35.

¢¢

2/3,a2

tests

a2

Tn−2,a2

Tn−1,a2

Tn−0.5,a2

Tn0,a2

Tn

Tn1,a2

Null hypothesis

−0.5 0 2/3 1

0.2344 0.2384 0.2458 0.2484

0.127 0.1302 0.1324 0.1348

0.0906 0.0922 0.0938 0.096

0.0688 0.0684 0.0692 0.07

0.054 0.05 0.0486 0.0488

0.0532 0.049 0.0478 0.0482

Alternative hypothesis 1

−0.5 0 2/3 1

0.849 0.856 0.8658 0.8728

0.779 0.7862 0.7976 0.8034

0.753 0.7562 0.7654 0.7702

0.7342 0.7294 0.7342 0.7378

0.7364 0.7254 0.7196 0.7206

0.7496 0.731 0.7226 0.722

Alternative hypothesis 2

−0.5 0 2/3 1

0.5454 0.551 0.5602 0.5628

0.4082 0.4112 0.4172 0.4208

0.3552 0.3562 0.36 0.364

0.31 0.3086 0.3104 0.3124

0.2852 0.2772 0.275 0.2754

0.2858 0.2784 0.273 0.273

¡ ¡

Table 2. Empirical levels and powers of the Tna1 Q θba2 for given values of parameters a1 and a2 and the sample sizes n1∗ = 45 ,n2∗ = 50, n3∗ = 40.

¢¢

2/3,a2

tests

a2

Tn−2,a2

Tn−1,a2

Tn−0.5,a2

Tn0,a2

Tn

Tn1,a2

Null hypothesis

−0.5 0 2/3 1

0.1876 0.1904 0.195 0.1968

0.1118 0.1142 0.116 0.1168

0.0824 0.0828 0.0836 0.085

0.0636 0.0634 0.0636 0.0646

0.0548 0.054 0.0532 0.0538

0.055 0.0532 0.052 0.0514

Alternative hypothesis 1

−0.5 0 2/3 1

0.8844 0.8924 0.9022 0.9062

0.8466 0.8506 0.8608 0.8646

0.8352 0.8366 0.842 0.847

0.8312 0.8296 0.8316 0.8342

0.8366 0.8302 0.8256 0.8274

0.8458 0.8356 0.83 0.83

Alternative hypothesis 2

−0.5 0 2/3 1

0.5598 0.5656 0.5724 0.5758

0.4518 0.4556 0.461 0.4638

0.4018 0.403 0.4058 0.4086

0.3708 0.3694 0.371 0.3724

0.3528 0.3484 0.3464 0.3464

0.3574 0.3514 0.347 0.3468

So the conclusion seems clear: we have found two unknown test statistics which are better, in all considered cases to test homogeneity when it is necessary to estimate some unknown than the most used test statistic for testing ³ ³ parameters, ´´ 2 b homogeneity, X Q θn . Then two new interesting test statistics emerge as a good alternative to the classical one. The best is obtained by considering the minimum power divergence estimator for a2 = −0.5, i.e., the minimum Matusita distance estimator (Matusita (1954)) and the power divergence statistic for a1 = 1, i.e., the chi square statistic. The second one is obtained considering the minimum power divergence estimator for a2 = 1, i.e., the minimum chi square estimator and the power divergence estatistic for a1 = 2/3, i.e., the well-known test statistic as alternative to the classical ones in goodness-of-fit (Read and Cressie (1988)).

minimum ϕ-divergence estimator for homogeneity ¡ ¡

Table 3. Empirical levels and powers of the Tna1 Q θba2 for given values of parameters a1 and a2 and the sample sizes n1∗ = 35 ,n2∗ = 60, n3∗ = 50.

¢¢

2/3,a2

91

tests

a2

Tn−2,a2

Tn−1,a2

Tn−0.5,a2

Tn0,a2

Tn

Tn1,a2

Null hypothesis

−0.5 0 2/3 1

0.1768 0.1796 0.1834 0.187

0.1016 0.1032 0.106 0.1076

0.0774 0.078 0.0798 0.0818

0.0618 0.0612 0.062 0.0628

0.0548 0.0536 0.0536 0.0536

0.0548 0.0536 0.052 0.052

Alternative hypothesis 1

−0.5 0 2/3 1

0.912 0.9194 0.9294 0.9324

0.8906 0.8952 0.9028 0.908

0.887 0.888 0.8932 0.8982

0.8862 0.8848 0.8872 0.8902

0.897 0.8904 0.8876 0.889

0.9024 0.8962 0.8928 0.8924

Alternative hypothesis 2

−0.5 0 2/3 1

0.5436 0.5486 0.5562 0.5584

0.4346 0.437 0.4416 0.4454

0.3876 0.3882 0.3924 0.394

0.3582 0.357 0.3586 0.3614

0.3354 0.3316 0.3304 0.3314

0.3368 0.3292 0.3272 0.3262

Note that if we allow that the empirical level of a test statistic has an absolute deviation of nominal level greater that 0.02 (a 40% of the nominal level) then we consider as acceptable test statistics the corresponding to the last three columns of 2/3,a2 and Tn1,a2 . In this case, the conclusion for Alternative 1 the tables, i.e., Tn0,a2 , Tn is the same as before. In relation with Alternative 2, the conclusion changes since the best one for the three combinations of sample sizes is Tn1,0 instead of Tn1,-0.5 . Moreover, if we order the considered test ³statistics ³ ´´ from the best to the worst according to their powers the classical X 2 Q θbn is the sixth for small equal sample sizes and the seveth for the other cases. References Ali, S.M. and Silvey, S.D. (1966). A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society, Series B, 28, 131–142. Basu, A. and Sarkar, S. (1994a). Minimum disparity estimation in the errors-invariables model, Statistics and Probability Letters, 20, 69–73. Basu, A. and Sarkar, S. (1994b). The trade-off between robustness and efficiency and the effect of model smoothing, Journal of Statistical Computation and Simulation, 50, 173–185. Berkson, J. (1980). Minimum chi-square, not maximum likelihood! Annals of Statistics, 8, 457–487. Birch, M.W. (1964). A new proof of the Pearson-Fisher theorem, Annals of Mathematical Statistics, 35, 817–824. Cox, C. (1984). An elementary introduction to maximum likelihood estimation for multinomial models: Birh’s theorem and the delta method,American Statistician, 38, 283–287. Cressie, N. and Read, T.R.C. (1984). Multinomial goodness of fit test, Journal of the Royal Statistical Society, Series B, 46, 440–464. ´ r, I. (1963). Eine Informationstheoretische Ungleichung und ihre Anwendung auf den Csisza Beweis der Ergodizit¨ at von Markoffschen Ketten, Publications of the Mathematical Institute of Hungarian Academy of Sciences, Series A, 8, 85–108. Ferguson, T.S. (1996). A Course in Large Sample Theory. Chapman & Hall, London. Fraser, D.A.S. (1957). Nonparametric Methods in Statistics. Wiley, New York. Ivchenko, G. and Medvedev, Y. (1990). Mathematical Statistics. Mir, Moscow.

92

l. pardo, m.c. pardo, and k. zografos

Lindsay, B.G. (1994). Efficiency versus robustness: The case for minimum Hellinger distance and other methods, Annals of Statistics, 22, 1081–1114. Matusita, K. (1954). On the estimation by the minimum distance method, Annals of the Institute of Statistical Mathematics, 5, 59–65. ´ , M. and Menendez, M.L. (1994). Asymptotic properties of Morales, D., Pardo, L., Salicru divergence statistic in a stratified random sampling and its applications to test statistical hypotheses, Journal of Statistical Planning and Inference, 38, 201–224. Morales, D., Pardo, L. and Vajda, I. (1995). Asymptotic divergence of estimates of discrete distributions, Journal of Statistical Planning and Inference, 48, 347–369. ´ , M. and Menendez, M.L. (1993). The φ-divergence statistics Pardo, L., Morales, D., Salicru in bivariate multinomial applications including stratification, Metrika, 40, 223–235. Pardo, L., Pardo, M.C. and Zografos, K. (1999). Homogeneity for multinomial populations based on φ-divergences, Journal of the Japan Statistical Society, 29, 213–228. Parr, W.C. (1981). Minimum distance estimation: a bibliography, Communications in Statistics, Theory and Methods, 10, 1205–1224. − − −− (1985). Minimum distance estimation. In Encyclopedia of Statistical Sciences, 5, S. Kotz and N.L. Johnson, eds., Wiley, New York, 529–532. Rao, C.R. (1961). Asymptotic efficiency and limiting information. In Proceeding of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1, University of California Press, Berkeley, 531–545. − − −− (1973). Linear Statistical Inference and its Applications. Wiley, New York. Read, T.R.C. and Cressie, N.A.C. (1988). Goodness of fit Statistics for discrete multivariate data. Springer-Verlag, New York. Wolfowitz, J. (1953). Estimation by minimum distance method, Annals of the Institute of Statistical Mathematics, 5, 9–23. Zografos, K., Ferentinos, K. and Papaioannou, T. (1990). φ-divergence statistics: sampling properties and multinomial goodness of fit and divergence tests, Communications in Statistics, Theory and Methods, 19, 1785–1802.

K. Zografos Department of Mathematics University of Ioannina Probability Statistics and Operational Research Section 451 10 Ioannina, Greece E-mail: [email protected]

L. Pardo M.C. Pardo Department of Statistics Faculty of Mathematics Complutense University of Madrid 28040 Madrid, Spain E-mail: Leandro [email protected] [email protected]

Suggest Documents