Symmetry 2010, 2, 1108-1120; doi:10.3390/sym2021108 OPEN ACCESS
symmetry ISSN 2073-8994 www.mdpi.com/journal/symmetry Review
Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview Leandro Pardo 1,⋆ and Nirian Mart´ın 2 1 2
⋆
Department of Statistics and O.R., Complutense University of Madrid, 28040 Madrid, Spain Department of Statistics, Carlos III University of Madrid, 28903 Getafe (Madrid), Spain Author to whom correspondence should be addressed; E-Mail:
[email protected].
Received: 12 March 2010; in revised form: 7 May 2010 / Accepted: 10 June 2010 / Published: 11 June 2010
Abstract: In the last years minimum phi-divergence estimators (MϕE) and phi-divergence test statistics (ϕTS) have been introduced as a very good alternative to classical likelihood ratio test and maximum likelihood estimator for different statistical problems. The main purpose of this paper is to present an overview of the main results presented until now in contingency tables with symmetry structure on the basis of (MϕE) and (ϕTS). Keywords: minimum phi-divergence estimator; phi-divergence statistics; symmetry model Classification: MSC 62B10, 62H15
1. Introduction An interesting problem in a two-way contingency table is to investigate whether there are symmetric patterns in the data: Cell probabilities on one side of the main diagonal are a mirror image of those on the other side. This problem was first discussed by Bowker [1] who gave the maximum likelihood estimator as well as a large sample chi-square type test for the null hypothesis of symmetry. The minimum discrimination information estimator was proposed in [2] and the minimum chi-squared estimator in [3]. In [4–7] new families of test statistics, based on ϕ-divergence measures, were introduced. These families contain as a particular case the test statistic given by [1] as well as the likelihood ratio test.
Symmetry 2010, 2
1109
Let X and Y denote two ordinal response variables, X and Y having I levels. When we classify subjects on both variables, there are I 2 possible combinations of classifications. The responses (X, Y ) of a subject randomly chosen from some population have a probability distribution. Let pij = Pr(X = i, Y = j), with pij > 0, i, j = 1, ..., I. We display this distribution in a rectangular table having I rows for the categories of X and I columns for the categories of Y . Consider a random sample of size n on (X, Y ) and we denote by nij the observed frequency in the (i, j)th cell for ∑ ∑ (i, j) ∈ I × I with Ii=1 Ij=1 nij = n. The classical problem of testing for symmetry is given by H0 : pij = pji , (i, j) ∈ I × I
(1)
H1∗ : pij ̸= pji , for at least one (i, j) pair
(2)
versus
This problem was considered for the first time by Bowker [1] using the Pearson test statistic I ∑ I ∑ (nij − nji )2 X = nij + nji i=1 j=1 2
(3)
i 0 and
∑I ∑I i=1
j=1,(i,j)̸=(I,I) pij
< 1}
(6)
∑ ∑ and we denote p(θ) = p = (p11 , ..., pII )T , pII = 1 − Ii=1 Ij=1,(i,j)̸=(I,I) pij or equivalently by the I × I matrix p(θ) = [pij ]. The ϕ-divergence between two probability distributions p = [pij ], q = [qij ] was introduced independently by [9] and [10]. It is defined as follows: Dϕ (p, q) =
I ∑ I ∑ i=1 j=1
( qij ϕ
pij qij
) ,
ϕ ∈ Φ∗
(7)
where Φ∗ is the class of all convex functions ϕ : [0, ∞) → R ∪ {∞}, such that ϕ (1) = 0, ϕ′′ (1) > 0; and we define 0ϕ (0/0) = 0 and 0ϕ (p/0) = limu→∞ ϕ (u) /u. For every ϕ ∈ Φ∗ that is differentiable at x = 1, the function ψ given by ψ (x) = ϕ (x) − ϕ′ (1) (x − 1) also belongs to Φ∗ . Then we have Dψ (p, q) = Dϕ (p, q), and ψ has the additional property that ψ ′ (1) = 0. Because the two divergence measures are equivalent, we can consider the set Φ∗ to be equivalent to the set Φ ≡ Φ∗ ∩ {ϕ : ϕ′ (1) = 0} An important family of ϕ-divergences in statistical problems, is the power divergence family ϕ(λ) (x) =
xλ+1 − x + λ (1 − x) ; λ ̸= 0, λ ̸= −1 λ (λ + 1)
(8)
ϕ(0) (x) = lim ϕ(λ) (x) = x ln x − x + 1 λ→0
ϕ(−1) (x) = lim ϕ(λ) (x) = − ln x + x − 1 λ→−1
which was introduced and studied by [11]. Notice that ϕ(λ) ∈ Φ. In the following we shall denote the power-divergence measures by Dϕ(λ) (p, q), λ ∈ R. For more details about ϕ-divergence measures see [12]. 3. Hypothesis Testing: H0 versus H1∗ We define I(I+1) −1 2
B = {(a11 , ..., a1I , a22, ..., a2I , ..., aI−1I−1 , aI−1I )T ∈ R+
:
∑
i≤j aij
< 1, i, j = 1, .., I}
the hypothesis (1) can be written as H0 : θ = g(β),
β= (p11 , ..., p1I , p22 , ..., p2I , ..., pI−1I−1 , pI−1I )T ∈B
(9)
Symmetry 2010, 2
1111
where the function β is defined by β = (gij ; i, j = 1, ..., I, (i, j) ̸= (I, I)) with { pij , i ≤ j gij (β) = , i, j = 1, ..., I − 1 pji , i > j and gIj (β) = pjI , j = 1, ..., I − 1 giI (β) = piI , i = 1, ..., I − 1 Note that p(g(β)) = (gij (β); i, j = 1, ..., I)T , where gII (β) = 1 −
∑I
i,j=1,(i,j)̸=(I,I) gij (β)
The maximum likelihood estimator (MLE) of β can be defined as b = arg min DKL (b β p, p(g(β))) a.s. β∈B
where DKL (b p, p(g(β))) is the Kullback–Leibler divergence measure (see [13,14]) defined by DKL (b p, p(g(β))) =
I ∑ I ∑
pbij log
i=1 j=1
pbij gij (β)
b = g(θ) b and by p(θ) b = (p11 (θ), b ..., pII (θ)) b T . It is well known that pij (θ) b = pbij +bpji , We denote by θ 2 i = 1, ..., I, j = 1, ..., I . Using the ideas developed in [15], we can consider the minimum ϕ2 -divergence estimator (M ϕ2 E) replacing the Kullback–Leibler divergence by a ϕ2 -divergence measure in the following way bϕ2 = arg min Dϕ ((b θ p, p(g(β)))); ϕ2 ∈ Φ∗ (10) 2 β∈B
where Dϕ2 (b p, p(g(β))) =
I ∑ I ∑
( gij (β)ϕ2
i=1 j=1
pbij gij (β)
)
bS,ϕ2 = g(β b ϕ2 ) and we have (see [7,16]) We denote θ √
( ) L b ϕ2 )−β) −→ n(g(β N 0, I SF (β)−1 n→∞
where I SF (θ) = Σθ − Σθ B(θ)T (B(θ)ΣTθ B(θ))−1 B(θ)Σθ ( ) ∂hij (θ) T being Σθ = diag(θ)−θθ and B(θ) = ∂θij I(I−1) . The functions hij are given by 2
hij (θ) = pij − pji ,
×(I 2 −1)
i < j, i = 1, ..., I − 1, j = 1, ..., I
It is not difficult to establish that the matrix I SF (θ) can be written as I SF (θ) = M Tβ I F (β)−1 M β
Symmetry 2010, 2
1112
where I F (β) is the Fisher information matrix corresponding to β ∈ B. If we consider the family of power divergences we get the minimum power-divergence estimator, S,λ b θ of θ, under the hypothesis of symmetry, whose expression is given by 1 ( ) λ+1 λ+1 pbλ+1 + p b ij ji 2 S,λ θbij = i, j = 1, ..., I (11) 1 , ( ) λ+1 I ∑ I λ+1 λ+1 ∑ pbij + pbji 2 i=1 j=1 For λ = 0 we get
pbij + pbji S,0 θbij = , i, j = 1, ..., I 2 hence, we obtain the maximum likelihood estimator for symmetry introduced by [1]. For λ = −1, we obtain as a limit case 1 (b pij pbji ) 2 S,−1 b θij = I I , i, j = 1, ..., I ∑∑ 1 (b pij pbji ) 2 i=1 j=1
i.e., the minimum discrimination estimator for symmetry introduced and studied in [2]. For λ = 1 we get the minimum chi-squared estimator for symmetry introduced in [3], )1/2 ( 2 pbij + pb2ji 2 S,1 θbij = I I ( ∑∑ pb2ij + pb2ji )1/2 i=1 j=1
b We denote θ
ϕ2
2
b ) and by = g(β ϕ2
bϕ2 ) = (p11 (θ bϕ2 ), ...., pII (θ bϕ2 ))T p(θ
(12)
bϕ2 ) it is the (M ϕ2 E) of the probability vector that characterizes the symmetry model. Based on p(θ possible to define a new family of statistics for testing (1) that contains as a particular case Pearson test statistic as well as likelihood ratio test. This family of statistics is given by ) ( I ∑ I ∑ ϕ ϕ ϕ 2n 2n p b ij b 2) ≡ b 2 )) = b 2 )ϕ1 Dϕ (b p, p(θ Tnϕ1 (θ (13) pij (θ ϕ2 ϕ′′1 (1) 1 ϕ′′1 (1) i=1 j=1 b pij (θ ) We can observe that the family (13) involves two functions ϕ1 and ϕ2 , both belonging to Φ∗ . We use the function ϕ2 to obtain the (M ϕ2 E) and ϕ1 to obtain the family of statistics. If we consider ϕ1 (x) = 21 (x − 1)2 and ϕ2 (x) = x log x − x + 1 we get Pearson test statistic whose expression was given in (3) and for ϕ1 (x) = ϕ2 (x) = x log x − x + 1 we get the likelihood ratio test given by G2 = 2
I ∑ I ∑ i=1 j=1 i j) and H1 : θij ≥ 1/2 (for i > j), and ) ( n 1 n ij ij (0) (1) , θb = 1/2 and θbij = max , for i > j θbij = nij + nji ij nij + nji 2 b(0) and p b(1) are given by It follows that p (0) pbij
Then we have
nij + nji nij + nji (1) = and pbij = max 2n n
(
nij 1 , nij + nji 2
)
I ∑ I ) ∑ nij + nji ( b b b )= Dϕ (b p, p ϕ(2θij ) + ϕ(2(1 − θij )) 2n i=1 j=1 (0)
i>j
and
I ∑ I ∑ nij + nji b(1) ) = Dϕ (b p, p n i=1 j=1
( (1) θbij ϕ
(
θbij (1) θb
(
) + (1 − θbij )ϕ (1)
ij
(1 − θbij ) (1) (1 − θb )
))
ij
i>j
To solve the problem of testing H1 against H2 , [8] consider the likelihood ratio test statistic T12 = 2
I ∑ I ( ∑
) (1) (1) nij ln θbij + nji ln(1 − θbij ) − nij ln(θbij ) − nji ln(1 − θbij )
i=1 j=1 i>j
This statistic is such that b(1) ) T12 = 2nDKL (b p, p
(20)
b(1) ) is the Kullback–Leibler divergence given by (20) with ϕ(x) = ϕ(0) (x) defined where DKL (b p, p above. Then the likelihood ratio test statistic is based on the closeness, in terms of the Kullback–Leibler b and p b(1) . Thus, one could measure the divergence measure, between the probability distributions p closeness between the two probability distributions using a more general divergence measure if we are able to obtain its asymptotic distribution. One appropriate family of divergence measures for that purpose is the ϕ-divergence measure. As a generalization of the test statistic given in (20) for testing H1 against H2 we introduce the family of test statistics 2n ϕ b(1) ) T12 p, p (21) = ′′ Dϕ (b ϕ (1) To test H0 against H1 , El Barmi and Kochar [8] consider the likelihood ratio test statistic T01 = 2
I ∑ I ( ∑
) (1) (1) nij ln θbij + nji ln(1 − θbij ) − nij ln( 12 ) − nji ln( 12 )
i=1 j=1 i>j
It is clear that
( ) b(0) ) − DKL (b b(1) ) T01 = 2n DKL (b p, p p, p
As a generalization of this test statistic we consider in this paper the family of test statistics ) 2n ( ϕ b(0) ) − Dϕ (b b(1) ) T01 Dϕ (b p, p p, p = ′′ ϕ (1)
(22)
Symmetry 2010, 2
1117
ϕ ϕ ϕ ϕ = T12 and T01 = T01 , and hence the families of test statistics T12 and T01 can be If ϕ = ϕ(0) then T12 considered as generalizations of the test statistics T12 and T01 respectively. In order to get the asymptotic distribution of the test statistics given in (21) and (22), we first define the so-called chi-bar squared distribution with n degrees of freedom, denoted by χ¯2n .
Definition 4 Let U = max(0, Z) where Z ∼ N (0, 1), so that the c.d.f. of U is given by { Φ(u), u > 0 FU (u) = 0, u v) = Pr(χ2l > v) n l 2 l=0
by conditioning on L, the number of non-zero Ui s. Furthermore, like the χ2n distribution, the χ¯2n distribution is stochastically increasing with n. If V ∼ χ¯2n and V ∼ χ¯2n′ , where n < n′ , then V is stochastically smaller than V ′ : Pr(V > t) ≤ Pr(V ′ > t). This follows since V ′ ∼ V + W where W ∼ χ¯2n−n′ with V and W independent. For more details about the chi-bar squared distribution see [17]. ϕ The following theorem present the asymptotic distribution of T01 . L
ϕ Theorem 5 Under H0 , T01 −→ χ¯2K as n → ∞ where K = I(I − 1)/2.
Proof. See [7]. If we consider the family of power divergences given in (8), we have the power divergence family of test statistics defined as ϕ λ = T01(λ) T01 which can be used for testing H0 against H1 . Therefore some important statistics can now be expressed 1 λ is the Pearson test statistic , that is, T01 as members of the power divergence family of test statistics T01 −1/2 −2 2 2 2 ), (X01 ), T01 is the Freeman–Tukey test statistic (F01 ), T01 is the Neyman-modified test statistic (N M01 −1 0 2 T01 is the modified loglikelihood ratio test statistic (N G01 ), T01 is the loglikelihood ratio test statistic 2/3 (G201 ) introduced by [8] and T01 is Cressie–Read test statistic (see [11]). L
ϕ Theorem 6 Under H1 , T12 −→ χ¯2M as n → ∞, where M is the number of elements in the set {(i, j) : i > j, pij = pji } ≤ K = 21 I(I − 1) and
lim
n→∞
ϕ Pr(T12
K ( ) ∑ K 1 Pr(χ2l ≥ t) ≥ t) ≤ K l 2 l=0
Symmetry 2010, 2
1118
If we consider the family of power divergences given in (8), we have the power divergence family of test statistics defined as ϕ λ T12 = T12(λ) which we can use for testing H1 against H2 . −1 0 Remark 7 In the same way as previously we can obtain the test statistics T12 = G212 , T12 = N G212 , −1/2 2/3 −2 1 2 2 2 T12 = X12 , T12 = F12 , T12 = N M12 and T12 .
We will refer here to the example of [18, Section 9.5], where the test proposed by Bowker [1] is applied. The proposed tests in this paper may be used in the situation such that it is hoped that a new formulation of a drug will reduce some side-effects. Example We consider 158 patients who have been treated with the old formulation and records are available of any side-effects. We might now treat each patient with the new formulation and note incidence of side-effects. Table 1 shows a possible outcome for such an experiment. Do the data in Table 1 provide any evidence regarding a less severity of side-effects with the new formulation of the drug? BA The two test statistics given in (21) and (22) are appropriate for this problem. For Table 1. Side-effect levels for old and new formulation.
Side-effect levels-old formulation
Side-effect levels-new formulation None Slight Severe Total None 83 4 3 90 Slight 17 22 5 44 Severe 4 9 11 24 Total 104 35 19 158
ϕ the test statistic T01 given in (22) the null hypothesis is that for all off-diagonal counts in the table the associated probabilities are such that all pij = pji . The alternative is that pij ≥ pji for all i ≥ j. We λ } given in Remark 7 and the corresponding asymptotic have computed the members of the family {T01 λ 2 λ p-values P01 = Pr(χ¯3 > T01 ) which are given in the following table: λ . Table 2. Asymptotic p-values for T01
λ λ T01 λ P01
−2 −1 −1/2 14.43 11.48 10.58 0.001 0.003 0.004
0 2/3 9.96 9.46 0.006 0.007
1 9.33 0.008
On the other hand, if we consider the usual Pearson test statistic X 2 , we have that the value of this statistic is 9.33. In this case using the chi-squared distribution with 3 degrees of freedom, the corresponding asymptotic distribution found by Bowker [1], Pr(χ23 > X 2 ) = 0.025. Then for all
Symmetry 2010, 2
1119
the considered statistics there is evidence of a differing incidence rate for side-effects under the two formulations, moreover this difference is towards less severe side effects under the new formulation. Therefore, the two considered tests lead to the same conclusion: There is a strong evidence of a bigger incidence rate for side-effects under the old formulation. The conclusion obtained in [18] is in accordance with our conclusion. Acknowledgements This work was supported by Grants MTM 2009-10072 and BSCH-UCM 2008-910707. References 1. Bowker, A. A test for symmetry in contingency tables. J. Am. Statist. Assoc. 1948, 43, 572–574. 2. Ireland, C.T.; Ku, H.H.; Koch, G.G. Symmetry and marginal homogeneity of an r × r contingency table. J. Am. Statist. Assoc. 1969, 64, 1323–1341. 3. Quade, D.; Salama, I.A. A note on minimum chi-square statistics in contingency tables. Biometrics 1975, 31, 953–956. 4. Men´endez, M.L.; Pardo, J.A.; Pardo, L. Tests based on ϕ-divergences for bivariate symmetry. Metrika 2001, 53, 15–29. 5. Men´endez, M.L.; Pardo, J.A.; Pardo, L. Tests for bivariate symmetry against ordered alternatives in square contingency tables. Aust. N. Z. J. Stat. 2003, 45, 115–124. 6. Men´endez, M.L.; Pardo, J.A.; Pardo, L. Tests of symmetry in three-dimensional contingency tables based on phi-divergence statistics. J. Appl. Stat. 2004, 31, 1095–1114. 7. Men´endez, M.L.; Pardo, J.A.; Pardo, L.; Zografos, K. On tests of symmetry, marginal homogeneity and quasi-symmetry in two contingency tables based on minimum ϕ-divergence estimator with constraints. J. Stat. Comput. Sim. 2005, 75, 555–580. 8. El Barmi, H.; Kochar, S.C. Likelihood ratio tests for symmetry against ordered alternatives in a square contingency table. Stat. Probab. Lett. 1995, 2, 167–173. 9. Csisz`ar, I. Eine Informationstheorestiche Ungleichung und ihre Anwendung auf den Beweis der Ergodizit¨at von Markoffschen Ketten. The Mathematical Institute of Hungarian Academy of Sciences: Budapest, Hungary, 1963; Volume 8, pp. 84–108. 10. Ali, S.M.; Silvey, S.D. A general class of coefficients of divergence of one distribution from another. J. Roy. Stat. Soc. Ser. B Stat. Met. 1966, 28, 131–142. 11. Cressie, N.; Read, T.R.C. Multinomial goodness-of-fit tests. J. Roy. Stat. Soc. Ser. B Stat. Met. 1984, 46, 440–464. 12. Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall/CRC: New York, NY, USA, 2006. 13. Kullback, S. Information Theory and Statistics; John Wiley: New York, NY, USA, 1959. 14. Kullback, S. Marginal homogeneity of multidimensional contingency tables. Ann. Math. Stat. 1971, 42, 594–606. 15. Morales, D.; Pardo, L.; Vajda, I. Asymptotic divergences of estimates of discrete distributions. J. Stat. Plan. Infer. 1995, 48, 347–369.
Symmetry 2010, 2
1120
16. Pardo, J.A.; Pardo, L.; Zografos, K. Minimum ϕ-divergence estimators with constraints in multinomial populations. J. Stat. Plan. Infer. 2002, 104, 221–237. 17. Robertson, T.; Wrigt, F.T.; Dykstra, R.L. Order Restricted Statistical Inference; Wiley: New York, NY, USA, 1988. 18. Sprent, P. Applied Nonparametric Statistical Methods; Chapman & Hall: London, UK, 1993. c 2010 by the authors; licensee MDPI, Basel, Switzerland. This article is an Open Access article ⃝ distributed under the terms and conditions of the Creative Commons Attribution license http://creativecommons.org/licenses/by/3.0/.