The Index of Dispersion Test for the Bivariate Poisson Distribution

0 downloads 0 Views 225KB Size Report
classical Pearson chi-squared goodness-of-fit and the Γ-test recently ... Kathirgamatamby, 1953) showed that the chi-squared distribution is a reasonably good.
42, 941-948 December 1986

B io m e tr ic s

The Index of Dispersion Test for the Bivariate Poisson Distribution S. Loukas Department of Mathematics, University of Ioannina, Ioannina 45332, Greece and C. D. Kemp Department of Statistics, University of St Andrews, North Haugh, St Andrews, Fife K Y I 6 9SS, Scotland

Su m m ary

The dispersion test for a univariate Poisson distribution is extended in a natural way to the bivariate case. The asymptotic distribution of the test statistic is established and is shown by simulation to be a satisfactory approximation even for small values of the bivariate Poisson parameters. Empirical power results are obtained which show that the proposed test compares favourably with both the classical Pearson chi-squared goodness-of-fit and the Γ-test recently developed by Crockett (1979, in Interactive Statistics, D. McNeil (ed.), North-Holland, Amsterdam).

1. Introduction Over the past two decades there has been an increasing interest in bivariate discrete distributions. The bivariate Poisson distribution (BPD) plays a central role here (much as the Poisson distribution does in the univariate case) and forms the least complex model for a wide variety of processes (see, e.g., Johnson and Kotz, 1969; Cox and Isham, 1980). Very briefly, the probability generating function of the BPD, which was first discussed by Campbell (1934) and Aitken (1936), is gx\,x2(u, v) = exp[(0i - 0 X2) ( u - l) + (0 2 - 0 i2) ( u - l) + 0i2( u v - 1)],

0i,02> 0 i 2>O.

The marginal distributions are Poisson (0i) and Poisson (02), and the correlation coefficient is limited to the range 0 ^ p ^ min{(01/02) 1/2, (02/0i ) 1/2). In order to decide whether the BPD provides an adequate description of sample data or whether a more complicated distribution is indicated, some kind of goodness-of-fit test is needed. The application of the classical Pearson chi-squared procedure to a sample of N bivariate observations raises problems. Compared with fitting one of the marginal distri­ butions, there are far more classes to be considered, expected frequencies per class are generally much smaller, and it is not obvious how best to group classes in order to avoid excessively low expected frequencies. In the univariate situation the index of dispersion, due to Fisher, Thornton, and Mackenzie (1922), is commonly used to test the hypothesis that observations have come Part of this work was carried out at the University of Bradford.

Key words: Bivariate Hermite distribution; Bivariate Poisson distribution; Chi-squared goodness-offit; Crockett’s 7-test; Index of dispersion test; Power of a test; Simulation. 941

942

,

Biometrics December 1986

from a Poisson population against alternatives with variance greater than mean. Thus, if there are N independent random variables X u . . . , X N each following the same Poisson law, the statistic / =

Σ

i=l

(X - x )2/x

is approximately distributed as chi-squared with (N — 1) degrees of freedom. Subsequent research (see, e.g., Cochran, 1936; Sukhatme, 1938; Hoel, 1943; Lancaster, 1952; Kathirgamatamby, 1953) showed that the chi-squared distribution is a reasonably good approximation when the Poisson parameter is not less than unity. Gbur (1981) discussed different types of alternative hypothesis. Perry and Mead (1979) considered / in the context of spatial pattern. The main object of the present paper is to propose a specific test for the BPD based on a natural extension of the univariate index of dispersion. Crockett (1979) recently proposed the test statistic (in our notation) T = N ( X 2Z ? - 2S2\2Z\ Z2 + X ]Z l)/2 {X ]X l - S i2\ where Z\ — S] — X x, Z 2 = S 2 — X 2; X u X 2 are the sample means; S 2\, S 2, the sample variances; and S \2 the sample covariance. He carried out a simulation study to confirm that T is asymptotically distributed as χ 22]. Note that there is a misprint in Crockett’s formula (9). His expression for Thas S \2 instead of S]2 in the numerator. The statistic / B, which we call the bivariate index of dispersion, and its asymptotic distribution are developed in Section 2. The approximate distribution of the test is empirically investigated in Section 3, and Monte Carlo results for the power of the test under certain simple alternatives are given in Section 4. The chi-squared goodness-of-fit and Crockett’s T statistics are compared with / B and the superiority of the proposed test is clearly indicated. 2. The Bivariate Dispersion Test Teicher (1954) proved that the limiting distribution of the multivariate Poisson distribution is, under certain conditions, the multivariate normal. In the bivariate case this becomes the following theorem. Theorem 1 If (X x, X 2\ has the BPD and θχ —>oo5 02 —> oo5 012 —►oo such that θ\2/>Ιθ\θ2 —> p = constant, then the limiting distribution of

(Yu Y2)

= [(X, - 0,)/^,

(X2 - e2)/Vo~2]

is the standardized bivariate normal distribution with zero marginal means, unit marginal variances, and correlation coefficient p ^ 0. It is well known that if (Zi, Z2) is distributed as the standardized bivariate normal distribution then X — (Z? — 2 p Z \Z i + z D / i i - p 2) follows the chi-squared distribution with 2 degrees of freedom (Johnson and Kotz, 1972, p. 87). Now consider TVpairs (X u, Xu), i= 1 , . . . , N of independent random variables following the BPD and let Is = Σ ( Y i - 2pYuYii + H W i—1

~ p2),

943

The Dispersion Test for the Bivariate Poisson

where Υβ = (Χβ - 07) /\/^ , j = 1 , 2 ; / = , N, and p = 012/V0i02· Then, under the limiting conditions of Theorem 1, 7B is approximately distributed as chi-squared with 2N degrees of freedom. / B forms the basis for our test for departures from the BPD against alternatives which involve an increase in the generalized variance, σ \σ \ — σ?2 . A more detailed discussion of these alternatives is given in Section 4. In the usual practical situation where the BPD parameters are unknown, we define n

Γ( χ „ - x )2 2 S n (X iiz ji_ M i

I, L

X.

/b= =

^1^2 1

- 2

l

i ^ ) 21 *2

J

-Sh/XiX2

sh + XS%)/(XiX2- sh)

and reduce the degrees of freedom of the approximating χ 2 accordingly. It is this version of IB which we shall consider from here onwards. Replacing all the parameters by their m oment estimates makes both / Band T very simple to compute. It is well known that X\ and X 2 are also the maximum likelihood estimators of 0i and 02, but the efficiency of S 12 as an estimator of 0i2 decreases as p increases (Holgate, 1964). However, iteration is required in order to obtain the maximum likelihood estimate and thus the tests would lose their essential simplicity if we required the use of maximum likelihood. Note that in practice when the degrees of freedom υ of the calculated / B are large we found it convenient to use the Wilson-Hilferty normal approximation Z = [(I B/ v ) u 3 - 1 + 2/9v](9v/2)'/2.

3. The Accuracy of the Chi-Squared Approximation We have examined the accuracy of the chi-squared approximation to the distribution of / B by a random sampling experiment. This approach was used inter alia by Sukhatme (1938), Lancaster (1952), and Bennett (1959) for the univariate index of dispersion. Other possible approaches include (i) comparing m oment ratios, the moments themselves being infinite series (Hoel, 1943); and (ii) calculating the exact moments and hence, by curve fitting, obtaining probability integrals which may be compared with those of chisquared (Kathirgamatamby, 1953). Approach (i) is exceedingly laborious and of question­ able accuracy even for the much simpler univariate index of dispersion case, whilst (ii) proved too tedious for this bivariate study. Following Kemp and Loukas (1978), we drew 200 sets of N (N = 50, 100, 500) random samples from the BPD for each of three sets of relatively small parameters. For each set of samples, the parameters were estimated using the method of moments and the bivariate expected and observed frequency table was constructed. The index I b was then calculated. The usual chi-squared goodness-of-fit statistic X 2 = Σ (O, - Ei)2/E i, i—1 where Oi9 E, are the observed and expected frequencies of the ith group, / = 1 , . . . , k, was also computed for comparison. As indicated earlier, computing X 2 in the bivariate situation is not so straightforward. We have adopted the empirical procedure (suggested to one of us by Professor J. B. Douglas some years ago) of first ordering the expected frequencies and then, where necessary, grouping to ensure that the expected num ber in any group does not fall below unity.

,

944

Biometrics December 1986 Table 1

Observed and expected frequencies of Fx2(·) calculated from the index / B and the classical X 2 statistic for samples of N from three bivariate Poisson distributions Observed (Parameters)

N

F A ·)

0-0.1 -0.3 -0.5 -0.7 -0.9 -1.0 Uniform fit X[5] xf5] 0-0.1 100 -0.3 -0.5 -0.7 -0.9 -1.0 Uniform fit X xf5] p5] 0-0.1 500 -0.3 -0.5 -0.7 -0.9 -1.0 Uniform fit X X[5] ?5] 50

(1.35, •9, -1)

(3.5, 3.25, 1)

( 6 , 6 , 1)

X2

Ib

X2

h

X2

Expected

20 29 45 43 46 17 5.22

9 38 42 39 46 26 8.98

21 42 46 34 36 21 2.40

12 31 38 49 41 29 11.42

15 44 37 39 43 22 2.32

7 32 47 41 51 22 14.52

20 40 40 40 40 20

20 39 41 34 38 28 4.25

14 32 37 47 48 22 6.65

19 33 42 39 36 31 7.85

13 36 44 43 43 21 3.75

17 41 39 38 40 25 1.85

21 29 30 46 48 26 9.88

20 40 40 40 40 20

13 41 35 50 43 18 6.02

20 35 38 37 46 24 2.65

21 36 33 35 49 26 6.12

18 52 32 37 41 20 5.65

20 33 39 48 40 20 2.85

22 34 31 42 51 20 6.25

20 40 40 40 40 20

If the x 2 distribution is a good approximation to the distribution of a random variable Y, then the distribution of FAY)=

f

Jo

f ( x 2) d ( X2)

should be approximately uniform. In Table 1 we give expected and observed values for i v ( / B) and Fxz(X2), together with the usual chi-squared statistics for goodness-of-fit of the uniform distribution to Fx2(·)· These results suggest that / B is reasonably satisfactory even when the parameters are small. They also suggest that the chi-squared distribution may be a better approximation to the / B distribution than to the distribution of the classical chi-squared goodness-of-fit test statistic. 4. Alternative Hypotheses and Some Empirical Power Results In order to examine the power of the / B test to detect departures from the BPD we need to specify an alternative distribution. In the univariate case a practically important alternative to the Poisson distribution is the class of compound (mixed) Poisson distributions (CPDs) obtained by allowing the Poisson parameter itself to be distributed. The common CPDs are also generalized Poisson distributions (in the sense of Gurland, 1957) and these can be given a clustering interpretation. A characteristic property of CPDs is that the variance exceeds the mean; i.e., for mixed mean, compounding increases the variance. The index of dispersion / is well suited to detect this condition. In the bivariate situation we can construct bivariate compound Poisson distributions (BCPDs) and now the characteristic feature is an increase in the generalized variance for

The Dispersion Test for the Bivariate Poisson

945

fixed marginal means. The classical (and simplest) method of compounding is to replace 0,, 02, and 0i2 by λ0ι, X02, and X0,2, and then to allow λ to be distributed. It is straightforward to show that the BCPD has first- and second-order moments μι = μθ\,

μ2 = μβ2,

σ2 — σ2θ j +

σ22 — ο 2 -

λ -

ο ΐ2 — σ 2β

where μ and σ 2 are the mean and variance of the distribution of λ. In the present context of deciding whether the BPD is a good fit, we require μ = 1 so that the marginal means are unchanged. Then the ratio of the generalized variance (GVR) of the CBPD to that of the BPD is GVR = 1 +

σ2(0, + θ 2 ~ 2θη ) ι - θ}2/ θ, θ2 ·

So, provided the distribution of λ is not degenerate, compounding increases the generalized variance. Substituting the theoretical values for μ,, μ2, σ2, σ2, and σ,2 into / Β gives 2N 1 +

r2 0i -h 02 2 1 - (σ 0,02 + 012) 70102 .

Similarly, for T we obtain Ν σ 4 (0? + el) - 2(σ2θ\02 + 0,2)2 2 1 - ( + 2 a* + a$,

σ 2 = a3 + 4 a4 + a5, σ,2 = a5.

Models for this distribution include both mixing and generalizing the BPD and it may be regarded as a limiting case of a num ber of CBP distributions (see Kemp and Papageorgiou, 1982). We have estimated the power of the / B test when the bivariate Hermite distribution is the alternative hypothesis by a sampling experiment. Berkson (1940) used this type of analysis in estimating the power of the univariate dispersion test and the approach has become a popular alternative to theoretical power studies (see, e.g., Thomas, 1951; Gartside, 1972; and Levy, 1978). Following Kemp and Loukas (1978), we drew 200 sets of N (N = 50, 100, 500) bivariate Hermite random samples for each of three sets of parameters. For the bivariate index / B, the chi-squared goodness-of-fit, and Crockett’s T, we estimated the power of each test under each alternative by the proportion of times that the null hypothesis was rejected at each of two levels of significance (a = .05, .01). The results are provided in Table 2. Our results clearly suggest that the bivariate index / B has higher power than both the classical chi-squared goodness-of-fit and Crockett’s Γ-tests when the underlying population is a bivariate distribution with GVR > 1. In particular, when the num ber of observations

Biometrics, December 1986

946

Table 2

Estimated power of / B, T, and X 2 when the actual distribution is Hermite (at, a2, a?. a4, as) (fll , U2 , #3, a 4, a 5) N

50 100

(.75, .25, .5, .15, .1)

a

Ib

T

.05 .01 .05 .01

.70 .48 .91 .80

.54 .41 .79 .64

.38 .28 .57 .38

(1,. .75, 1.25, .5, 1) T X2 Ib .78 .56 .92 .82

.58 .43 .84 .74

.32 .20 .56 .39

(2, 1.5, 2, 1.5, 1) T X2 Ib .91 .76 .99 .98

.78 .66 .98 .90

.45 .28 .75 .56

When N = 5 0 0 ,the estimated power was at least .99 in all cases.

is relatively small, a considerable difference in power exists. We therefore propose the bivariate index IB as an appropriate test statistic for detecting departures from the bivariate Poisson distribution. Apart from the structural difference between the two tests (as shown by the theoretical expressions above), one might expect / Bto perform better, particularly for small N, because of the way it approximates χ 2. / B approximates the joint distribution of (X u X 2 ) by a bivariate normal distribution, whilst T approximates the joint distribution of (S’?, S i) by a bivariate normal distribution. 5. Examples Two sets of bivariate data were tested by Crockett—the original data are reproduced in his paper. The first set comprises numbers of plants of the species Lacistema aggregatum and Protium guianense in each of 100 contiguous quadrats. These data were first given in detail and analysed by Holgate (1966) in an examination of bivariate Neyman type A distributions. Holgate obtained the data from P. Greig-Smith who collected (but did not publish) them during a much larger study of secondary rain forest in Trinidad (Greig-Smith, 1952). For this set, IB = 269.4 with 197 df, giving z = 3.306 to be compared with z(.01) = 2.33, z(.001) = 2.88; T = 9.936 (in disagreement with Crockett’s own figure) to be compared with χ^](·01) = 9.21, χ 22](.001) = 13.8. Both tests indicate that the data are not well fitted by the BPD, IB being rather more decisive than T. Holgate fitted three different bivariate generalizations of the Neyman type A which allow for clustering of the plants in various ways (one is very directly related to the BPD). He found that in all cases the conventional chi-squared statistic fell below the upper 5% point of χ 2 but pointed out that equally plausible groupings of the bivariate table lead to significant values of the goodness-of-fit statistic. The second data set of 621 observations comprises the num ber of children suffering medically-attended injuries in each of two different time periods. These data were obtained as part of a large study of childhood accidents conducted by the State of California Department of Public Health, and were detailed and used by Mellinger et al. (1965), who were seeking to establish the existence of differences in accident liability by fitting the Bates-Neyman (bivariate negative binomial) model. For these data, IB = 1787 with 1239 df, giving z = 971; T = 103.7. Both tests clearly indicate that the data are most unlikely to have come from a BPD. Using the conventional chi-squared test with grouping “by eye,” Mellinger et al. found this data

The Dispersion Test for the Bivariate Poisson

947

set to be well fitted by the Bates-Neyman model (which is obtainable by gamma-mixing an uncorrelated bivariate Poisson distribution). A cknow ledgem ent

The authors are very grateful to the referee for his helpful comments and suggestions. R

esum e

On etend naturellement a deux dimensions le test de dispersion pour une distribution de Poisson univariable. On etablit la distribution asymptotique de la statistique de test et on montre par simulation qu’elle est une approximation satisfaisante, meme pour de petites valeurs des parametres de la loi de Poisson bidimensionnelle. Des resultats empiriques concernant la puissance montrent que le test propose est un concurrent serieux du test d’ajustement classique du chi-square de Pearson et du test T developpe recemment par Crockett (1979, dans Interactive Statistics, D. McNeil (ed.), NorthHolland, Amsterdam). R

eferences

Aitken, A. C. (1936). A further note on multivariate selection. Proceedings of the Edinburgh Mathematical Society 5, 37-40. Bennett, B. M. (1959). A sampling study on the power function of the χ 2 “index of dispersion” test. Journal of Hygiene 57, 360-365. Berkson, J. (1940). A note on the chi-square test, the Poisson and the binomial. Journal of the American Statistical Association 35, 362-367. Campbell, J. T. (1934). The Poisson correlation function. Proceedings of the Edinburgh Mathematical Society 4, 18-26. Cochran, W. G. (1936). The χ 2 distribution for the binomial and Poisson series, with small expectations. Annals of Eugenics, London 7, 207-217. Cox, D. R. and Isham, V. (1980). Point Processes. London: Chapman and Hall. Crockett, N. G. (1979). A quick test of fit of a bivariate distribution. In Interactive Statistics, D. McNeil (ed.), 185-191. Amsterdam: North-Holland. Fisher, R. A., Thornton, H. G., and Mackenzie, W. A. (1922). The accuracy of the plating method of estimating bacterial populations. Annals of Applied Biology 9, 325-359. Gartside, P. S. (1972). A study of methods for comparing several variances. Journal of the American Statistical Association 67, 342-346. Gbur, E. E. (1981). On the Poisson index of dispersion. Communications in Statistics—Simulation and Computation 10, 531-535. Greig-Smith, P. (1952). Ecological observations on degraded and secondary forest in Trinidad, British West Indies. Journal of Ecology 40, 283-330. Gurland, J. (1957). Some interrelations among compound and generalized distributions. Biometrika 44, 265-268. Hoel, P. G. (1943). On indices of dispersion. Annals of Mathematical Statistics 14, 155-162. Holgate, P. (1964). Estimation for the bivariate Poisson distribution. Biometrika 51, 241-245. Holgate, P. (1966). Bivariate generalizations of Neyman’s type A distribution. Biometrika 53, 241-245. Johnson, N. L. and Kotz, S. (1969). Distributions in Statistics: Discrete Distributions. Boston: Houghton Mifflin. Johnson, N. L. and Kotz, S. (1972). Distributions in Statistics: Continuous Multivariate Distributions. New York: Wiley. Kathirgamatamby, N. (1953). Note on the Poisson index of dispersion. Biometrika 40, 225-228. Kemp, C. D. and Loukas, S. (1978). The computer generation of bivariate discrete random variables. Journal of the Royal Statistical Society, Series A 141,513-519. Kemp, C. D. and Papageorgiou, H. (1982). Bivariate Hermite distributions. Sankhya, Series A 44, 269-280. Lancaster, H. O. (1952). Statistical control of counting experiments. Biometrika 39, 419-422. Levy, K. J. (1978). An empirical study of the cube-root test for homogeneity of variance with respect to the effects of nonnormality and power. Journal of Statistical Computation and Simulation 7, 71-78.

948

,

Biometrics December 1986

Mellinger, G. D., Sylwester, D. L., Gaffey, W. R., and Manheimer, D. I. (1965). A mathematical model with applications to a study of accident repeatedness among children. Journal of the American Statistical Association 60, 1046-1059. Perry, J. N. and Mead, R. (1979). On the power of the index of dispersion test to detect spatial pattern. Biometrics 35, 613-622. Sukhatme, P. V. (1938). On the distribution of χ 2 in samples of the Poisson series. Journal of the Royal Statistical Society, Supplement 5, 75-79. Teicher, H. (1954). On the multivariate Poisson distribution. Skandinavisk Aktuarietidskrift 37, 1-9. Thomas, M. (1951). Some tests for randomness in plant populations. Biometrika 38, 102-111.

Received May 1985; revised April 1986.

Suggest Documents