ISSN (Print): 0974-6846 ISSN (Online): 0974-5645
Indian Journal of Science and Technology, Vol 8(4), 349–357, February 2015
DOI: 10.17485/ijst/2015/v8i1/59610
Comparison of Test Statistic for Zero-Inflated Negative Binomial against Zero-Inflated Poisson Model B. Muniswamy1*, Dejen Tesfaw Molla2 and N. Konda Reddy3 Department of Statistics, Andhra University, Visakhapatnam, India;
[email protected] 2 Department of Statistics, Addis Ababa University, Addis Ababa, Ethiopia 3 Department of Mathematics, KL University, Guntur, Andhra Pradesh, India
1
Abstract In this study, the existence of score test and alternative tests were studied for testing the overdispersion parameter after including covariates in ZINB against ZIP models. The power of the three tests for different degrees of overdispersion parameter and various sample sizes were also obtained through Monte Carlo simulation. We have also presented the power of the three tests to detect the overdispersion problem in ZINB regression model. From the simulation result, it was observed that, the score test is more effective than the LRT and Wald test for ZINB regression model because of its higher empirical power. The simulation result also showed that the ZIP model is more appropriate when the value of overdispersion is small while ZINB regression model is more appropriate for dataset that contain high overdispersion values. However, the AIC and BIC were included to choice these two models.
Keywords: Likelihood Ratio Test, Score Test, Wald Test, Zero-Inflated Poisson Model, Zero-Inflated Negative Binomial model
1. Introduction
For a random variable y representing counts where sample mean and sample variance are equal, the Poisson regression model is used as a standard model for analysing the count data. Quite often, count data exhibit substantial variations where the sample variance is either smaller or larger than the sample mean and it is classified as under-dispersion or overdispersion, respectively. This indicates that Poisson regression is not adequate. There are two common causes that can lead to overdispersion: additional variation to the mean or heterogeneity, a negative binomial model is often used and other cause counts with excess zeros or zero-inflated counts, since the excess zeros will give smaller conditional mean than the true value, this can be modeled by using zeroinflated Poisson or Zero-inflated negative binomial. In the literature of statistical modeling for counts a number of models and associated estimation methods have been proposed to handle overdispersed and zero-inflated count data, *Author for correspondence
for example, negative binomial and mixed Poisson regression models (Cameron and Trivedi1; Dean and Lawless4 and a large proportion of zero values is observed17. In such a case, ZIP regression model is an appropriate approach for analyzing an outcome variable having excess number of zero observations8,9,11,15,16. Moreover, overdispersion and zero-inflation problem that occur in Poisson regression model in data sets. In such cases, ZINB regression model is an alternative method 8,14,13,17 . Also, the Poisson hurdle model and negative binomial hurdle model7,16, and zero-inflated generalised Poisson model2,5,3 are widely used in the analysis of zero-inflated data. This study concentrated on the use of ZINB models for analysis of counts data including maximum likelihood estimation for overdispersion parameter using Fisher scoring method. The Wald, LRT and Score tests can be used to explore the ZIP and ZINB models for the same count dataset. The goal is to help researchers understand the ideas behind the power of test procedures and choose the best tests for count dataset in terms of its power.
Comparison of Test Statistic for Zero-Inflated Negative Binomial against Zero-Inflated Poisson Model
2. Zero-Inflated Negative Binomial Regression Model Let y be a nonnegative response count variable. Thus, the ZINB distribution can be defined as: −1
d yi = 0 wi + (1 − wi )(1 + dmi ) , P (Yi = yi ) = 1 −1 Γ ( yi + d ) (1 + d mi ) d (1 + 1 dm )− yi , yi > 0 (1 − wi ) i yi ! Γ ( 1 d )
(1) where, δ > 0 is a dispersion parameter and is assumed not to depend on covariates. Here, δ > 0 is set to model over dispersion case. The mean and variance of the ZINB regression model are E(Yi ) = (1 − ωi )µi , Var(Yi ) = (1 − ωi )(1 + ωi µi + δµi )µi (2)
Also, the parameters μ1 and ω1 depend on vectors of covariates x1 and z1 respectively. More specifically, we can write the models as:
log( mi ) = xiT b ,
w log i = ziT g 1 − wi
since, y y Γ ( yi + 1 d ) −y 1 = ∏ ( y + d − k ) = d ∏ (d y − d k + 1) Γ ( 1d ) k =1 k =1
furthermore, l can be written as n
l = ∑ I( yi =0) log wi + (1 − wi )(1 + dmi )
i =1
3. Maximum Likelihood Estimation of ZINB Model The ZINB distribution is not a standard GLM type exponential family distribution, even when the overdispersion parameter δ is known, and standard GLM fitting methods are not applied. To obtain the parameter estimates of ZINB regression models, d˘, b˘ and g˘ the Newton-Raphson method or the method of Fisher scoring can be used. However, the method of scoring is more appropriate for ZINB regression because the second derivative l = l(δ, μi, ωi, yi) can be simplified by taking expectations. Based on the ZINB model (1) and the model for μ and ω displayed in (3). The log-likelihood function, l = l(δ, μi, ωi, yi) for the ZINB model is given below:
−1 I( yi =0) log(wi + (1 − wi )(1 + dmi ) d ) n Γ ( yi + 1 d ) l = ∑ + I( yi >0) log (1 − wi ) yi ! Γ( 1d ) i =1 − yi −1 (1 + dmi ) d 1 + dm1 i 350
Vol 8(4) | February 2015 | www.indjst.org
log(1 − wi ) − log yi ! yi + I ( yi > 0 ) 1 + ∑ log(d yi − d k + 1) − ( yi + d )log(1 + dmi ) + yi log( mi ) k =1 (4)
Using (3) we have the following relationship
∂m i = exp(xiT b )xij = mi xij ∂b j
(5)
∂wi exp(ziT g )zir exp(ziT g ) 1 zir = = ∂g r (1 + exp(ziT g ))2 1 + exp(ziT g ) 1 + exp(ziT g ) = wi (1 − wi )zir (6) The first derivatives of the log-likelihood function with respect to the underlying parameters are:
lb′ j =
)
d
(3)
this distribution reduces to zero-inflated Poisson distribution in δ → 0
(
−1
∂l ∂l ∂m i = ∂b j ∂m i ∂b j
−(1 − w )m (1 + dm )−(1+d ) d n i i i = ∑ I ( yi = 0 ) −1 d i =1 wi + (1 − wi )(1 + dmi )
+ I( y >0) yi − mi xij i 1 + dmi
(7)
lg′ r =
∂l ∂l ∂wi = ∂g r ∂wi ∂g r
−1 wi (1 − wi )(1 − (1 + dmi ) d ) = ∑ I ( yi = 0 ) − I( yi >0)wi zir −1 wi + (1 − wi )(1 + dmi ) d i =1
n
(8) −1 (1 − wi )(1 + dmi ) d I ( yi = 0 ) −1 wi + (1 − wi )(1 + dmi ) d n log(1 + d mi ) mi ∂l = ∑ − ld′ = 2 ∂d i =1 1 + d ( dm ) d i yi 1 )m y + ( − 1 + y k log( dm ) d i i i + I ( y > 0) ∑ + − i i (1 + dmi ) d yi − d k + 1 d2 1 k =
(9) and the second derivatives of the log-likelihood are given by: Indian Journal of Science and Technology
B. Muniswamy, Dejen Tesfaw Molla and N. Konda Reddy
∂lb′ j ∂lb′ ∂ 2l ∂ 2l k = = = ∂b j ∂b k ∂b k ∂b j ∂b k ∂b j
∂lg′ ∂l ′ ∂ 2l ∂ 2l = r = d = ∂g r ∂d ∂g r ∂d ∂g r ∂d −1 log(1 + dm ) mi i − −wi (1 − wi )(1 + dmi ) d 2 d (1 + dmi ) d = ∑ I ( yi = 0 ) z 2 ir w + (1 − w )(1 + dm )−1) d i =1 i i i n
− (1+ d ) d I( y =0) (1 − wi )mi (1 + dmi ) i w + (1 − w )(1 + dm )−1) d i i i − ( 1 +d ) n d (1 − wi )mi (1 + dmi ) = ∑ (mi − 1) − −1) wi + (1 − wi )(1 + dmi ) d i =1 (1 + d yi )mi − I( yi >0) (1 + dm )2 i
xij xik (10)
∂lg′ ∂l ′ ∂ 2l ∂ 2l = r = d = ∂g r ∂d ∂g r ∂d ∂g r ∂d −1 log(1 + dm ) mi i − −wi (1 − wi )(1 + dmi ) d 2 d ( 1 + dm ) d i = ∑ I ( yi = 0 ) z 2 ir w + (1 − w )(1 + dm )−1) d i =1 i i i
(14) −1 (1 − wi )(1 + dmi ) d I ( y = 0) 2 −1) i d + − + ( 1 )( 1 ) w w dm i i i 2 −1 log(1 + dm ) m i − i (1 − wi )(1 + dmi ) d d (1 + dmi ) d2 2 n −1 log( 1 dm ) m + ∂2l ∂ld′ i i = ∑ − wi + (1 − wi )(1 + dmi ) d = − d (1 + dmi ) d2 ∂d 2 ∂d i =1 − 2 log(1 + dmi ) − mi (2 + 3dmi ) d3 d 2 (1 + dmi )2 2 yi 2 1 ( yi + d ) m i yi − k 2 1 a m 2 m log( ) + i i + I ( y > 0) ∑ + + 2 d y − d k + 1 − i d3 d (1 + dmi ) d (1 + dmi )2 i k =1
(15)
n
(11)
w (1 − w )(1 − (1 + dm ) −1d ) i i i n I ( yi = 0) 2 −1 d = ∑ zir zis + − + w ( 1 w )( 1 dm ) i i i i =1 −1 (1 − wi )(wi + (1 − wi )(1 + dmi ) d ) − wi − I ( yi >0)wi (1 − wi )
)
(12)
d
d
)
∂ 2l Igd = IgT d = − E ∂g r ∂d −1 log(1 + dm ) mi i − wi (1 − wi )(1 + dmi ) d d (1 + dmi ) d2 = ∑ zir −1) wi + (1 − wi )(1 + dmi ) d i =1 n
(16)
∂ 2l I bg = − E ∂b j ∂g r
∂lg′ s ∂lg′ ∂ 2l ∂ 2l = = r = ∂g r ∂g s ∂g s ∂g r ∂g s ∂g r −(1+d ) d I ( y = 0) −(1 − wi )mi (1 + dmi ) i −1 d )2 + 1 − 1 + w w dm ( ( )( ) i i i log(1 + dmi ) mi (1 + d ) wi + (1 − wi )(1 + dmi )−1/ d − 2 n d (1 + dmi d xij ∑ i = 1 log( 1 + dm ) m i i −[(1 − w )(1 + dm )−1/ d ][ − i i 2 d (1 + d mi ) d ( yi − m i ) m i − I ( yi > 0 ) (1 + dmi )2
(13) Vol 8(4) | February 2015 | www.indjst.org
and E( I( yi >0) ) = (1 − wi )(1 − (1 + dmi )
−1
It may be shown that, based on equations (10), (11), (12), (13), (14) and (15) and after some tedious but straightforward algebra, under δ=0 the elements of the information matrix are:
∂lg′ s ∂lg′ ∂ 2l ∂ 2l = = r = ∂g r ∂g s ∂g s ∂g r ∂g s ∂g r
(
Note that E( I( yi =0) ) = wi + (1 − wi )(1 + dmi )
−1
− m w (1 − w )(1 + dm )−(1+d ) d i i = ∑ i i −1 d i =1 wi + (1 − wi )(1 + dmi ) n
∂ 2l Igg = − E ∂g r ∂g s
(
w 2 (1 − w ) 1 − (1 + dm )−1d i i i = ∑ −1 d i =1 wi + (1 − wi )(1 + d mi ) n
xij zir (17)
)z z
ir is
(18)
Indian Journal of Science and Technology
351
Comparison of Test Statistic for Zero-Inflated Negative Binomial against Zero-Inflated Poisson Model
I bd =
IdT b
and the expected information matrix, I(δ,β,γ) can be partitioned as
∂ 2l = −E ∂b j ∂d
d wi (1 − wi ) mi (1 + dmi ) −1) d wi + (1 − wi )(1 + dmi ) n log(1 + dmi ) − mi (1 + d ) = ∑ d (1 + dmi ) d2 i =1 (1 − w ) m 2 w + (1 − w )(1 + dm ) −1d i i i i i − (1 + dmi ) 2 − (1+ d )
(
)
x ij (19)
∂ 2l Igd = IgT d = − E ∂g ∂d r
−1 log(1 + dm ) mi i − d 2 n wi (1 − wi )(1 + dmi ) d (1 + dmi ) d = ∑ zir −1) wi + (1 − wi )(1 + dmi ) d i =1 2
∂ l Idd = − E 2 ∂d 2 −1 log(1 + dm ) m i i d − (1 − wi )(1 + d mi ) 2 (1 + dmi ) d d −1 −1 (1 − wi )(1 + dmi ) d − w + (1 − wi )(1 + dmi ) d −1 i d w ( w )( dm ) + − + 1 1 i i i 2 log(1 + dmi ) mi n − 2 = −∑ ( 1 + ) d dm d i i =1 m ) − 2 log( 1 + ) ( 2 + 3 dm m d i i + − 2i 3 2 1+ d ( dm ) d i yi y − k 2 2 log(1 + dm ) i i E ∑ − d yi − d k + 1 d3 +(1 − w ) 1 − (1 + dm )−1d k =1 i i 2 1 ( ) − + w m m 1 ( ) m 2 i i i d i + + 2 2 d (1 + dmi ) d (1 + dmi )
(
)
(
)
(20) Let S(δ, β, γ) and I (δ, β, γ) be the score vector and the expected information matrix, respectively, evaluated at δ = δ(m+1), β = β (m+1) and γ = γ (m+1) are as follows:
∂l(d , b , g ) Sd (d , b , g ) ∂d ∂l(d , b , g ) S(a , b , g ) = Sb (d , b , g ) = ∂b Sg (d , b , g ) ∂l(d , b , g ) ∂g 352
Vol 8(4) | February 2015 | www.indjst.org
Idd I (d , b , g ) = I bd Igd
Idb I bb Igb
Idg I bg Igg (22)
where, I δ δ is a scalar and the other elements are, in general, matrices with dimensions determined by the dimensions of the parameter vectors β and γ . Hence the estimates of δ, β and γ at the (m+1)th iteration, denoted by δ(m+1), β(m+1) and γ(m+1) the standard Fisher scoring iterative scheme gives:
d (m+1) d (m) b (m+1) = b (m) + [ I (m) (d , b , g )]−1 S(m) (d , b , g ) g (m+1) g (m) with good starting values δ(0), β(0) and γ(0) the iterative scheme converges in a few step, convergence is obtained with a stopping rule, such as l
(m+1)
− lm ≤ ∈
where, l(m+1) and lm are the log-likelihood, evaluated l(δ, β, γ:y )and using the estimates of δ, β and γ from the m and m+1 iterations, respectively. Then the asymptotic variance-covariance matrix for
^ ^ ^ is automatically provided at the final iteration. d , b , g Let x = (d , b T , g T )T under the usual regularity conditions
for maximum likelihood estimation, when the sample size is large, x ~ N p (x, I −1(d , b , g )) approximately.
4. Test Statistic for Comparing ZIP and ZINB Models The ZIP model is a special case of ZINB regression model. Within the family of ZINB models, testing if a ZIP model is adequate corresponds to testing: H 01 : d = 0 vs. H A : d > 0 , where one possible test statistics is the likelihood ratio test. For a general ZINB regression model, the LRT for δ is given by ^ ^
^ ^ ^
LRTd = −2[l(b , g ) − l(d , b , g )] where, l(b , g ) and l(d , b , g ) are the maximized log-likeli(21)
hoods under the ZIP regression and ZINB regression models, respectively. The Wald test is also given by: Indian Journal of Science and Technology
B. Muniswamy, Dejen Tesfaw Molla and N. Konda Reddy
Wd = ^
d˘ 2
n
Var (d˘)
Idd −1 = ∑
,
i =1
^ ^ ^
I (b , g , d ), I (b , g , d ) is
corresponding ^
ZINB
X T [l i ]n×1 +
^
^
Detail discussions for the distribution of LRTδ and Wδ can be found in the work by Ridout13. The score test for testing H01is given in the following general expression form:
SdT (b , g , d )C22 −1Sd (b , g , d ) |^ x1 (23)
0i
01
i
wi P0i
−1
d
[l i ]1×n X ( X T diag (n i ) X )−1 X T diag (n i )ZZT [ei ]n×1
ZT diag (ki )Z − ZT diag (n i ) X ( X T diag (n i ) X )−1 X T diag (n i )Z
(26)
where,
li =
mi 3 2 wi
(1 − ) − (1 − w )m wi P0i
2
ei = wi
n
^
w
(24) Since we are only interested in the inversion of elements
Iδδ which corresponds to the matrix I.
)
−1
(1 − ) wi P0i
2
where, n
Idg Igb I bg Igb I + − bb I bd Igg Igg
2
A = ∑ 12 (1 − w˘ i ) m˘ i − i =1
4
m˘ i 4
˘ w˘ i 1 − Pwi 0i
B = X { X T diag (n i ) X − ( X T diag (n i )Z[ZT diag (ki )Z ]−1 ZT diag (vi ) X )}−1 X T
Idg Igd
Igg − Igb I bb −1I bg
(25) Then using the derived elements in the Fisher information matrix, we have further derivation for Idd Vol 8(4) | February 2015 | www.indjst.org
wi mie − mi P0i
+ [l i ]1×n XDZT [ei ]n×1 + [ei ]1×n ZE −1ZT [ei ]n×1
Idd −1 = Idd − Idb (I bb − I bg Igg −1Igb )−1 I bd
Igg − Igb I bb −1I bg
P0i
A − [l i ]1×n X {B}−1 X T [l i ]n×1 + [ei ]1×n CX T [l i ]n×1
i =1
−
mi 2 2
1 P0i
2
2 n w˘ i 2 1 ˘ ˘ 2 ∑ ( y i − m i ) − yi − I( yi =0) m i P˘ 0i i =1
∑[( yi − m i )2 − yi ] − I( yi =0)mi2 P0ii
Ia b I bb −1I bg Iga
(
i
Using (24) and (26), according to the definition of score test, we then can get the score statistic as
∂l(d , b , g ) Sd (d , b , g ) = ∂d d =0 1 2
i
n i = (1 − wi )mi 1 −
log(1 + dmi ) mi m = e − mi and limd →0 − = i 2 d (1 + dmi ) d2
= lim ∂∂dl = d →0
−1
Z diag (ki )Z − ZT diag (n i ) X ( X T diag (n i ) X )−1 X T diag (n i )Z T
i
using Taylor’s theorem. It follows that under the null hypothesis the score vector is given by the following:
+
(1 − )}
ki = wi 2
Note that lima →∞ (1 + dmi )
i
i
[ei ]1×n ZZT [ei ]n×1
−
^ ^ T ^T Let x = (0, b , g )T be the REML (Restricted Maximum 1 Likelihood) estimates of parameter ξ under the null hypothesis H and P = P ( y = 0 | H ) = w + (1 − w )e − mi . 01
mi 4 4 w
[ei ]1×n ZZT diag (n i ) X T X T diag (n i )ZZT diag (n i ) X X diag (n i ) X − T Z diag (ki )Z ZT diag (ki )Z
information matrix evaluated at b = b , g = g and d = d
Sd =
−
ZT diag (n i ) X )}−1 X T [l i ]n×1 +
the
1 (1 − w )m 2 i i 2
− [l i ]1×n X { X T diag (n i ) X − ( X T diag (n i )Z[ ZT diag (ki )Z ]−1
where, Var(d ) is the relevant diagonal element of ^ ^ ^ −1
{
−1
as
C=
ZZT diag (n i ) X T X T diag (n i )ZZT diag (n i ) X X diag (n i ) X − T ZT diag (ki )Z Z diag (ki )Z
D=
−1
( X T diag (n i ) X )−1 X T diag (n i )ZZT
Z diag (ki )Z − ZT diag (n i ) X ( X T diag (n i ) X )−1 X T diag (n i )Z T
E = ZT diag (ki )Z − ZT diag (n i ) X ( X T diag (n i ) X )−1 X T diag (n i )Z
and λi,vi and ki are defined as above. Indian Journal of Science and Technology
353
Comparison of Test Statistic for Zero-Inflated Negative Binomial against Zero-Inflated Poisson Model
Standard asymptotic theory suggests the score statistics is asymptotically distributed as a chi-squared distribution 2 with degree of freedom 1, i.e. c 1 . More information about the overdispersion test for ZINB models can be found in the work by Ridout13. If there is no covariates for μ and ω the above score statistic can be simplified to n
Sa 1 =
^ 2^
^
∑[( yi − m i )2 − yi ] − n m i wi i =1
^
mi
^ 2 m i n(1 − w ) 2 − ^ ^ m e i − 1 − mi (27) ^
The simplified score statistic (27) is asymptotically distributed as standard normal.
5. Simulation Study In this simulation study, we compared the score, LRT and Wald test in terms of their power under different situations.
A simulation study was considered to examine the sample size and power of the Wald, LRT and Score tests for testing the overdispersion parameter (δ). The outcome variable was generated from ZINB regression. We included one covariate in the simulation, and the simulation is given as follow:
log( mi ) = 2.5 − xi ,
w log i = −1 + 0.5xi 1 − wi
where, xi was generated from a uniform distribution in the interval [0, 1]. Samples of size n=40, 60, 80, 100, 200 and 300 were taken from ZINB with dispersion parameter: δ =0.01, 0.025, 0.05, 0.075, 0.1 and 0.25. The experiment for size or power was based on 1,000 replications. The result shown in Table 1 indicates that the three tests hold the nominal level reasonably well for all values of dispersion parameter, δ considered. Generally, the power of the three tests increase when the value of the overdispersion parameter, δ increases; and for large sample size n=300, the power increases very fast and approaches 1 for δ=0.1. The behaviour of score, LRT and Wald test statistics indicates different pattern: for small sample size n=40, the power increases slowly firstly, if δ>0.075 then the power increases fast; while
Table 1. Power of LRT, wald and score test statistics with one covariate when the data are simulated from the ZINB regression model for overdispersion parameter with ^ w log( m i ) = 2.5 − xi and log i = −1 + 0.5xi 1−w i
n 40
60
80
100
200
300
354
Method
Power δ = 0.01
0.025
0.05
0.075
0.1
0.25
Wald
0.031
0.067
0.114
0.127
0.143
0.522
LRT
0.034
0.081
0.174
0.382
0.551
0.923
Score
0.039
0.101
0.237
0.409
0.557
0.936
Wald
0.035
0.094
0.157
0.213
0.353
0.912
LRT
0.045
0.114
0.324
0.543
0.707
0.984
Score
0.049
0.141
0.346
0.563
0.743
0.987
Wald
0.043
0.121
0.219
0.342
0.530
0.987
LRT
0.056
0.144
0.389
0.658
0.812
0.997
Score
0.067
0.165
0.428
0.668
0.833
0.999
Wald
0.053
0.130
0.302
0.491
0.721
0.997
LRT
0.065
0.165
0.550
0.762
0.885
0.999
Score
0.097
0.206
0.524
0.782
0.910
1.000
Wald
0.123
0.311
0.323
0.910
0.982
1.000
LRT
0.160
0.323
0.815
0.957
0.991
1.000
Score
0.199
0.392
0.826
0.969
0.994
1.000
Wald
0.174
0.381
0.854
0.983
1.000
1.000
LRT
0.189
0.471
0.915
0.996
1.000
1.000
Score
0.234
0.527
0.925
0.997
1.000
1.000
Vol 8(4) | February 2015 | www.indjst.org
Indian Journal of Science and Technology
B. Muniswamy, Dejen Tesfaw Molla and N. Konda Reddy
Figure 1. Flow chart of the dynamics of transmission of leptospirosis.
for n=100 the power increases slowly firstly, if δ > 0.1then it increases fast; and for n=200 the power increases slowly firstly, and if δ > 0.05 then the power increases fast, and the power approaches 1.0 if δ >0.25; while for n=300 the power increases very fast and approaches nearly 1 when δ=0.1. It is interesting to note that for small sample size n=40 there is a high difference among the three tests for the value of δ; and for sample size n=100 difference of power for the three tests is small for δ>0.25; and for large sample size n=300 if , δ ≥ 0.05, the difference among the power of the Wald LRT and Score tests becomes small. In general, when the overdispersion parameter (δ) approaches zero, then the difference between ZINB and ZIP model becomes small, whereas when the value of the overdispersion parameter increases, the power of the three tests also increases for fixed value of n. Clearly, the score test is more appropriate for general use in the application than the LRT and Wald tests.
5.1 Example Gupta et al. (1996) use the numbers of death notices of women 80 years of age and over, appearing in the “London Vol 8(4) | February 2015 | www.indjst.org
Times” on each day for three consecutive years with the frequency, fk for each k=0,1,2,…,9. Here, we use it to show the proposed score statistics to test overdispersion in Poisson model. The dataset is shown in the Table 2. The observed and predicted probability for each count (from fitted models) for Poisson and NB model are also presented in Table 2. There are 1096 observation, and the mean count is 2.157 with standard deviation 1.615, and the median is 2.0, the range of counts is from 0 to 9. Meanwhile, the dispersion index is D=1.6152/2.157=1.209, indicating we cannot use the standard Poisson model to fit the data for overdispersion. The zero-fraction, 162/1096 = 0.1487 enable us to try the zero-inflated model. The observed percent for ZIP and ZINB are shown in Figure 2. The predicted percent for each count (from fitted models) for ZIP and ZINB model are also presented in Table 2. In the ZINB model, we get the estimate for δ as δ=0.077 with standard error 0.0297, and the Wald statistics is 2.5932 = 6.725 with p-value as p < 0.0001. From Table 3, we can calculate the LRT statistics for testing δ = 0 is 3988.104– 3980.774=7.33 with p-value as p