On the bivariate negative binomial regression model - Semantic Scholar

0 downloads 0 Views 92KB Size Report
The BNBR model tends to perform better than the bivariate Poisson ... Poisson regression techniques have been used to describe univariate count data where ...
Journal of Applied Statistics Vol. 37, No. 6, June 2010, 969–981

On the bivariate negative binomial regression model Felix Famoye∗

Downloaded By: [Famoye, Felix] At: 13:25 9 November 2010

Department of Mathematics, Central Michigan University, Mount Pleasant, MI 48859, USA (Received 6 February 2008; final version received 18 April 2009)

In this paper, a new bivariate negative binomial regression (BNBR) model allowing any type of correlation is defined and studied. The marginal means of the bivariate model are functions of the explanatory variables. The parameters of the bivariate regression model are estimated by using the maximum likelihood method. Some test statistics including goodness-of-fit are discussed. Two numerical data sets are used to illustrate the techniques. The BNBR model tends to perform better than the bivariate Poisson regression model, but compares well with the bivariate Poisson log-normal regression model. Keywords:

1.

correlated count data; over-dispersion; goodness-of-fit; estimation

Introduction

Poisson regression techniques have been used to describe univariate count data where the sample mean and sample variance are almost equal [12,20]. Lawless [23] discussed negative binomial regression (NBR) techniques. The NBR model can be used to describe univariate count data that exhibit over-dispersion. Over-dispersion relative to the Poisson is when the sample variance is substantially in excess of the sample mean. Other regression models like generalized Poisson regression have been proposed in the literature by Consul and Famoye [8] and Famoye [11]. See the book by Cameron and Trivedi [4] for other techniques. Many empirical studies examined the market entry of firms across industries. These studies [26] used the net entry (the difference between the number of firms in one year and another year) as a measure of new entrants. Mayer and Chappell [26] analyzed data for US industries between 1972 and 1977. They considered entry and exit data as bivariate Poisson random variables. By specifying two sets of explanatory variables, one for entry and the other for exit, Mayer and Chappell [26] applied the bivariate Poisson regression (BPR) model to fit data from 330 four-digit Standard Industrial Classification industries for the 1972–1977 period. Mayer and Chappell [26, p. 777] concluded that the previous studies were “subject to distortions created by failure to separate entry effects from exit effects” and “found that controlling for separate entry and exit effects alters several conclusions”. ∗ Email:

[email protected]

ISSN 0266-4763 print/ISSN 1360-0532 online © 2010 Taylor & Francis DOI: 10.1080/02664760902984618 http://www.informaworld.com

Downloaded By: [Famoye, Felix] At: 13:25 9 November 2010

970

F. Famoye

Ho and Singer [16] applied the bivariate Poisson and bivariate Poisson log-normal regression (BPLR) models to fit bivariate counts observed under a stratified sampling plan. An advantage of the BPLR model over the bivariate Poisson model is the ability to admit both positive and negative correlation between the two response variables. The model allows for more than two response variables and it also allows differing levels of over-dispersion on each response variable. The bivariate Poisson distribution described in Johnson et al. [17, Chapter 37] is based on the trivariate reduction method. The random variables Y1 and Y2 have bivariate Poisson distribution with Poisson marginals by setting Y1 = Z1 + Z3 and Y2 = Z2 + Z3 , where Z1 , Z2 , and Z3 are independent Poisson random variables with respective means λ1 , λ2 , and λ3 . This model has been applied in regression context by various authors including Gourieroux et al. [13], King [19], Jung and Winkelmann [18] and Kocherlakota and Kocherlakota [21]. A disadvantage of this method is that√the correlation between Y1 and Y2 has to be positive. The correlation between Y1 and Y2 is λ3 / (λ1 + λ3 )(λ2 + λ3 ), and according to Johnson et al. [17, p. 126], this correlation cannot exceed λ3 /[λ3 + min(λ1 , λ2 )]. Kocherlakota and Kocherlakota [21] studied the BPR model. They estimated the regression coefficients when the linear model is unrestricted, when the regression planes are parallel, and when the regression planes are coincident. They illustrated the techniques by using simulated data that followed bivariate Poisson distribution. In their analysis, Kocherlakota and Kocherlakota [21] noted that the real data that they studied are not bivariate Poisson since the data exhibit an excessive amount of dispersion. There are studies in the literature that recognized the limitation from bivariate models based on trivariate reduction and suggested alternative bivariate count models that allow more flexible correlation structure. One approach is to model dependence among counts through correlated random effects. Berkhout and Plug [1] cited Chib and Winkelmann [7] and Munkin and Trivedi [28] as examples. Cameron et al. [3] stated in their paper that this approach is based on unobserved heterogeneity. Gurmu and Elder [14] defined a generalized bivariate negative binomial regression model. The model is based on first-order series expansion of an unknown density function of unobserved heterogeneity component. The drawback of the model is that it can be used to model only response variables with non-negative correlation. Another approach is to model dependence among count variables through copula functions. Examples of work on this approach include van Ophem [30], Lee [24], and Cameron et al. [3]. Berkhout and Plug [1] derived a bivariate Poisson count regression model by using conditional probabilities. Berkhout and Plug decomposed an “unknown” bivariate density into a marginal distribution and a conditional distribution. They assumed that both the marginal and the conditional distributions are Poisson. Their bivariate distribution allows for a flexible correlation structure. Lakshminarayana et al. [22] developed a bivariate Poisson distribution as a product of Poisson marginals with a multiplicative factor. The correlation between the two random variables can be positive, zero, or negative depending on the value for the multiplicative factor parameter. This bivariate Poisson distribution is not as limited as that based on trivariate reduction which can only allow positive correlation. In this paper, we apply the work of Lakshminarayana et al. [22] to define a bivariate negative binomial distribution with a multiplicative factor. We define and study a bivariate negative binomial regression (BNBR) model which can be used to describe bivariate correlated count data that exhibit over-dispersion and an unrestricted correlation. We assume that the marginal means of the bivariate model are functions of the explanatory variables. Further, we assume that the relationship between the marginal means and the covariates is log-linear. This relationship is usually referred to as the link function in the univariate case. Other link functions can be considered, but we restrict our discussion in this paper to log-linear relationship. The BNBR model is based on the bivariate negative binomial distribution.

Journal of Applied Statistics

971

A BNBR model based on the trivariate reduction can be defined along the model in Kocherlakota and Kocherlakota [21]. However, such a model will suffer from the same drawback as the BPR model based on trivariate reduction. The bivariate model defined in this paper is an alternative bivariate count model that allows more flexible correlation structure. The bivariate model does not have as complicated a form as the models based on copula functions. In Section 2, we define the bivariate negative binomial distribution as a product of negative binomial marginals with a multiplicative factor. The correlation between the two negative binomial variates can be positive, zero, or negative. In this section, we develop bivariate count regression model based on the bivariate negative binomial distribution. We discuss estimation and goodness-of-fit statistic for the bivariate count regression model in Section 3. We discuss some tests in Section 4 and in Section 5, two applications to illustrate the bivariate count model are presented. 2. The bivariate regression model

Downloaded By: [Famoye, Felix] At: 13:25 9 November 2010

Lakshminarayana et al. [22] defined a bivariate Poisson distribution as a product of Poisson marginals with a multiplicative factor. The probability function is given as θ1 1 θ2 2 e−θ1 −θ2 [1 + λ(e−y1 − e−dθ1 )(e−y2 − e−dθ2 )] , (y1 !y2 !) y

P (y1 , y2 ) =

y

y1 , y2 = 0, 1, 2, . . .

where d = 1 − e−1 . The marginal distribution of Yt (t = 1, 2) is Poisson with mean θt . The covariance between Y1 and Y2 is λθ1 θ2 d 2 e−d(θ1 +θ2 ) and so the correlation coefficient is ρ = √ 2 −d(θ1 +θ2 ) λ θ 1 θ2 d e . Thus, the correlation coefficient can be positive, zero, or negative depending on the value of λ, the multiplicative factor parameter. By using a similar approach, we define a bivariate negative binomial distribution as a product of negative binomial marginals. The probability function of the bivariate negative binomial distribution is given as   −1   −1 −1 m2 + y 2 − 1 y 2 m1 + y 1 − 1 y1 m−1 1 θ1 (1 − θ1 ) θ2 (1 − θ2 )m2 P (y1 , y2 ) = y1 y2 × [1 + λ(e−y1 − c1 )(e−y2 − c2 )],

(1)

m−1 t

where ct = E(e−Yt ) = [(1 − θt )/(1 − θt e−1 )] (t = 1, 2) and y1 , y2 = 0, 1, 2, . . .. The marginal distribution of Yt (t = 1, 2) is a negative binomial with mean μt = m−1 t θt /(1 − θt ) 2 θ /(1 − θ ) . The covariance between Y and Y is λc c A1 A2 , where and variance σt2 = m−1 t t 1 2 1 2 t −1 −1 −1 At = m−1 θ e /(1 − θ e ) − m θ /(1 − θ ) (t = 1, 2). The correlation coefficient is given by t t t t t t ρ = λc1 c2 A1 A2 /(σ1 σ2 ). The correlation coefficient can be positive, zero, or negative depending on the value of λ, the multiplicative factor parameter. n n Let the sample mean and sample variance be y¯t = i=1 yit /n and st2 = i=1 (yit − y¯t )2 /(n − 1) n (t = 1, 2), respectively. We denote the sample covariance by s12 = i=1 (yi1 − y¯1 )(yi2 − y¯2 )/(n − 1). On equating the sample moments to the population moments, the moment estimates for the bivariate negative binomial distribution are given by θ˜t = 1 − y¯t /st2 , m ˜ t = θ˜t /[y¯t (1 − θ˜t )] (t = 1, 2) and λ˜ = s12 /(c˜1 c˜2 A˜ 1 A˜ 2 ), where c˜t and A˜ t are the moment estimates of ct and At , respectively. Let Yit (t = 1, 2; i=1, 2, …, n; and n is the sample size) be a count response variable, and let xit = (xit0 = 1, xit1 , xit2 , . . . , xitk ) be a vector of covariates. A BNBR model would specify that the joint distribution of (Yi1 , Yi2 ) for any given (xi1 , xi2 ) is that of bivariate negative binomial distribution in Equation (1) with mean E(Yi1 |xi1 ) = μ1 (xi1 ) = ei1 f (xi1 , β1 )

and E(Yi2 |xi2 ) = μ2 (xi2 ) = ei2 f (xi2 , β2 ),

(2)

where f (xit , βt ) > 0 (t = 1, 2) is a known function of xit and a vector = (βt0 , βt1 , βt2 , . . . , βtk ) of regression parameters, and eit a measure of exposure. The function f (xit , βt ) βtT

972

F. Famoye

is differentiable with respect to βt . It may be difficult to know which covariates affect only Yi1 or only Yi2 or both. To simplify our analysis, we will assume that the same covariates affect Yi1 and Yi2 . Under this assumption, xi1 = xi2 = xi ; however, the parameter vectors β1 and β2 are not assumed to be equal. By using the marginal means in Equation (2) and writing them as functions of covariates, we have m−1 μit t θt , t = 1, 2. (3) = μit (xi ) or θt = −1 (1 − θt ) (mt + μit )

Downloaded By: [Famoye, Felix] At: 13:25 9 November 2010

This is an extension of the procedure for the univariate negative binomial distribution. We will keep our analysis simple as in the univariate NBR by modeling θ1 and θ2 through a log-linear model for μit . In the analysis, λ and mt will not be considered as functions of covariates. By using Equation (3) in Equation (1), a new BNBR model can be written as  yit  m−1 2  −1 t  μit m−1 mt + yit − 1 t P (yi1 , yi2 ) = yit m−1 m−1 t + μit t + μit t=1   × 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) ,

(4)

−1

where ct = [(1 − θt )/(1 − θt e−1 )]mt with θt = μit /(m−1 t + μit ). When λ = 0, the pair Yi1 and Yi2 are independent. When λ > 0, the BNBR model in Equation (4) allows positive correlation, and when λ < 0, the BNBR model allows negative correlation. The parameter mt measures the dispersion. If mt → 0, the negative binomial distribution reduces to the Poisson distribution and there is no dispersion. When mt > 0, there is over-dispersion. The correlation coefficient, in terms of μit , can be written as  ρi = λd 2 [μi1 μi2 (1 + m1 μi1 )(1 + m2 μi2 )][1 + dm1 μi1 ]−1−1/m1 [1 + dm2 μi2 ]−1−1/m2 , i = 1, 2, . . . , n,

(5)

where d = 1 − e−1 . The average of ρi in Equation (5) can be used to measure the degree of association between the response variables. Examples of the vector (yi1 , yi2 ) are observations made on the same experimental unit at times 1 and 2. For this case, the random variables yi1 and yi2 are correlated. It is also possible to have correlated examples which are not time series. Examples include the entry and exit data [26] and presidential vetoes [19]. Of course, one question of interest is to test for the independence of yi1 and yi2 . This question will be addressed later in the paper. 3.

Estimation and goodness-of-fit statistic

Consider the n independent vectors (yi1 , yi2 ), where the ith vector has the BNBR model in Equation (4). The log-likelihood function, log L = log L(μ; y), for the bivariate regression model is given by 2 n   −1 −1 log L = [yit log μit − m−1 t log mt − (yit + mt ) log(μit + mt ) − log(yit !) i=1

t=1

yit −1

+

 j =0

⎫ ⎬

−yi1 log(m−1 − c1 )(e−yi2 − c2 )] , t + j )] + log[1 + λ(e ⎭

where ct = (1 + dμit mt )−1/mt with d = 1 − e−1 , and t = 1, 2.

(6)

Journal of Applied Statistics

973

Downloaded By: [Famoye, Felix] At: 13:25 9 November 2010

The log-likelihood in Equation (6) is maximized over the parameters βtj (j = 0, 1, 2, . . . , k and t = 1, 2), mt and λ. It is straightforward to take the first and second derivatives with respect to the parameters and these are provided in the Appendix. The Fisher information matrix can be obtained by taking the expectations of minus the second derivatives. The Newton–Raphson iterative technique can be used to obtain the maximum likelihood estimates of the BNBR model parameters. The initial estimates for mt , and βtj (j = 0, 1, 2, . . . , k and t = 1, 2), are the corresponding estimates from the univariate NBR models. The initial estimate of λ can be taken as the moment estimate by equating the sample correlation coefficient to the population correlation coefficient. By using the parameter estimates and their standard errors, we obtain the asymptotic Wald z statistics for testing the significance of each independent variable and the nuisance parameters mt and λ. A measure of goodness-of-fit for the BNBR model may be based on the deviance statistic D. The deviance statistic is defined as D = 2[log L(y; y) − log L(μ; y)] 2     −1 n   ˆ it ) yit (m−1 mt + μˆ it t +μ −1 =2 yit log + mt log μˆ it (m−1 m−1 t + yit ) t + yit t=1 i=1   1 + λ2t=1 (e−yit − c¯t ) + log , 1 + λ2t=1 (e−yit − cˆt )

(7)

where c¯t is the value of ct evaluated at μit = yit and cˆt the value of ct evaluated at μit = μˆ it . The deviance statistic D can be approximated by a chi-square distribution with n − p degrees of freedom (df) when the μˆ it s (t = 1, 2) are large. In the df, n is the sample size and p the total number of estimated parameters. A model with the smallest value of the deviance D, among all models, is usually taken as the best model for describing a given data set. A model provides a good fit when the ratio D/df is very close to one. The BNBR model in Equation (4) reduces to the BPR model when the parameters mt → 0. To assess the adequacy of the BNBR model over the BPR model, one can test the hypothesis that mt is zero. This test will be done in the next section. To choose among models, one can use the Akaike Information Criterion (AIC) which has the form AIC = −2 log L + 2ν, where ν is the number of estimated parameters in the model. A model with a smaller AIC is generally preferred. 4. Tests In this section, we are interested in some hypotheses on the BNBR model. We will compare the BPR model with the BNBR model to determine if BNBR model is more suitable. We will also test for the independence of the two random variables yi1 and yi2 . 4.1

Test against BPR model or test for over-dispersion

The BNBR model reduces to the BPR model when mt → 0. This is a situation in which there is no dispersion in the data. To test for over-dispersion or test if BNBR model should be used in place of BPR model, we test the hypothesis H0 : m1 = m2 = 0.

(8)

Let Ldis be the likelihood function when H0 is true and let La be the likelihood function when H0 is false. The test statistic χdis = −2 log(Ldis /La ) is not chi-square with two df because the parameters are on the boundary. One can use the results of Chernoff [6] (see also [29]), which shows

974

F. Famoye

that the statistic χdis is asymptotically distributed as a random variable which has a probability mass of 0.25 at the point zero, a (1/2)χ12 and (1/4)χ22 distributions above zero. An alternative to using the log-likelihood ratio statistic to test the null hypothesis in Equation (8) is to use a score statistic. The reader is referred to Cox and Hinkley [9] for a discussion of the score test. The score function U (β, λ, m1 = 0, m2 = 0) and the expected information matrix I (β, λ, m1 = 0, m2 = 0) can be calculated from the log-likelihood function in Equation (6), where β T = (β1T , β2T )T . The score statistic for testing the null hypothesis in Equation (8) is ˆ λˆ ) = S(β, ˆ λˆ , 0, 0) = U  (β, ˆ λ, ˆ 0, 0)[I (β, ˆ λ, ˆ 0, 0)]−1 U (β, ˆ λ, ˆ 0, 0). S(β,

(9)

Downloaded By: [Famoye, Felix] At: 13:25 9 November 2010

ˆ λ, ˆ 0, 0) are the limits of Equations (A1)–(A5) (see The elements of the score vector U (β, ˆ λ, ˆ 0, 0) are obtained by multiplying EquaAppendix) when m1 , m2 → 0. The elements of I (β, tions (A6)–(A20) by −1 and then take their limits as m1 , m2 → 0. A score test in the BNBR model has the advantage that one does not need to fit the BNBR model but only the BPR model, which is the model under the null hypothesis in Equation (8). 4.2

Test for constant dispersion parameter

In the formulation of the BNBR model, we have the dispersion parameters m1 and m2 . To test for constant dispersion, we test the hypothesis H0 : m1 = m2 = m.

(10)

Let Lcon be the likelihood function when H0 is true and let La be the likelihood function when H0 is false. The test statistic is given by χcon = −2 log(Lcon /La ), which is approximately chi-square with one df. 4.3

Test for independence

The response variables yi1 and yi2 are independent when the parameter λ is zero. For independence, we test the null hypothesis H0 : λ = 0.

(11)

Let Lind be the likelihood function when H0 is true and let La be the likelihood function when H0 is false. The test statistic is given by χind = −2 log(Lind /La ), which is approximately chi-square with one df. 5. Application 5.1

Health-care utilization

Cameron et al. [5] analyzed various measures of health-care utilization by using a sample of 5190 single-person households from the 1977–1978 Australian health survey. The data are obtained from the Journal of Applied Econometrics 1997 Data Archive. Various authors, including Mullahy [27] and Cameron and Johansson [2], used the data to illustrate univariate regression models. Gurmu and Elder [14] used the data to illustrate a generalized BNBR model. In this paper, we model two response variables that were considered by Gurmu and Elder [14]. The two response variables (yi1 , yi2 ) are the number of consultations with a doctor during the past 2 weeks to the survey (yi1 = doctor) and the number of consultations with non-doctor health professional during the past 4 weeks to the survey (yi2 = non-doctor). These two response variables are characterized by over-dispersion and a very high frequency of non-users. The mean

Journal of Applied Statistics

975

and standard deviation of the response variable doctor are 0.3017 and 0.7981, respectively. The corresponding values for the response variable non-doctor are 0.2146 and 0.9653. The proportions of zero (or non-user) in the variables doctor and non-doctor are 79.8% and 90.9%, respectively. The regressor variables are made up of four socio-economic variables and eight insurance and health status variables. Detailed description of these regressor variables are provided in Gurmu and Elder [14]. Also, the summary statistics for these regressor variables are contained in Cameron et al. [5]. The marginal mean of Yit is assumed to have a log-linear relationship with the covariates xi through log[E(Yit )] = xit0 βt0 + xit1 βt1 + xit2 βt2 + · · · + xit12 βt12

Downloaded By: [Famoye, Felix] At: 13:25 9 November 2010

i = 1, 2, 3, . . . , 5190.

for t = 1, 2

and (12)

The regression function (12) relates the logarithm of the marginal means to the explanatory variables. The correlation between Yi1 and Yi2 is specified in terms of parameter λ, which determines the correlation in the bivariate negative binomial distribution. The regressors xit1 through xit12 are listed in Table 1. We fitted the BPR and BNBR models to the data and the results are presented in Table 1. To test for the adequacy of BNBR model over the BPR model, we test the null hypothesis in Equation (8) by using the score statistic in Equation (9) since the sample size n = 5190 is large. The observed value of the score statistic is 455.52 (with p-value 1000). Others, including Dean [10], Lin [25], and Hall and Berenhaut [15] found that a score test performs poorly when the sample sizes are small unless a bias correction is performed. Future work on a bias correction to the test in Equation (9) will be undertaken. This future work will include the use of simulation to compare the BNBR model in this paper with alternative viable models.

Acknowledgements The author gratefully acknowledges the support received from FRCE Committee at Central Michigan University under grant #49571. The author is grateful to the anonymous referees for their valuable suggestions that improved the presentation.

References [1] P. Berkhout and E. Plug, A bivariate Poisson count data model using conditional probabilities, Statist. Neerlandica 58(3) (2004), pp. 349–364. [2] A.C. Cameron and P. Johansson, Count data regression using series expansion: With applications, J. Appl. Econom. 12 (1997), pp. 203–223. [3] A.C. Cameron, T. Li, P.K. Trivedi, and D.M. Zimmer, Modelling the differences in counted outcomes using bivariate copula models with application to mismeasured counts, Econom. J. 7 (2004), pp. 566–584. [4] A.C. Cameron and P.K. Trivedi, Regression Analysis of Count Data, Cambridge University Press, Cambridge, UK, 1998. [5] A.C. Cameron, P.K. Trivedi, F. Milne, and J. Piggott, A microeconomic model of the demand for health care and health insurance in Australia, Rev. Econom. Stud. LV (1988), pp. 85–106. [6] H. Chernoff, On the distribution of the likelihood ratio, Ann. Math. Stat. 25 (1954), pp. 573–578. [7] S. Chib and R. Winkelmann, Markov chain Monte Carlo analysis of correlated count data, J. Bus. Econom. Statist. 19(4) (2001), pp. 428–435. [8] P.C. Consul and F. Famoye, Generalized Poisson regression model, Comm. Statist. Theory Methods 21(1) (1992), pp. 89–109. [9] D.R. Cox and D.V. Hinkley, Theoretical Statistics, Chapman & Hall, Boca Raton, FL, 1974. [10] C.B. Dean, Testing for overdispersion in Poisson and binomial regression models, J. Amer. Statist. Assoc. 87 (1992), pp. 451–457. [11] F. Famoye, Restricted generalized Poisson regression model, Comm. Statist. Theory Methods 22(5) (1993), pp. 1335–1354. [12] E.L. Frome, M.H. Kurtner, and J.J. Beauchamp, Regression analysis of Poisson-distributed data, J. Amer. Statist. Assoc. 68 (1973), pp. 935–940. [13] C. Gourieroux, A. Monfort, and A. Trognon, Pseudo maximum likelihood methods: Applications to Poisson models, Econometrica 52 (1984), pp. 701–720.

Downloaded By: [Famoye, Felix] At: 13:25 9 November 2010

Journal of Applied Statistics

979

[14] S. Gurmu and J. Elder, Generalized bivariate count data regression models, Econom. Lett. 68 (2000), pp. 31–36. [15] D.B. Hall and K.S. Berenhaut, Score tests for heterogeneity and overdispersion in zero-inflated Poisson and binomial regression models, Can. J. Statist. 30 (2002), pp. 415–430. [16] L.L. Ho, and J. da Matta Singer, Regression models for bivariate counts, Braz. J. Probab. Stat. 11 (1997), pp. 175–197. [17] N.L. Johnson, S. Kotz, and N. Balakrishnan, Discrete Multivariate Distributions, John Wiley and Sons, Inc., New York, 1997. [18] R.C. Jung and R. Winkelmann, Two aspects of labor mobility: A bivariate Poisson regression approach, Empir. Econom. 18 (1993), pp. 543–556. [19] G. King, A seemingly unrelated Poisson regression model, Sociol. Methods Res. 17(3) (1989), pp. 235–255. [20] G.G. Koch, S. Atkinson, and M.E. Stokes, Poisson regression, Encyclopedia Statist. Sci. 7 (1987), pp. 33–41. [21] S. Kocherlakota and K. Kocherlakota, Regression in the bivariate Poisson distribution, Commun. Statist. Theory Methods 30(5) (2001), pp. 815–825. [22] J. Lakshminarayana, S.N.N. Pandit, and K.S. Rao, On a bivariate Poisson distribution, Commun. Statist. Theory Methods 28(2) (1999), pp. 267–276. [23] J.F. Lawless, Negative binomial and mixed Poisson regression, Can. J. Statist. 15(3) (1987), pp. 209–225. [24] A. Lee, Modelling rugby league data via bivariate negative binomial regression, Aust. N. Z. J. Statist. 41(2) (1999), pp. 141–152. [25] X. Lin, Variance component testing in generalised linear models with random effects, Biometrika 84 (1997), pp. 309–326. [26] W.J. Mayer and W.F. Chappell, Determinants of entry and exit: An application of the compound bivariate Poisson distribution to US industries, 1972–1977, South. Eco. J. 58 (1992), pp. 770–778. [27] J. Mullahy, Heterogeneity, excess zeros, and the structure of count data models, J. Appl. Econom. 12 (1997), pp. 337–350. [28] M.K. Munkin and P.K. Trivedi, Simulated maximum likelihood estimation of multivariate mixed-Poisson regression models, with application, Econom. J. 2 (1999), pp. 29–48. [29] S.G. Self and K. Liang, Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions, J. Amer. Statist. Assoc. 82 (1987), pp. 605–610. [30] H. Van Ophem, A general method to estimate correlated discrete random variables, Econometric Theory 15 (1999), pp. 228–237.

Appendix. The first and second partial derivatives of the log-likelihood function in Equation (6) For all derivatives, t = 1, 2; j, s, = 0, 1, 2, . . . , k; and ct = (1 + dμit mt )−1/mt , where d = 1 − e−1 .  −1  ∂ct mt log(1 + dμit mt ) − dμit /(1 + dμit mt ) ct , = m−1 t ∂mt ∂ct ∂μt −dct ∂μit = with = μit xtj . ∂βtj 1 + dmt μit ∂βtj ∂βtj Alternatively, we can write −dct μit xj ∂ct = ∂βtj 1 + dmt μit 2 2 ∂ ct dμit m−1 2dμit m−2 t t −2 = m log(1 + dμ m ) − + it t t 1 + dμit mt 1 + dμit mt ∂m2t  2  dμit −3 −1 ct , − 2mt log(1 + dμit mt ) + mt 1 + dμit mt −d [μit (1 + dmt μit )∂ct /∂βts + ct μit xs ] xj ∂ 2 ct = , ∂βtj ∂βts (1 + dmt μit )2  2

 ∂ 2 ct ∂ct dμit m−1 dμit m−1 t t −2 = mt log(1 + dμit mt ) − + ct xj . ∂mt ∂βtj 1 + dμit mt ∂βtj 1 + dμit mt

980

F. Famoye

The first partial derivatives are given by the following expressions: ∂ log L  (e−yi1 − c1 )(e−yi2 − c2 ) = , ∂λ 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) i=1 n −1 ∂ log L  m−2 −2 −1 1 (yi1 + m1 ) m−2 = 1 log(m1 ) + m1 [log(μi1 + m1 ) − 1] + −1 ∂m1 μi1 + m1 i=1 ⎫ yi1 −1 −yi2  m−2 − c2 ) λ(e ∂c1 ⎬ 1 − , − −1 −y −y 1 + λ(e i1 − c1 )(e i2 − c2 ) ∂m1 ⎭ m1 + j n

(A1)

(A2)

Downloaded By: [Famoye, Felix] At: 13:25 9 November 2010

j =0

n −1 ∂ log L  m−2 −2 −1 2 (yi2 + m2 ) m−2 = log(m ) + m [log(μ + m ) − 1] + 2 i2 2 2 2 ∂m2 μi2 + m−1 2 i=1 ⎫ yi2 −1  m−2 λ(e−yi1 − c1 ) ∂c2 ⎬ 2 − , − 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) ∂m2 ⎭ m−1 2 +j

(A3)

j =0

 n  ∂ log L  yi1 − μi1 λ(e−yi2 − c2 ) ∂μi1 ∂c1 , = − ∂β1j μi1 (1 + m1 μi1 ) ∂β1j 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) ∂β1j i=1  n  ∂ log L  yi2 − μi2 λ(e−yi1 − c1 ) ∂μi2 ∂c2 . = − ∂β2j μi2 (1 + m2 μi2 ) ∂β2j 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) ∂β2j i=1 The second partial derivatives are given by the following expressions: 2 n  ∂ 2 log L (e−yi1 − c1 )(e−yi2 − c2 ) =− , ∂λ2 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) i=1

(A4)

(A5)

(A6)

∂ log L  −(e−yi2 − c2 ) ∂c1 = , 2 −y −y i1 i2 ∂λ∂m1 − c1 )(e − c2 )] ∂m1 [1 + λ(e i=1

(A7)

∂ log L  −(e−yi1 − c1 ) ∂c2 = , 2 −y −y i1 i2 ∂λ∂m2 − c1 )(e − c2 )] ∂m2 [1 + λ(e i=1

(A8)

∂ log L  −(e−yi2 − c2 ) ∂c1 = , 2 −y −y i1 i2 ∂λ∂β1j − c1 )(e − c2 )] ∂β1j [1 + λ(e i=1

(A9)

n

n

n

∂ log L  −(e−yi1 − c1 ) ∂c2 = , 2 −y −y i1 i2 ∂λ∂β2j − c1 )(e − c2 )] ∂β2j [1 + λ(e i=1   n m−1 ∂ 2 log L  −3 −1 1 m1 3 − 2 log(m1 ) − 2 log(μi1 + m1 ) − = ∂m21 μi1 + m−1 1 i=1 n



yi1 −1 −3 −1 −2  m−4 + 2j m−3 m−2 1 yi1 + 2m1 + 2μi1 m1 yi1 + 3μi1 m1 1 1 + , −1 2 (1 + μi1 m1 )2 (m + j ) 1 j =0

(A10)

2 

λ(e−yi2 − c2 ) ∂ 2 c1 ∂c1 λ(e−yi2 − c2 ) − − 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) ∂m21 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) ∂m1 (A11)

Journal of Applied Statistics  ∂ 2 log L λ ∂c1 ∂c2 = , 2 −y −y ∂m1 ∂m2 [1 + λ(e i1 − c1 )(e i2 − c2 )] ∂m1 ∂m2 i=1 n  m−2 ∂ 2 c1 λ(e−yi2 − c2 ) ∂ 2 log L 1 (μi1 − yi1 ) ∂μi1 = − −1 −y −y 2 ∂m1 ∂β1j 1 + λ(e i1 − c1 )(e i2 − c2 ) ∂m1 ∂β1j (μi1 + m1 ) ∂β1j i=1  2

∂c1 ∂c1 λ(e−yi2 − c2 ) , − 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) ∂m1 ∂β1j

981

n

 ∂ 2 log L λ ∂c1 ∂c2 = , 2 −y −y ∂m1 ∂β2j [1 + λ(e i1 − c1 )(e i2 − c2 )] ∂m1 ∂β2j i=1   n m−1 ∂ 2 log L  −3 −1 2 m2 3 − 2 log(m2 ) − 2 log(μi2 + m2 ) − = ∂m22 μi2 + m−1 2 i=1

(A12)

(A13)

Downloaded By: [Famoye, Felix] At: 13:25 9 November 2010

n

(A14)

yi2 −1 −3 −1 −2  m−4 + 2j m−3 m−2 2 yi2 + 2m2 + 2μi2 m2 yi2 + 3μi2 m2 2 2 + , − −1 2 (1 + μi2 m2 )2 (m + j ) 2 j =0

2 

λ(e−yi1 − c1 ) λ(e−yi1 − c1 ) ∂ 2 c2 ∂c2 , − − 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) ∂m22 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) ∂m2 (A15)

 ∂ 2 log L λ ∂c2 ∂c1 = , −yi1 − c )(e−yi2 − c )]2 ∂m2 ∂β1j ∂m2 ∂β1j + λ(e [1 1 2 i=1 n  m−2 λ(e−yi1 − c1 ) ∂ 2 c2 ∂ 2 log L 2 (μi2 − yi2 ) ∂μi2 = − −1 −y −y ∂m2 ∂β2j 1 + λ(e i1 − c1 )(e i2 − c2 ) ∂m2 ∂β2j (μi2 + m2 )2 ∂β2j i=1 

2 ∂c2 ∂c2 λ(e−yi1 − c1 ) , − 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) ∂m2 ∂β2j n   −μi1 (1 + m1 yi1 )xj xs ∂ 2 log L ∂ 2 c1 λ(e−yi2 − c2 ) = − −y −y 2 ∂β1j ∂β1s 1 + λ(e i1 − c1 )(e i2 − c2 ) ∂β1j ∂β1s (1 + m1 μi1 ) i=1  2

∂c1 ∂c1 λ(e−yi2 − c2 ) , − 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) ∂β1j ∂β1s n

 ∂ 2 log L λ ∂c1 ∂c2 = , −yi1 − c )(e−yi2 − c )]2 ∂β ∂β ∂β1j ∂β2s [1 + λ(e 1 2 1j 2s i=1 n   −μi2 (1 + m2 yi2 )xj xs ∂ 2 log L λ(e−yi1 − c1 ) ∂ 2 c2 = − 2 −y −y ∂β2j ∂β2s (1 + m2 μi2 ) 1 + λ(e i1 − c1 )(e i2 − c2 ) ∂β2j ∂β2s i=1 

2 ∂c2 ∂c2 λ(e−yi1 − c1 ) . − 1 + λ(e−yi1 − c1 )(e−yi2 − c2 ) ∂β2j ∂β2s

(A16)

(A17)

(A18)

n

(A19)

(A20)