A hyper-Poisson regression model for overdispersed ...

A hyper-Poisson regression model for overdispersed and underdispersed count data S´ aez-Castillo, A. J.∗∗ ∗ ., Conde-Sánchez, A. Department of Statistics and Operations Research, University of Ja´ en, Spain

Abstract The Poisson regression model is the most common framework for modelling count data, but it is constrained by its equidispersion assumption. The hyperPoisson regression model described in this paper generalizes it and allows for over- and under-dispersion, although, unlike other models with the same property, it introduces the regressors in the equation of the mean. Additionally, regressors may also be introduced in the equation of the dispersion parameter, in such a way that it is possible to fit data that present overdispersion and underdispersion in different levels of the observations. Two applications illustrate that the model can provide more accurate fits than those provided by alternative usual models. Keywords: Regression model; Count data; Hyper-Poisson; Overdispersion; Underdispersion

1. Introduction The Poisson regression model for count data forces the conditional variance to be equal to the conditional mean, but this hypothesis is commonly rejected in real applications due to the existence of overdispersion (the conditional variance exceeds the conditional mean) or underdispersion (the conditional variance is less than the conditional mean). The case of overdispersion has been widely studied, although models which cope with underdispersion are less common. Most of these models which cope with under- and over-dispersion are generalizations of the Poisson model that appear with the inclusion of new parameters (usually one) in the expression of the Poisson probability function. Thus, the GECk model (Cameron and Trivedi, 1998; Winkelmann, 2008; Winkelmann and ∗ Corresponding address: Department of Statistics and Operational Research, University of Ja´ en, Polytechnic School of Linares, 23700 Linares (Ja´ en), Spain. Tel. +34 953648578 Fax +34 953648578 E-mail addresses: [email protected] (A. J. S´ aez-Castillo), [email protected] (A. Conde-S´ anchez) ∗ http://www.sciencedirect.com/science/article/pii/S0167947312004434 ∗ http://dx.doi.org/10.1016/j.csda.2012.12.009

Preprint submitted to Elsevier

December 11, 2015

Zimmermann, 1991), the generalized Poisson distribution (Consul and Famoye, 1992; Wang and Famoye, 1997; Famoye et al., 2004; Winkelmann, 2008), the double Poisson distribution (Efron, 1986; Johnson et al., 2005) or the COMPoisson model (Shmueli et al., 2005; Sellers and Shmueli, 2010) introduce a parameter which determines if the distribution is under- or over-dispersed, so that it is called dispersion parameter. That is also the case of the Poisson Polynomial model of order p (Cameron and Johansson, 1997; Johnson et al., 2005), which proposes the inclusion of new parameters as coefficients of a polynomial of order p, whose square is introduced as a factor in the Poisson probability function. Probably, the most used of these models in recent years has been that based on the Conway-Maxwell-Poisson (COM-Poisson or CMP) distribution. In this model, the normalizing constant is approximated by the truncation of the series that defines it, while there are no closed expressions of the mean and the variance, although approximations of these moments which show a high accuracy in a wide region of the parameter space may be considered. It is possible to introduce the covariates in both the location and/or the dispersion parameter, but not in the mean (Sellers and Shmueli, 2010; Geedipally and Lord, 2011; Sellers et al., 2012). Despite these regression models having shown a great versatility in the fit of datasets coming from very different fields (Sellers et al., 2012; Guikema and Coffelt, 2008; Lord et al., 2010), some authors warn about the problem of the use of approximations for the normalizing constant, the mean and the variance (Francis et al., 2012). A different approach to generalize the Poisson regression model is given by extended Poisson processes. In this way, the process based on a Gamma waiting time distribution (Winkelmann, 1995; Oh et al., 2006; Daniels et al., 2010, 2011) is well-known. More recently, McShane et al. (2008) proposed a model based on Weibull interarrival times which provides simpler expressions for the hazard function. In a slightly different way, other authors generate extensions of the Poisson process allowing that the transition rates between the states of the process are not constant, but they can increase (providing overdispersion) or decrease (providing underdispersion); that is the case of the model proposed by Podlich et al. (2004), who propose a semiparametric model for the transition rates in which the covariate effects of the regression model are included. On the other hand, Faddy and Smith (2011) propose a parametric model for the transition rates and use approximations of the mean and the variance of the resulting count process to introduce the covariates effect. In this paper we use a different generalization of the Poisson distribution with only two parameters, the Crow-Bardwell or hyper-Poisson distribution (Johnson et al., 2005), for the response variable of the regression model. This distribution may be considered as belonging to the family of confluent distributions because it is generated by the confluent function 1 F1 , and admits overdispersion as well as underdispersion. When it is overdispersed, it may be seen as compound Poisson distribution where the mixing distribution is a confluent hypergeometric distribution. 2

The main advantage of the proposed regression model is that the regressors may be introduced in the mean at the same time that they can influence the overor under-dispersed behaviour of the distribution. In fact, it is possible to model data which present both, overdispersion and underdispersion, in different levels of the observations, determined by different combinations of the values of the covariates. It implies a strong computational effort, but we have implemented algorithms to estimate the model in R (R Development Core Team, 2011) which work adequately in the analyzed datasets. In the next section the main properties of the hyper-Poisson distribution are summarized, emphasizing those that we will use in the regression model. This model is described in section 3, discussing the problem of the estimation of the parameters. Section 4 includes two examples to illustrate the utility of the regression model in real applications, one of them with underdispersed data and the other one in an overdispersed context; in both cases we have tried to compare the results with those obtained by other models (Poisson, negative binomial, COM-Poisson or Poisson Polynomial). The fits show that the hyperPoisson model improves the accuracy in both applications. The last section is a discussion with some final remarks. 2. The Crow-Bardwell or Hyper-Poisson distribution Bardwell and Crow (1964) proposed what is known as the hyper-Poisson distribution (from now in, hP ), also the Crow-Bardwell distribution, given by a probability mass function fy =

λy 1 , 1 F1 (1; γ; λ) (γ)y

y = 0, 1, 2, ...,

(1)

where γ, λ > 0, (a)r = a (a + 1) ... (a + r − 1) = Γ(a+r) for a > 0 and r a Γ(a) positive integer, and ∞ X (a)r z r F (a; c; z) = 1 1 (c)r r! r=0 is the confluent hypergeometric series (Johnson et al., 2005, p. 23). The main characteristic of this distribution is that it is overdispersed if γ > 1, is the Poisson distribution if γ = 1 and is underdispersed if γ < 1; that is because γ is interpreted as a dispersion parameter. On the other hand, λ is the mean when γ = 1, corresponding to the Poisson model. The probability generating function is given by g (z) =

1 F1

(1; γ; λz) , (1; γ; λ)

1 F1

(2)

so the hP distribution may be considered as a member of the family of generalized hypergeometric probability distributions (Johnson et al., 2005, pp. 89-96). Furthermore, as the probability generating function is specifically related to the confluent hypergeometric function, it also may be considered as a particular 3

case of the family of confluent hypergeometric distributions (Johnson et al., 2005, p. 202). From a different point of view, the hP distribution may be seen as a weighted Poisson distribution (Ridout and Besbeas, 2004), taking into account that its probability mass function may be expressed as fy = where W =

P∞

y=0

e−λ λy ωy y!

e−λ λy ωy , W y!

y! is the normalizing constant, considering ωy = eλ (γ) . y

It is easy to prove that (1) verifies the recurrence equation (γ + y) fy+1 = λfy .

(3)

From it, multiplying both members by powers of y and adding on y, we can deduce recurrence relations between non-central moments, given by mh+2 + (γ − 1) mh+1 = λ

h X h k=0

k

mk ,

(4)

with h ≥ 1. If we take h = 1, a relation between mean, µ, and variance, σ 2 , may be obtained; from m2 + (γ − 1) m1 = λ (m1 + 1) we have that σ 2 = λ + (λ − (γ − 1)) µ − µ2 ,

(5)

which determines a dependence of the variance to the mean given by a second degree polynomial. An expression of the mean, not involving the variance, may be deduced if we sum (3) over y, from y = 0 to ∞, obtaining µ = λ − (γ − 1) (1 − f0 ) = λ − (γ − 1)

(1; γ; λ) − 1 . 1 F1 (1; γ; λ)

1 F1

(6)

It is clear that when γ = 1 (the Poisson distribution case), the mean matches up with λ, but, in general, λ is not always near the mean. Additionally, an alternative expression of the mean may be obtained from the derivative of the probability generating function (2) evaluated in one, given by µ = g 0 (1) =

λ 1 F1 (2; γ + 1; λ) γ 1 F1 (1; γ; λ)

(7)

Nevertheless, in both equations the mean may be expressed explicitly from the parameters γ and λ, but these parameters do not have a direct expression of 4

1.99

2.0

8

1.87

1.7

1.5

4

6

γ

1.6

2

1.4

9

4

1.0

1.3

7

1.2 4 2

1.12 0.5

0.99 0.87 0.74 0.62 0.49 1

2

3

µ

Figure 1: Level and contour plots for the variance-mean ratio in terms of µ and γ

their value in terms of the other parameter and the mean, because they appear as arguments of the 1 F1 function, which does not have an explicit inverse. In order to display the dispersion properties of the distribution, we have analysed the variance-mean ratio in terms of the mean and the γ parameter. Thus, Figure 1 shows a contour plot for the variance-mean ratio for a wide range of values of µ and γ. Bounds for the ratio in terms of µ and γ appear not to exist, and it can be seen that, given a fixed value of the mean, it is an increasing function in terms of γ. This versatility to provide over- and underdispersion makes the hP distribution an appropriate model to be considered in the methodology proposed by Xu et al. (2012) for integer-valued pure time series. When γ > 1 (that is to say, in the overdispersion case) the hP distribution may be seen as a Poisson compound distribution with a confluent hypergeometric distribution which, in general, is given by a density function (Gupta and Nadarajah, 2004) v−1

f (p) =

pu−1 (1 − p) e−λp ; Beta (u, v) 1 F1 (u; u + v; −λ)

Specifically, it is a random variable with distribution P oisson (λP ) where there exists individual heterogeneity in the average, given by λP , P being a random variable that models the individual heterogeneity (providing the overdispersion) with a confluent hypergeometric distribution of parameters u = 1, v = γ − 1 and −λ. That is because the probability mass function of a hP distribution

5

with parameters λ and γ appears as ∞

y

γ−1−1

e−λp (λp) (1 − p) eλp dp y! Beta (1, γ − 1) 1 F1 (1; γ; λ) 0 Z Γ (γ) λy ∞ y γ−1−1 p (1 − p) dp = Γ (γ − 1) 1 F1 (1; γ; λ) y! 0 Γ (γ) λy Γ (1 + y) Γ (γ − 1) = Γ (γ − 1) 1 F1 (1; γ; λ) y! Γ (γ + y) 1 λy = . 1 F1 (1; γ; λ) (γ)y Z

P [Y = y] =

Thus, in this overdispersion context, the hP distribution may be compared with other compound Poisson distributions, such as the negative binomial distribution. 3. Regression model 3.1. Model formulation Being yi the value of the response variable of the i − th individual of the sample and x0i = (1, xi1 , xi2 , ..., xik ) the observed covariates in this i − th individual, we are going to consider the model in which yi follows a hP distribution with mean and dispersion parameter given by 0

µi =exi β 0

γi =exi δ , in such a way that the location parameter λi will be determined by (6) from the values of µi and γi ; from now on, we will name this model hP (µi , γi ). So, it is a model in which we can analyze the effect of the covariates over the mean, as in the common Poisson model, the NegBin I and the NegBin II models (Cameron and Trivedi, 1998), based on the negative binomial distribution (NB), or the GWRM (Rodr´ıguez-Avi et al., 2009), based on the Generalized Waring distribution, among others, and also over the dispersion parameter. It is even possible that γi values greater and less than one appear with the same dataset, in such a way that over- and under-dispersed conditional distributions may appear, related to different values of the vector of covariates. 3.2. Model estimation The estimation of the regression coefficients β and δ is carried out maximizing the log-likelihood function. If we have a sample y1 , ..., yn , it is given by log L (γ, λ|y) = −

n X

log Γ (γ + yi ) + log(λ)n¯ y + n (log (γ) − log (1 F1 (1; γ; λ))) .

i=1

6

This function depends on γ and λ, while we are modelling µ and γ, so we must replace the location parameter λ with its expression in terms of the mean µ, which can be deduced from (6), in each step of the optimization process. The problem is that there is not a closed and direct expression of λ in terms of µ and γ, but we have overcome this difficulty solving the resulting equation by numerical methods in each evaluation of the log-likelihood function within the optimization process. Specifically, given µ and γ, we will search for the λ value as the solution of (6). In short, we must apply the maximum-likelihood method in the usual way, optimizing the log-likelihood function, considering the regression coefficients in the expression of µ and γ, and calculating the value of λ in each evaluation of the log-likelihood function as the solution of (6). In this sense, we have used the functions nlm and optim of R (R Development Core Team, 2011) to maximize the log-likelihood, and optimize or uniroot to solve (6) numerically. We must highlight that the need to solve (6) in each evaluation of the log-likelihood within its optimization strongly increases the computational effort and the time necessary to obtain a fit, given a dataset. In order to minimize this computational effort, it is advisable to provide bounds for the parameter λ as close as possible to the actual value, reducing the range of the optimize or uniroot functions to search for the solution. Thus, we have that (see the appendix for the details) λ ≥ min {µ, max (µ + (γ − 1) , γµ)}

(8)

λ ≤ max {µ, min (µ + (γ − 1) , γµ)} .

(9)

and

3.3. Model inference The Wald test and the likelihood-ratio test (LRT) may be considered in order to test the significance of the regression coefficients. In the same way, it is also possible to test if the model hP (µi , γi ) is adequate versus a simpler model hP (µi , γ) where the dispersion parameter is constant. For non-nested models the goodness of fit may be compared by a wide range of measures. We have considered the Bayesian Information Criterion (BIC), given by BIC = −2 × log L + k × log n, where k is the number of parameters of the model and n the sample size. It is important to highlight that BIC is a measure for discriminating between candidate models, but it cannot tell us if the fit is good or bad because there is no a benchmark against which to judge the measure of fit. Finally, in order to assess the global adequacy of the model to data, we have considered an analysis of the Pearson residuals. As their distribution is unknown, we have taken into account the simulated envelop (Garay et al., 2011; Atkinson, 1981), which can be used as a diagnostic tool for detecting incorrect specification of the error distribution and the systematic component, as well as the presence of outlying observations. 7

4. Applications In this section we illustrate two fits of the described models to real data, comparing them with those provided by other common models in the context of over- or under-dispersed count data. The Poisson model has been fitted with the glm function, and NegBinII models by means of the function glm.nb in the MASS package of R (Venables and Ripley, 2002). Finally, the fit of the COMPoisson model has been also carried out optimizing its log-likelihood function with the nlm and optim functions and with the COMPoissonReg package of R (Sellers and Lotze, 2011). 4.1. Takeover bids First, we have considered data from Cameron and Johansson (1997), about the number of bids received by 126 US firms that were targets of tender offers during the period 1978-85, and were actually taken over within 52 weeks of the initial offer. The dependent count variable is the number of bids after the initial bid (NUMBIDS) received by the target firm, and the regressors are: • Defensive actions taken by management of the target firm: indicator variables for legal defence by lawsuit (LEGLREST), proposed changes in asset structure (REALREST), proposed change in ownership structure (FINREST) and management invitation for friendly third-party bid (WHITEKNT). • Firm-specific characteristics: bid price divided by price 14 working days before bid (BIDPREM), percentage of stock held by institutions (INSTHOLD), total book value of assets in billions of dollars (SIZE ) and book value squared (SIZESQ). • Intervention by federal regulators: an indicator variable for Department of Justice intervention (REGULATN). These data appear in the Ecdat package of R (Croissant, 2011). Cameron and Johansson (1997) show that, ignoring the covariates, the data present only a small amount of overdispersion (the variance-mean ratio is 1.180), which is expected to disappear when regressors are added. In fact, they model successfully the dataset with an underdispersed Poisson Polynomial of order 1 (PP1). This model has probability mass function given by f (y | λ, a1 ) =

(1 + a1 y)2 e−λ λy × , y! η1 (λ, a1 )

where η1 (λ, a1 ) is the normalizing constant; λ > 0 is considered as location parameter, where regressors are introduced by λi = exp(x0i β). The authors compare the goodness of fit of the PP1 model to those provided by other distributions, such as the Poisson Hurdle model, double Poisson and GECk models, by means of the BIC. They prove that the PP1 model is the most accurate model. 8

Table 1: Takeover bids example: parameter estimates for regressor coefficients over the mean (Poisson model and hP (µi , γ) model) and over the location PP1 parameter Variable ONE LEGLREST REALREST FINREST WHITEKNT BIDPREM INSTHOLD SIZE SIZESQ REGULATN

Poisson 0.986 0.260 -0.196 0.074 0.481 -0.678 -0.362 0.179 -0.008 -0.008

BIC

418.3

t-statistic 2.39 2.09 -1.08 0.28 4.54 -2.29 -1.13 2.87 -2.74 -0.21

Estimate PP1 t-statistic 0.210 0.28 0.522 2.38 -0.372 -1.39 0.138 0.47 1.013 3.88 -1.334 -2.46 -0.757 -1.23 0.329 3.99 -0.014 -3.30 -0.081 -0.36 a ˆ1 = 3.382 3.02 398.1

hP 1.033 0.246 -0.276 0.107 0.492 -0.704 -0.390 0.179 -0.008 -0.010 δˆ0 = −2.624 393.5

t-statistic 2.610 2.229 -1.920 0.655 4.433 -2.508 -1.248 4.089 -3.332 -0.081 -5.347

First, we have fitted a hP (µi , γ) model, with the mean depending on the regressors and γ being constant. A summary of the results appear in Table 1: here we have considered the same outputs that Cameron and Johansson (1997) propose in their table of results. It can be seen that the hP model presents a lower BIC, so it improves the goodness of fit in relation to the PP1 model, and thus, also in relation to the rest of the model considered by these authors. It can be also highlighted that γ estimate (less than one) implies underdispersion, as in the PP1 fitted model. Moreover, the hP (µi , γ) model presents an additional advantage, because it permits interpretation of the regression coefficients β as in the Poisson model, that is to say, in relation to the effect that the regressors have over the mean of the distribution. Both β estimates, corresponding to the hP and the Poisson models, are close, but a little higher in the hP model, apart from the LEGLREST variable. Also both fitted models present the same set of regressors as significant for the mean (5% of significance level in both, the Wald test and the LRT), and these are also the same variables that are significant for the location parameter of the PP1 fitted model (LEGLREST, WHITEKNT, BIDPREM, SIZE and SIZESQ). Next, we have considered a model where the dispersion parameter also depends on the covariates, i.e., a hP (µi , γi ) model. As the number of covariates is high, we have carried out a forward selection in the dispersion parameter, including in each step the most significant covariate for γ. A summary of the resulting model appears in Table 2; the value of the BIC is 388.2666, less than those obtained in the rest of the models. If we test this model vs. the hP (µi , γ) through the LRT, it is highly significant (χ24 = −2(−170.17 + 157.86) = 24.59, with p − value < 0.001). The Wald test shows that all the β coefficients are significant apart from those corresponding to FINREST and REGULATN, which are near significance. Here there exists a discrepancy with the LRT, for which LEGLREST and REALREST are not significant (p-values 0.059 and 0.108 respectively). The QQ-plot with the simulated envelope for the Pearson residuals with k = 19 simulated samples are shown in Figure 2 (left): as the proportion of points that falls outside the envelope is very low, then there is not evidence against the adequacy of the fitted model.

9

Table 2: Takeover bids example: parameter estimates for regressor coefficients over the mean (β) and over the dispersion parameter (δ) of a hP (µi , γi ) model)

Variable ONE LEGLREST REALREST FINREST WHITEKNT BIDPREM INSTHOLD SIZE SIZESQ REGULATN

β 1.421 0.224 -0.380 0.420 0.326 -0.872 -0.957 0.214 -0.010 0.220

Estimate t-statistic δ 4.132 20.400 2.235 No sig. -2.335 3.721 1.951 7.739 3.753 No sig. -3.834 -20.149 -4.531 No sig. 16.348 No sig. -8.829 No sig. 1.920 4.584

t-statistic 1.719 2.230 1.994 -2.007

2.310

Furthermore, we think it is also interesting to emphasize that, as the previous models provide underdispersion to all the conditional distributions corresponding to each value of the observation level, given by the different values of the regressors, this new fitted model provides some values of the γ parameter higher than one, showing overdispersion in some observations. In this sense, it is interesting that most of the cases are only slightly underdispersed, while a few cases are highly overdispersed. We have also tried to fit two COM-Poisson models. First, a model with regressors only in the location parameter, with the COMPoissonReg package of R (Sellers and Lotze, 2011), but the BIC is 413.9092, higher than those corresponding to the PP1 fitted model. And second, a more complex model with regressors in both, the location and the dispersion parameters, but the algorithm of the optimization of the log-likelihood function did not converge. 4.2. Number of extra academic years Now, the response variable is the number of extra years that students of short degrees of the University of Jaén (Spain) needed to conclude their studies, taking into account the gender (male being the reference category) and the field of study: social sciences and law, technology, experimental sciences and health, the first one being the reference category. Thus, for example, if a student needed four years to finish its short degree (three years), the number of extra years is 1; or if they finished it in three years, the number of extra years is zero. The dataset includes data about 5960 students. The variance-mean ratios in the observations level show a clearly overdispersed dataset. We have carried out fits of Poisson, NB (specifically, a NegBin II) and two hP models. The first of these is again a hP (µi , γ), where we introduce the regressors only in the equation of the mean, so it may be compared with Poisson and NegBin II models. In the second one, a hP (µi , γi ) model, we consider the regressors in the mean as well as in the dispersion parameter.

10

Extra academic years

4

Pearson residuals

−4

−2

0

−2

2

0

Pearson residuals

2

6

8

4

Takeover Bids

−2

−1

0

1

2

−3

Standard normal quantiles

−2

−1

0

1

2

3

Standard normal quantiles

Figure 2: Plots of the Pearson residuals against the order statistics of the normal distribution from the hP (µi , γi ) models fitted to data in both applications

Finally, in order to compare this more complex hP model with a competitive model (in terms of the number of parameters), we have also fitted a COMPoisson model with regressors in the location and the dispersion parameters (Sellers et al., 2012), so it admits over- and under-dispersion in each case of the observations level, as our hP model. If we compare the fitted Poisson and NB models with the hP (µi , γ) fitted model in terms of the BIC (Table 3), it can be seen that this is the most accurate; the dispersion parameter estimate (ˆ γ = exp(1.758) > 1) shows overdispersion. In turn, the hP (µi , γi ) fitted model improved the fit strongly (Table 4), also with a lower BIC than the COM-Poisson fitted model. The likelihood ratio test permits confirmation of the significance of this more complex model in respect to that which only includes regressors on the mean (χ24 = −2 × (9839.79 − 9968.31) = 257.05, with p − value < 0.001). It is important to highlight that t-statistics from Tables 3 and 4 do not permit evaluation of the significance of the regressor Branch, but only of their dummy variables; in this case, the LRT that compares the model to that which does not include this regressor provide an adequate evaluation of its global significance: we have checked that it is highly significant (p − value < 0.001). In relation to the goodness of fit of the hP (µi , γi ) estimated model, the QQplot with the simulated envelope for the Pearson residuals (Figure 2, right) show that there is no strong evidences of lack of fit. Finally, we have calculated sample variance-mean ratios as well as the expected ones according to the NB, hP and COM-Poisson fitted models in the observations level. The hP (µi , γi ) fitted model explains better these ratios, and 11

Table 3: Extra academic years example: BIC and parameter estimates for regressor coefficients over the mean (β) of Poisson, NB and hP models. θ is the dispersion parameter of NB model. * indicates that the coefficient is statistically significant at level 5% 0

(Intercept) Gender (woman) Branch (exp. science) Branch (health) Branch (technology) BIC

0

P oisson µi = exi β βˆ s.e. 0.342* 0.021 -0.078* 0.022 -0.009 0.124 -1.465* 0.075 0.922* 0.022

N B θ, µi = exi β βˆ s.e. 0.354* 0.027 -0.095* 0.029 -0.017 0.150 -1.463* 0.079 0.914* 0.030 θˆ = 2.879 0.163 20170.85

20859.92

0

µi = exi β , γ = eδ0 ˆ β s.e. 0.346* 0.027 -0.083* 0.028 -0.015 0.155 -1.462* 0.082 0.919* 0.028 δˆ0 = 1.758 0.068 19988.78 hP

Table 4: Extra academic years example: BIC and parameter estimates for regressor coefficients over the mean or the location parameter (β) and over the dispersion parameters (δ) of the hP and COM-Poisson (CMP) models. * indicates that the coefficient is statistically significant at level 5% 0 0 0 0 µi = exi β , γi = exi δ s.e. δˆ s.e. 0.028 3.503* 0.335 0.028 0.036 0.222 0.189 11.389* 4.665 0.086 6.365 14.273 0.029 -2.756* 0.331 19766.5

hP

(Intercept) Gender (woman) Branch (exp. science) Branch (health) Branch (technology) BIC

βˆ 0.327* -0.055 -0.011 -1.468* 0.933*

CM P λi = exi β , νi = exi δ ˆ β s.e. δˆ s.e. -0.462* 0.025 -2.083* 0.130 -0.001 0.026 -0.058 0.038 -0.317* 0.161 0.823* 0.361 -0.788* 0.071 0.908* 0.304 1.306* 0.053 1.760* 0.137 19836.13

all of them show overdispersion of the conditional distributions. Results are not included for the sake of brevity. 5. Discussion The proposed regression models, based on the hP distribution, appear to be an attractive alternative to other models that explain over- and underdispersion, such as Poisson-Polynomial of order 1 or COM-Poisson models. We have shown that the range of values of the variance-mean ratio of this distribution is wide in both situations of over- and under-dispersion. The existence of explicit expressions of the mean in terms of the parameters facilitates the construction of regression models where the effect of the covariates can be evaluated directly on the mean, as occurs in the very common Poisson or NB models, for example. We think that this is an advantage in comparison to other models that introduce the covariates in the equations of parameters with a not very clear meaning, such as location parameters. It is also possible to include regressors on the dispersion parameter, in such a way that these models can fit adequately datasets that present over- and under-dispersion simultaneously in the observations level. Additionally, we have explored another parametrization of the model in which regressors are included in a expression of the variance, but the equation that links the values of the variance with the parameters of the distribution leads to negative values of these parameters in the empirical applications, providing improper probability mass functions. That is the reason because we have discarded this approach. 12

Finally, we have proved that they fit data in some real applications with more accuracy than other competitive models (such as Poisson, Negative Binomial, PP1 or COM-Poisson) both in an over- and in an under-dispersion context. The main difficulty in the use of these models is in the estimation of the coefficients, because the maximum-likelihood method implies a strong computational effort. Nevertheless, we have found no problems in the fit of different datasets by means of the above mentioned functions and packages of R. Acknowledgements The authors are grateful for the constructive suggestions provided by the reviewers, which improved the paper. Appendix In relation to the bounds for the parameter λ, from (6), λ = µ + (γ − 1)

(1; γ; λ) − 1 . 1 F1 (1; γ; λ)

1 F1

So, • if γ < 1, then, (1; γ; λ) − 1 < 0; F 1 1 (1; γ; λ) 1 F1 (1; γ; λ) − 1 λ > µ + (γ − 1) because 1 1 F1 (1 + 1; γ + 1; λ)

if λr coefficients verify r+1 1 > ⇐⇒ γ + r > (r + 1)γ ⇐⇒ 1 > γ (γ)r (γ + 1)r in such a way that if γ < 1, then λ > γµ. Equivalently, if γ ≥ 1, then, 1 F1

1 F1 (1; γ; λ) ≤1 (1 + 1; γ + 1; λ)

so λ ≤ γµ. Thus, we have the expressions (8) and (9). References Atkinson, A., 1981. Two graphical displays for outlying and influential observations in regression. Biometrika 68 (1), 13–20. Bardwell, G. E., Crow, E. L., 1964. A two-parameter family of hyper-Poisson distributions. Journal of the American Statistical Association 9 (305), 133– 141. Cameron, A. C., Johansson, P., 1997. Count data regression using series expansions: with applications. Journal of Applied Econometrics 12 (3), 203–223. Cameron, A. C., Trivedi, P. K., 1998. Regression Analysis of Count Data. Cambridge University Press. Consul, P., Famoye, F., 1992. Generalized Poisson Regression-Model. Communications in Statistics-Theory and Methods 21 (1), 89–109. Croissant, Y., 2011. Ecdat: Data sets for econometrics. R package version 0.16.1. Daniels, S., Brijs, T., Nuyts, E., Wets, G., 2010. Explaining variation in safety performance of roundabouts. Accident Analysis & Prevention 42 (2), 393–402. Daniels, S., Brijs, T., Nuyts, E., Wets, G., 2011. Extended prediction models for crashes at roundabouts. Safety Science. Dunn, J., 2008. compoisson: Conway-Maxwell-Poisson Distribution. R package version 0.3. Efron, B., 1986. Double exponential-families and their use in Generalized LinearRegression. Journal of the American Statistical Association, 81 (395), 709– 721. 14

Faddy, M. J., Smith D. M., 2011. Analysis of count data with covariate dependence in both mean and variance. Journal of Applied Statistics, 38 (12), 2683–2694. Famoye, F., Wulu, J. T., Singh, K. P., 2004. On the Generalized Poisson regression model with an application to accident data. Journal of Data Science 2, 287–295. Francis, R. A., Geedipally, S. R., Guikema, S. D., Dhavala, S. S., Lord, D., LaRocca, S., 2012. Characterizing the performance of the Conway-Maxwell Poisson generalized linear model. Risk Analysis 32 (1), 167–183. Garay, A. M., Hashimoto, E. M., Ortega, E. M., Lachos, V. H., 2011. On estimation and influence diagnostics for zero-inflated negative binomial regression models. Computational Statistics & Data Analysis 55 (3), 1304 - 1318. Geedipally, S. R., Lord, D., 2011. Examination of crash variances estimated by Poisson–Gamma and Conway–Maxwell–Poisson models. Transportation Research Record, 2241, 59–67. Guikema, S. D., Coffelt, J. P., 2008. A flexible count data regression model for risk analysis. Risk Analysis 28 (1), 213–223. Gupta, A. K., Nadarajah, S., 2004. Handbook of Beta distributions and its applications, Marcel Dekker, New York. Johnson, N. L., Kotz, S., Kemp, A. W., 2005. Univariate Discrete Distributions, third edition. Wiley, New York. Lord, D., Geedipally, S. R., Guikema, S. D., 2010. Extension of the application of Conway-Maxwell-Poisson models: analyzing traffic crash data exhibiting underdispersion. Risk Analysis 30 (8), 1268–1276. McShane, B., Adrian, M., Bradlow, E. T., Fader, P. S., 2008. Count Models Based on Weibull Interarrival Times. Journal of Business & Economic Statistics 26 (3), 369–378. Oh, J., Washington, S. P., Nam, D., 2006. Accident prediction model for railwayhighway interfaces. Accident Analysis & Prevention 38 (2), 346–356. Podlich, H. M., Faddy, M. J., Smyth, G. K., 2004. Semi-parametric extended Poisson process models for count data. Statistics and Computing 14, 311–321. R Development Core Team, 2011. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. Ridout, M. S., Besbeas, P., 2004. An empirical model for underdispersed count data. Statistical Modelling 4 (1), 77–89.

15

Rodr´ıguez-Avi, J., Conde-Sánchez, A., Sáez-Castillo, A. J., Olmo-Jiménez, M. J., Mart´ınez-Rodr´ıguez, A. M., 2009. A generalized Waring regression model for count data. Computational Statistics & Data Analysis 53 (10), 3717–3725. Sellers, K. F., Borle, S., Shmueli, G., 2012. The COM-Poisson model for count data: a survey of methods and applications. Applied Stochastic Models in Business and Industry 28, 104–116. Sellers, K. F., Lotze, T., 2011. COMPoissonReg: Conway-Maxwell Poisson (COM-Poisson) Regression. R package version 0.3.2. Sellers, K. F., Shmueli, G., 2010. A Flexible Regression Model For Count Data. Annals of Applied Statistics 4 (2), 943–961. Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S., Boatwright, P., 2005. A useful distribution for fitting discrete data: revival of the Conway-MaxwellPoisson distribution. Journal of the Royal Statistical Society Series C 54 (1), 127–142. Venables, W. N., Ripley, B. D., 2002. Modern Applied Statistics with S, 4th Edition. Springer, New York, ISBN 0-387-95457-0. Wang, W., Famoye, F., 1997. Modeling household fertility decisions with generalized Poisson regression. Journal of Population Economics 10 (3), 273–283. Winkelmann, R., 1995. Duration dependence and dispersion in count-data models. Journal of Business & Economic Statistics 13 (4), 467–474. Winkelmann, R., 2008. Econometric analysis of count data. Springer-Verlag. Winkelmann, R., Zimmermann, K., 1991. A new approach for modeling economic count data. Economics Letters 37 (2), 139–143. Xu, H., Xie, M., Goh, T. N., Fu, X., 2012. A model for integer-valued time series with conditional overdispersion. Computational Statistics & Data Analysis 56 (12), 4229–4242.

16

A hyper-Poisson regression model for overdispersed ...

A hyper-Poisson regression model for overdispersed ...

Suggest Documents

an mm-algorithm for a class of overdispersed regression ... - CiteSeerX

comparison count regression models for overdispersed alga data

a neighborhood regression model for sample

Concordance Correlation Coefficient for Overdispersed Count Data

Overdispersed-Poisson Model in Claims Reserving: Closed ... - MDPI

A regression model with a hidden logistic process for signal

The Logistic Regression Model

Logistic Regression Model for Business Failures ...

Multiple-regression hospitalization-cost model for ...

Multiple-regression hospitalization-cost model for ...

Regression model for Poisson counts

Fuzzy Multiple Regression Model for Estimating Software ...

Localized Model Selection for Regression - Semantic Scholar

Gaussian process regression model for ... - Proteome Science

Multilevel-model assisted generalized regression estimators for ...

Model Averaging for Linear Regression - CiteSeerX

Model Weights for Regression Estimation - American Statistical ...

Multivariable fractional polynomial method for regression model

Multiple Linear regression model for predicting ...

Dynamic Model Averaging for Bayesian Quantile Regression

Robust group-Lasso for functional regression model

MODEL SELECTION FOR (AUTO-)REGRESSION WITH DEPENDENT ...

Probabilistic Gaussian Copula Regression model for ...

Logistic Regression Model-building Strategies for ... - LexJansen