Alternate Generalized Linear Models for Nonhomogeneous Poisson Regression. Bruce Cooil, Vanderbilt University. Owen Graduate School, Vanderbilt ...
Alternate Generalized Linear Models for Nonhomogeneous Poisson Regression Bruce Cooil, Vanderbilt University Owen Graduate School, Vanderbilt University, 401 21st Ave. S., Nashville, TN 37203 Key Words: Negative Binomial, Link, Hazard 1. INTRODUCTION AND SUMMARY This paper looks at several alternatives to standard Poisson regression models that are especially suited for insurance and manufacturing applications. The general model includes the Poisson, negative binomial and zero-inflated Poisson regression models as special cases.
occur: with probability pi the driver is in a "perfect state" where no accidents occur and, with probability 1-pi, accidents occur according to a process that depends on an individual random effect and covariates that summarize individual characteristics and driving conditions. Thus, the probability of zero events is higher than it would be if the Poisson process itself always determined whether events occurred. In a general model the "perfect state" probability pi is also a function of covariates Ui and coefficients á,
2. COMBINING CURRENT APPROACHES I begin with a description of a model that combines current standard approaches. Assume there are a number of independent systems that generate events of interest. In an automobile insurance application, each system could represent a different driver and the events of interest could be accidents that lead to liability claims. In a quality control example, each system could be an inspection unit or manufacturing system and the events could be defects or calibration shifts. With probability pi, system i generates 0 events, and with probability 1-pi, events are generated by a Poisson process with intensity function ëi(t) = ë0(t) îi exp[V'iâ], (1) where ë0(t) is a temporal baseline intensity function that is common across systems, îi represents a random effect (that is typically assumed to have a gamma distribution with unknown mean and variance), Vi is a vector of covariates, and â is a vector of covariate coefficients. When the random effect is assumed to have a gamma distribution, the predictive distribution for future claims is a zero-inflated negative binomial distribution (Lambert 1992; Cooil 1991; Lawless 1987). The general model of (1) could be referred to as a zero-inflated negative binomial regression. In an automobile insurance example, this process could be used to model how accidents
pi = exp(U'iá)/[1 + exp(U'iá)].
(2)
Here are some common simplifying assumptions that are frequently made: 1)
Ui = Vi;
2)
assumption (1) & á = ôâ for some unknown scalar ô (i.e., one set of coefficients is a scalar multiple of the other);
3)
temporal homogeneity: assume ë0(t) = t, so that ëi(t) = tîiexp[V'iâ]; assumption (3) & no random effect: this is a simple zero-inflated Poisson process with intensity function
4)
ëi(t) = t exp[V'iâ]. 3. THE NEED FOR ALTERNATIVE APPROACHES The rest of this paper outlines possible remedies for two common problems. In many applications there are simply too many parameters to estimate. For example, in rating models for automobile insurance, it is necessary to consider three large groups of covariates: variables that specify the driver's region, driver class variables (this is an age-sex-marital status classification), and variables that summarize
each driver's accident experience. Even in very large samples, some of these covariates may not be very important statistically. Nevertheless they must be included in the model because they are part of the rating function! A flexible approach would be to fit a model with at least three sets of indicator variables. But the data requirements would be prohibitive for all but the largest companies. And this approach becomes even more difficult when one tries to estimate two and three-way interactions. A second problem is that the intensity function in (1) is not always the appropriate function of the covariates (when the covariates are not simply indicator variables). 4. WHEN THERE ARE TOO MANY COVARIATES One possibility is to use order restricted maximum likelihood estimators for indicator coefficients (Robertson, Wright and Dykstra, 1988, ch. 4). The constraints in this case would force the coefficient estimates to reflect the historical risk-order among various driver categories. In addition one might use model selection criteria to choose only the most important interactions among classification criteria. Finally, one can experiment with simplified forms of interaction. For example, in modeling the rate at which automobile accidents occur, it may suffice to model interactions between driver classification (which is a function only of age, sex and marital status) and the point group variable (which is one possible summary of a driver's previous accident record) by using a separate linear function of the number of points within each driver classification group. (The number of points is used to define the point groups.) 5. ALTERNATIVE LINK FUNCTIONS The intensity function in (1) can be written as, ëi(t) = ë î C h(V'iâ) 0(t) i where h is the inverse link function. Of course, the shape of the link function is of no practical consequence when all of the covariates are indicator variables. In this case the canonical logarithmic link function is a natural choice so
that the inverse link function is h(x) = exp(x) as in (1). The inverse of the power link, 1/è h0(x) = (1+èx) ( =exp(x) if è=0) has also been proposed as a simple way of generalizing (1). But for those cases where it seems reasonable that the intensity would be monotonically related to the covariate, then why not take the idea of the inverse power link one step further and consider a flexible 2-parameter inverse link? A conventional way to do this would be to let h be the inverse power link after applying the Box-Cox transformation to the -1 covariates, sohthat 1(x) = L (L (x)), è1 è2 where L (x) is the Box-Cox transformation, è Lè(x)=(xè - 1)/è. Thus in this case, h1(x) = [1 + è1log(x)]1/è1, è1 > 0, è2 = 0 = [1 + è è-1(xè2 - 1)]1/è1, è > 0, è > 0 1 2
1
= exp[è-21(xè2 - 1)],
2
è1 = 0, è2 > 0.
This is an extraordinarily broad class of transformations. (I assume throughout that the covariates have been defined, or redefined, so that they are strictly positive.) In many cases, you would want the intensity to be a concave function of extreme values of the covariate, so that there is a decreasing marginal increase in the intensity as the covariate reaches extreme values. To ensure that the intensity function is eventually a concave function of the covariate, it would make sense to estimate the two shape parameters subject to the constraint that è1 > è2 (this ensures that h1 is eventually concave). Furthermore, h1(x) is strictly concave for all x > 0, if and only if è1 > è2 and è2 > 1. It is convex and then concave as x64 whenever è1 > è2 and è2 < 1. In fact there is an inflection point in h1 at x = (1 - è )1/è2 2
whenever è2 < 1. The coefficient of the covariate would of course determine where this change in curvature occurs. The inflection point is an interesting feature, although its use in most applications
would only be plausible in the case where h1 is first convex and then concave (rather than from concave to convex--this would happen whenever è1 < è2 and è2 < 1 ). If this property is not important, a more "pedestrian" choice, would be to use an inverse link function of the form, -1 h2(x) = L (L (x)) è1 è2 (where as before: Lè(x) = (xè - 1)/è). Thus in this case, h2 is the inverse power link applied to the power link itself, so that h2(x) = è-21log(1 + è2x), = è-1[(1 + è x)è1/è2 - 1],
è1 > 0, è2 > 0
= è-11[exp(è1x) - 1],
è1 > 0, è2 = 0.
1
2
è1 = 0, è2 > 0
This is concave whenever è1 < è2. A third possibility is to model the intensity as a "bathtub" hazard rate (for selected covariates). A general inverse link function of this form is h3(x) = [è1è2]{1-[1 - exp(-(x)è1)]è2}-1 C [ exp(-(x)è1)]è2-1exp(-(x)è1)(x)è1-1
1
include flexible hazard functions! In general the intensity function would be of the form where
ëi(t) = ë0(t) îi h(V'iâ)
(3)
L
h(V'iâ) = J hR(Vi'Râ), R=1
and each inverse link function hR is chosen for the particular subset of covariates ViR. The model (3) provides a way of modeling and testing for a wide variety of process attributes, including tests for: whether it is a pure process or if the zero-inflation is appropriate, temporal nonhomogeneity, whether the random effect has positive variance, and what the most important covariates are for zeroinflation and for the intensity itself. The general model also provides a framework for comparing various links. In this way one can also directly justify simpler models. REFERENCES Andersen, P. K., and Gill, R. (1982), "Cox's Regression Model for Counting Processes: A Large Sample Study," The Annals of Statistics,10, 1100-1120.
-
(Mudholkar and Srivastava 1993). The function h3 is the hazard function of the exponentiatedWeibull cumulative distribution function F(x) = [1 - exp(-(x)è1)]è2. In automobile insurance applications, this type of inverse link function would be appropriate for covariates like age, where typically accident rates initially decrease with age, level-out in the middle years and then eventually increase with advancing age. 6. CONCLUSION In a given application it may be appropriate to combine links, so that each is tailored to a different covariate or group of covariates. For example, one might use the natural (or canonical) inverse link together with inverse links that are designed to have an inflection point or to be concave, and to also
Cooil, B. (1991), "Using Medical Malpractice Data to Predict the Frequency of Claims: A Study of Poisson Process Models with Random Effects," Journal of the American Statistical Association, 86, 285-295. Hastie, T.J. and Tibshirani, R.J. (1990), Generalized Additive Models, New York: Chapman and Hall. Lambert, Diane (1992), "Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing," Technometrics, 34, 1-14. Lawless, J. F. (1987), "Regression Methods for Poisson Process Data," Journal of the American Statistical Association, 82, 808-815. Mudholkar, Govind S. and Srivastava, Deo Kumar (1993), "Exponentiated Weibull Family for Analyzing Bathtub FailureRate Data," IEEE Transactions on
Reliability, 42, 299-302. McCullagh, P. and Nelder, J.A. (1989), Generalized Linear Models, 2nd Edition, New York: Chapman and Hall. Robertson, Tim, Wright, F.T., and Dykstra, R.L. (1988), Order Restricted Statistical Inference, New York: John Wiley.