A Disaggregate Negative Binomial Regression ...

4 downloads 0 Views 404KB Size Report
If the Poisson regression model is deemed inappro- priate for the count data, alternative stochastic models may be utilized to accommodate overdispersion ...
A Disaggregate Negative Binomial Regression Procedure for Count Data Analysis Author(s): Venkatram Ramaswamy, Eugene W. Anderson, Wayne S. DeSarbo Source: Management Science, Vol. 40, No. 3 (Mar., 1994), pp. 405-417 Published by: INFORMS Stable URL: http://www.jstor.org/stable/2632807 Accessed: 05/08/2009 14:37 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=informs. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].

INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Management Science.

http://www.jstor.org

A

Disaggregate Negative Regression

Procedure Data

Binomial for

Count

Analysis

Venkatram Ramaswamy * Eugene W. Anderson * Wayne S. DeSarbo School of Business Administration, The University of Michigan, Ann Arbor, Michigan 48109

arious research areas face the methodological problems presented by nonnegative integer count data drawn from heterogeneous populations. We present a disaggregate negative binomial regression procedure for analysis of count data observed for a heterogeneous sample of cross-sections, possibly over some fixed time periods. This procedure simultaneously pools or groups cross-sections while estimating a separate negative binomial regression model for each group. An E-M algorithm is described within a maximum likelihood framework to estimate the group proportions, the group-specific regression coefficients, and the degree of overdispersion in event rates within each derived group. The proposed procedure is illustrated with count data entailing nonnegative integer counts of purchases (events) for a frequently bought consumer good. (Negative Binomial Regression; Count Data; Stochastic Models; Maximum Likelihood; E-M Algorithm) V

1. Introduction There are several instances in various research areas where nonnegative integer counts of particular events are observed for a number of cross-sections, possibly over different fixed time-periods. These include operations management (e.g., counts of nonconformities in quality control (Drury and Fox 1975); failure counts of industrial components (Moore and Beckman 1988)), corporate strategy (e.g., analysis of firm patents (Hausman et al. 1984)), industrial policy (e.g., analysis of interregional industrial movements (Twomey 1986)), health care management (e.g., analysis of patient visits (Cameron and Trivedi 1986); heart disease mortality (Lovett et al. 1986)), international planning (e.g., counts of international political events (King 1989)), direct marketing (e.g., analysis of book club purchases (Roberts and Berger 1989)), and several other applications of purchase counts in financial services (e.g., number of customer transactions), retailing (department store purchases), grocery products (Ehrenberg 1988), etc.

Researchers often wish to conduct regression analysis of such observed count data as a function of particular explanatory variables of interest. The classic regression model assumes that the dependent variable is continuous and ranges from - oo to + oo, and is generated by a random variable that follows a normal distribution. These assumptions are clearly violated with count data since the dependent variable is discrete and nonnegative, often with a preponderance of small values (Greene 1990, Maddala 1983). One alternative, which is consistent with the nature of observed counts, is to assume that a Poisson process generates the observed counts (Jorgenson 1961, Patil 1970), which results in the Poisson regression model (cf. Frome et al. 1973, Aickin 1985, Lawless 1987a). However, the Poisson regression model assumes that the variance of the counts is equal to the mean, which may not hold in certain situations. For instance, in analyzing counts of grocery product purchases, the variance is typically larger than the mean (Ehrenberg 1988). This situation is termed as "over-

0025-1909/94/4003/0405$01.25 Copyright ? 1994, The Institute of Management Sciences

MANAGEMENT

SCIENCE/VOL

40, No. 3, March

1994

405

RAMASWAMY, ANDERSON, AND DESARBO A DisaggregateNegative Binomial RegressionProcedure

dispersion" (see Cox 1983, McCullagh and Nelder 1989). As Cameron and Trivedi (1990) note, the presence of overdispersion has consequences similar to those for heteroscedasticity in the classical linear regression model, i.e., the variances of the parameter estimates are inconsistently estimated and invalidate the usual hypothesis tests. Cameron and Trivedi (1990) discuss some regression-based tests for overdispersion in the Poisson model, which can be utilized to test the appropriateness of the Poisson regression model. If the Poisson regression model is deemed inappropriate for the count data, alternative stochastic models may be utilized to accommodate overdispersion (Cameron and Trivedi 1986, Hausman et al. 1984). The negative binomial regression model is perhaps the most popular, in part due to its simplicity (e.g., Bain and Wright 1982, Dhavale 1989, Lawless 1987b, Morrison and Schmittlein 1988, Schmittlein et al. 1985), although researchers have also suggested other modified count data models (e.g., Mullahy 1986). The negative binomial regression model has an additional dispersion parameter which captures the extent of overdispersion in the mean event rates. However, in applications of the negative binomial regression model that typically involve purely crosssectional count data (i.e., one observation for each crosssection), a single set of regression coefficients is estimated for the entire sample of cross-sections. While this may be justified if one is only interested in aggregatelevel estimates, it may be inadequate and potentially misleading if there is considerable cross-sectional heterogeneity in the data. For example, in analyzing count data entailing purchases of a nondurable consumer good for a cross-section of consumers, the sample may be comprised of distinct groups of consumers with different purchase patterns. In these circumstances, the mean event rates or, alternatively, the impact of the explanatory variables may vary for each group of consumers. This requires some form of disaggregation of the sample into groups, and the estimation of distinct negative binomial regression models for each group. One alternative for performing such disaggregate analysis is to pool or group together cross-sections on some a priori basis. Hence, those cross-sections which one believes in advance to be very similar with regard to their respective regression coefficients and degree of

406

overdispersion are grouped together. Separate negative binomial regression models can then be estimated for each group. For instance, in the case of grocery products, consumers may be grouped according to various characteristics such as age, geographical region of residence, level of consumption, etc. However, choosing the characteristics that are associated with differences in the parameters of the negative binomial regression model is, in practice, difficult and requires strong theoretical guidance. Further, the relevant descriptive data for each of the cross-sections must be available to implement the a priori pooling or grouping of cross-sections. Another alternative, if sufficient time-series data are available for each cross-section, is to estimate a separate negative binomial regression model for each individual cross-section. These individual-level estimates can then be subject to traditional cluster analysis to form clusters (groups) of cross-sections and obtain group-level disaggregate estimates. A major limitation of this approach is that different clustering methods (Hartigan 1975) may produce different cluster (group) results; further, the regression and cluster analysis optimize unrelated objective functions/ aspects of the data. More importantly, this two-step approach becomes unreliable if sufficient time-series observations are unavailable for each crosssection, and is obviously infeasible if the count data are purely cross-sectional. In this paper, we present a maximum likelihood procedure for empirically deriving groups of cross-sections based directly on the distributional parameters of the negative binomial regression model. This "disaggregate" negative binomial regression procedure simultaneously estimates the group sizes and the parameters of the negative binomial regression model for each of the derived groups, all to optimize a common objective function. Our procedure may be gainfully utilized for analyzing heterogeneous cross-sectional time-series count data, as well as purely cross-sectional count data. It does not require the formation of a priori groups or the use of ancillary group analyses. The proposed procedure assumes that the observed count data are drawn from a finite mixture of negative binomial distributions, where these distributions differ in their mean event rates (or alternatively, the regression component), and dispersion parameters. These different distributions are unobserved however, and an E-M al-

MANAGEMENT SCIENCE/VOl.

40, No. 3, March

1994

RAMASWAMY, ANDERSON, AND DESARBO A DisaggregateNegative Binomial RegressionProcedure

gorithm is described within a maximum likelihood framework to estimate the "latent" group proportions, the group-specific regression coefficients, and the degree of overdispersion in event rates within each derived group. We illustrate the proposed procedure by using count data entailing purchases (events) for a frequentlybought consumer good, observed for a cross-sectional panel of consumers over some fixed time periods. 2. The Procedure Let i = 1, . . . , I cross-sectional units; t = 1, . . ., T fixed time periods; j = 1, . . ., J explanatory variables; Xijt = the value of explanatory variable j for crosssection i in time period t; and Yit= a nonnegative integer count of an event for crosssection i in time period t. The classical linear regression model is specified as: Yi

=

Xi:

+ si,

=exp

( - Xi,) Xyit

(1 + Y)

(2)

where Xit is the mean event rate for cross-section i in time period t. The Poisson regression equivalent of the normal regression in (1) is obtained by reparameterizing Xi, as a function of appropriate explanatory variables.

MANAGEMENTSCIENCE/VOL 40, No. 3, March 1994

Xit =

exp(1jXijtoj)3.

(3)

This Poisson regression specification is restrictive in terms of capturing the variance of the observed counts as the conditional mean and variance of Yit, given the explanatory variables, are necessarily equal. However, count data sets in many applications often exhibit overdispersion wherein the variance of the counts is greater than the mean. This overdispersion cannot be accounted for by the traditional Poisson regression specification. To assess the appropriateness of the Poisson regression model for a particular count data set, Cameron and Trivedi (1990) have recently proposed an optimal regression-based test for overdispersion in the Poisson model. Under their testing framework, the null hypothesis is that the variance and mean are equal, i.e., Var [Yit]= Xit. An alternative hypothesis is

(1)

where Yi is a T X 1 column vector containing the discrete values of the dependent variable, Xi is a T X J matrix of explanatory variables, : is a J X 1 column vector of regression coefficients, and Eiis a T X 1 normal random disturbance vector. However, since Yi consists of nonnegative integer counts, the traditional assumption of a normal distribution is inappropriate (cf. Maddala 1983). While the normal distribution may give a reasonable approximation to the true underlying discrete distribution when the counts are large, the use of the classical regression model becomes inappropriate if one observes a preponderance of zero values and small counts (cf. Greene 1990). A type of regression analysis that is more appropriate for counts is Poisson regression (cf. Frome et al. 1973, Jorgenson 1961). The Poisson distribution characterizes the probability of observing Yitas: P[Yit]

For a nonnegative mean rate Xit, the regression component is specified as:

Var [Yit]= Xit[l +

aXit].

The optimal test involves the computation of a test statistic, Topt = [g'Y--lg]-1/2g

'2;-lz,

where g is an IT X 1 vector with each element = it is an IT X IT diagonal matrix with each diagonal element = 2X , and z is an IT X 1 vector with each element = ((Yit it)2 - Yit). This test statistic is distributed as N(O, 1) under the null hypothesis and is asymptotically equivalent to the t-test of the coefficient obtained via a weighted least squares regression of z on g (see Cameron and Trivedi 1990). The magnitude of this coefficient may be interpreted as a measure of the extent of overdispersion. Assuming that the above test suggests that the Poisson regression model is inappropriate, then overdispersion can be modeled within the framework of the Poisson regression model, by introducing a stochastic component (vit)into expression (3). This stochastic component introduces intrinsic randomness in the mean event rate for each cross-section i in each time period t, such that Xit = exp(1jXijt0)

+ vit.

(4)

407

RAMASWAMY, ANDERSON, AND DESARBO A DisaggregateNegative Binomial RegressionProcedure

Denoting the probability density function for vPihas h (vi,), the marginal density of Yitcan be expressed as: P[YitIXi, A

,

|P[YitIXi, A, PitIh (pVit) dPit =

(

exp(-exp(lXjijtoj J {

F(1

+ vit))

+ Yit)

X exp(21jXijtfl + vit)Yith(vit)dvit} .

(5)

Depending upon the parametric form imposed via h (vit), various compound Poisson regression models can be derived. The negative binomial regression model arises if we assume that h (vit), or equivalently, the distribution of Xit, f (it), follows a Gamma distribution.' In particular, we use the 'index' parameterization of the Gamma distribution (McCullagh and Nelder 1983, p. ( ) with density: 150) and assume that Xit - Gamma(oit, (bXit)6exp (-36Xit/ oi) I I IF(b6) S,ito

f(i)=

6

)

where oit = 21jXijto and 6 is the index or precision parameter. The expected mean event rate is E[Xit] = Oit and Var [Xit] = i2 /. Hence, the marginal density of Yitgiven the parameters A, 6 and the explanatory variables Xi, is given by: P[Yit IXi, A, a]

I itIf (Xit) dXit P[Yif + Yit) ~~(7)F(1 + r(p

Yit)r(b)

6

I

oit

Y~~~~~~) + Oit- Lb+ _a

1Yit

oit-

E[Yit]= 1jXijtO4and Var [Yit]= (1jXj1jt ))(1 + 21jXjtOj1/b).

the formulation in expression (7) reduces to a Poisson regression model with Xi, = oit = IjXij31. Since 6 varies depending upon the amount of overdispersion in the data, we refer to 6 as a dispersion parameter (the degree of overdispersion is high when 6 is small). Note that expression (9) is the basis for formulating the alternative hypothesis for the overdispersion test discussed earlier. The negative binomial regression model in (7) however, implies a common parameter vector : and 6 for all the cross-sections. To capture unobserved cross-sectional heterogeneity in these parameters across crosssections, one can specify a negative binomial regression model at a disaggregate level via multiple groups of cross-sections, and estimate separate parameters for each group. One approach would be to form these multiple groups a priori, although this has limitations as indicated in the introduction. Alternatively, we attempt to form these multiple groups empirically while estimating regression coefficients for each group simultaneously, according to some objective (data driven) criteria. Hence, we posit the existence of K "latent" groups such that the structural relation within each group is described by group-specific regression coefficients /k and a dispersion parameter bk. The regression coefficients, as well as the dispersion parameter, are allowed to vary across the K groups so as to capture the unobserved cross-sectional heterogeneity in the data. The potential space of parameter variation is characterized by these "latent" groups along with the membership probabilities of each cross-section into these groups. Formally, we postulate that the discrete dependent vector Y = [Y1,. . . , Y, ]' is distributed as a finite mixture of negative binomial densities, conditional on K groups of cross-sections:

(8) (9)

From expression (9), the ratio of variance to mean for the observed counts is linear in the mean which accommodates overdispersion in the data. As 6 -x ce,

P[Y]

= z2kWkP[Yi

zkW k1t

(10)

bk]

rF(1 + Yit) F(bk)

F (6k (k

408

1k,

F((3k + yit) =

X [ ' The negative binomial model can be motivated and derived in different ways (cf. Greenwood and Yule 1920, Anscombe 1950, Boswell and Patil 1970).

IXi,

+k(itk

Olitk =

16kF

6[0*]i

J

itk

Yit

where (1 1)

+ kijtkJ

(11)

(12)

IjXiftf]k,

MANAGEMENT SCIENCE/VOl.

40, No. 3,

March

1994

RAMASWAMY, ANDERSON, AND DESARBO A DisaggregateNegative Binomial RegressionProcedure

and Wkis a mixing proportion such that O WK =

1 -

kWk

(13)

and

W c 1 Wk

(k = 1, . . . , K - 1).

(14)

.. , 3K)', where k is a J X 1 vector of Let B = (1, regression coefficients for group k, and A = (61, 6K)', where bk iS the dispersion parameter for group k. Note that the parameters are posited to vary across groups. The mixing proportion Wkcan be construed as the prior probability of any cross-section belonging to group k. The conditional specification in (11) implies that if a cross-section belongs to group k, then the structure of the relationship between the explanatory variables and the discrete dependent variable is represented by the parameter vector k and the dispersion parameter bk 2 The empirical derivation of the number of groups, group membership of cross-sections, and the parameter vectors for each group, are discussed in the next section.

3. Estimation Given a sample of I cross-sections, we can form the likelihood expression: L=

IXi,

lliikWkP[Yi

x

bk

[k +

+

Yit)

1k'F )itk]

)itk

Lbk +

Yi (15)

itkj

Using Bayes' rule, the posterior probabilities of membership in the groups (for each cross-section) can be computed as: Pik = Prob (k; B, AIYi) WkP[Yi

I Xi,

1k,

bk]/

{ 2kWkP[Yi

IXi,

/k,

bk]}*

(16)

Hence, upon estimation of the w, f, and A parameters, the posterior probabilities provide a "fuzzy" grouping of the I cross-sections into K groups. Each 2 The identifiability of such finite mixtures, in general, has been dis-

cussed extensively by researchers (cf. Chandra 1977, Teicher 1963, Yakowitz and Spragins 1968, and Titterington et al. 1985, pp. 35-42, for a discussion of necessary and sufficient conditions). Yakowitz and Spragins (1968) provide a proof for the identifiability of a mixture of all nondegenerate negative binomial distributions.

MANAGEMENTSCIENCE/VOl. 40, No. 3, March 1994

of a maximum likeliEM algorithm (see a formulation, we inmembership data via

1 iff cross-section i belongs to group k; 0

+ Yit)F(bk)

li~kWklrF(

An EM Framework We discuss the implementation hood procedure utilizing the Dempster et al. 1977). For such troduce the nonobserved group the indicator function:

1k, bk]

F(bk =

cross-section can therefore be characterized by more than one negative binomial regression function depending upon the estimated posterior probabilities (Pi1 . , PiK) of belonging to the K groups. Given Y, X, and a value of K, we need to estimate the following free parameters: the K - 1 independent mixing proportions w = (w1, . . ., WK-1), the groupspecific regression coefficients B = (i1, . ., O3K)', and dispersion parameters A = (61, . . ., bK)Y so as to maximize the likelihood L (or equivalently ln L), subject to the conditions specified in (13) and (14). Note that our formulation accommodates latent class Poisson regression models (Wedel et al. 1993) as a special case (hK -> cx). We now describe a maximum likelihood estimation procedure for solving this constrained optimization problem.

otherwise.

The column vector zi is defined as (zi1, . . Z,ziK)' and the matrix Z = (z1, . . . , z)'. We assume that for a particular cross-section i, the nonobserved data zi are independently and identically multinomially distributed with probabilities w, such that: P[Zi Iw] = HlkWkk,

(18)

where P[Zi Iw] is the likelihood of observing the group membership vector zi for each cross-section i, given the group proportions w. The joint density of Yi and zi (i.e., the "complete" data) is: P[Yi, ZiIXi, B, 1A]= P[zi; w IP[YiIXi, B, A|Izi I Z IwkP[Y

IXi, 1k,

Zkik.

(19)

The complete likelihood function over all cross-sections now becomes: LC=

IX,,k,bkf

lliHkWkkP[Yi

]

(20)

and the corresponding ln likelihood is: ln LC=

2i

kZik

ln

Wk +

2i

kZik

ln P[Yi lXi, /k,

bk].

(21)

409

RAMASWAMY, ANDERSON, AND DESARBO A DisaggregateNegative Binomial RegressionProcedure

With the matrix Z considered as missing data, an EM algorithm amounts to iteratively alternating between an E-step (expectation step) and a M-step (maximization step). The E-step. In the E-step, the expectation of ln LC is evaluated over the conditional distribution of the nonobserved data Z, given the observed count data Y with provisional estimates w(*), B (*), and A (*) of the parameters w, B, and A respectively. This conditional expectation is: A W*)

EJ In LCIw (*), B*, =

ln wk

i2kE[ZikIYi]

+ 2i2kE[ZikIYi] ln P[YiIXi,f

(22)

k,)k]

negativity restriction on A. To ensure the nonnegativity of A, we cast the dispersion parameters in the form bk = exp(0k) and estimate 0 = (01, . . ., OK) instead.3 The constraints (13) and (14) are enforced by forming the augmented function: (

Y,,P,*. In P[Yi lXi, Ah,

+

Hw Y IkW k *P[[i|X,

IX, f3k(*),/

4*)] k ]Zik

Wk

P[ Y,I lX,

0Ak

, 6

kP[Yi /1k{Wk

Xifk(*) P[Yi

Z]}ik, Zik

(3 can be

a+dws (=

(p'(*)/Wk ?k0.)

=

T,P(*),InPW=0 d

k),

*k ]}'

(24)

0I/d(*

kPik

,

oh0, ]Iaojk

ln w *) ln P[Yj IXi,

0,

(31)

where the gradients of the ln likelihood function are:

d ln P[Yi IXi, x

ln P[Yi |Xi, =

A *), o -t'th

6 (*it h*), o

]/0fI1':' )/(bh

a]/O (*)

Yit(/(*+n-

=[E(1/ (6

(32)

+

+ n - 1)))-In

+( [Xt'tk + Yit ) (k+itk()

W W 'k) /6 k

( (6 k+

/ 6

(33)

*

The stationary equations obtained by substituting expressions (32) and (33) into equations (30) and (31) do not yield closed-form solutions for the respective fk*,

a*)],

(26)

The M-Step. In the M-step, Ez[lnLc;w(*), B(*), A (*)] needs to be maximized with respect to w, B, and A, subject to constraints (13) and (14), as well as a non-

410

Akh

)aP,* In P[Yj IXi, Ah,*) (h]/a6*

= =

25)

ik ,(

EJIn LCIw *W,B(*), I\ W*

+ 2i

ln P[Yj IXi,

iPk

=

iykPik

(29)

ik

(30)

where P(*) denotes the posterior probability evaluated with provisional estimates w( *), B (*), and A (*). Thus, in the E-step, the nonobserved discrete data Z are replaced by these posterior probabilities on the basis of provisional parameter estimates, and expression (22) is computed as:

=

(28)

-

,k)]

IXi,

1),

The stationary equations concerning B and 0 are: (*)D a jk=

which is identical to the posterior probability Pik in expression (16) computed using the provisional estimates w( *), B (*), and A (*). Consequently, E [Zik I Yi I

-

(kW

Summing both sides of (28) over k yields:

and hence, the conditional expectation of computed as: E[ZikIY] wi =

-

(27)

wk /k

oh]

where - is the corresponding Lagrange multiplier. The resulting maximum likelihood stationary equations are obtained by equating the first-order partial derivatives of 4?to zero. The stationary equations concerning w are:

Using Bayes' rule and expression (19), we can derive the conditional distribution of Zik as: P[zkIY,] I Yi ] = P[ Zik

In w'*)

lilkPik

For ease of exposition, we will continue to refer to bk as the dispersion "parameter" although strictly speaking, Ok iS the parameter that is actualty estimated.

MANAGEMENT SCIENCE/VOl.

40, No. 3,

March

1994

RAMASWAMY, ANDERSON, AND DESARBO A DisaggregateNegative Binomial RegressionProcedure

parameters. A gradient-based search procedure is utilized in the M-step (Press et al. 1989) wherein expressions (32) and (33) are used to compute the gradients analytically. The parameter estimates resulting from the M-step are used in the subsequent E-step to compute new estimates of the nonobserved data Z. The new estimate of Z is used in the subsequent M-step to arrive at new estimates of w, B, and A&.The E-step and the M-step are successively applied until no further improvement in the ln likelihood function is possible based on a specified convergence criterion. Dempster et al. (1977) provide a relatively simple proof based on Jensen's inequalitythat ln L(s+) > ln L(s) (monotonic increasing) so that convergence to at least a locally optimum solution can be proven using a limiting sums argument. Boyles (1983) and Wu (1983) provide a discussion of the convergence properties of the EM algorithm. In general, we find that estimating the regression coefficients B by fixing A\,and then letting A\vary, speeds up the convergence of the M-step. Schematically, the EM algorithm proposed above may be described as follows: (i) Initialize the iteration index s; s 0. Specify initial estimates w(0)I B(), and \ (0). (ii) Compute Z (s) with zik = pik (iii) Obtain revised estimates B(s?l), A(s+) for the K groups. (iv) Compute a new estimate w(S+l) of w using expression (29). (v) Test for convergence. If change in the ln likelihood from iteration (s) to (s + 1) is smaller than some positive constant, stop. (vi) Increment iteration index s: s -- i + 1; return to step (ii). Hence, upon convergence, we obtain estimates of the group proportions Wk, the group-specific regression coefficients /3k, and dispersion parameter 6k for each derived group k. An asymptotic estimate of the variancecovariance matrix of the parameters is obtained using the inverse of the estimated Hessian matrix. -

Number of Groups Since the number of groups is rarely known in practice, the estimation algorithm must be run for a varying number of groups K. Bozdogan and Sclove (1984) pro-

MANAGEMENT SCIENCE/VOl.

40, No. 3, March 1994

pose using Akaike's (1984) information criterion (AIC) for choosing the number of groups (the value of K) in mixture models. Accordingly, the value of K is the solution which minimizes AICK = -2 ln L + 2NK,

(34)

where NKis the number of free parameters (in the full model): NK=(K-1)+K(J+

1),

(35)

given no additional restrictions on any of the parameters. More recently however, Bozdogan ( 1987) proposed that researchers use the CAIC (consistent AIC) as a heuristic since it penalizes overparameterization more strongly than the AIC. This CAIC statistic is computed as: CAICK= -2 ln L + NK(ln IT + 1).

(36)

We utilize the CAIC as the measure of choice here since it is more conservative than the AIC. The CAIC should be used especially when the data entail a large number of observations (Bozdogan 1987). While the CAIC accounts for overparameterization as more number of groups are derived, one must also ensure that the group centroids are sufficiently separated for the solution that is chosen. To assess the separation of the groups (when K > 1), we utilize an entropybased measure to examine the degree of fuzziness in group membership based on the posterior probabilities (Ramaswamy et al. 1993): EK= 1

-

[2iZk

- Pik ln Pik]/I

ln K.

(37)

EKis a relative measure that is bounded between 0 and 1. Given K groups, EK= 0 when all the posterior probabilities are equal for each cross-section (maximum entropy). A value of EK very close to zero is cause for concern as it implies that the centroids of the conditional parametric distributions are not sufficiently separated for the particular number of groups that have been derived.

4. An Illustrative Application with Purchase Count Data We provide an illustrative application of the disaggregate negative binomial regression procedure for count data analysis. We demonstrate the use of the proposed procedure using count data pertaining to the unit num-

411

RAMASWAMY, ANDERSON, AND DESARBO A DisaggregateNegative BiniomialRegressioniProcedure

ber of purchases of a frequently purchased nondurable consumer good for a cross-sectional panel of households living in a large midwestern city. These count data cover purchases for 504 households that made at least one purchase over 12 monthly time periods beginning January 1985. Additional data such as the price paid and the specific item (e.g., brand) bought are also known for each household. In all, there were eight nationally advertised brands that accounted for over 75% of total purchase counts in the data set; since our objective here is to provide an illustrative application of the proposed procedure, we choose to restrict ourselves to count data representing these eight brands. We model the vector of the monthly number of units bought (Yi), as a function of explanatory variables (Xi) such as price, trend, and seasonality. Price is measured in cents per ounce. We utilize the price paid by each household in each time period (if a purchase was made) to compute the price per ounce for a brand. Hence, the price variable is actually a "promotional" price since it reflects cents-off promotions by both retailers and manufacturers. In a given month, if a household does not purchase a brand, the price variable is specified as the mean price per ounce paid by other panel households who bought the brand in that month. The trend variable represents each of the 12 months to capture any specific trends in total consumption over time. The seasonality variable is coded as '1' for the summer months, when consumption of this good is typically high, and 'O'otherwise. The trend and seasonality variables are examples of exogenous variables other than attributes of the good that can influence the magnitude of mean purchase rates. In addition to price, trend, and seasonality, we include dummy variables for the eight brands to account for differences in the mean purchase rates due to the type of good and its positioning in the marketplace. This particular specification is motivated by the work of Wagner and Taudes (1986, 1987). A summary of the descriptions for the eight brands appears in Table 1. The eight brands belong to three manufacturers (denoted as A, B, and C) with manufacturer A having four brands, and manufacturers B and C having two brands each. The brands of manufacturers A and B have national distribution and recognition, while the brands of manufacturer C have regional appeal due to the limited

412

Table1

Consumer GoodDescriptions Pricea

Brand

Type

(cents/oz.)

Market Shareb

Regular LowCal Regular LowCal Regular LowCal Regular LowCal

4.39 4.18 4.78 4.82 4.30 4.33 4.24 4.25

0.20 0.07 0.06 0.05 0.09 0.23 0.17 0.14

Manufacturer A A A A B B C C

Al A2 A3 A4 Bi B2 Cl C2 a

panelists. Averagepricepaidby purchasing I Shareof totalpurchasecountsin panel.

distribution low calorie

Table

(" low cal").

each

share

share

(i.e.,

of the eight

brands, Al

closely

by Brand

brands

A3 and A4 have

market

strong

count

The premium

20%.

small

counts.

of 31.5%

share

priced ac-

and together

shares

Cl

Brands reflecting

in the area from which

position

for

with a 23% share followed

with

a combined

and the

counts),

on the sample

based

and

the average

by the panelists

for 10% of the total sample

and C2 have were

1 also gives

of total purchase

data. Brand B2 is the leader

count

good

into two basic types: regular

price paid (in cents per ounce) market

consumer

These branded

of its goods.

items can be classified

a

the data

obtained.

Aggregate

Analyses

The results

of an aggregate

Poisson

regression

analysis

binomial

regression

analysis,

and an aggregate

negative

wherein

set of parameters

a single

cross-sections

binomial

negative

6 which

rameter

regression captures

are estimated

are shown

(households),

model

the degree

for all

in Table 2. The

has an extra paof overdispersion

in the data. of the Poisson

To test the adequacy we utilized

regression

model,

test for overdispersion

the regression-based

a t-ratio of 15.776 indiin ?2. We obtained cating that the Poisson regression model is inappropriate discussed

for the data. Note from the negative

binomial

results that the dispersion

6 = exp(6)

which persion sions

confirms

parameter

that a substantial

exists in the purchase (8)

and

(9),

the

count

amount

of overdis-

data. From expres-

variance-to-mean

MANAGEMENT SCIENCE/VOl.

regression = 0.056,

40, No. 3,

ratio

March

using

1994

RAMASWAMY, ANDERSON, AND DESARBO A DisaggregateNegative Binomial RegressioniProcedure

Table 2

Aggregate parameterEstimatesa

Parameter Price Trend Seasonality BrandAl BrandA2 BrandA3 BrandA4 BrandB1 BrandB2 BrandCl BrandC2

Poisson Regression -0.663*** (0.011) 0.055*** (0.003) 0.345*** (0.019) 1.434*** (0.043) 0.235*** (0.052) 0.407*** (0.058) 0.176*** (0.052) 0.516*** (0.037) 1.548*** (0.042) 1.196*** (0.044) 1.033*** (0.045)

0 Ln likelihood CAIC

-35553.650 71236.956

NegativeBinomial Regression -0.463*** (0.045) 0.042*** (0.007) 0.294*** (0.049) 0.680*** (0.213) -0.566*** (0.188) -0.430** (0.213) -0.603** (0.227) -0.165 (0.201) 0.781*** (0.214) 0.451** (0.209) 0.277 (0.205) -2.880*** (0.204) -20976.910 42095.263

a

Standarderrorsfor parameterestimates are givenin parentheses. ***p

Suggest Documents