Bayesian Approaches for Overdispersion in Generalized Linear Models

6 downloads 0 Views 242KB Size Report
porary statistical data analysis (McCullagh and Nelder 1989). The widely ... gle dispersion parameter in the variance function was discussed by Finney. (1971) ...
This is page 1 Printer: Opaque this

Bayesian Approaches for Overdispersion in Generalized Linear Models Dipak K. Dey Nalini Ravishanker

ABSTRACT Generalized linear models (GLM's) have been routinely used in statistical data analysis. The evolution of these models as well as details regarding model tting, model checking and inference is thoroughly documented in McCullagh and Nelder (1989). However, in many applications, heterogeneity in the observed samples is too large to be explained by the simple variance function which is implicit in GLM's. To overcome this, several parametric and nonparametric approaches for creating overdispersed generalized linear models (OGLM's) were developed. In this article, we summarize recent approaches to OGLM's, with special emphasis given to the Bayesian framework. We also discuss computational aspects of Bayesian model tting, model determination and inference through examples.

1 Introduction Generalized linear models (GLM) are a standard class of models in contemporary statistical data analysis (McCullagh and Nelder 1989). The widely available GLIM software as well as SPlus facilitate computation under these models. Bayesian tting of GLM's via Gibbs sampling is discussed in Dellaportas and Smith (1993). In GLM's, the underlying distribution of responses is assumed to be of the exponential family form, and a link function transformation of its expectation is modeled as a linear function of observed covariates, assuming that the variance of the response is a speci ed function of its mean. This allows modeling in various nonnormal situations such as the binomial, Poisson, negative binomial etc. However, in many applications, such a simple functional relationship is inadequate to handle the heterogeneity in the data; a common problem is the so-called overdispersion problem, where the variance of the response exceeds the nominal variance (Cox 1983). For instance, regression analysis of count data in biomedical research areas such as toxicology, epidemiology etc., must handle extra-Poisson variation. Count data analyzed under a Poisson assumption (Breslow 1984; Lawless 1987a,b; McCullagh and Nelder 1989, Sec. 6.2) often exhibit overdispersion. In modeling categorical or ordered

2

Dipak K. Dey, Nalini Ravishanker

categorical data in longitudinal studies, toxicity studies, or in biological experimental research, the use binomial or multinomial distributions with overdispersion is encountered. Crowder (1985) and Williams (1982) analyzed proportions using overdispersed binomial distributions. Overdispersion often results from latent heterogeneity, i.e., the sample of responses is drawn from a population consisting of many subpopulations. Historically, the presence of overdispersion prompted the need to de ne a wider class of models than the GLM, through nite mixture models (Everitt and Hand 1981; Titterington, Makov and Smith 1985), through exponential dispersion models, EDM (Jorgensen 1987), Efron's (1986) double exponential family models, through EDM's embedded within Efron's double exponential family (Ganio and Schafer 1992), or through the use of random terms in the linear predictor, leading to generalized linear mixed models, GLMM (Breslow and Clayton 1993). These classes are described in Section 2. Numerous methods have been proposed in the literature for estimation in overdispersed GLM's with particular emphasis given to binomial or Poisson models. The quasi-likelihood (QL) approach requires speci cation of the mean-variance relationship rather than a full likelihood function. If y is a random variable with mean  and variance V () (a known function), the QL for y is de ned as (Wedderburn 1974) R Q(; y) = y Vy?(tt) dt:

For a sequence of independent observations y~ = (y1 ;    ; yn ), the QL is P de ned as Q(~; y~) = ni=1 Q(i ; yi ), where ~ = (1 ;    ; n ). In general, if ~ is a function of p parameters ~ = (1 ;    ; p ), the QL estimates of ~ may be obtained by maximizing Q(~; y~) under mild conditions (see McCullagh and Nelder 1989). For the exponential family, the QL coincides with the log-likelihood, and therefore the estimation of ~ retains full asymptotic eciency (Nelder and Lee 1992). The QL approach for a model with a single dispersion parameter in the variance function was discussed by Finney (1971), McCullagh and Nelder (1989) and Wedderburn (1974). Moore and Tsiatis (1989) and Williams (1988) showed that the QL approach generally gave consistent estimates of the regression coecients even under a mispeci ed variance function. Wang (1996) presented a QL approach for ordered categorical data with overdispersion and illustrated using sh mortality data from quantal response experiments. The extended quasi-likelihood was de ned in Nelder and Pregibon (1987), using saddlepoint arguments. There has been some concern over the use of overly simple models for overdispersion in GLMs, especially in the context of the widely used binomial and Poisson models. To overcome this, it has been proposed that explanatory variables be included into the model for dispersion (see Carroll and Ruppert 1982; Efron 1986; Jorgensen 1987; McCullagh and Nelder 1989; Nelder and Pregibon 1987; Smyth 1989).

1. Bayesian Approaches for Overdispersion in Generalized Linear Models

3

The QL/M approach (also called the pseudolikelihood approach) uses only the mean and variance structure implied by a mixture model; the regression parameters are estimated by quasi-likelihood while the variance parameter is estimated by the method of moments. For Poisson counts, use of a gamma mixing distribution results in a negative binomial distribution for the observed data. The QL approach yields asymptotically ecient estimates of the regression coecients for the negative binomial model with xed shape parameter, although the variance estimation by the method of moments is less ecient (see Breslow 1984; Lawless 1987a; Williams 1982). Carroll and Ruppert (1982) discuss the QL/M approach for general heteroscedastic models. The penalized quasi-likelihood (PQL) was proposed as an approximate Bayes procedure for GLMM's (Laird 1978; Stiratelli, Laird and Ware 1984). The marginal quasi-likelihood (MQL) procedure, which is similar to Goldstein's (1991) apparoach for GLMM's with nested random e ects, is appropriate when interest is focused on the marginal relationship between covariates and response (Liang, Zeger and Qaqish 1992). Both PQL and MQL may be implemented by standard software for variance components analysis of normally distributed responses and provide shrinkage estimates of random error terms and for estimates of the variance components. Breslow and Clayton (1993) used PQL and MQL to analyze data on seed germination (Crowder 1978) under an overdispersed binomial assumption and a GLMM with a linear predictor including a random e ect. They observed that there was little di erence between PQL and MQL form a practical viewpoint; a major distinction between the two methods is that whereas the regression estimates of the former depend strongly on the estimated variance components under a non-identity link, those of the latter do not. The PQL also has a limitation in that inferences on variance components are not very accurate and it fails to account for the contribution of estimated variance components when assessing uncertainty in both xed and random e ects (the latter problem exists in empirical Bayes methods as well). They also point out that use of PQL and MQL is more natural for simple overdispersion problems involving a log-linear, conditionally Poisson model than a competing approach in Liang and Waclawiw (1990). Zeger (1988) considered a time series model for the random e ects in a simple overdispersion model for count data. Recent Bayesian procedures for GLLM models have used importance sampling ideas (Raghunathan 1994) or Gibbs sampling techniques (Besag, York and Mollie 1991; Zeger and Karim 1991). A general estimating equation approach (Diggle, Liang and Zeger 1994) provides nearly ecient estimation relative to maximum likelihood estimates in overdispersion problems. Efron (1992) discussed asymetric maximum likelihood (AML) estimation of overdispersed generalized linear regression models, focusing on Poisson regression with application to an archeological data set. The AML approach is easy to implement, does not require explicit speci cation of a model for overdispersion and can be

4

Dipak K. Dey, Nalini Ravishanker

a good starting point for a more complete parametric analysis based on quasi-likelihood models, double exponential families or the overdispersion models (Gelfand and Dalal 1990; Dey, Gelfand and Peng 1997). Several tests for overdispersion have been suggested in the literature. A test for Poisson overdispersion is given by Bohning (1994). Dean (1992) discussed tests for overdispersion with respect to a natural exponential family, which are powerful against arbitrary alternative mixture models with just the rst two moments of the mixture distributions speci ed. Breslow (1990) derived tests of hypotheses in overdispersed Poisson regression and other quasi-likelihood models. The format of the rest of this paper is as follows. We review various models for handling overdispersion in Section 2, parametric approches for Bayesian model tting using MCMC techniques in Section 3 and nonparametric approaches in Section 4. Section 5 discusses extension to multistage modeling.

2 Classes of Overdispersed General Linear Models Unfortunately, nite parametric mixtures are awkward to work with and in many cases it is unclear how many components should be used. Further, even for a given number of components, inconvenient constraints on the model are necessary to insure identi ability, when the mixing weights and the parameters of the component densities in the mixture are unknown. Although continuous mixtures alleviate these problems to a certain extent, once again the choice of a mixing distribution is a problem. Mixture models, especially nite mixture models have been most frequently used for creating a larger class of models for overdispersion. For instance, the one parameter exponential family de ning the GLM is mixed with a two parameter exponential family for the canonical parameter  (or equivalently the mean parameter ), resulting in a two parameter marginal mixture family for the data. For the Poisson case, the gamma mixing distribution has been widely used, resulting in a negative binomial distribution for the observed data (Manton, Woodbury, and Stallard 1981 and Margolin, Kaplan and Zeiger 1981) while Hinde (1982) proposed a log-normal mixing distribution. Shaked (1980) showed that such mixing necessarily in ates the model variance, while Gelfand and Dalal (1990) argued that taking additional observations within a population does not provide more information about heterogeneity across populations. Finite parametric mixture models have another drawback in that the resulting overdispersed family of mixture models will no longer be an exponential family and consequently would be awkward to work with. In many cases, it is unclear how many components should be used. Further, even for a given number of components, inconvenient constraints on the model are necessary to insure identi ability when

1. Bayesian Approaches for Overdispersion in Generalized Linear Models

5

the mixing weights and the parameters of the component densities in the mixture are unknown. Although continuous mixtures alleviate these problems to a certain extent, once again the choice of a mixing distribution is a problem. For a given one parameter exponential family , Gelfand and Dalal (1990) considered a class of two parameter exponential family of models f (y j ;  ) = b(y)ey+T (y)?(; )

(1.1)

where depending on whether y is continuous or discrete, f is assumed to be a density with respect to Lebesgue measure or counting measure respectively. They showed that under the assumption that (1.1) is integrable over y  y , and T (y ) is convex, then for a common mean, var(y ) increases in  . It is presumed that the natural parameter space contains a two dimensional rectangle which, by translation, can be taken to contain the line  = 0. The associated one parameter exponential family which is obtained at  = 0 has the form f (y j ) = b(y)ey?() (1.2) with () = (; 0). As  increases from 0, var(y) increases relative to the variance under the associated one parameter exponential family so that  describes the overdispersion. A GLM is usually developed from the form of the one parameter exponential family (1.2). In particular   E (y) =  (), var(y) =  () = V (), the variance function. Here  () is strictly increasing in  so that  and ?1  are one-to-one ( = ( ) ()). A link function g is de ned, which is a strictly increasing di erentiable transformation from  to R1 , so that g() =  = x~T ~, where x~ and ~ are, respectively, a p  1 vector of known explanatory variables and an unknown vector of model parameters. Efron (1986) presented an alternative approach through so-called double exponential families. Such families are derived as the saddle point approximation to the density of an average of n random variables from a one parameter exponential family for large n . The parameter n written suggestively by Efron as n, 0 <  < 1 for actual sample size n, introduces  as a second parameter in the model along with the canonical parameter . The density corresponding to the double exponential family is (1.3) f(y j ; ; n) = c(; ; n) en(y?())+n(1?)((y)y?((y))) 0

00

0

0

1 2

?1

where (y) = ( ) (y), y may be viewed as an average of n i.i.d. random variables,  is the canonical parameter and  is a dispersion parameter. For regression problems, Efron assumed that  = x~T ~ and that  = h(~z T ~ ) for a suitable h, where z~ is a known q  1 vector and ~ is an unknown parameter vector. Using various expansions, he showed that (1.3) permits attractive approximation as n increases. Most notably, f behaves like (1.2) with w = n and  = ?1 . 0

6

Dipak K. Dey, Nalini Ravishanker

Model (1.2) is extended to an exponential dispersion model, EDM (Jorgensen 1987) by incorporating a dispersion parameter , to give f (y j ; ) = b(y; )ew(y?())=; (1.4) where w is a known \sample size"; here var(y) = V ()=n. Whereas (1.1) is a customary two parameter exponential family, (1.4) is a one parameter family for each xed . The ensuing problem was circumvented in an approximate fashion by Ganio and Schafer (1992) who considered the EDM as embedded in Efron's double exponential family and appealed to the associated asymptotic inference. Unlike the mixture case, these asymptotics result in overdispersion relative to the original exponential family which approaches a constant as n ! 1 (Efron 1986; Gelfand and Dalal 1990). Speci cally, Ganio and Schafer directly set  = h(~z T ~) in (1.3). Assuming that z~T ~ includes an intercept, they t double exponential family models by the method of maximum likelihood and using an approximation to (1.3). Note that the Normal or Gamma families have scale parameters and hence can be written as EDM's. Gelfand and Dalal (1990) argued that an appeal to asymptotics is not necessary to justify these models. Speci cally, (1.1) not only includes Efron's model as a special case, but also a family discussed by Lindsay (1986). This is attractive because retaining the exponential family structure simpli es inference and the relative overdispersion behaves as in Efron's model. Note that regardless of n, (1.3) is of the form (1.1) with T (y) = (y)y ? ((y)),  = n(1 ? ) and  = n. Straightforward calculation shows this T (y) is convex so that, in fact, (1.3) is a special case of (1.1). Although Gelfand and Dalal suggested that with the speci cation of link functions, these two parameters could each be given the usual GLM structure, they did not pursue the matter. These models, which are referred to as overdispersed generalized linear models (OGLM's), were fully examined by Dey et al. (1997). They assumed independent responses yi with associated covariates x~i , and z~i , i = 1; 2; ::; n, where the components of x~ and z~ need not be exclusive. Let y~ = (y1 ;    ; yn ) and de ne i = g(~xTi ~) and i = h(~ziT ~) where g and h are strictly increasing. The corresponding joint density is from (1.1) ~ ~) = f (~yj ;

n Y i=1

e y + T (y )?( ; ) : i

i

i

i

i

(1.5)

i

The monotonicity of g and h is natural and insures that i is monotonic in xil and that i is monotonic in zil , facilitating interpretation. Of course such monotonicity does not imply that, e.g., i is monotone in each covariate. The form x~Ti ~ allows a covariate to enter as a polynomial. If an explanatory variable appears in x~i but not in z~i , say xil , @ = (2;0) ( ;  )g (~ xTi ~) l . But since (2;0) and g are strictly posthen @x i i itive, i is strictly monotone in xil with the sign of l determining the i

il

0

0

1. Bayesian Approaches for Overdispersion in Generalized Linear Models

7

direction. If the variable appears in z~i as well, its in uence on i is less @ involves (1;1) , the covariance between y and T (y ). clear, since now @x A practical drawback to working with (1.1) is that (;  ) is not available explicitly. While () in (1.2) is usually an explicit function of , R (;  ) = log b(y)ey+T (y)dy usually requires a univariate numerical integration or summation. Dey et al. (1997) mentioned that this was not a problem in the practical examples they investigated. Gelfand and Dalal (1990) argued that the performance of (1.1) is not sensitive to the choice of T (y). Attention should focus on the speci cation of i and i rather than on the stochastic mechanism (1.1). Dey, Peng and Larose (1995) modeled heterogeneity and overdispersion through the notion of a parametrized weighted distribution. Suppose the random variable of interest Y is distributed over a population of interest with probability density f (yj), where  is the underlying parameter of interest. The observed data is now a random sample from the following weighted distribution with density function w(y;  )f (yj) f w (yj;  ) = ; (1.6) Ef [w(y;  )] where the expectation in the denominator is the normalizing constant. The overdispersed model is viewed as a perturbation of the original model, so that the latter can be formed as a weighted distribution of the form (1.6). These models are regular and parsimonious compared to a GLM and enable the capture of overdispersion within an exponential family framework. Section 3 describes tting (1.5) in a parametric setting while Section 4 describes a nonparametric framework using MCMC techniques. i

il

3 Fitting OGLM in the Parametric Bayesian Framework Dey et al. (1997) examined inference for OGLM's using Markov chain Monte Carlo methods. The advantages of their approach include the familiarity of exponential families, the ready interpretation of model parameters, the uni cation of modeling by absorbing earlier cases and exact inference rather than inference based on asymptotic approximations. They adopted a Bayesian perspective in tting these models but used noninformative prior speci cations since primary concern lies in the modeling incorporated in the likelihood in (1.5). Hence inference will be close to that arising from maximum likelihood though estimates of variability will be exact rather than asymptotic and entire posteriors for model parameters would result. Required Bayesian computation is handled through Markov chain Monte Carlo using a Metropolis algorithm resulting in samples essentially from the joint posterior distribution which may be summarized to provide any

8

Dipak K. Dey, Nalini Ravishanker

desired inference. Such samples may also be used as the starting point for sampling from predictive distributions to investigate questions of model adequacy and model choice.

3.1 Model Fitting

The Fisher information matrix associated with (1.5) is of interest in Bayesian model tting since the square root of the determinant of this matrix, as a function of ~ and ~, is known as Je reys' prior and is commonly used as a \noninformative" speci cation. Denoting the right side of (1.5) as ~ ~; y~), Dey et al. (1997) showed that L( ; ! X ~ ~; y~) @ 2 logL( ; (2;0) (i ; i )xij xik (g (~xTi ~))2 ; = ? E @ ~i @ ~j i ! 2 X ~ ~; y~) @ logL( ; (0;2) (i ; i )zij zik (h (~ziT ~))2 ; = ? E @ ~i @ ~j i ! 2 ~ X @ logL( ; ~; y~) E = ? (1;1) (i ; i )xij zik (g (~xTi ~))(h (~ziT ~)): ~ @ j @ ~k i Let X~ denote the n  p design matrix arising from the x~i s, Z~ the n  q design matrix arising from the z~i s, M~  an n  n diagonal matrix with (M~  )ii = (2;0) (i ; i )(g (~xTi ~))2 , M~  an n  n diagonal matrix with (M~  )ii = (0;2) (i ; i )(h (~ziT ~ ))2 and M~ ; an n  n diagonal matrix with (M~ ; )ii = (1;1)(i ; i )(g (~xTi ~))(h (~ziT ~)). Then   T T ~ ~) = X~~T M~~  X~ ~ X~ ~TM~~; Z~~ (1.7) I ( ; Z M; X Z M Z 0

0

0

0

0

0

0

0

0

0

~ ~)j . Computation of (1.7) requires calculation and Je reys' prior is jI ( ; of , (2;0) , (0;2) and (1R;1) which in turn requires numerical integration or summation of the form yc T (y)d b(y)ey+T (y)dy for various c's and d's. Dey et al. showed that the posterior is proper under Je reys' prior ~ ~; y~) is log concave. Log conor a at prior for ~ and ~, provided L( ; ~ ~0; Y~ ) is discussed in Wedderburn (1976) in the context of cavity of L( ; properties of MLE's and in Dellaportas and Smith (1993) with respect to simplifying Monte Carlo sampling of ~. In particular, for OGLM Dey et al established log concavity by verifying a simple nonnegative de niteness condition. Letting  = g();  = h( ) they considered the func@ a  0; ? @ a  0 tion a(; ) = ()y +  ( )T (y) ? (();  ( )). If ? @ @ 1 2

2

2

and



@2a ? @@ @2a ? @ 2 @2a

; ? @ 2

@2a ; ? @ 2 @2a ; ? @ @

 = ;  =

2



2

 0; (1.5) is log concave. For the canonical case

@ a = var(T (y )) and the determinant = var(y), ? @ 2

2

1. Bayesian Approaches for Overdispersion in Generalized Linear Models

9

becomes var(y)var(T (y)) - cov2 (y; T (y)) which is nonnegative whence log concavity holds. Sampling based tting of the OGLM in the Bayesian framework was carried out using standard tools. Dey et al. (1997) used a Gaussian proposal to implement a Metropolis algorithm (Hastings 1970) although adaptive rejection sampling within the Gibbs sampler as in Gilks and Wild (1992) is another possibility. They used multiple starts in the vicinity of the MLE for ~ ~; Y~ ) requires repeated ~ (under  = 0) and at ~ = ~0. Evaluation of L( ; calculation of the function (i ; i ), which fortunately is a routine univariate numerical integration.

3.2 Example: Overdispersed Poisson Regression Model

Dey et al. (1997) used as a motivating example data involving damage incidents to cargo ships (McCullagh and Nelder 1989, p204). For each of 34 ships the aggregate months in service were recorded as well as the number of damage incidents over that period. Explanatory factors are ship type having 5 levels (A, B, C, D, E), year of construction having 4 levels (CP1, CP2, CP3, CP4) and period of operation having two levels (SP1, SP2). Since the response is a count, McCullagh and Nelder proposed a Poisson regression presuming that the expected number of damage incidents is directly proportional to the aggregate months in service, i.e., the total period of risk. Using a canonical link the GLM sets  = log (aggregate months service) + 0 + e ect due to ship type + e ect due to year of construction + e ect due to service period: (1.8) The log(aggregate months service) term is called an o set, its coecient xed at 1 as a result of the proportionality assumption. They also incorporated a dispersion parameter  in the Poisson model, whose estimate was ^ = 1:69, indicating overdispersion relative to the standard Poisson density which, intrinsically, has  = 1. On examining the nonstandardized) residuals yi ? ^i under the GLM in (1.8), Dey et al. noted that some observations did not show a good model t, and this situation is not improved through an EDM. The use of OGLM's to t this data is illustrated in Dey et al., a brief summary of which follows. Taking (1.1) as the model for  = x~T ~ in (1.7), Dey et al. considered three speci cations for  . In the simple case (model 1) they set  = 0, which resulted in a 9 parameter Bayesian GLM for (1.1). Model 2 incorporated a constant dispersion parameter  = 0 with the convex function, T (y) = (y + 1) log(y + 1), yielding a 10 parameter model. Finally, anticipating that overdispersion might increase with exposure, model 3 set  = 0 + 1 log(aggregate months service), using the same T (y), a 11 parameter model.

10

Dipak K. Dey, Nalini Ravishanker

Results from these model ts provide evidence of overdispersion ( > 0) and, in addition, evidence that 1 > 0, supporting the hypothesis that overdispersion increases with exposure to risk. It was also seen that model 3 better gave a better t for larger values of y, which are associated with greater exposure.

3.3 Model Determination for parametric OGLM's

Model determination includes model adequacy and model choice. Formal Bayesian model determination proceeds from the marginal predictive distribution of the data f (~y) evaluated at the observed data, y~obs . Since this high dimensional density is hard to estimate well and its value is hard to calibrate it may be preferable, as argued in Gelfand (1995), to look at alternative predictive distributions, in particular univariate ones such as the posterior predictive density at a new y0 , f (y0 jy~obs ) or the cross{validated predictive density of say yr , f (yr jy~(r);obs ) (which can be compared with yr;obs ). Although the model determination diagnostics are admittedly informal, they are in the spirit of widely used classical EDA approaches and are attractive since they permit examination of model performance at the level of the individual observation. Required calculations may be carried out by sampling f (yr jy(r);obs ) which may be achieved using the posterior sampler as described in Gelfand and Dey (1994). For the cargo data analysis, Dey et al. (1997) adopted a cross-validation approach, paralleling widely used classical regression strategy. In particular, they considered the proper densities f (yr j y~(r)), r = 1; 2; :::; 34, where y~(r) denotes y~ with yr removed. In fact, they conditioned on the actual observations y~(r);obs creating the predictive distribution for yr under the model and all the data except yr . For model determination this would require comparison, in some fashion, of f (yr j y~(r);obs) with the rth observation, yr;obs . Such cross validation is discussed in Gelfand, Dey and Chang (1992) and in further references provided therein. A natural approach for model adequacy is to draw, for each r, a sample from f (yr j y~(r);obs), and compare this sample with yr;obs . In particular, based on this sample, it is possible to obtain the .025 and .975 quantiles of f (yr j y~(r);obs) say yr and yr and see how many of the yr;obs belongs to [yr ; yr ]. Under each of the three models t to the cargo ship data, at least 27 of the 34 intervals contained the corresponding yr;obs . They also obtained the lower and upper quartiles of f (yr j y~(r);obs) to see how many yr;obs belonged in their interquartile ranges; they found 13, 15 and 18 under model 1, model 2 and model 3 respectively. Since we would expect half, i.e., 17 under the true model, both model 2 and model 3 perform close to expectation though all three models seem adequate. A well established tool for model choice is the conditional predictive ordinate (CPO), f (yr;obs j y~(r);obs), a large value of which implies agreement between the observation and the model. For comparing models, the ratio

1. Bayesian Approaches for Overdispersion in Generalized Linear Models dr = ff ((yy

jy( ) jy( )

11

;Mi ) ;Mi0

(or perhaps log dr ) indicates support by the rth point for one model versus the other (see Pettit and Young, 1990). For the cargo data, model 3 provided the best t. A simple diagnostic with a freP 2 quentist avor, 34 r=1 (yr;obs ? E (r jy )) =34 yielded values 6.74, 7.11 and 8.61 respectively for model 1, model 2 and model 3; compared with the EDM result of 6.70, all three models are adequate. All of the foregoing calculations are carried out by sampling f (yr jy(r);obs) which may be achieved using the posterior sampler as described in Gelfand and Dey (1994). r;obs

r;obs

r ;obs

r ;obs

4 Modeling Overdispersion in the nonparametric Bayesian framework In some problems, samples exhibit extra heterogeneity because observations are drawn from an overall population which would be more e ectively modeled as a mixture of subpopulations. Such behavior cannot be captured by univariate exponential family models and even OGLM's cannot explain all types of heterogeneity adequately. To alleviate the problems in nite parametric mixture models, Mukhopadhyay and Gelfand (1997) captured the speci cation of overdispersion nonparametrically through Dirichlet Process (DP) mixing. They extended GLM's to DPMGLM's (Dirichlet process mixed GLM's) and OGLM's to DPMOGLM's, with speci c emphasis on binomial and Poisson regression models. DPMOGLM's a ord the possibility of capturing a very broad range of heterogeneity resulting in the most

exible GLM's yet proposed and are an alternative to GLMM's (Breslow and Clayton 1993). Note that they intentionally retained the GLM aspect with regard to the mean. In theory, although one could DP mix over the coecients in the presumed linear mean structure on a translated scale, these coecients would vanish from the likelihood resulting in a mixture model which would no longer be a linear model in any sense. Their structure retains these coecients and the associated linear structure to permit the appealing interpretation they provide with regard to the relationship between a response variable and a collection of explanatory variables.

4.1 Fitting DP mixed GLM and OGLM

The basics of DP mixing as described in Blackwell and McQueen (1973), Ferguson (1973) and Sethuraman and Tiwari (1982) are extended to mixture modeling as follows (for details, see Mukhopadhyay and Gelfand, 1997). Consider ff (j~) : ~ 2  

Suggest Documents