A Simulation Approach to Hierarchical Models

1 downloads 0 Views 302KB Size Report
Edward George for helpful comments on an earlier version of the paper, ..... tion of many authors; see for example Carlin and Gelfand (1991), George et al.
A Simulation Approach to Hierarchical Models Petros Dellaportas and Dimitris Karlis  October 1995; revised December 1996

Abstract R

We deal with general mixture models of the form m(x) =  f(x )g()d, where g() and m(x) are called mixing and mixed or compound densities respectively, and  is called the j

mixing parameter. The usual statistical application of these models emerges when we have data x ; i = 1; : : :; n with densities f(x i

i j

 ) for given  , and the  are independent with i

i

i

common density g(). Then, the marginal density of any x is m(x). These models are often i

called hierarchical models due to the ordered structure of the data and parameter distributions. Interest lies mainly in the estimation of the unit-speci c parameters  , but an interesting i

problem is also the smooth reconstruction of m(x) which can also be viewed as the marginal density of the data. For a certain well known class of densities f(x ), we present a samplej

based approach to reconstruct g(). The idea is based on the fact that at least for the class of densities we consider, the moments of g() can be derived analytically from the moments of m(x). We rst provide theoretical results to justify the above result, and then we use, in an empirical Bayes spirit, the rst four moments of the data to estimate the rst four moments of g(). With these moments, we are in a position to generate a random sample from a density which has such rst four moments. Thus, using sampling techniques we proceed in a fully

 Petros Dellaportas is Lecturer, Department of Statistics, Athens University of Economics and Business, 76

Patission Str, 10434 Athens, Greece, e-mail address: [email protected]. Dimitris Karlis is a Ph.D. candidate at the same department, e-mail address: [email protected]. The authors thank Evdokia Xekalaki for her encouragement, Edward George for helpful comments on an earlier version of the paper, an editor and a referee for many detailed constructive suggestions. The second author acknowledges nancial support from the State Scholarship foundation of Greece.

1

Bayesian fashion to obtain any posterior summaries of interest. Moreover, reconstruction of m(x) in a smoother version than the commonly used is available. Simulations which investigate the operating characteristics of our proposed methodology are presented. We illustrate our approach using data from mixed Poisson and mixed normal densities. Keywords :

Bayesian inference; Empirical Bayes; Method of moments; Mixtures; Monte

Carlo.

1 INTRODUCTION We deal with situations in which data are available from several di erent, but related, sources of variability giving rise to what are known as general mixture models or compound decision problems. Examples include agricultural trials involving many varieties, clinical trial centers involved in the same study and actuarial risk assessment of many policy holders. The probabilistic formulation of such models is as follows. Suppose that there are n sources of variability or population units and we observe data xi ; i = 1; 2; : : :; n. The within unit uncertainty is expressed by assuming that xi are independent and distributed according to a parametric family of distributions f (x j i ). The between unit uncertainty can be modeled by assuming that the unit-speci c parameters i 2  have some common distribution g () belonging to a class of distributions G on . Then the unconditional or marginal distribution of any xi has density

m(x) =

Z



f (x j )g()d

(1)

where g () and m(x) are called mixing or prior and mixed or compound densities respectively and

 is called the mixing parameter. The above model speci cation allows us to combine information from the di erent sources of variability (population units) to improve our inference on the parameters i . It is often called a two-stage hierarchical model, or a conditionally independent hierarchical model; see Kass and Ste ey (1989). Following the formalism of Draper (1995), the model M speci ed by (1) has two uncertain parts: the structural uncertainty S speci ed by the form of the density g , and the parameter uncertainty

P of the parameters i . We can thereby write M = (S ; P ), keeping in mind that the uncertainties 2

S and P are equivalent to uncertainties on g and i. Below, we shall be using this formalism to present a uni cation of di erent approaches to inference for models in (1). There is a large literature on methodology for and applications of the models de ned by (1). We can distinguish between three main avenues. First, interest may lie in the structural uncertainty

S . This is emphasized in the writings of workers in classical methodology via non- parametric density estimation and identi cation questions for the distribution g ; see, for example, Laird (1978), Bohning, Schlattman and Lindsay (1992) . The idea is that inference on g can be used to predict future i . Very close ideas are found in the eld often called non- parametric empirical Bayes; see Robbins (1964). Second, we may focus on the parameter uncertainty P adopting a parametric empirical Bayes approach. This strategy involves the introduction of the hyperparameters  , so that g () is written as g ( j  ). Estimation of the parameters  through maximum likelihood or some variant leads to a speci c choice of g^ 2 G, where in this case G is a pre-speci ed parametric family of distributions. Inference for the unknown parameters i is based on the posterior empirical Bayes approximation

p^(i j xi ) = p^(i j xi ;  = ^x). Here,  is set to the estimate ^x based on data x. See, for example, Deely and Lindley (1981), Morris (1983a) and Kass and Ste ey (1989). Third, we may choose the fully Bayesian formulation in which both P and S must be described probabilistically. Thus, another density h is introduced so that  has density h( ) and the model is now called a three-stage hierarchical model. The parameters of h( ) are speci ed in advance, or, another hierarchical stage can be added. The parameter uncertainty is based on the posterior R

density given by p(i j x) = p(i j xi ;  )p( j x)d , with x denoting all data, whereas inference on  , based on straightforward probability manipulations involving Bayes' theorem, provides the required information about g . See, for example, Lindley and Smith (1972) and George, Makov and Smith (1994). Here, the uncertainty S is speci ed by extending the uncertainty P to cover a further set of parameters  . However, the family G is still pre-speci ed. An approach to incorporate

S while retaining a Bayesian formulation can be based on variations of Dirichlet process models introduced by Ferguson (1973). See, for example, Escobar and West (1995). A broad review of 3

all the above approaches can be found in the books of Berger (1985), Maritz and Lwin (1989) and Carlin and Louis (1996). Finally, probably in parallel with the above methodological list, it is often desirable not to infer directly about the uncertainties in M, but, possibly via M, to reconstruct m(x) in (1). Interest here lies in the search of a smoother version than the commonly used empirical frequencies or empirical distribution functions, which can result in an ecient estimation of any data functionals of interest. Leaving aside philosophical issues referring to Bayes versus non-Bayes debates, it is certainly true that each of the above approaches have speci c disadvantages. In summary, non-parametric methodologies require a large amount of data and fail to produce smooth versions of g (); naive parametric empirical Bayes underestimates the variance V ar(i j xi ); the fully Bayesian approach presents implementation impediments, especially when it is not restricted to the conjugate cases; although recent papers by George et al. (1993,1994) suggest an implementation avenue. Here, we would like to focus on, and o er alternative solutions to, the following more general drawbacks: parametric empirical Bayes methodologies assess the parametric uncertainty P given, probably after a data driven investigation, a pre-speci ed g . Hence, the structural uncertainty S is not assessed, and it is not included in the uncertainty of the posterior empirical Bayes rule. Non-parametric approaches focus on S but fail to provide an inferential mechanism to specify P . Moreover, they provide only a nite step approximation to g (). In this paper we develop a semi-parametric sample-based methodology to approximate a density g () based on the method of moments and consequently obtain approximate Bayesian inference about the unit-speci c parameters i . A key innovation is the simulation from a distribution with given moments. We rst show that for a broad range of distributions, the moments of g () can be derived analytically from the moments of m(x). Morris (1983b) has shown that this is possible for natural exponential families with quadratic variance functions (NEF-QVF). Next, rather than producing, as is commonly the case, a nite step distribution as an approximation to g (), we incorporate a smooth semi-parametric sampling-based approximation. S is restricted to a particular, 4

but broad, family of distributions G, and g () is speci ed conditionally on its rst four moments

 . These moments, can also be speci ed in a purely subjectivist position from personal beliefs. Identi cation of g ( j  ; S ) is then automatically equivalent to a variant of the famous method of moments problem. Our approach is sampling-based in the sense that this identi cation is achieved through the ability to obtain a sample from g ( j  ; S ) rather than obtain its precise functional form. This ability is useful for two reasons. First, posterior inference on the parameters i can be routinely made via the rejection and sampling-resampling algorithms. Second, reconstruction of the density m(x) based on a smooth rather than a nite step approximation of g () in (1) is straightforward using the composition method. Note that rather than specifying rst S and then

 , as it is commonly the case, we specify rst  and then S . The remainder of the paper proceeds as follows. In section 2 we present the theoretical background which enables us to specify, in an empirical Bayes spirit, the moments of the mixing distribution g (). In section 3 we discuss how we can obtain a sample of g () given its rst four moments. In section 4 we discuss the simulation approaches which result in the required samples for inference, while in section 5 we illustrate our suggested methodology with two examples. Section 6 deals with the operating characteristics of our method, investigating the properties of Bayes and empirical Bayes risks as well as sensitivity to the suggested moment speci cation. Finally, the last section 7 contains a short discussion and concluding remarks.

2 THEORY Let F be a class of distributions, discrete or continuous, with moment generating function of the form M (t) = c(t)[(t)] where  is the mixing parameter as in (1) and c(t); (t) are functions continuously di erentiable of order r  4 in some neighborhood of the origin and that do not depend on  but that do possibly depend on some other parameters. Some well known distributions belong to this class; Poisson, exponential, gamma, normal, binomial and negative binomial are some examples. Note that some multiparameter distributions may belong to this family only with respect to some parameters. For example, the binomial distribution with parameters p and n has moment 5

generating function (q + pet )n where q = 1 ? p. Clearly, the binomial belongs to this class with respect to n, but not with respect to p. Some common probability distributions in F are given in Table 1.

Table 1. Common distributions in F Density

Moment Generating Function Mixing parameter 

Normal

exp(t +  2t2 =2)

;  2

Gamma

[=( ? t)]



Poisson

exp[(et ? 1)]



?

Binomial 

q + pet

n

n 

p=(1 ? qet ) n

n

Logistic

eat?(1 ? t)?(1 + t)

a

Extreme value

eat?(1 ? t)

a

Double exponential

eat=[1 ? ( t)2 )]

a

Generalized Poisson

exp[(h(t) ? 1)]



Generalized Binomial

(q + ph(t))n

n

Generalized Neg. Binomial

[p=(1 ? qh(t))]n

n

Negative Binomial

Our results in this section refer mainly to the above family of distributions. We shall show that analytic moment estimation of the mixing density is possible for distributions in F . In fact, using a di erent approach, Lindsay (1989) gave similar results for the NEF-QVF distributions identi ed by Morris (1982,1983b) which contains six distributions -modulo certain transformations. These are the normal, binomial, Poisson, negative binomial, generalized hyperbolic secant and gamma families of distributions. As it is evident from Table 1, the class F is di erent than Morris' NEF-QVF because, for example, does not contain the binomial distribution with parameter p. However, it contains all the generalized Poisson, binomial and negative binomial distributions, which have many important applications in risk theory, reliability and ecology. Recall that a generalized distribution is de ned as the random sum SK = X1 + X2 + : : : + XK with Xi ; i = 1; : : :; K , independent and 6

identically distributed random variables from any density function and K a random variable from a discrete distribution. The last three generalized distributions in Table 1 correspond to cases in which K is drawn from a Poisson, binomial or negative binomial density respectively. See Douglas (1980) for a detailed account. The family F possesses the following property:

Lemma 1 If a distribution function belongs to F , then its r-th moment is a polynomial of degree at most r with respect to . Proof: The r-th derivative of the moment generating function can be written as M (r)(t) = (t)?r (),

where () is a polynomial of  with degree at most r and number of terms given by the recursive relation tr = 3tr?1 ? 1, with t0 = 1. Noting that the r-th moment r is given by the M (r)(t) evaluated at t = 0 and that (0) = 1, the result is immediate.

Theorem 1 If f (x j ) belongs to F , then the moments of m(x) are linear functions of the moments of g (). Proof: It is well known that the moments of m(x) are given by the corresponding moments of

f (x j ) weighted by g(). More speci cally, the r-th moment of the mixed random variable X R

can be given by E (X r) = E (X r j )g ()d; see, for example, Douglas (1980). The result is then trivial from lemma 1. Consider two simple examples of the above theorem. If f (x j ) is the Poisson density with parameter , then its rst four moments, denoted i (x); i = 1; : : :; 4, are ; 2 + ; 3 + 32 +  and

4 +63 +72 +  respectively. Then, the moments of the mixed Poisson are given by 1 (x) = 1 (), 2 (x) = 2() + 1(), 3(x) = 3 () + 32() + 1 () and 4 (x) = 4 () + 63() + 72() + 1(), where i (); i = 1; : : :; 4, denote the moments of g (). The above relations allow us to calculate

i (); i = 1; : : :; 4, as functions of i(x); i = 1; : : :; 4. If f (x j ) denotes the normal density with mean  and known standard deviation  , the respective relations between the rst moments are 1 (x) = 1 (), 2 (x) =  2 + 2 (), 3 (x) = 3 21 () + 3 () and 4 (x) = 3 4 + 4 () + 6 22 (). It is interesting to note that had we 7

considered  2 as mixing parameter with the mean assumed known, we would have derived similar results. It is now clear that at least for densities in F we can estimate analytically the moments of

g() by using data estimates of the moments of m(x). In fact, there are other distributions outside

F in which the property of theorem 1 holds. For example, Lindsay (1989) notes that the Weibull distribution has this property. In full Bayesian analysis, the moments can be speci ed in a subjective spirit as a way to express uncertainty about the prior; see Goutis (1994). Before we proceed in the next section to describe how this result can be exploited, we give an interesting corollary of theorem 1. Note that for clari cation, we introduce subscripts which denote the distributions with respect of which the expectations are taken:

Corollary 1 If a random variable X has marginal density m(x) as de ned in (1), with f 2 F , then V arx(X ) = kV ar () + V arxj [X j  = E ()], where k  0. Proof: For m; f and g de ned in (1), we have from theorem 1 that Exj (X j ) = a1  + a2,

Exj (X 2 j ) = qa () = a32 +a4  +a5 and V arxj (X j ) = qb() = b12 +b2  +b3 . Note that qb () = h

h

i

i

qa () ? (a1 + a2)2. Then, E V arxj (X j ) = b1V ar () + qb (E ()) and V ar Exj (X j ) = a21 V ar (), so V arx(X ) = (b1 + a21 )V ar () + qb (E ()) = kV ar () + V arxj [X j  = E ()] for k = b1 + a21 = a3. Note that, since M (0) = 1, k = (0(0)=(0))2. For example if we consider mixtures of Poisson we take the useful expression V arx(X ) =

Exj (X j  = E ()) + V ar () = E () + V ar () since for the Poisson distribution the variance equals to the mean. In fact this property leads to the variance test for the Poisson assumption. Another useful application is known in accident theory. Assuming a model of the form (1) we can decompose the total variance into a random and a proneness term. Xekalaki (1984) generalized the above property by decomposing the total variance in three terms: randomness, proneness and liability. Corollary 1 is also the base for the `one-way analysis of variance' problem in normal theory.

8

3 SIMULATION WITH GIVEN MOMENTS We shall be concerned with the following variant of the moment problem. Can we reconstruct a density function, say g (), if we have a nite set of its moments 1 ; 2; : : :; k ? Conditions which ensure the existence of a solution to the above problem have been given by Shohat and Tamarkin (1943). In the past, such attempts have been made assuming that the distribution has a given functional form. Tucker(1963) treated the Poisson case and Lindsay(1989) generalized the idea of Tucker by showing that for all the members of the NEF-QVF of Morris (1982) we can consistently estimate the mixing distribution using the method of moments. Heckman and Walker (1990) and Heckman, Robb and Walker (1990) exploited the case of exponential mixtures, from a theoretical and an application perspective respectively. See also Heckman (1990) for geometric mixtures. Using an empirical Bayes framework, Maritz and Lwin(1989) discussed how the knowledge being elicited from the moments can be used to nd the mixing distribution; they also presented a critical discussion to the applicability of such procedures. The practicality of the above methodologies has been questioned because of the underlying variability of the prior moments, from both an empirical Bayes and subjective spirit; see Morris (1983a), and the discussion of Berger in Morris (1983a). It has been noted by many authors that there is an obvious relationship between probability densities and samples, in that, given a sample, an estimate or approximate version of the corresponding density or distribution function may be obtained; see, for example, Smith and Gelfand (1992). Motivated by this approach, we suggest here that the moment problem above can be solved by taking a new look from a sampling perspective. Simulation from a density with speci ed moments would provide a sensible way to approximate the density g (). We shall denote this approximation g^x (). We have experimented with two algorithms to simulate from a unimodal density with given its rst four (standardized) moments. The rst is given by Devroye (1986). The second is an algorithm to simulate from the appropriate member of the Pearson family; see Elderton and Johnson (1969) 9

for details how to choose the appropriate member of Pearson family and Devroye (1986) for details about its simulation. In Appendix we outline the basic steps of these algorithms. (A Pascal code of both these algorithms can be obtained by the authors upon request.) These are only two of many possible algorithms that can be used; see Devroye (1986, pp. 685) for a long discussion, and Devroye (1989,1991) for related references. Our experimentation with the above algorithms indicated that with an adequate sample size, they produce identical results. However, for small sample sizes we tend to recommend the Pearson family generator because it produces smoother densities. A simulated study which provides evidence that there is no visual di erence between eight di erent families of distributions with the same rst moments is given by Pearson, Johnson and Burr (1979). The Pearson family is one of these families. Finally, Rutherford and Krutchko (1967) have shown that the estimated, via the rst four moments, member of Pearson family g^x(), converges almost surely to g (). Note that the conditions of Shohat and Tamarkin (1943) for the existence of a density with given rst moments imply, for the rst four standardized moments, 0r ; r = 1; : : :; 4, that 02  0 and 04  (03 )2 + 1. The above conditions may not hold if the data sample size is small. In this case, we can use only the rst three moments; Algorithm 1 of Appendix can be readily adjusted for this purpose. Let us pause here and take a close look to another relevant issue which is important in the nonparametric empirical Bayes setting. First, notice that a distribution function with given its rst four moments is not unique, therefore di erent generating algorithms will probably produce di erent results. Even though that for our purposes all algorithms will produce good approximations to g (), we must connect this issue with the identi ability of mixing densities. We recall that the R

R

mixtures of the density f (x j ) are said to be identi able if f (x j )g1 ()d = f (x j )g2 ()d implies that g1 () = g2 (). For identi able mixtures one and only one mixing distribution can result to a speci c mixed distribution. Naturally, it is only sensible to estimate the mixing distribution if the identi ability holds. See Maritz and Lwin (1989) for a counterexample with a non identi able mixture. For a further discussion see Titterington, Makov and Smith (1985) and the references 10

therein. Teicher (1961) has shown that for a subset of F , namely the distributions with moment generating function of the form [(t)] , identi ability is guaranteed. This sub-family contains the gamma, Poisson, binomial, negative binomial, and the generalized Poisson, binomial and negative Binomial densities. Identi ability for normal mixtures has been shown again by Teicher (1961), whereas for the logistic and Double exponential densities identi ability is straightforward using a Laplace transformation. Therefore, at least for the densities in Table 1 we have results to guarantee a unique g (). The algorithms in Appendix provide samples from unimodal densities. In some situations, the unimodality property can be very important. For example, it is known (Holgate 1970) that if the mixing distribution g () of a Poisson distribution f (x j ) is continuous and unimodal, then the resulting marginal distribution m(x) is also unimodal. However, if g () is discrete and unimodal,

m(x) is not necessarily unimodal. Therefore, if a preliminary data investigation reveals strong evidence for unimodality, a continuous and unimodal mixing distribution is a safe route to choose. Algorithm 1 in the Appendix provides random variates on the real line whereas algorithm 2, depending on the choice of the member of Pearson family, provides random variates on di erent domains. It is a simple matter to transform our sample to the desired domain using Taylor expansion. For example, assume that we require random variates de ned on the positive line, but we use the algorithm 1 which provides variates taking values on the real line. We can simply transform the sampled variates using the transformation  ! exp(). The required moments of log() can be approximated very accurately using the rst two terms of Taylor expansion: if X is a random variable with mean  and variance  2 , then Y = logX has moments 2h i E (Y n) ' (log )n + n2 2 (n ? 1)(log)n?2 ? (log )n?1 :

Up to now we have based our estimation of g^x() to the rst four sample moments. A natural question to arise is the sensitivity of this estimation due to the sample moments variability. We suggest the use of the moments sampling distributions as follows: instead of using g^x (), we may R

use g~x () = g ( j  )h( )d where  denotes the four moments of g () and h( ) is the moments 11

sampling distribution which is a multivariate normal density with mean vector the estimated sample moments ^k ; k = 1; : : :; 4, and covariance matrix with (rq)-th element n?1 (^r+q ? ^q ^r ); see, for example, Kendall and Stuart (1961). This multivariate normal density should be constrained to the values of  for which the moment problem has a solution. The required sampling for g~x () is now achieved by rst obtaining a sample from h( ), say ~, and then using ~ to obtain a sample from g ( j  = ~).

4 INFERENCES 4.1 Bayesian inference of i With x(n) denoting the totality of the data, the posterior density function for a particular i , for some 1  i  n, is

p(i j x(n) ) / f (xi j i )^gx (i ):

(2)

Note that only xi which is directly associated with i is used in the right hand side of (2); see Deely and Lindley (1981). Sampling from the posterior density can be achieved using either the rejection or the sampling-resampling algorithms; see Smith and Gelfand (1992). For the rejection sampling algorithm, note that we only need to obtain a sample from p(i j xi) = m(xi )p(i j xi ) / p(i j xi). But p (i j xi )  f (xi j ^i )^gx (i ), where ^i is the value of  which maximizes f (xi j ). The algorithm proceeds by drawing samples from g^x (), and accepting them with probability f (xi j i )=f (xi j ^i ). The maximization of f (xi j i ) is sometimes straightforward. For example, if f (x j ) is a Poisson density, ^i = xi . If maximisation of f (xi j i ) is not easy, the sampling- resampling algorithm can be used: rst P

draw a sample 1 ; 2 ; : : :; N from g^x (), calculate wi = f (xi j i )= Ni=1 f (xi j i ), and re-draw  from the discrete distribution over (1 ; : : :; N ) placing mass wi on i . Another interesting issue emerges here. Having obtained a mechanism which generates samples from g^x(), we can proceed in a fully Bayesian fashion to estimate any posterior densities of interest. For example, if a parameter i depends on a group of data xij , then the term f (xi j i ) in the right 12

hand side of (2) can be replaced by the product

Q

j f (xij

j i). This issue will be applied in the

second example of next section. The advantages of obtaining a posterior sample of the parameters of interest is well demonstrated in the statistical literature, mainly due to the advances of Markov chain Monte Carlo methods. For example, when a posterior sample is obtained, graphs of posterior densities as well as estimation of functions of parameters and any posterior summaries of interest are readily available. References related to the context of hierarchical modeling are George et al. (1993,1994). Comparing their approach with ours, we note that the price for using a fully Bayesian approach is adoption of sophisticated sampling (for example from log-concave densities) in a Markov chain Monte Carlo setting.

4.2 Reconstruction of the marginal density Having obtained the ability to sample from g^x (), we can obtain a sample of the marginal density

m(x) using the composite method (see Tanner 1991): We draw a value j  g^x(), then a value xj  f (x j  = j ), and we repeat the process N times, for j = 1; : : :; N . The resulting values xj ; j = 1; : : :; N constitute an independent and identically distributed sample from m(x), which in turn possesses an obvious relationship with the required density m(x). Approximate versions of m(x) are readily available via some reconstruction technique which relates a given sample with the corresponding density. Obvious choices are the normal kernel density estimation (Silverman 1986) and the histogram for continuous densities, and the empirical density estimate for discrete densities. At any case, this recovery of a density from a sample can be achieved fairly automatically.

13

5 ILLUSTRATIVE EXAMPLES 5.1 A re-analysis of the pump-failure data The pump-failure data presented by Gaver and O'Muircheartaigh (1987) have attracted the attention of many authors; see for example Carlin and Gelfand (1991), George et al. (1993), Christiansen and Morris (1996). They refer to n = 10 power plant pumps in which the number of failures

xi; i = 1; : : :; n, is assumed to follow a Poisson distribution with means i ti where i is the failure rate for pump i and ti is the length of operation time of the pump (in 1000s of hours). George et al. (1993) presented a full Bayesian approach using Gibbs sampler. They assumed that, conditional on

and , the failure rates i are independent and identically distributed as Gamma( ; ) with density function G(i j ; ) = i ?1 exp(? i )=?( ). Then, instead of xing and at empirical Bayes estimates as suggested by Gaver and O'Muircheartaigh (1987), they placed third-stage priors on and ,  ( ) and  ( ) respectively, suggesting, for illustrative purposes, four di erent choices of the joint prior  ( ; ) =  ( ) ( ). Nelder (1994) commented that the maximum-conjugatelikelihood estimates provide similar results with the full Bayesian approach of George et al. and that a better model is achieved by taking into account the fact that the pumps fall into two groups, one consisting of pumps 1,3,4 and 6 which were operated continuously and one consisting of the rest which were operated intermittently. We analyze the same data using our suggested sampling based approach. First, we treat all

i as exchangeably distributed according to a density g(). The moments of g() can be derived, after some algebra, as follows: denote ri = xi =ti , k (r) = n?1

Pn

k i=1 ri

and k (t) = n?1

Pn

?k .

i=1 ti

Then 1 () = 1 (r), 2 () = 2 (r) ? 1 (r)1(t), 3 () = 3 (r) ? 32 ()1 (t) ? 1 (r)2 (t) and

4 () = 4(r) ? 63 ()1(t) ? 72()2(t) ? 1 (r)3 (t). We construct g^x () using the rst four moments of the data as described in section 3 and we derive posterior samples for i by using the rejection algorithm of section 4.1. The reconstruced through kernel density estimation posterior densities are depicted in Figure 1 (solid lines). Note that the posterior densities of 7 ? 8 and 9 are not unimodal. This is an indication that the mixing density and the unit-speci c data con ict. 14

FIGURE 1 with caption: Posterior densities of failure rates It is of particular interest to compare our methodology with the empirical Bayes approach of Gaver and O'Muircheartaigh (1987), the fully Bayesian approach of George et al. (1993) and the maximum-conjugate-likelihood estimates of Nelder (1994). The modeling adequacy can be checked with the Bayes factors of our model against the above models. The Bayes factor of a model Ms against a model Mt is given by Bst = m(x j Ms )=m(x j Mt ) with m(x j Ms ) and m(x j Mt ) denoting the joint marginal densities under models Ms and Mt respectively. Denoting with f (xi j ; ti ) the Poisson density with mean ti , it turns out that the marginal densities for the pump failure data are

m(x j M1 ) =

Z Y n

i=1

f (xi j ; ti )^gx()d

for the general hierarchical model we propose,

m(x j M2) =

Z Z Z Y n

i=1

f (xi j ; ti )G( j ; )( ; )dd d

for the full Bayesian model of George et al. (1993),

m(x j M3 ) =

Z Y n

i=1

f (xi j ; ti )G( j = 1:04; = 1:04)d

for the maximum-conjugate-likelihood estimates of Nelder (1994) and

m(x j M4) =

Z Y n

i=1

f (xi j ; ti )G( j = 0:8; = 1:4)d

for the empirical Bayes estimates of Gaver and O'Muircheartaigh (1987). It is well known (see Kass and Raftery 1995) that in general, the calculation of joint marginal densities presents computational diculties. For the marginals above, note that the forms of m(x j M3 ) and m(x j M4 ) are equal to

?( + Pni=1 xP txi i  i) : P + ni=1 xi n i=1 xi ! ?( ) ( + i=1 ti ) To calculate m(x j M1 ) and m(x j M2 ), we have chosen to use a simple Monte Carlo integration: n  Y

For the former, we rst obtain a sample j ; j = 1; : : :; N from g^x (), and then we use the estimate

m^ (x j M1 ) = N ?1 PNj=1 Qni=1 f (xi j j ; ti ). For the latter, we obtain rst a sample ( j ; j ); j = 15

1; : : :; N from  ( ; ), then a sample j ; j = 1; : : :; N from G( j j ; j ), and then we use the estimate m^ (x j M2) = N ?1

PN

Qn

j j ; ti). Using the prior  ( ; ) / ?0:9 exp[? ( =100)], the resulting Bayes factors are, for N = 100000, j =1

i=1 f (xi

B12 = 1:34; B13 = 1:89; B14 = 1:42. There is evidence that our model approach provides better t to the data. Next we treat the pumps as belonging to two groups as suggested by Nelder (1994), so that the parameters i are partially exchangeable: 1 ; 3 ; 4 and 6 are distributed according to a density

g1(), whereas 2; 5; 7; 8; 9 and 10 are distributed according to a density g2 (). Therefore, the rst four moments from the data related to pumps 1; 3; 4 and 6 were used to construct g1 (), whereas the moments from the rest of the data were used to construct g2(). Denoting with S1 = f1; 3; 4; 6g and S2 = f2; 5; 7; 8; 9; 10g, the joint marginal density under this model is given by

m(x j M5 ) =

Z Z Y

i2S1

f (xi j ; ti )

Y

i2S2

f (xi j ; ti)g1()g2( )dd

with a Monte Carlo estimate given by

m^ (x j M5) = N ?1

N Y X j =1 i2S1

f (xi j j ; ti)

Here, we rst obtain samples j ; j = 1; : : :; N and

j; j

Y

i2S2

f (xi j j ; ti):

= 1; : : :; N from g1 and g2 respectively.

Naturally, this model drastically improves the t to the data, resulting to a Bayes factor B51 ' 3:7  10000. In Figure 1 the dotted lines present the posterior densities derived from the partially exchangeably two-groups approach. The rst group (pumps 1; 3; 4 and 6) contains parameters with small dispersion and it is less a ected by the relaxation of the full exchangeability. Pumps 7 ? 8 and 9, on the contrary, correspond to the smallest operation times ti and therefore are very much in uenced by the shrinkage e ects. Naturally, the partial exchangability assumption changes drastically their posterior densities and, in particular, 9 does not su er any more from the prior-likelihhod con ict. In general, with the exception of 5 , all variances of i in the partially exchangeably model have smaller variances than in the fully exchangeable model. 16

5.2 Normal groups data To illustrate our approach in the normal mixture models, we have generated 4 independent data sets of sample size 10. The data are shown in Table 2. Table 2. One way analysis of variance Group 1 Group 2 Group 3 Group 4 1

-2.768

-1.791

-1.069

0.048

2

-3.403

-1.618

-1.669

0.490

3

-1.449

-2.122

-1.861

0.201

4

-2.367

-1.833

-0.837

-0.392

5

0.222

-1.997

-1.413

0.508

6

-2.381

-1.929

-1.381

1.087

7

-1.682

-2.212

-0.741

0.357

8

-0.708

-1.519

-1.407

0.001

9

-3.043

-1.944

-1.667

0.209

10

-1.419

-1.738

-1.875

0.558

mean

-1.900

-1.870

-1.392

0.307

s.d.

1.114

0.215

0.400

0.397

Denote the data either as j-group vectors xj or data points xij with i = 1; : : :; 10; j = 1; : : :; 4. We assume that within each group j we have obtained xj from a normal density N (j ; j2). Interest lies in the estimation of group means j ; j = 1; : : :; 4. The usual hierarchical model for the \one way analysis of variance" model assumes that for given j , xij  N (j ; j2) and for given  and

 , j  N (;  ). A list of di erent estimation approaches to deal with the above model will be outlined when we revisit this data set in section 6.2. Our suggested sampling based approach replaces N (;  ) above with a density g^x () the moments of which are estimated from the moments

17

of all 40 data points. Then, note that

p(j j x(40)) /

( 10 Y

i=1

)

ND(xij ; j ; j ) g^x (j ) = f (xj j j )^gx(j )

where ND(x; ;  ) denotes the normal density N (;  2) at x, and x(40) denotes all data points. Estimation of j is available from xj . Thus, after obtaining a sample from g^x(), the rejection algorithm can be applied noting that the value ^j which maximizes f (xj j j ) is the sample mean of group j . The approximate Bayesian posterior densities for group means are produced via a kernel density approximation technique and are illustrated in Figure 2 (solid lines).

FIGURE 2 with caption: Posterior densities for group means. Note the di erence in the dispersion of each density which re ects the unequal sample group variances. As a result of this, we have encountered the important property of order reversal, in which the group sample means have di erent order than the group posterior means. This phenomenon is well known in empirical Bayes estimation; see Efron and Morris (1975). We would like to emphasize that the rejection algorithm above not only relaxes the usual assumption that g () is a normal density, but it also incorporates in the sampling algorithm the fact that the variances j2 might not be equal. A comparison with the fully Bayesian approach is dicult because the latter depends on the choice of hyperparameters in g and the prior distribution of j .

6 OPERATING CHARACTERISTICS 6.1 Bayes Risk: Poisson-Gamma model We investigate the Bayes risk performance of our suggested method by using 50 data points generated in Maritz and Lwin (1989, pp. 85) by a Poisson distribution mixed by a g () taken to be a Gamma( ; ) density with = 10; = 2. Maritz and Lwin compare a series of empirical Bayes estimators by calculating, for each estimator  (x), the expected loss

W () =

Z X

x

( (x) ? )2 f (x j )g ()d 18

(3)

where f (x j ) = e? x =x!; x = 0; 1; : : :. Full details of these estimators are given by Maritz and Lwin (1989). In summary, the estimating procedures are (a) method of moments with g () assumed to be a Gamma density, (b) maximum likelihood estimation with g () assumed to be Gamma distributed, (c) step function approximation of g () using a variant of the method of moments, (d) simple empirical Bayes estimator based on the empirical density estimate, (e) a smoothed version of (d) tting a straight line, (f) a monotonized version of (d), (g) monotonic ordinates tted by maximum likelihood, (h) linear empirical Bayes estimator. It is well known that the Bayes estimator G (x) which minimizes W ( ) is G (x) = ( + x)=( + 1), with corresponding

W (G ) = =[ ( + 1)]. The expected loss for any other estimator (x) is related to the Bayes estimator by W ( ) = W (G ) +

P

2 x [ (x) ? G (x)] m(x)

with m(x) being the marginal (mixed)

probability distribution, a negative binomial in the Poisson-Gamma setting. Our sampling-based estimator, denoted S (x), can be obtained with two ways. First, we may apply the rejection algorithm described in section 4.1: Denote the responses with xi ; i = 0; 1; : : : and draw ij  g^x () accepting it with probability f (xi j  = ij )=f (xi j  = ^ = xi ). The resulting sample ij ; j = 1; : : :; N , constitute a sample from p(i j x(50)), so the sample mean N ?1

PN

j =1 ij

is an estimate of

S (xi). Second, we may take advantage of the expression S (xi) = (xi + 1)m(xi + 1)=[1 + m(xi)] and reconstruct the required, in the previous formula, m(x) with the sampling approach described in section 4.2: Draw j  g^x (); j = 1; : : :; N and then draw xj  f (x j  = j ). The empirical frequences of the resulting sample xj ; j = 1; : : :; N can be used to reconstruct m(x). Note that in none of the above methods the explicit form of g^x () is necessary. We used both the above methods based on a sample of size N = 100000 from g^x (). Both estimators give identical results. Table 2 presents the expected losses for all estimators as well as the (optimum) Bayes estimator and our suggested estimator S . Even though many methods in the table assume that the density

g() is a Gamma density, our semi-parametric estimator which does not make such an assumption is superior to all estimated approaches. We re-emphasize here that the other methods are designed to obtain Bayes estimators, and they require considerable e ort for di erent Bayesian output, whereas our approach readily obtains, through a drawn sample of i , any desired posterior summary. 19

Table 3. Expected losses for di erent estimators Method

Loss

(a) Parametric g (), method of moments

1.77

(b) Parametric g (), maximum likelihood

1.74

(c) Step-function approximation of g ()

1.90

(d) Simple empirical Bayes estimator

41.04

(e) Smooth version of (d)

1.74

(f) Monotonized version of (d)

3.38

(g) Monotonic ordinates-maximum likelihood

2.08

(h) Linear empirical Bayes

5.00

(i) Bayes (optimum)

1.67

(j) Sampling-based S

1.68

6.2 Empirical Bayes performance: normal-normal case In this subsection we study the empirical Bayes performance of our suggested estimator using simulated data from a standard normal-normal hierarchical model de ned as Yi  N (i ; 1) and

i  N (;  2) for i = 1; : : :; k. Carlin and Louis (1996) use the same model to investigate the empirical Bayes risk of ve di erent estimators. The data have been generated as follows. We rst simulate values 1 ; : : :; k from N (0; 1) for k = 10 and k = 50. We then generate values yi ; i = 1; : : :; k from N (i; 1). The estimators we compare are the frequentist estimator ^iF = yi, the Bayes estimator ^iB = yi =2 derived if we assume the true values  = 0 and  2 = 1, the empirical Bayes estimator ^iEB = B y + (1 ? B )yi where B = 1=(1 + ^ 2), ^ 2 = maxf0; s2 ? 1g and

, s2 denote the sampling mean and variance respectively, the Morris (1983a) modi ed empirical Bayes estimator ^iM = BM y + (1 ? BM )yi where BM = [(k ? 3)=(k ? 1)]B and the hierarchical Bayes point estimator with a hyperprior  (;  2) = 1, ^iH = BH y + (1 ? BH )yi where

BH = (kk??1)3s2 [1 ? g(s2)] 20

and

2 ( " (k ? 1)s2 # )?1 ( k ? 1) s g(s2) = 2 exp ?1 : 2

To compare the performance of our sampling-based estimator with the ve above estimators, we calculated for every competitor estimator, say ^(:), the empirical Bayes risk r^. This was achieved by repeating the the data generation process N = 10000 times, resulting to estimators ^ij(:) , i = P P 1; : : :; k; j = 1; : : :; N . Then, a Monte Carlo estimate r^ = (Nk)?1 Nj=1 ki=1 (^ij(:) ? ij )2 provides an estimation of r^. Moreover, to check the sensitivity of the above estimators, we also changed the model by using i  SLN (0; 1) and i  SE (0; 1) for i = 1; : : :; k, where SLN and SE denote the shifted lognormal distribution and the shifted exponential distribution respectively, each with mean 0 and variance 1. These distribution were chosen so that departures from normality can be tested. The results are illustrated in Table 4. They indicate that our suggested sampling method performs poorly when i really come from a Normal distribution. This is a natural outcome based on fact that our method is not specially taylored to normal-normal models. However, in cases where the underlying g () does not resemble the Normal distribution, our sampling method performs better than all other estimators, with the exception of the Bayes estimator which naturally has a great advantage because its two prior moments are matched exactly. This veri es the robustness of our methodology to the assumption that g () belongs to a particular class of distributions.

6.3 Normal groups data revisited: robustness to moment estimation In this section we repeat the analysis of section 5.2 adopting the approach described at the end R

of section 3: instead of g^x () we use g~x () = g ( j  )h( )d with  denoting the four moments of g () and h( ) the sampling distribution moments calculated using the data moments. Our aim is to demonstrate that ignoring the variability in estimating the prior moments is a rather robust choice. However, at the same time, if one feels that the more conservative approach which accounts for moments variability has to be chosen, the sampling implementation is straightforward. Using g~x () instead of g^x (), we produced posterior densities of the group means which are depicted in Figure 2 (dotted lines). The inferences for groups 2 and groups 3 are practically identical. 21

Table 4. Empirical Bayes risks Method

Model

N (0; 1)

SLN (0; 1)

SE (0; 1)

k = 10 k = 50 k = 10 k = 50 k = 10 k = 50 Frequentist

1.060

1.011

1.334

1.104

1.214

1.077

Bayes

0.497

0.495

0.425

0.393

0.473

0.472

Empirical Bayes

0.604

0.522

0.448

0.350

0.564

0.493

Morris

0.612

0.522

0.533

0.354

0.609

0.494

Hierarchical Bayes 0.611

0.522

0.536

0.354

0.610

0.494

Sampling-based

0.530

0.449

0.336

0.565

0.476

0.620

Group 1 has much larger variance than the other groups and it seems that the posterior density is less robust to moment variability and more vulnerable to the shrinkage e ect. Consequently, the order reversal e ect is further emphasized. Finally, the data in group 4 are located far away from the rest 3 groups and are subject to strong shrinkage e ect. In this case, however, the moment variability approach seems to soothe the shrinkage e ect because g~x () generates much more disperse values than g^x().

7 DISCUSSION AND CONCLUSIONS We have adopted an empirical Bayes viewpoint and obtained a sampling-based estimation of the mixing density g () in (1). We believe that our suggested methodology o ers an important alternative to approximate posterior inference of unit-speci c parameters. It is computationally straightforward and it relaxes the usual parametric restrictions on the mixing density. A related comment refers to the form of posterior densities. We have noted in the pump failure data example that bimodal densities are possible when the mixing density and the unit-speci c data con ict. We would like to emphasize that the usual log-concave conjugate forms of f and g 22

do not allow non log-concave posterior densities for the unit-speci c parameters, a fact which is well known in the usual normal-normal setting. O'Hagan (1988) noticed this problematic behavior and suggested the use of t distributions for the mixing density g , so that outlier accommodation can be possible. Our approach allows g to take any unimodal shape and a possible con ict with the unit-speci c data may produce bimodal posterior densities. Another issue, which we only mentioned by passing, is the fact that the moments of g can be speci ed subjectively. Such a speci cation is not easy because the quanti cation of moments greater than the rst two is generally considered dicult. However, sensitivity analysis is readily available: di erent prior beliefs for the moments of g will produce di erent approximate posterior densities for the unit-speci c parameters, and a graphical inspection of them will provide the suitable insight on the e ect of our prior speci cation. Our approach cannot be generalized, at least straightforwardly, for multivariate hyperparameters. The method of moments becomes prohibitively dicult so a similar result to theorem 1 of section 2.1 is not easily available. We currently investigate how such generalization can be obtained. Last but not least, we would like to focus on the applicability of our methodology in di erent areas of statistics. As we remarked in the introduction, the general mixture problems are dealt with from many di erent perspectives. This paper exposes and directs attention to useful results of both the classical and Bayesian literature.

23

APPENDIX: ALGORITHMS FOR SIMULATING WITH GIVEN FIRST FOUR MOMENTS We give two algorithms for the simulation from a density with given the rst four moments.

Algorithm 1 Devroye (1986, pp. 690-691) gave an algorithm to generate random variates from a unimodal distribution with given the rst moments 1 ; 2; : : :; 4. Due to some typographic errors in that algorithm, we give it again here:

 Readjustment of moments: 1

21, 2

32 , 3

 Normalize the moments:

{

p

2 ? (1)2

{ (3; 4) {q {p

h

3 ?32 1 +231 4 ?43 1 +62 21 ?341 i ; 3 4

 q  1 1 + 4 ?1 2 4 +3 1+ p43?1 2

 Generate a uniform [0; 1] random variate U .  IF U  q THEN

{X

I[U pq]

{X

?q 1 +  pXq(1? q)

 ELSE

{X

I[U q+(1?q)p]

{X

?q 1 ?  pXq(1? q)

 Generate a uniform [0; 1] random variate Y  RETURN Z

XY 24

43 , 4

54 .

Algorithm 2 Pearson family of curves contains 12 distributions which can be derived as the solutions of the basic di erential equation which creates the system. There are three main types (Types I, IV and XI) while the rest are special or limiting cases. For applications with real data these three types cover all possible cases. We give below the basic steps of the algorithm, which is based on Elderton and Johnson (1969) and Devroye (1986). 1. Choose the appropriate member of Pearson family. 2. Estimate via moment matching the parameters of the chosen distribution. 3. Simulate from the appropriate member

 If Pearson Type I family is chosen, then simulate using the fact that if the random variable X follows a Beta type II distribution with parameters b + 1 and d + 1, then the random variable Y de ned as Y = (cX ? a)=(1+ X ) follows a Pearson type 1 distribution with parameters (a; b; c; d).  If Pearson Type IV family is chosen, then simulate using the fact that if the random variable X follows a Pearson type IV distribution with parameters (a; b; c), then the random variable Y de ned as Y = arctan(X=a) follows a distribution which is logconcave so standard algorithms which generate from log-concave densities can be used; see Devroye (1986).

 If Pearson Type IV family is chosen, then simulate using the fact that if the random variable X1 follows a Gamma(c ? b ? 1; 1) distribution and X2 follows a Gamma(b +1; 1) distribution then then the random variable Y de ned as Y = a(X1 + X2)=X1 follows a Pearson type VI distribution with parameters (a; b; c).

25

References Berger , J. O. (1985), Statistical Decision Theory and Bayesian Analysis, Springer-Verlag, New York. Bohning, D., Schlattman, P., and Lindsay, B. (1992), \Computer Assisted Analysis of Mixtures (C.A.M.AN.): Statistical Algorithms," Biometrics, 48, 283-303. Carlin, B. P., and Gelfand, A. E. (1991), \A sample Reuse Method for Accurate Parametric Empirical Bayes Con dence Intervals," Journal of the Royal Statistical Society, Series B, 53, 189-200. Carlin, B. P., and Louis, T. A. (1996), Bayes and Empirical Bayes Methods for Data Analysis, Chapman and Hall, Great Britain. Christiansen, C. L., and Morris, C. (1996), \ Hierarchical Poisson Regression Modelling," Manuscript, Department of Statistics, Harvard University, Cambridge. Deely, J. J., and Lindley, D. V. (1981), \ Bayes Empirical Bayes," Journal of the American Statistical Association, 76, 833-841.

Devroye, L. (1986), Non-Uniform Random Variate Generation. Springer- Verlag, New York. (1989), \ On Random Variate Generation When Only Moments or Fourier Coecients are Known," Mathematics and Computers in Simulation, 31, 71-89. (1991), \ Algorithms for Generating Discrete Random Variables With a Given Generating Function or a Given Moment Sequence," SIAM Journal on Scienti c and Statistical Computing, 12, 107-126. Douglas, J. B. (1980), Analysis With Standard Contagious Distributions, Statistical distributions in scienti c work series, Vol 4, ICPH, Maryland, USA. Draper, D. (1995), \ Assessment and Propagation of Model Uncertainty (with discussion)," Journal of the Royal Statistical Society, Series B, 57, 45-97.

Efron, B., and Morris, C. (1975), \ Data Analysis Using Stein's Estimator and its Generalizations," 26

Journal of the American Statistical Association, 70, 311-319.

Elderton, W. P., and Johnson, N. L. (1969), Systems of Frequency Curves, Cambridge University Press, England. Escobar, M. D., and West, M. (1995), \ Bayesian Density Estimation and Inference Using Mixtures," Journal of the American Statistical Association, 90, 577-588. Ferguson, T. S. (1973), \ A Bayesian Analysis of Some Nonparametric Problems," The Annals of Statistics, 1, 209-230.

Gaver D., and O'Muircheartaigh I. G. (1987), \Robust Empirical Bayes Analyses of Event Rates," Technometrics, 29, 1-15.

George, E. I., Makov, U. E., and Smith, A. F. M. (1993), \ Conjugate Likelihood Distributions," Scandinavian Journal of Statistics, 20, 147-156.

(1994), \ Fully Bayesian Hierarchical Analysis for Exponential Families via Monte Carlo Computation," in : Aspects of uncertainty, eds., P.R. Freeman and A. F. M. Smith, pp 181-199, Wiley, England. Goutis, C. (1994), \ Ranges of Posterior Measures for Some Classes of Priors with Speci ed Moments," International Statistical Review, 62, 245-256. Heckman, J. (1990), \ A Nonparametric Method of Moments Estimator for the Mixture of Exponentials Model and the Mixture of Geometrics Model," in: Nonparametric and Semiparametric Estimation Models in Econometrics and Statistics, eds., W. A. Barnett, J. Powell and G. Tanchen,

pp 243-258, Cambridge University Press, UK. Heckman, J., and Walker, J. (1990), \ Estimating Fecundability from Data on Waiting Times to First Conception," Journal of the American Statistical Association, 85, 283-294. Heckman, J., Robb, R., and Walker, J. (1990), \ Testing the Mixture of Exponentials Hypothesis and Estimating the Mixing Distribution by the Method of Moments," Journal of the American Statistical Association, 85, 582-589.

27

Holgate, P. (1970), \ The Modality of Some Compound Poisson Distributions," Biometrika, 57, 666-667. Kass, R. E., and Raftery, A. E. (1995), \ Bayes Factors," Journal of the American Statistical Association, 90, 773-795.

Kass, R. E., and Ste ey, D. (1989), \ Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models)," Journal of the American Statistical Association, 84, 717-726.

Kendall, M. G., and Stuart, A. (1961), The Advanced Theory of Statistics, Hafner, New York. Laird, N. (1978), \ Nonparametric Maximum Likelihood Estimation of a Mixing Distribution," Journal of the American Statistical Association, 64, 1459-1471.

Lindley, D. V., and Smith, A. F. M. (1972), \ Bayes Estimates for the Linear Model (with discussion)," Journal of the Royal Statistical Society, Series B 34, 1-41. Lindsay, B. (1989), \ Moment Matrices: Application in Mixtures," Annals of Statistics, 17, 722-740. Maritz, J. S., and Lwin, T. (1989), Empirical Bayes methods, Chapman and Hall, London. Morris, C. (1982), \ Natural Exponential Families with Quadratic Variance Functions," Annals of Statistics, 10, 65-80.

(1983a), \ Parametric Empirical Bayes Inference: Theory and Applications," Journal of American Statistical Association, 78, 47-59.

(1983b), \ Natural Exponential Families with Quadratic Variance Functions: Statistical Theory," Annals of Statistics, 11, 515-529. Nelder, J. A. (1994), \ A Re-analysis of the Pump Failure Data," Scandinavian Journal of Statistics, 21, 187-191.

O'Hagan, A. (1988), \ Modelling with Heavy Tails," in: Bayesian Statistics 3, eds. Bernardo J M, DeGroot M H, Lindley D V and Smith A F M , pp. 345-359, Oxford University Press. 28

Pearson, E .S., Johnson, N. L., and Burr, I. W. (1979), \ Comparisons of the percentage points of distributions with the same rst moments from eight di erent systems of frequency curves," Communications of Statistics-Simulation and Computation, B 8, 191-229.

Rutherford, G. R., and Krutchko , R. G. (1967), \ The Empirical Bayes Approach: Estimating the Prior Distribution," Biometrika, 54, 326-328. Robins, H. (1964), \ The Empirical Bayes Approach to Statistical Problems," Annals of Mathematical Statistics, 35, 1-20.

Shohat, J. A ., and Tamarkin, J. D. (1943), The problems of moments, American Mathematical Society, New York. Silverman, B. (1986) Density Estimation for Statistics and Data Analysis, Chapman and Hall, London. Smith, A. F. M., and Gelfand, A. E. (1992), \ Bayesian Statistics Without Tears: a SamplingResampling Perspective," The American Statistician, 46, 84-88. Tanner, M., (1991), Tools for Statistical Inference: Observed Data and Data Augmentation Methods,SpringerVerlag, New York. Teicher, H. (1961), \ Identi ability of Finite Mixtures," Annals of Mathematical Statistics, 28, 7588. Titterington, D. M., Smith, A. F. M., and Makov, U. E. (1985), Statistical Analysis with Finite Mixtures Distributions, Wiley, UK.

Tucker, H. G. (1963), \ An Estimate of the Compounding Distribution of a Compound Poisson Distribution," Theory of Probability and its Applications., 8, 195-200. Xekalaki, E. (1984), \ The Bivariate Generalized Waring Distribution and its Application to Accident Theory," Journal of the Royal Statistical Society, Series A, 147, 488-498.

29

List of Figures and Tables  Table 1. Common distributions in F . In the last three densities, h(t) is a moment generating function of any density.

 Table 2. Normal Groups data. Arti cial data generated from 4 di erent normal densities.  Table 3. Expected Losses for di erent estimators. Replication of the example given by Maritz and Lwin (1989) where the expected losses of 9 di erent estimators are compared. Added in the list is the last estimator, s , which is based on our suggested methodology.

 Table 4. Empirical Bayes risks. Simulated results which compare empirical Bayes risks for a series of methods in a hierarchical model. The data come from a normal distribution and the priors for the means are taken as Normal (N ), shifted Lognormal (SLN ) and shifted exponential (SE ) with mean 0 and variance 1.

 Figure 1. Posterior densities of failure rates. Posterior densities of parameters i in the pump-failure data example. Solid and dotted lines represent results in the exchangable and partially exchangable case respectively.

 Figure 2. Posterior densities for group means. Posterior densities of parameters i in the Normal groups example. Solid lines represent results when the moments are speci ed from the data, dotted lines represent results when the moments vary according to their sampling distribution.

30

Pump 2

Pump 3

0.0

0.05

0.10

0.15

0.20

0.25

12 8 4 0

0

5

10

15

0 1 2 3 4 5 6

Pump 1

0.0

0.2

0.4

0.8

1.0

0.2

0.3

1.0 0.0 0.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0

0.5

Pump 9

1.0

1.5

Pump 10

0

1

2

3

4

0.0

0.5

0.4 0.0

0.4

1.0

0.8

1.5

Pumps 7-8 0.8

0.4

2.0

1.2 0.4 0.0 0.2

0.3

Pump 6

0.8

15 10 5 0

0.1

0.1

Pump 5

0.0

31

0.0

0.0

3.0

Pump 4

0.6

0

1

2

3

0

1

2

3

3

Figure 2

group 3

2

group 2

32 1

group 4

0

group 1

-3

-2

-1

0

1