Chapter 1
Bayesian semi-parametric symmetric models for binary data Marcio Augusto Diniz and Carlos Alberto de Braganca Pereira and Adriano Polpo
Abstract This work proposes a general Bayesian semi-parametric model to binary data. It is considered symmetric prior probability curves as an extension for discussed ideas from [4] using the Blocked Gibbs sampler which is more general than the Polya Urn Gibbs sampler. The semi-parametric approach allows to incorporate the uncertainty around the F distribution of the latent data and modeling heavy-tailed or light-tailed distributions than that prior proposed. In particular, the Bayesian semi-parametric Logistic model is introduced which enables one to elicit prior distributions for regression coefficients from information about odds ratios what is quite interesting in applied research. Then, this framework opens several possibilities to deal with binary data in the Bayesian perspective.
1.1 Introduction The binary data modeling is a recurrence challenge in several research applied areas being that the Logistic regression popularized by [10] and the Probit model introduced by [6] often are the strategies adopted. These models are obtained when the probability curve of success is defined as a distribution function F evaluated on some covariables, in the case, Logistic and Normal distributions. Marcio Augusto Diniz Institute of Mathematics and Statistics - University of S˜ao Paulo, S˜ao Paulo, e-mail:
[email protected] Carlos Alberto de Braganca Pereira Institute of Mathematics and Statistics - University of S˜ao Paulo, S˜ao Paulo, e-mail:
[email protected] Adriano Polpo Departament of Statistics - Federal University of S˜ao Carlos, S˜ao Carlos, e-mail:
[email protected]
1
2
M.A.Diniz, C.A.B.Pereira, and A.Polpo
The distribution function F usually is symmetric with mean µ = 0 and precision τ = 1, thereby, the probability of success for a binary response approximates to value zero in the same rate that it approximates to value one. Furthermore, the distribution F could be a scale mixture of G distribution by a H distribution. In the Bayesian parametric approach, a binary random variable is treated by [1] as a discretization of a latent variable with the F distribution. In this context, a scale mixture of distributions is considered such that G is defined as the Normal distribution and H as the Gama distribution resulting F as the t distribution. Consequently, this structure makes available the t, Cauchy and Normal distributions for describing the probability curve of success. The Kolmogorov-Smirnov test statistic distribution for H was suggested by [8] in the Bayesian parametric approach resulting in the Logistic distribution for describing the probability curve which is deeply desired since it allows one to elicit prior distributions for β , the vector of regression coefficients, discussing about the odds ratios that implies in a great impact to deal with applied researchers. The modeling under Bayesian non-parametric approach through Dirichlet process, proposed by [14], arises as a more flexible alternative in front of previous modeling discussed because it excludes the necessity to define H mixture distribution which means that is possible to model heavy-tailed or light-tailed distributions than that prior proposed. A semi-parametric model based on the same structure created by [1] was introduced by [4]. Nonetheless, the H mixture distribution is not known and treated as a random quantity such that Dirichlet process is considered as the distribution for H. The computational implementation is based on Polya Urn Gibbs sampler algorithm developed by [12] that is constructed by Dirichlet process definition from [5]. Thus, the work presented in [4] resulted in the t, Cauchy and Normal distributions as options for the prior expected distribution of the probability curve of success as well as [1]. This work proposes a Bayesian semi-parametric general model to binary data. It is considered symmetric probability curves as an extension for discussed ideas from [4] for Logistic prior expected distribution through the blocked Gibbs sampler. The blocked Gibbs sampler was introduced by [15] such that it is more general and easy to implement than the Polya Urn Gibbs sampler. Section 2 defines the modeling and presents some concepts about Bayesian nonparametric approach; in section 3, the Gibbs sampler is established. Finally, the beetle data from [6] is revisited in section 4. Concluding remarks are given in section 5.
1.2 The model The problem can be described when considering Yil |pi ∼ B(pi ) as independent binary random variables being pi = F(x0i β |µ, τ) for i = 1, . . . , n and l = 1, . . . , ni such that F is a distribution function of a scale-location family F = {F(.|µ, τ) : µ ∈
1 Bayesian semi-parametric symmetric models for binary data
3
ℜ, τ > 0}, x0i is a p × 1 covariables vector from ith subject and β is a p × 1 parameter vector. In order to simplify the notation, a specific F(.|µ = a, τ = b) distribution will be indicated by only F(.|a, b). Following [1], Yil |Wil = 1(Wil >0) ,
(1.1)
where Wil |β , τ ∼ F(.|µ = x0i β , τ) for i = 1, . . ., L, because pi = E(Yil ) = P(Yil = 1) = P(Wi j > 0) = P(Wil − x0i β > −x0i β ) = 1 − F(−x0i β |µ = 0, τ) = F(x0i β |µ = 0, τ),
(1.2)
if F is symmetric around zero. Usually, it is assumed τ = 1, although, it is possible to attribute more flexibility to the modeling of W when the F distribution is defined as a mixture of G symmetric distributions in a scale-location family G, that is, f (W |β , H) =
Z
g(W |x0 β , τ(ψ))dH(ψ).
(1.3)
In a hierarchical perspective, ind
Wil |β , τi ∼ G(Wil |µ = x0i β , τi ) para i = 1, . . . , L e l = 1, . . . , ni , (1.4) τ1 , ...τL |H
i.i.d.
∼ H.
(1.5)
where τi = τ(ψi ) On the Bayesian paradigm, prior distributions should be attributed to the unknown quantities, β |τβ ∼ N p (0, τβ I p ),
(1.6)
H|α ∼ P(α, H0 ),
(1.7)
being P(α, H0 ) the Dirichlet process introduced by [14] such that H0 is a distribution which indicates the expected distribution for H and α is the parameter which indicates the dispersion around H0 in relation the sample size n. Also, α can be seen as a function of the expected value of the number of clusters of τ, n
α . α + i−1 i=1
C(α, n) = ∑
(1.8)
4
M.A.Diniz, C.A.B.Pereira, and A.Polpo
Notice that the H0 distribution should be chosen from the prior expected distribution for W denoted by F0 . In this way, it is important to point out [2] who discussed scale Normal mixtures and presented the relations used in [1] and [8]. Another point to highlight is that the posterior distribution for H is not a simple Dirichlet process, but it is a mixture of Dirichlet process defined by [3]. The computational implementation of mixture of Dirichlet process had been an enormous difficult until the Polya Urn algorithm has been presented in [12] since the previous algorithms could not sample from H posterior distribution adequately. The Polya Urn algorithm is based on the representation of Dirichlet process introduced by [5]. In spite of the algorithm is appropriate in several occasions, it is limited for situations where there is conjugacy between H0 and G distributions and it suffers from slow mixing because of the one-at-a-time updates. Solutions for these limitations was presented by [16], [13] e [17] among others. An alternative algorithm was developed by [15] based on the stick-breaking representation introduced by [19], ∞
P(.) =
∑ qk δτk (.),
(1.9)
k=1
where δτk is a discrete probability measure concentrated on τk such that τk ∼ H0 and qk is an independent random variable of τk which is given by, q1 = V1 , qk = (1 −V1 )(1 −V2 ) × . . . × (1 −Vk−1 )Vk
para k ≥ 2,
∞
∑ qk = 1,
(1.10)
k=1
being that Vk for k = 1, 2, . . . are independent random variables and identically Beta(1, α) distributed. The algorithm considers an approximation for the Dirichlet process by truncating the sum in equation (1.9) for a finite sum until N. The quality of this approximation also is established by [15], || f (W) − fN (W)||1 ≤ 4n exp (−(N − 1)/α),
(1.11)
then, the number of components of Dirichlet process, N, should be chosen from the sample size n and the dispersion parameter α. The implementation of the algorithm requires that the model presented in equations (1.4) - (1.7) be rewritten by considering τi = ZKi such that Ki for i = 1, . . . , L are classification variables to identify the variable Zk associated with each τ. Then, the model can be described as,
1 Bayesian semi-parametric symmetric models for binary data
5
ind
Wil |β , Z, K ∼ G(Wil |µ = x0i β , ZKi ) para i = 1, . . . , L e l = 1, . . . , ni , β |τβ ∼ N p (0, τβ I p ), i.i.d.
Z j |v ∼ H0 (v) para j = 1, . . . , N, K|q ∼ Multinomial(L, q1 , . . . , qN ), q|α ∼ GDN (α, 1), (1.12) where GD is the Generalized Dirichlet distribution discussed by [9]. Finally, it is possible to present the algorithm from the model.
1.3 Blocked Gibbs Sampler In this structure, the sampling of H posterior distribution is simplified and divided in the sampling from of Z|K, W, β , v, K|q, Z, W, β , q|α, K and α|q, c1 , c2 . Moreover, there are the sampling of W|Z, K, β , Y, X and β |W, Z, K. The sampling of the posterior complete conditional distribution Z|K, W, β , v is divided in two parts, i.i.d.
Zk ∼ h0 (Zk |v) for ind
Zk ∼ h0 (Zk |v)
k ∈ {1, . . . , N} − K ∗ ,
ni
∏ ∏ g(Wil |x0i β , Zk )
for
k ∈ K∗,
(1.13) (1.14)
{i:Ki =k} l=1
where K ∗ = {K1∗ , . . . , Km∗ } consists in the set of unique values of K vector. For the sampling of K|q, Z, W, β , each component Ki follows the Multinomial distribution with probabilities given by, ( ) Zk ni ni /2 0 2 exp − ∑ (Wil − xi β ) , (1.15) qi,k ∝ qk Z 2 l=1 for i = 1, . . . , L and k = 1, . . . , N. The Generalized Dirichlet conjugates with Multinomial distribution of K following [20], thus, q|α, K also is Generalized Dirichlet with parameters, aW k = 1 + mk , N
bW k =α+
∑
m j.
(1.16)
j=k+1
being mk the number of Ki equal to k for k = 1, . . . , N − 1. W The α|q, cW 1 , c1 also takes advantage of the conjugacy such that the posterior distribution still is Gamma with parameters,
6
M.A.Diniz, C.A.B.Pereira, and A.Polpo
cW 1 = N + c1 − 1, cW 2 = c2 + ln(qN ).
(1.17)
Similarly in the Bayesian linear regression with Normal errors, β |W, Z, K is given by the Normal p µ W , Σ W distribution such that, µ W = τ β I p + X0 Σ X
−1
X0 Σ W,
Σ W = τ β I p + X0 Σ X .
(1.18)
The sampling of W|Z, K, β , Y corresponds to the sampling of the g(W|Z, K, β ) distribution truncated by Y as defined in (1.1). Considering [18], it is easy to generate univariate truncated Normal, L
ni
π(W|Z, K, β , Y, X) ∝ ∏ ∏ g(Wil |x0i β , ZKi )1 i=1 l=1
Yil =1(W >0) il
.
(1.19)
The algorithm which sampling the posterior distribution W, H, β |Y, X distribution is defined by, 1. Define initial values for W, Z, K, q, α and β ; 2. Generate H from π(H|β , W) by parts, Z ∼ π(Z|K, W, β , v) defined in (1.13) and (1.14); K ∼ π(K|q, α, Z, W, β ) detailed in (1.15); q ∼ DGN (aW , bW ) with parameters defined in (1.16); W α ∼ Gamma(cW 1 , c2 ) with parameters given in (1.17); 3. Generate β ∼ N p µ W , Σ W with parameters presented in (1.18); 4. Generate W ∼ π(W|Z, K, β , Y, X) expressed in (1.19). i. ii. iii. iv.
The next two sections will present the scale Normal mixture with Gamma distribution for the scale parameter as an alternative version of the work present in [4] and the Kolmogorov-Smirnov distribution in order to provide the Logistic distribution for the prior expected distribution of the Dirichlet process. Others distributions can be constructed since the algorithm is general enough.
1.3.1 Gamma distribution If τ(ψ) = ψ such that ψ ∼ Gamma(v/2, v/2) with density, v
n v o (v/2) 2 v −1 τ 2 exp − τ 1(τ>0) , h0 (τ) = Γ (v/2) 2
(1.20)
follows that F0 is the tv (0, 1) distribution. If v = 1, F0 is the Cauchy(0, 1) distribution, while, if v → ∞, F0 is Normal(0, 1) distribution. Notice that the t-student distribution
1 Bayesian semi-parametric symmetric models for binary data
7
is already considered a reasonable approximation to Normal distribution with v > 30. Then, the posterior distribution calculated in (1.14) is the Gamma distribution with parameters, vW 1 = v/2 + mk /2, vW 2 =
ni 1 v (Wil − x0i β )2 + . ∑ ∑ 2 {i:K =k} {l=1} 2
(1.21)
i
1.3.2 Kolmogorov-Smirnov distribution If τ(ψ) is defined as τ(ψ) =
1 2ψ
2 such that
ψ ∼ Kolmogorov-Smirnov,
(1.22)
with Kolmogorov-Smirnov distribution following [11], ∞
h0 (ψ) = 8 ∑ (−1)n+1 n2 ψ exp {−2n2 ψ 2 }1(ψ>0) ,
(1.23)
k=1
follows that F0 is the Logistic(0, 1) distribution. It is not simple to evaluate the Kolmogorov-Smirnov distribution since the summing limit is not finite, although it is possible to enerate these random variables following [11] who presented an algorithm based on the Alternating Series algorithm. Different from last section, it is preferable to consider the random variable ψk to calculate the posterior distribution which results in, π(ψk |K, W, β ) ∝ ∞
∝ 8 ∑ (−1)n+1 n2 ψk exp {−2n2 ψk2 } k=1
1 4ψk2
mk /2
) ni 1 0 2 × exp − 2 ∑ ∑ (Wil − xi beta) . 8ψk {i:Ki =k} {l=1}
×
(
(1.24)
This posterior distribution is not known. Here, the Blocked Gibbs sampler introduced by [15] become extremely interesting because a etropolis-Hastings step can be added without difficulties to sample from (1.14). In the Bayesian parametric approach, a proposal distribution for the same distribution in (1.24) is suggested by [8]. They considered the empirical relation between tv and Logistic distributions discussed by [1].
8
M.A.Diniz, C.A.B.Pereira, and A.Polpo
In this way, the proposal distribution is defined by ψk2 ∼ Inverse Gamma with the parameters, vW 1 = v/2 + mk /2, # " ni 1 v 0 2 W + v2 = ∑ (Wil − xi beta) , 8 b2 {i:K∑=k} {l=1}
(1.25)
i
with acceptance probability of ψk∗ generated from proposal distribution, λ =
h0 (ψk∗ )/ha0 (ψk∗ ) , h0 (ψk )/ha0 (ψk )
(1.26)
where ha0 denotes the proposal distribution. It is necessary to evaluate the Kolmogorov-Sminorv distribution to calculate the acceptance probability, therefore, [8] presented a limit to truncate the infinite sum. The limit is based on the decomposition of the density in an alternating series, ∞
h0 (ψ) = c fd (ψ) ∑ (−1)n an (ψ), n=0
where c is a constant, fd is an easy generating density and an is a monotone decreasing series. The first decomposition follows, c fd (ψ) = 8ψ exp{−2ψ 2 }, an (ψ) = (n + 1)2 exp {−2ψ((n + 1)2 + 1)}, (1.27) p such that an & 0 for ψ > 1/3, while, the second decomposition is given by, √ π2 2ππ 2 exp − 2 , c fd (ψ) = 4ψ 4 8ψ 2 4ψ (n − 1)2 π 2 if odd n, 2 exp − 2 π 8ψ (1.28) an (ψ) = ((n + 1)2 − 1)π 2 (n + 1)2 exp − if even n. 8ψ 2 with an & 0 for ψ < π/2. Observe that the convergence interval of both series intersects, then, choosing ψ = 0, 75 as cutoff to evaluate each series as well as [11]. The limit to truncate the sum in (1.23) follows, n∗ = in f {n : c fd (ψ)an (ψ) < δ }, where δ is the precision of approximation.
(1.29)
1 Bayesian semi-parametric symmetric models for binary data
9
1.4 Predictive distribution A direct by-product from the Blocked Gibbs sampler is the predictive distribution for a new observation W(n+1)l and, consequently, Y(n+1)l for l = 1, . . . , L. See that, f (W(n+1)l |Y, X) =
Z
g(W(n+1)l |x0(n+1) x0(n+1) β , τ)dπ(β , τ|Y, X)
Z Z
=
g(W(n+1)l |x0(n+1) β , τ)dπ(τ|H)dπ(H, β |Y, X). (1.30)
Considering H ∼ P(α, H0 ), the internal integral of equation (1.30) can be approximated by, Z
g(W(n+1)l |x0(n+1) β , τ)dπ(τ|H) ≈
N
∑ qk g(W(n+1)l |x0(n+1) β , Zk ).
(1.31)
k=1
The approximated predictive distribution f (W(n+1)l |Y, X) follows, 1 B N (b) (b) ∑ ∑ qk g(W(n+1)l |x0(n+1) β (b) , Zk ), b b=1 k=1
(1.32)
such that (q(b) , Z(b) , β (b) ) is the bth element of the sample generated by Gibbs sampler. From equations (1.30) and (1.31), the preditive distribution for Y(n+1)l is given by, P(Y(n+1)l = 1|Y, X) ≈
1 B N (b) (b) ∑ ∑ qk G(x0(n+1) β (b) |µ = 0, Zk ). b b=1 k=1
(1.33)
1.4.1 Conditional Predictive Ordinate i Yil which The ith conditional predictive ordinate (CPO) is constructed on Ti = ∑nl=1 0 follows Binomial(ni , pi ) distributed with pi = F(xi β ). Let T[i] the vector of variables T1 , . . . , TL excluding Yi , then,
CPOi = P(Ti = ti |T[i] ) =
1 h i , 1 (b) −1 ∑ P(Ti = ti |pi ) b b=1 B
(1.34)
with (b)
pi
N
=
(b)
∑ qk k=1
(b)
G(x0(n+1) β (b) |µ = 0, Zk ).
(1.35)
10
M.A.Diniz, C.A.B.Pereira, and A.Polpo
Then, the sum of the logged CPOs (SLCPO) can be an estimator for the logarithm of the marginal likelihood of the model.
1.5 Beetle Data A study of beetle mortality after five hours of exposure to gaseous carbon disulphide is reported in [6]. It is a classical data set which was originally fitted through a Normal model. The Bayesian parametric (BP) approach was considered as well as the Bayesian semi-parametric (BSP) approach for fixed α = 1 and random α ∼ Gamma(4, 2) which corresponds that the degree of faith about F0 is minimum or the expected number of cluster from equation (1.8) for τ is 6.75 and 11.51 for fixed and random α respectively. In this way, the Normal and Logistic models were considered to fit the data. The β prior distribution is defined as Normal(0, 1/1000) and the number of components for the approximation of Dirichlet process is N = 100 such that the quality of approximation is verified through the equation 1.11 in the both cases. The MCMC convergence was evaluated for β considering the [7] statistic with a thin corresponding the autocorrelation function smaller than 0.2. From this strategy, the MCMC conditions follows at Table 1.1. The estimates for β and its 95% credibility intervals for the models are presented in Table 1.2. It is possible to see that the Bayesian semi-parametric approach requires more computational effort than the Bayesian parametric approach which is expected since the former is a more general modeling. Moreover, the Logistic models demand greater burn-in and thin than Normal models. The estimates are very similar for each class of models independent of the approach, but the credibility intervals are larger for Bayesian semi-parametric models. There is no significant difference between the semi-parametric models with fixed and random α parameter. Finally, the Bayesian semi-parametric models present smaller SLCPO in both classes of models. In particular, the Logistic models takes more advantages from semi-parametric approach as can be seen in Figures 1.5, 1.5 and 1.5. Table 1.1 Beetle data - MCMC conditions for the Bayesian models Model burn in Normal BP 2400 Normal BSP fixed α 5125 Normal BSP random α 3000 Logistic BP 4420 Logistic BSP fixed α 15542 Logistic BSP random α 13829
thin 11 21 22 20 81 82
1 Bayesian semi-parametric symmetric models for binary data
11
Table 1.2 Beetle data - Estimates for the Bayesian Logistic models Model Normal BP β0 β1 Normal BSP fixed α β0 β1 Normal BSP random α α β0 β1 Logistic BP β0 β1 Logistic BSP fixed α β0 β1 Logistic BSP random α α β0 β1
Estimate
95% CI
SLCPO -19.126
-8.683 (-10.246 ; -7.252) 0.146 (0.122 ; 0.171) -18.482 -8.699 (-10.669 ; -6.977) 0.146 (0.118 ; 0.180) -18.558 0.541 (0.135 ; 1.268) -8.753 (-10.764 ; -7.078) 0.147 (0.119 ; 0.181) -22.846 - 12.028 (-14.481 ; -9.933) 0.202 (0.166 ; 0.240) -18.618 -12.236 (-16.526 ; -8.607) 0.206 (0.145 ; 0.278) -18.410 0.577 (0.146 ; 1.402) -12.335 (-16.869 ; -8.665) 0.208 (0.146 ; 0.284)
Fig. 1.1 Normal and Logistic Bayesian parametric models
12
M.A.Diniz, C.A.B.Pereira, and A.Polpo
Fig. 1.2 Normal and Logistic Bayesian semi-parametric with fixed α models
Fig. 1.3 Normal and Logistic Bayesian semi-parametric with random α models
1.6 Concluding Remarks This works presents a Bayesian semi-parametric model for binary data more interesting than [4] because of the use of Blocked Gibbs Sampler which provides a more general framework than the previous works based in the parametric approach or using the Polya Urn Gibbs Sampler. The semi-parametric approach allows to incorporate the uncertainty around the F distribution of latent data and modeling heavy-tailed curves. The Logistic Bayesian semi-parametric model allows one to elicit prior distribution for regression coef-
1 Bayesian semi-parametric symmetric models for binary data
13
ficients through odds ratios information without losing the flexibility of modeling heavy-tailed or light-tailed distributions. Future work is to expand the modeling to encompass asymmetric distributions.
References 1. Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association 88(422), 669–679 (1993) 2. Andrews, D.F., Mallows, C.L.: Scale mixtures of normal distributions. Journal of the Royal Statistical Society. Series B (Methodological) pp. 99–102 (1974) 3. Antoniak, C.E.: Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics pp. 1152–1174 (1974) 4. Basu, S., Mukhopadhyay, S.: Binary response regression with normal scale mixture links. In: D.K. Dey, S.K. Ghosh, B.K. Mallick (eds.) Generalized linear models: A Bayesian perspective, pp. 231–241. Marcel Dekker, New York (1998) 5. Blackwell, D., MacQueen, J.B.: Ferguson distributions via p´olya urn schemes. The Annals of Statistics pp. 353–355 (1973) 6. Bliss, C.I.: The calculation of the dosage-mortality curve. Annals of Applied Biology 22(1), 134–167 (1935) 7. Brooks, S.P., Gelman, A.: General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 7(4), 434 – 455 (1998) 8. Chen, M.H., Dey, D.K.: Bayesian modeling of correlated binary responses via scale mixture of multivariate normal link functions. Sankhy¯a: The Indian Journal of Statistics, Series A pp. 322–343 (1998) 9. Connor, R.J., Mosimann, J.E.: Concepts of independence for proportions with a generalization of the Dirichlet distribution. Journal of the American Statistical Association 64(325), 194–206 (1969) 10. Cox, D.R.: The Analysis of Binary Data. Chapman and Hall, London (1970) 11. Devroye, L.: Non-uniform Random Variate Generation. Springer-Verlag (1986) 12. Escobar, M.D.: Estimating normal means with a Dirichlet process prior. Journal of the American Statistical Association 89(425), 268–277 (1994) 13. Escobar, M.D., West, M.: Computing nonparametric hierarchical models. In: D.K. Dey, P. M¨uller, D. Sinha (eds.) Practical Nonparametric and Semiparametric Bayesian Statistics, pp. 1–22. Springer, New York (1998) 14. Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. The Annals of Statistics pp. 209–230 (1973) 15. Ishwaran, H., Zarepour, M.: Markov chain Monte Carlo in approximate Dirichlet and Beta two-parameter process hierarchical models. Biometrika 87(2), 371–390 (2000) 16. MacEachern, S.N.: Estimating normal means with a conjugate style Dirichlet process prior. Communications in Statistics-Simulation and Computation 23(3), 727–741 (1994) 17. MacEachern, S.N., M¨uller, P.: Estimating mixture of Dirichlet process models. Journal of Computational and Graphical Statistics 7(2), 223–238 (1998) 18. Robert, C.P.: Simulation of truncated normal variables. Statistics and computing 5(2), 121– 125 (1995) 19. Sethuraman, J.: A constructive defintion of dirichelt priors. Statistica Sinica 4, 639–650 (1994) 20. Wong, T.T.: Generalized Dirichlet distribution in Bayesian analysis. Applied Mathematics and Computation 97(2), 165–181 (1998)