Bayesian nonparametric inference for unimodal skew ... - Springer Link

Stat Papers (2012) 53:821–832 DOI 10.1007/s00362-011-0385-2 REGULAR ARTICLE

Bayesian nonparametric inference for unimodal skew-symmetric distributions A. Ghalamfarsa Mostofi · M. Kharrati-Kopaei

Received: 30 June 2010 / Revised: 25 May 2011 / Published online: 16 June 2011 © Springer-Verlag 2011

Abstract This paper studies the case where the observations come from a unimodal and skew density function with an unknown mode. The skew-symmetric representation of such a density has a symmetric component which can be written as a scale mixture of uniform densities. A Dirichlet process (DP) prior is assigned to mixing distribution. We also assume prior distributions for the mode and the skewed component. A computational approach is used to obtain the Bayes estimate of the components. An example is given to illustrate the approach. Keywords Dirichlet process · Gibbs sampler · Skew-symmetric · Skewness · Unimodal density Mathematics Subject Classification (2000)

62G99 · 62F15

1 Introduction The usual assumption in the classic statistical analysis is that the distribution of data is symmetric. Sometimes this condition is a desirable property and sometimes it is a necessary condition. However, in practice we frequently encounter a set of data with an asymmetric distribution. The problem of modeling skewed data has been of interest in recent years. It is desirable to have a flexible class of distributions which covers departures from symmetry. For example, the family of skew normal (SN) distributions allows for continuous variation from normality to non-normality. This family was

A. Ghalamfarsa Mostofi (B) · M. Kharrati-Kopaei Department of Statistics, College of Sciences, Shiraz University, 71454 Shiraz, Iran e-mail: [email protected] M. Kharrati-Kopaei e-mail: [email protected]

123

822

A. Ghalamfarsa Mostofi, M. Kharrati-Kopaei

formally introduced by Azzalini (1985). In details, a random variable Z λ is said to have a SN distribution with skewness parameter λ if it has the density function φ(z; λ) = 2φ(z)(λz), where φ(.) and (.) denote the standard normal density and distribution function, respectively. This family of distributions and its extensions have been studied by many authors, including Azzalini (1986), Chiogna (1998), Gupta and Gupta (2004), Gupta et al. (2004), Henze (1986), and Pewsey (2000). Several authors introduced multivariate versions of the SN density, see e.g., Azzalini (2005), Azzalini and Capitanio (1999), Azzalini and Dalla Valle (1996), Gupta et al. (2004), and Liseo and Loperfido (2003). There are also other skew-symmetric distributions which can be generated by a symmetric kernel. The construction of such models is based on the following density function f (y) = 2 f 0 (y)H (w(y)),

(1)

where f 0 (.) is a probability density function symmetric about zero, H (.) is a cumulative distribution function of a continuous random variable that is symmetric about zero, and w(.) is an odd function (i.e. w(−y) = −w(y)). Since w(y) ≡ 0 gives f = f 0 , the set of densities which can be obtained from a given ‘basis’ f 0 (.) includes this symmetric density. Actually, the density function in (1) can be regarded as a generalization of the SN distribution. Note that the class of skew-symmetric distributions in (1) is a rich family and includes many skew-symmetric distributions such as SN, skew Laplace and skew Cauchy distributions; for more details see Nadarajah and Kotz (2003) and references therein. Genton and Loperfido (2005) introduced the density function f (y) = 2 f 0 (y)π(y),

(2)

where π(.) is a skewing function which satisfies 0 ≤ π(y) ≤ 1 and π(−y) = 1−π(y). This density is called a skew-symmetric density with symmetric component f 0 (.) and skewed component π(.). Wang et al. (2004) proved that any probability density function f (.) has a unique skew-symmetric representation of the form (2) and they also showed that the class of skew-symmetric distributions described by (1) is the same as that described by (2). In recent years, many authors have considered the class of distributions in (1). They usually assumed a specific form for f 0 (.) and H (w(.)) and then parameterized f (.) by some special parameters such as location, scale and skewness parameters. Estimating the parameters and fitting the proposed density function to a data set are the usual inferences which are made about this class of skew-symmetric distributions. For example, Azzalini (2005) discussed the problem of estimating the parameters of a SN distribution; see also Fruhwirth-Schnatter and Pyne (2010) and Arellano-Valle et al. (2007).

123

Bayesian nonparametric inference for unimodal skew-symmetric distributions

823

However, as far as we know, there is no approach to non-parametric inference for f 0 (.) and H (.). In this paper, one of our interests is estimating these two components. For this purpose, we suppose that the random variable X has the following unimodal density function with a mode at η f X (x; η, λ) = 2 f 0 (x − η)H (w(x − η)).

(3)

For simplicity, we consider the odd function w(y) = λy where λ is an unknown parameter. The object is to estimate H (.), f 0 (.), λ and η from the data set x1 , . . . , xn . To do this, at first f 0 (.) and H (.) are presented as a mixture of known distributions. Then we take a Bayesian approach by assuming that the mixing distributions have Dirichlet process (DP) priors and the other parameters have prior distributions (e.g. non-informative distributions). The organization of the paper is as follows. In Sect. 2, we propose a Bayesian semiparametric approach to estimate H (.), f 0 (.), η and λ. We also present an algorithm based on the Gibbs sampling method to obtain realizations from the posterior distributions. These realizations are used to obtain Bayes estimates of the parameters of interest and components. Section 3 gives a numerical example to illustrate how the approach is applied. 2 The method In this section, we propose a Bayesian semiparametric model which is based on the skew-symmetric density function (1) with function w(y) = λy. In this model, the symmetric component f 0 (.) and the skewed component H (.) are assumed to be mixtures of known distributions. We also assume DP priors for these mixing distributions. Posterior inference about the parameters (H (.), f 0 (.), λ and η) is discussed in Sect. 2.2. Since the DP has an important role in our Bayesian approach, we first briefly present the definition of this prior. We say that a random distribution G(.) is a DP with parameter c, if for any measurable partition (B1 , . . . , Bk ) (G(B1 ), . . . , G(Bk )) ∼ Dirichlet(cG 0 (B1 ), . . . , cG 0 (Bk )), where G 0 (.) is a distribution function. The parameter c > 0 may be interpreted as controlling the variability of G(.) about G 0 (.), see Walker et al. (1999). The distribution function G 0 (.) reflects a prior belief about G(.) in the sense that E(G(t)) = G 0 (t). 2.1 The Bayesian model Suppose that the random variable X has the unimodal density function (3) with a mode at η. Also let f Y (.) denote the density function of Y = X − η, and the density function in (1) be the skew-symmetric representation of this density. Since f Y (.) is unimodal about zero, it can be shown that so is f 0 (.) (for proof see Appendix A). The symmetry and unimodality of f 0 (.) about zero help us to represent it as a scale mixture of uniform densities centered at zero and therefore the density f 0 (.) corresponds

123

824


to a unique mixing distribution G 1 (.), say. That is the symmetric unimodal density function f 0 (y) can be written as: +∞ f 0 (y) =

1 I|y| 0 are precision parameters (Ferguson 1973). Also let π1 (.) and π2 (.) denote the prior density functions of η and λ, respectively. It is also assumed that η, λ, G 1 and G 2 are mutually independent. 2.2 Posterior inference Let D = (x1 , . . . , xn ) be an observed random sample from (4). We know that the Bayes estimate of a parameter is its posterior mean when the squared error loss function is used. Hence, if we can generate a sample from the posterior distribution, then the sample mean provides a good estimate of the posterior mean. Therefore we focus on how some realizations from the posterior distributions can be obtained. In what follows, Gelfand and Smith’s (1990) bracket notation is used to write the distribution

123


825

of random variables. For example, the prior and posterior distributions of η given the data D, are denoted by [η] and [η|D], respectively. The Gibbs sampler algorithm can be used to approximate the posterior mean of each parameter. For this purpose, we need to know how to sample from the full conditional posterior distributions. Note that the posterior distribution of η given λ, f 0 , H and D is given by [η|λ, f 0 , H, D] ∝

n

f 0 (xi − η)H (λ(xi − η)) [η],

i=1

and hence given λ, f 0 , H and D, simply a sample point η can be obtained. Also a sample λ given η, f 0 , H and D can be generated from [λ|η, f 0 , H, D] ∝

n

H (λ(xi − η)) [λ].

i=1

Now, we discuss the problem of generating H from [H |η, λ, f 0 , D]. According to the presentation of H (.) as the mixture of known distributions, the density function of X given η, λ and f 0 can be written as follows +∞ f X (x; η, λ, f 0 , G 2 ) = 2 f 0 (x − η)K (λ(x − η); θ )G 2 (dθ ).

(5)

−∞

Following Neal (2000) and Gelfand and Mukhopadhyay (1995), we introduce a latent θi for each xi for i = 1, . . . , n, to replace (5) by a hierarchical model with xi conditionally independent given the θi ’s, xi |θ1 , . . . , θn ∼ 2 f 0 (x − η)K (λ(x − η); θi ), and i.i.d.

θ1 , . . . , θn ∼ G 2 (.). To generate H from the corresponding posterior distribution, we use the approach proposed by Gelfand and Mukhopadhyay (1995). In this approach, at first a sample is generated from [θ1 , . . . , θn |η, λ, f 0 , D]. To this aim, the following Gibbs sampler algorithm can be used: (0)

(0)

(1) Specify the initial values θ2 , . . . , θn . ( j) (2) For j = 1 to j = k, generate θi from

( j)

( j)

( j−1)

( j−1)

θi |θ1 , . . . , θi−1 , θi+1 , . . . , θn

, η, λ, f 0 , D ,

123

826

A. Ghalamfarsa Mostofi, M. Kharrati-Kopaei ( j−1)

for i = 1, . . . , n. (Note that for i = 1 the given part starts with θ2 and ends ( j) with θn−1 for i = n.) (k) (k) (3) θ1 , . . . , θn can be regarded as an observation from [θ1 , . . . , θn |η, λ, f 0 , D]. Note that G 2 |θ1 , . . . , θn ∈ D P(c2 + n, G ∗02 ), with updated base measure

G ∗02 (.) ∝ c2 G 02 (.) +

n

δθi (.) .

i=1

So a draw of H (.) should really first generate from the posterior DP, and then convolute with K (.; θ ). The large total mass parameter c2 + n implies little variability in G 2 |θ1 , . . . , θn . For large n, we can ignore the first term, c2 G 02 (.), in the base measure G ∗02 (.) of the posterior DP. Hence, a realization from H (.) can be approximately generated by H (.) =

n 1 (k) K .; θi . n i=1

To use the above algorithm, we need [θi |θ j ; j = i, η, λ, f 0 , D]. In this regard, notice that the density function of θi given θ j ; j = i, η, λ, f 0 and D at a point such as θ is proportional to qi,0 G 2 (θ |η, λ, f 0 ) +

n

qi, j δθ j (θ ),

j=1 j =i

for i = 1, . . . , n; where dG 2 (θ |η, λ, f 0 ) ∝ 2 f 0 (xi − η)K (λ(xi − η); θ )dG 02 (dθ ), qi,0 ∝ c2

2 f 0 (xi − η)K (λ(xi − η); θ )G 02 (dθ ),

qi, j ∝ 2 f 0 (xi − η)K (λ(xi − η); θ j ), and δa (.) is a degenerate distribution at a (see Gelfand and Mukhopadhyay (1995)). Note that [θi |θ j ; j = i, η, λ, f 0 , D] can be viewed as the mixture of two distributions: the first one is related to the continuous distribution function G 2 (.|η, λ, f 0 ) and the second one is related to a discrete distribution with support {θ j ; j = i}. Thus, an

123


827

observation from this conditional distribution can be obtained by running a random experiment such as a Bernoulli trial with probability of success pi = c2 /(c2 + Mi ), where Mi =

n

K λ(xi − η); θ j for i = 1, . . . , n.

j =i

If the Bernoulli trial results in a success, then an observation is generated from the continuous part and otherwise it is generated from the discrete part. We note that the density function of X, given η, λ and H, can be written as +∞ f X (x; η, λ, H, G 1 ) = −∞

1 H (λ(x − η)) I|x−η| 0 (the Gamma distribution), G 02 (θ ) = 1 − exp(−θ ); θ > 0 (the Exponential distribution), K (x, θ ) = ( √x ); x ∈ R, θ > 0 (the Normal distribution), θ and c1 = c2 = 200. For these values of ci , most of the values of c1 /(c1 + Mi ) and c2 /(c2 + Mi ) are near one and hence our results tend to have the same properties as G 0i (.). For example, theoretically, we know that if G 01 (.) is the Gamma distribution with the shape and scale parameters 2 and 1, then the prior mean of f 0∗ (.) is the Laplace

123

829

3 0

1

2

Density

4

5

6


0.90

0.95

1.00

1.05

1.10

1.15

1.20

1.25

Transformed Data

2 0

1

Density

3

4

Fig. 1 Histogram of the transformed data

-2

-1

0

1

2

x Fig. 2 The Bayes estimate of the symmetric component f 0∗ (.)

density. Therefore, we expect that the posterior mean of f 0∗ (.) is similar to the Laplace density. By these priors, we can obtain the Bayes estimate of each parameter. We used the second Gibbs algorithm with m = 520, to draw observations from the posterior distributions. (We discarded the first 20 observations to guarantee that the initial values did not affect on the results.) To generate f 0∗ and H0∗ from the corresponding posterior distributions, we used the first algorithm with k = 50. We obtained the Bayes estimate of f 0∗ (x) and it is shown in Fig. 2. Actually, for a fixed

123


3 0

1

2

Density

4

5

6

830

0.90

0.95

1.00

1.05

1.10

1.15

1.20

1.25

Transformed Data Fig. 3 The Bayes estimate of the skew-symmetric density

Difference

-0.10

-0.05

0.00

0.05 0.00 -0.05 -0.15

Difference

(b)

-0.10

(a)

-2

-1

0

1

2

0.9

x

Difference -2

-1

0

x

1.1

1.2

1

2

-0.04 -0.02 0.00 0.02 0.04 0.06

Difference

(d)

-0.02 -0.01 0.00 0.01 0.02 0.03

(c)

1.0

Transformed Data

0.9

1.0

1.1

1.2

Transformed Data

Fig. 4 a The difference between the estimates of f 0∗ (x) when (m, k) are (500, 50) and (500,40), b the difference between the estimates of f ∗ (x; η∗ , λ∗ ) when (m, k) are (500, 50) and (500,40), c the difference between the estimates of f 0∗ (x) when (m, k) are (500, 50) and (400,50), and d the difference between the estimates of f ∗ (x; η∗ , λ∗ ) when (m, k) are (500, 50) and (400,50)

123


831

value of x ∈ [−2, 2], the average was taken over 500 realizations to obtain the Bayes estimate of f 0∗ (x). As we see, it resembles the Laplace density function (as we expect). Consequently, the Bayes estimates of η∗ and λ∗ are 0.97998 and 19.4768, respectively; or equivalently the Bayes estimates of the original parameters are ηˆ = 97.998 and λˆ = 0.19768. Note that the skewing parameter λ of our model is different from that of the Balakrishnan SN distribution, obtained by Sharafi and Behboodian (2008). However, our mode estimate is close to their estimate. Finally, the Bayes estimate of f ∗ (x; η∗ , λ∗ ) is presented in Fig. 3. It is clear that fˆ∗ (x; ηˆ ∗ , λˆ ∗ ) fits very well to the histogram of the transformed data. To evaluate that the lengths of Gibbs sequences, (m, k) = (500, 50), are sufficient to obtain an accurate Bayes estimate, we obtained the Bayes estimates of f ∗ (x; η∗ , λ∗ ) and f 0∗ (x) again with (m, k) = (500, 40) and (m, k) = (400, 50). When (m, k) are (500, 50) and (500, 40), the difference between the estimates of f 0∗ (x) is shown in Fig. 4a. Also when (m, k) are (500, 50) and (400, 50), the difference between the estimates of f 0∗ (x) is shown in Fig. 4c. Similarly, the differences between the estimates of f ∗ (x; η∗ , λ∗ ) are shown in Fig. 4b, d. These figures show that the lengths m = 500 and k = 50 are approximately sufficient. Acknowledgements comments.

We would like to thank the referee and the editor for their valuable and constructive

Appendix A We want to show that the unimodality of f Y (y) = 2 f 0 (y)H (λy) about zero results in the unimodality of f 0 (y) at zero. From unimodality of f Y (y), we have f Y (y) ≤ f Y (0) and f Y (−y) ≤ f Y (0) for all y. Therefore for all y we can conclude that f Y (y) + f Y (−y) ≤ 2 f Y (0). By substituting f Y (y) = 2 f 0 (y)H (λy) in both sides of the inequality and using the symmetry of f 0 (.) and H (.) about zero, the result holds.

References Arellano-Valle RB, Bolfarine H, Lachos VH (2007) Bayesian inference for skew-normal linear mixed models. J Applied Stat 34(6):663–682 Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 2:171–178 Azzalini A (1986) Further results on a class of distributions which includes the normal ones. Statistica 2:199–208 Azzalini A (2005) The skew-normal distribution and related multivariate families (with discussion). Scand J Stat 32:159–188 (C/R 189–200) Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew normal distributions. J R Stat Soc B 61:579–602 Azzalini A, Dalla Valle A (1996) The multivariate skew-normal distribution. Biometrika 83:715–726 Brunner LJ (1995) Bayesian linear regression with error terms that have symmetric unimodal densities. J Nonparametr Stat 4:335–348 Chiogna M (1998) Some results on the scalar skew-normal distribution. J Ital Stat Soc 7:1–13 Feller W (1971) An introduction to probability theory and its applications. Wiley, New York Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230 Fruhwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew normal and skew-t distributions. Biostatistics 11:317–336

123

832


Gelfand AE, Mukhopadhyay S (1995) On nonparametric Bayesian inference for the distribution of a random sample. Can J Stat 23:411–420 Gelfand AE, Smith AFM (1990) Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85:398–409 Genton MG, Loperfido N (2005) Generalized skew-elliptical distributions and their quadratic forms. Ann Inst Stat Math 57:389–401 Gupta RC, Gupta RD (2004) Generalized skew normal model. Test 13:501–524 Gupta AK, Nguyen TT, Sanqui JAT (2004) Characterization of the skew-normal distribution. Ann Inst Stat Math 56(2):351–360 Henze N (1986) A probabilistic representation of the ‘skew-normal’ distribution. Scand J Stat 13:271–275 Liseo B, Loperfido N (2003) A Bayesian interpretation of the multivariate skew-normal distribution. Stat Probab Lett 61:395–401 Nadarajah S, Kotz S (2003) Skewed distributions generated by the normal kernel. Stat Probab Lett 65: 269–277 Neal RM (2000) Markow chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 9:249–265 Pewsey A (2000) Problems of inference for Azzalini’s skew-normal distribution. J Appl Stat 27:659–770 Roberts HV (1988) Data analysis for managers with Minitab. Scientific Press, Redwood City, CA Sharafi M, Behboodian J (2008) The Balakrishnan skew-normal density. Stat Pap 49:769–778 Walker SG, Damien P, Laud PW, Smith AFM (1999) Bayesian nonparametric inference for random distributions and related functions. J R Stat Soc B 61:485–527 Wang T, Boyer J, Genton MG (2004) A skew-symmetric representation of multivariate distributions. Stat Sin 14:1259–1270

123

Bayesian nonparametric inference for unimodal skew ... - Springer Link

Bayesian nonparametric inference for unimodal skew ... - Springer Link

Suggest Documents

Bayesian Nonparametric Weighted Sampling Inference

Bayesian nonparametric inference for discovery probabilities: credible ...

Bayesian Nonparametric Inference for the Power Likelihood

Bayesian Nonparametric Inference for Nonhomogeneous ... - CiteSeerX

Bayesian Nonparametric Approaches to Abnormality ... - Springer Link

Likelihood-based inference for multivariate skew scale ... - Springer Link

Scalable Nonparametric Bayesian Inference on Point

Bayesian Nonparametric Inference of Switching ...

Bayesian Inference in Nonparametric Dynamic ... - Semantic Scholar

Bayesian nonparametric inference on stochastic ordering - CiteSeerX

Scaling Nonparametric Bayesian Inference via Subsample-Annealing

PDF Download Fundamentals of Nonparametric Bayesian Inference ...

Bayesian nonparametric predictive inference and ... - Semantic Scholar

Nonparametric Bayesian inference for the spectral density ... - CiteSeerX

Nonparametric Bayesian inference for point processes - Department of ...

Bayesian inference for controlled branching processes ... - Springer Link

Backward Inference in Bayesian Networks for ... - Springer Link

A Bayesian Inference Tool for NHPP-Based Software ... - Springer Link

Bayesian inference and model comparison for ... - Springer Link

Statistical inference based on the nonparametric ... - Springer Link

A note on Bayesian nonparametric regression function ... - Springer Link

Bayesian inference on the scalar skew-normal ... - Semantic Scholar

Bayesian inference for CoVaR

Skew derivations and - Springer Link