AStA Adv Stat Anal (2016) 100:421–441 DOI 10.1007/s10182-016-0266-z ORIGINAL PAPER
Likelihood-based inference for multivariate skew scale mixtures of normal distributions Clécio S. Ferreira1 · Víctor H. Lachos2 · Heleno Bolfarine3
Received: 29 March 2015 / Accepted: 5 January 2016 / Published online: 19 January 2016 © Springer-Verlag Berlin Heidelberg 2016
Abstract Scale mixtures of normal distributions are often used as a challenging class for statistical analysis of symmetrical data. Recently, Ferreira et al. (Stat Methodol 8:154–171, 2011) defined the univariate skew scale mixtures of normal distributions that offer much needed flexibility by combining both skewness with heavy tails. In this paper, we develop a multivariate version of the skew scale mixtures of normal distributions, with emphasis on the multivariate skew-Student-t, skew-slash and skewcontaminated normal distributions. The main virtue of the members of this family of distributions is that they are easy to simulate from and they also supply genuine expectation/conditional maximisation either algorithms for maximum likelihood estimation. The observed information matrix is derived analytically to account for standard errors. Results obtained from real and simulated datasets are reported to illustrate the usefulness of the proposed method. Keywords EM algorithm · ECME algorithm · Multivariate scale mixtures of normal distributions · Skew distributions
B
Víctor H. Lachos
[email protected] Clécio S. Ferreira
[email protected] Heleno Bolfarine
[email protected]
1
Department of Statistics, Federal University of Juiz de Fora, Juiz de Fora, Minas Gerais, Brazil
2
Departamento de Estatística, Universidade Estadual de Campinas, Cidade Universitaria “Zeferino Vaz”, Campinas, São Paulo, Brazil
3
Departamento de Estatística, Universidade de São Paulo, São Paulo, Brazil
123
422
C. S. Ferreira et al.
1 Introduction Scale mixtures of normal distributions (Andrews and Mallows 1974) compose a group of thick-tailed distributions that are often used for robust inference of symmetrical data. Moreover, this class includes distributions such as the Student-t, slash and contaminated normal, among others. However, the theory and application (through simulation or experimentation) often generate a large number of datasets that are skewed and have heavy tails, such as datasets of family income (Azzalini et al. 2003) or substance concentration (Bolfarine and Lachos 2007). Thus, appropriate distributions to fit these skewed and heavy-tailed data are needed. The skew-normal (SN) distribution is a new class of density functions depending on an additional shape parameter, and includes the normal density as a special case. Azzalini (1985) proposed the univariate SN distribution and it was recently generalized to the multivariate case by Azzalini and Dalla-Valle (1996) and Arellano-Valle et al. (2005). The multivariate SN density extends the multivariate normal model by allowing a shape parameter to account for skewness. The probability density function (pdf) of the generic element of a multivariate skew-normal distribution is given by f (y|μ, Σ, λ) = 2φ p (y|μ, Σ)Φ1 (λ Σ −1/2 (y − μ)), y ∈ R p ,
(1)
where φ p (.|μ, Σ) stands for the pdf of the p-variate normal distribution with mean vector μ and covariance matrix Σ, Φ1 (.) represents the cumulative distribution function (cdf) of the standard normal distribution, and Σ −1/2 satisfies Σ −1/2 Σ −1/2 = Σ −1 . When λ = 0, the skew-normal distribution reduces to the normal distribution (Y ∼ N p (μ, Σ)). A p-dimensional random vector Y with pdf as in (1) will be denoted by SN p (μ, Σ, λ). Its stochastic representation, which can be used to derive several of its properties, is given by Y = μ + Σ 1/2 [δ|T0 | + (I p − δδ )1/2 T1 ], with δ = λ/(1 + λ λ)1/2 , d
(2)
where |T0 | denotes the absolute value of T0 , T0 ∼ N1 (0, 1) and T1 ∼ N p (0, I p ) are d
independent, I p denotes the identity matrix of order p and “=” means “distributed as”. A conditional representation of Y can be obtained by Y|T = t ∼ N p (μ + Σ 1/2 δt, Σ 1/2 (I p − δδ )1/2 Σ 1/2 ), T ∼ T N (0, 1; (0, +∞)),
(3)
where T N (μ, σ 2 ; (a, b)) represents the univariate truncated normal distribution of N (μ, σ 2 ) lying within interval (a, b) (Johnson et al. 1994). From (2), it follows that the expectation and covariance of Y are given, respectively, by E[Y] = μ +
2/π Σ 1/2 δ and Cov[Y] = Σ −
2 1/2 1/2 Σ δδ Σ . π
(4)
Reasoning as in Azzalini and Dalla-Valle (1996), it is natural to construct multivariate distributions that combine skewness with heavy tails. Notice that the main idea
123
Likelihood-based inference for multivariate skew scale…
423
behind the construction in (1) involves a density function defined as the product of the normal density with its cdf. Following Lin et al. (2013), a more general set-up can be considered if we multiply a scale mixtures of normal (SMN) density by the cdf of the normal distribution (as the skewing function). The approach leads to a family of asymmetric elliptical distributions that will be called skew scale mixtures of skew-normal (SSMN) distributions. For this new class of SSMN distributions, we study some of its probabilistic and inferential properties and discuss applications to real data. One interesting and simplifying aspect of the family defined is that the implementation of the expectation/conditional maximisation either (ECME) algorithm are facilitated by the fact that the E-step is exactly as in the scale mixtures of the normal distribution class of models proposed in Andrews and Mallows (1974) (see also Osorio et al. 2007). Moreover, the M-step involves closed form expressions facilitating the implementation of the EM-type algorithm. The multivariate SSMN class proposed here is fundamentally different from the scale mixtures of skew-normal distributions (SMSN) developed by Lachos et al. (2010) because we start our construction from the SMN densities and not from the stochastic representation of a skew-normal random variable as presented in Branco and Dey (2001) and Lachos et al. (2010) The rest of the article is organized as follows. In Sect. 2, the SSMN distributions is defined by extending the elliptical class of SMN distributions. Properties like moments and a stochastic representation of the proposed distributions are also discussed. Moreover, some examples of SSMN distributions are presented. In Sect. 3, we discuss how to compute maximum likelihood (ML) estimates via the ECME algorithm, which presents advantages over the direct maximization approach, especially in terms of robustness with respect to starting values. The observed information matrix is derived analytically. Section 4 reports applications to simulated and real data sets, indicating the usefulness of the proposed methodology. Finally, Sect. 5 concludes with some discussions, citing avenues for future research.
2 A skew version of multivariate scale mixtures of normal distributions Andrews and Mallows (1974) used the Laplace transform technique to prove that a standardized continuous random variable Y has a scale mixture of normal (SMN) distribution. This symmetric family of distributions has attracted much attention in the last few years, mainly because it includes distributions such as the Student-t, slash, power exponential and contaminated normal distributions. All these distributions have heavier tails than the normal one. We say that a p-dimensional vector Y has an SMN distribution (Lange and Sinsheimer 1993), with location parameter μ ∈ R p , a positive definite scale matrix Σ and an hyperparameter τ , if its density function assumes the form
∞
φ p (y|μ, u −1 Σ)dH (u; τ ) 0 ∞ 1 1 p/2 −1 u(y − μ) u exp − Σ (y − μ) = (2π ) p/2 |Σ|1/2 0 2 p ×dH (u; τ ), y ∈ R ,
f 0 (y) =
(5)
123
424
C. S. Ferreira et al.
where H (u; τ ) is a cdf of a one-dimensional positive random variable U indexed by the parameter vector τ . For a random vector with a pdf as in (5), we shall use the notation Y ∼ SMN p (μ, Σ; H ). Now, when μ = 0 and Σ = I p , we use the notation Y ∼ SMN p (H ). We will extend the SN distributions to a wider class, incorporating distributions with heavier tails, by replacing the component φ p (·) in (1) with the robust SMN class of distributions. Definition 1 A p-dimensional random vector Y follows a multivariate skew scale mixtures of normal distribution with location parameter μ ∈ R p , a positive definite scale matrix Σ and skewness parameter λ ∈ R p , if its pdf is given by f (y) = 2 f 0 (y)Φ1 (λ Σ −1/2 (y − μ)), y ∈ R p ,
(6)
where f 0 (·) is as in (5). For a random vector with a pdf as in (6), we shall use the notation Y ∼ SSMN p (μ, Σ, λ; H ). If p = 1, then we have the univariate SSMN distribution developed in Ferreira et al. (2011). Clearly, when λ = 0, we get the corresponding SMN distribution. A random vector Y, with Y ∼ SSMN p (μ, Σ, λ; H ), has the following stochastic representation: d
Y=μ+Σ
1/2
λ|T0 | −1/2 + (U I p + λλ ) T1 , [U (U + λ λ)]1/2
(7)
where U ∼ H (.; τ ), T0 ∼ N (0, 1) and T1 ∼ N p (0, I p ) are mutually independent. Let Υ = U −1/2 [U + λ λ]1/2 |T0 |. From (7), we have that Σ 1/2 λυ 1/2 −1 1/2 , Y|Υ = υ, U = u ∼ N p μ + , Σ (uI + λλ ) Σ p u + λ λ u + λ λ ; (0, +∞) , Υ |U = u ∼ T N 0, u U ∼ H (.; τ ).
(8)
From (8), the joint pdf of Y, Υ and U is given by f (y, υ, u) =
2(1− p)/2 u p/2 u 1 2 v (υ − λ h(u; τ ), exp − v − v) 2 2 π (1+ p)/2 |Σ|1/2
(9)
where v = Σ −1/2 (y − μ) and h(·; τ ) is the corresponding pdf of U ∼ H (.; τ ). Integrating out υ in (9), we get f (y, u) =
123
u
2(2− p)/2 u p/2 v exp − v Φ1 (λ v)h(u; τ ), π p/2 |Σ|1/2 2
(10)
Likelihood-based inference for multivariate skew scale…
425
Finally, integrating out u, from (10) we have the marginal pdf of Y as follows: f (y) = 2 0
+∞
Σ φ p y|μ, h(u; τ )duΦ1 (λ v) u
= 2 f 0 (y)Φ1 (λ Σ −1/2 (y − μ)).
(11)
Proposition 1 (An invariance result) If Y ∼ SSMN p (μ, Σ, λ; H ), then the conditional distribution of U |Y = y does not depend on λ. The proof entails dividing (10) by (11). The conditional distributions U |Y = y, for each element of the SSMN class, are given in Sect. 2.1. For an SSMN random vector, a convenient hierarchical representation is given next which can be used to quickly simulate realizations of Y, for the implementation of the EM-type algorithm and also to study many of its properties. Proposition 2 Let Y ∼ SSMN p (μ, Σ, λ; H ). Then its hierarchical representation is given by 1 1 Y|U = u ∼ SN p μ, Σ, 1/2 λ , u u U ∼ H (.; τ ).
(12)
Proof Conditioned on U , it follows that f Y (y|U = u) = 2φ p (y|μ, u −1 Σ)Φ1 (λ Σ −1/2 (y − μ)) = 2φ p (y|μ, u −1 Σ)Φ1 (u −1/2 λ [(u −1 Σ)−1/2 ](y − μ)). Thus, clearly from (1), Y|U = u ∼ SN p (μ, u −1 Σ, u −1/2 λ).
From (12), to generate a SSMN random variable, we proceed in two steps, that is, we generate first from the distribution of U and next from the conditional distribution Y|U using, for instance, the stochastic representation given in (2). Proposition 3 Let Y ∼ SSMN p (μ, Σ, λ; H ). Then the moment generating function (MY (s)) is given by s Σs +∞ μ + s 2u Φ1 2e MY (s) = E es Y = 0 λ Σ 1/2 × s d H (u; τ ), s ∈ R p . [u(u + λ λ)]1/2 √ Proof From Proposition 2, we have that Y|U = u ∼ SN p (μ, Σ/u, λ/ u). Moreover, from well-known properties of conditional expectations, it follows that
123
426
C. S. Ferreira et al.
MY (s) = EU (E(es Y |U )) and the proof concludes with the fact that U is a positive random variable with cdf H and since, if Z ∼ SN p (μ, Σ, λ) then MZ (s) = s Σs 2es μ+ 2 Φ1 δ Σ 1/2 s . Proposition 4 Let Y ∼ SSMN p (μ, Σ, λ; H ). Then, the Mahalanobis distance Dλ = (Y − μ) Σ −1 (Y − μ), has the same distribution as D = (X−μ) Σ −1 (X−μ), where X ∼ S M N p (μ, Σ; H ). Proof The pdf of Y ∼ SSMN p (μ, Σ, λ; H ) in (6) has a skew-symmetric construction (see Proposition 1 in Wang et al. 2004). Then, from Proposition 6 of the same article, the distribution of ω(Y − μ), where ω is an even function, does not depend on the skewing function Φ1 (λ). The distributions of the quadratic forms can be found in Lange and Sinsheimer (1993). Some particular cases of the SSMN distributions are discussed next. 2.1 Examples of SSMN distributions • The skew Student-t normal (StN) distribution with ν > 0 degrees of freedom, denoted by StN p (μ, Σ, λ; ν). The use of the Student-t distribution as an alternative to the normal distribution has been frequently suggested in the literature. For instance, Little (1988) and Lange et al. (1989) recommend using the Student-t distribution for robust modeling. Considering U ∼ Gamma(ν/2, ν/2), the pdf of Y takes the form f (y) = 2t p (y|μ, Σ; ν)Φ1 (λ Σ −1/2 (y − μ)), y ∈ R p ,
(13)
−( ν+2 p ) Γ ( ν+ p ) where t p (·|μ, Σ; ν) = Γ ( ν )(νπ ) 2p/2 |Σ|1/2 1 + dν , d = (y − μ) Σ −1 (y − μ) is 2 the density function of a p-dimensional Student-t variate with ν degrees of freedom. The univariate skew Student-t normal (StN) distribution was developed by Gómez et al. (2007). In this paper, the authors showed that the St N distribution can present a much wider asymmetry range than the one presented by the ordinary skew-normal distribution Azzalini (1985). Lin et al. (2013) used the multivariate StN distribution in the context of finite mixture models, including the implementation of an interesting EM-type algorithm for ML estimation. When ν ↑ ∞, we get the skew-normal distribution as the limiting case. The quadratic form D/ p ∼ F p,ν . Finally, U |Y = y ∼ Γ ((ν + p)/2 + k) Gamma((ν + p)/2, (ν + d)/2), so E[U k |Y = y] = . Γ ((ν + p)/2)((ν + d)/2)k • The skew-slash (SSL) distribution, with shape parameter ν > 0, denoted by SSL p (μ, Σ, λ; ν).
123
Likelihood-based inference for multivariate skew scale…
427
The distribution of U is Beta(ν, 1), 0 < u < 1 and ν > 0. Its pdf is given by
1
f (y) = 2ν
u ν−1 φ p (y|μ, u −1 Σ)duΦ1 (λ Σ −1/2 (y − μ)), y ∈ R p .
(14)
0
The skew-slash distribution reduces to the skew-normal distribution when ν ↑ ∞. The Mahalanobis distance has distribution function Pr (D ≤ r ) = Pr (χ p2 ≤ r ) −
2ν Γ (ν + p/2) 2 Pr (χ2ν+ p ≤ r ). r ν Γ ( p/2)
It is easy to see that U |Y = y ∼ T G(ν+ p/2, d/2, 1), where T G(a, b, t) is the right ba x a−1 exp(−bx)I(0,t) (x), truncated gamma distribution, with pdf f (x|a, b, t) = γ (a,bt) b a−1 −u γ (a, b) = 0 u e du is the incomplete gamma function. So, we have that Γ ((ν + p)/2 + k)P1 ((ν + p)/2 + k, d/2) E[U k |Y = y] = , where Px (a, b) Γ ((ν + p)/2)((ν + d)/2)k P1 ((ν + p)/2, d/2) denotes the cdf of the Gamma(a, b) distribution (with mean a/b) evaluated at x. • The skew-contaminated normal (SCN) SCN p (μ, Σ, λ; ν, γ ), 0 ≤ ν ≤ 1, 0 < γ ≤ 1.
distribution,
denoted
by
Here, U is a discrete random variable taking one of the two states. The pdf of U , given the parameter vector τ = (ν, γ ) , is denoted by h(u; τ ) = νI(u=γ ) + (1 − ν)I(u=1) . It follows that f (y) = 2{νφ p (y|μ, γ −1 Σ) + (1 − ν)φ p (y|μ, Σ)}Φ1 (λ Σ −1/2 (y − μ)), y ∈ R p . The skew-contaminated normal distribution reduces to the skew-normal distribution when γ = 1. The Mahalanobis distance has cdf given by Pr (D ≤ r ) = ν Pr (χ p2 ≤ γ r ) + (1 − ν)Pr (χ p2 ≤ r ). The conditional distribution U |Y = y is also a degenerated function, given by f (u|Y = y) = f01(y) {νφ p (y, μ, γ −1 Σ)I(u=γ ) + (1 − ν)φ p (y, μ, Σ)I(u=1) }, where f 0 (y) = νφ p (y, μ, γ −1 Σ)I(u=γ ) + (1 − ν)φ p (y, μ, Σ)I(u=1) . Therefore, 1 − ν + νγ p/2+k exp{(1 − γ )d/2} . E[U k |Y = y] = 1 − ν + νγ p/2 exp{(1 − γ )d/2} From Proposition 3, it follows that the expectation and covariance of Y ∼ SSMN p (μ, Σ, λ; H ) are given, respectively, by E[Y] = μ +
2 ηΣ 1/2 λ π
and
Cov[Y] = κΣ −
2 2 1/2 1/2 η Σ λλ Σ , π
(15)
where η = E[{U (U + λ λ)}−1/2 ] can be evaluated by numerical integration. The constant κ = E[U −1 ] is equal to ν/(ν − 2) (ν > 2) for StN case, ν/(ν − 1) (ν > 1) for SSL and ν(1 − γ )/γ + 1 for SCN. Note that if U = 1, that is Y ∼ SN p (μ, Σ, λ), we have the same result given in (4).
123
C. S. Ferreira et al.
3
3
4
4
428
2
2
0.02
0.04 0.06
0.04
0.06
0.1
0.1
1
1
0.14 0.18
0.14 0.18
0
0
0.22
0.2
0.2
0.16
0.16
0.12
−1
−1
0.12
0.08
−3
−3
−2
−2
0.08
−3
−2
−1
0
1
2
3
4
−3
−2
−1
0
1
2
3
4
2
3
4
4
(b)
4
(a)
3
3
0.02
0.04 0.02
0.06
2
2
0.04 0.06
0.1
0.08
0.14
0.18
1
1
0.12
0.2
0.16 0.24
0
0
0.18
0.22
0.16
−1
−1
0.14
0.1
0.12
−2 −3
−3
−2
0.08
−3
−2
−1
0
1
2
3
(c)
4
−3
−2
−1
0
1
(d)
Fig. 1 Contour plot of some elements of the standard bivariate SSMN family. a SN2 (λ), b StN2 (λ; 2), c SCN2 (λ; 0.5, 0.5) and d SSL2 (λ; 1), where λ = (2, 1)
Figure 1a, b provide contour plots of the SSMN bivariate densities. Figure 2 displays the contour plot of a SSMN bivariate density for two levels (c = 0.01 and c = 0.1). Note that SN contour are inside the StN, SSL and SCN ones for c = 0.01, while a opposite situations occurs for c = 0.1.
2.2 Maximum likelihood estimation The EM algorithm originally proposed by Dempster et al. (1977) has several appealing features, such as stability of monotone convergence with each iteration increasing the likelihood and simplicity of implementation. However, ML estimation in SSMN models is complicated and the EM algorithm is less advisable due to the computational difficulty in the M-step. To cope with this problem, we apply an extension of the EM algorithm, called the ECME algorithm (Liu and Rubin 1994), that shares the appealing features of the EM and has a typically faster convergence rate than the
123
429
−0.5
−2
−1
0.0
0
0.5
1
1.0
2
1.5
3
4
2.0
Likelihood-based inference for multivariate skew scale…
−3
SN StN SSL SCN
−1.0
−3
SN StN SSL SCN
−2
−1
0
1
2
3
−1.0
4
−0.5
0.0
(a)
0.5
1.0
1.5
2.0
(b)
Fig. 2 A comparison of density contours of the SSMN distributions by plotting f (y|0, I2 , (4, 4); H ) = c, with ν = 2 for StN and SSL models and (ν, γ ) = (0.5, 0.5) for SCN model. a c = 0.005 and b c = 0.1
EM. The ECME algorithm is obtained by maximizing the constrained Q-function (the expected complete data function) with some conditional maximization (CM) steps that maximizes the corresponding constrained actual marginal likelihood function, called the CML steps. In the following, we demonstrate how to employ the ECME algorithms for ML estimation of SSMN models. Given Y ∼ SSMN p (μ, Σ, λ; H ), we have that joint distribution of (Y, U, T ) (see Appendix 2) is given by f (y, u, t) = 2φ p (y|μ, Σ/u) h(u; τ )φ1 (t|λ Σ −1/2 (y − μ), 1).
(16)
Let y = (y1 , y2 , . . . , yn ) , u = (u 1 , . . . , u n ) and t = (t1 , . . . , tn ) . Treating u and t as hypothetical missing data, it follows from (16) that the complete log-likelihood function associated with yc = (y , u , t ) is given by n ui n − (yi − μ) Σ −1 (yi − μ) c (θ |yc ) ∝ − log |Σ| + 2 2 i=1
1 + ti λ Σ −1/2 (yi − μ) − (λ Σ −1/2 (yi − μ))2 + log h(u i ; τ ) . 2
(17)
To facilitate the estimation process, consider the reparameterization Δ = Σ −1/2 λ. (k) (k) (k) , = ( μ (k) , Σ λ , τ (k) ), Given the current estimate of θ in the kth iteration, θ the (k + 1)th E-step finds the conditional expectation of the complete log-likelihood function with respect to the conditional distribution of (U, T) given Y and the current (k) estimated parameter θ , Q(θ | θ
(k)
) = E[c (θ |yc )|y, θ = θ
(k)
]=
n i=1
Q 1i (θ | θ
(k)
)+
n
Q 2i (θ| θ
(k)
), (18)
i=1
123
430
C. S. Ferreira et al.
with Q 2i (θ | θ
(k)
) = E[log h(Ui ; τ )|yi , θ = θ
(k)
] and
−1 1 (k) (k) (k) | − 1 (k) (yi − u i (yi − θ ) = − log |Σ μ (k) ) Σ μ (k) ) Q 1i (θ | 2 2 1 (k) (k) (yi − + ti(k) Δ μ (k) ) − [Δ (yi − μ (k) )]2 . 2
k+1 (k) The (k + 1)th M-step then finds θ to maximize Q(θ| θ ). From (25), it follows (k) (k) ti = E[Ti |yi , θ = θ ] that T |Y = y ∼ T N (λ Σ −1/2 (y − μ), 1; (0, +∞)). So, can be expressed by
(k) (k) (k) λ Σ ti =
−1/2
(k) (k) (yi − μ (k) ) + WΦ ( λ Σ
−1/2
(yi − μ (k) )),
(19)
where WΦ (u) = φ1 (u)/Φ1 (u). (k) (k) The expressions u i = E[Ui |yi , θ = θ ] can be readily evaluated for the StN, SSL and SCN distributions, as follows: ⎧ ⎪ ⎪ 1, ⎪ ν+p ⎪ ⎪ ⎪ , ⎪ ⎪ ⎨ ν + di u i = (2ν + p) P1 ( p/2 + ν + 1, di /2) , ⎪ ⎪ di P1 ( p/2 + ν, di /2) ⎪ ⎪ ⎪ p/2+1 ⎪ 1 − ν + νγ exp {(1 − γ )di /2} ⎪ ⎪ , ⎩ 1 − ν + νγ p/2 exp {(1 − γ )di /2}
for the SN distribution, for the StN distribution, for the SSL distribution,
(20)
for the SCN distribution,
where di = (yi − μ) Σ −1 (yi − μ) is the Mahalanobis distance. (k) E-step Given θ = θ , compute ti (k) and u i(k) , for i = 1, ..., n, using (19) and (20). (k) The CM-step then conditionally maximizes Q(θ| θ ) with respect to θ , obtaining (k+1) , as described next: a new estimate θ (k+1)
λ CM-step Update μ(k+1) ,
(k+1)
μ
=
n
and Σ
(k) −1 (k) ui Σ
i=1 n
(k+1)
as
(k) (k)
+ nΔ
Δ
(k)
(k) −1 (k) (k) yi ), (k) (k) Δ ti + Δ ( ui Σ yi − Δ
×
i=1
(k+1)
Δ
=
n i=1
123
−1
(yi − μ
−1 (k)
)(yi − μ
(k)
)
n i=1
(k) μ (k) ), ti (yi −
Likelihood-based inference for multivariate skew scale…
431
n (k) (k+1) = 1 u i (yi − μ (k) )(yi − μ (k) ) , Σ n i=1
(k+1) λ
1/2 (k+1) (k+1)
=Σ
Δ
.
(21)
(k+1) and update τ (k) by optimizing the constrained logCML-step Fix μ(k+1) and Σ likelihood function, i.e.,
τ (k+1) = argmaxτ
n
log f 0 (yi | μ(k+1) , Σ
(k+1)
, τ ),
(22)
i=1
where f 0 (y) is the respective symmetric pdf as defined in (5). Further, this step requires one-dimensional search of StN, SSL models and a bi-dimensional search of the SCN model, which can be easily accomplished by using, for example, the “optim” or “optimize” routines in R (R Core Team 2015) or “fmincon” in Matlab . The iterations of the above algorithms are repeated until the difference between two successive (k+1) (k) ) − ( θ )|, is sufficiently small, say 10−5 , where log-likelihood values, |( θ n (θ ) = i=1 log f (yi ), f (yi ) as defined in (11). The initial values used in the ECME algorithm are the vector of sample mean for μ, the sample covariance matrix for Σ and the vector of sample skewness for λ (see, for instance, Cabral et al. 2012).
2.3 The observed information matrix in multivariate SSMN distributions Suppose that we have observations on n independent individuals, Y1 , . . . , Yn , where Yi ∼ SSMN p (μ, Σ(α), λ; H ), for i = 1, ..., n. Then, the log-likelihood function for θ = (μ , α , λ ) , with α = vech(Σ 1/2 ), given the observed sample y = n p (y1 , . . . , yn ) is of the form (θ ) = i=1 i (θ ), where i (θ ) = log 2 − log2π − 2 1 log |Σ| + log K i + log Φ1 (Ai ), with 2 K i = K i (θ) = 0
∞
u u p/2 exp − di h(u; τ )du, 2
di = (yi − μ) Σ −1 (yi − μ) and Ai = Ai (θ ) = λ Σ −1/2 (yi − μ). The score vector n ∂(θ) ∂i (θ ) = , where is given by ∂θ ∂θ i=1
∂i (θ ) 1 ∂(log |Σ|) 1 ∂ Ki ∂ Ai =− + + WΦ (Ai ) , ∂θ 2 ∂θ K i ∂θ ∂θ
123
432
C. S. Ferreira et al.
with WΦ (x) = φ1 (x)/Φ1 (x). The second derivatives of i (θ) with respect to θ take the form ∂ 2 i (θ ) ∂θ∂θ
1 ∂ 2 (log |Σ|) 1 ∂ 2 Ki 1 ∂ Ki ∂ Ki ∂ 2 Ai + − 2 + WΦ (Ai ) 2 ∂θ∂θ K i ∂θ∂θ K i ∂θ ∂θ ∂θ ∂θ ∂ Ai ∂ Ai +WΦ (Ai ) , ∂θ ∂θ
=−
with WΦ (x) = −WΦ (x)(x + WΦ (x)). The first and second derivatives of K i (θ) with respect to θ are given by 1 ∂ K i (θ ) ∂di = − K i (θ) E[U |Y = yi ], ∂θ 2 ∂θ ∂ 2 K i (θ) K i (θ) ∂ 2 di 1 ∂di ∂di 2 = − E[U |Y = y ] − E[U |Y = y ] . i i 2 2 ∂θ ∂θ ∂θ∂θ ∂θ∂θ Simplifying the above expressions, we have ∂ 2 i (θ ) ∂θ∂θ
1 ∂ 2 (log |Σ|) 1 ∂ 2 di − E[U |Y = yi ] 2 ∂θ∂θ 2 ∂θ∂θ 1 ∂di ∂di + [E[U 2 |Y = yi ] − E[U |Y = yi ]2 ] 4 ∂θ ∂θ ∂ 2 Ai ∂ Ai ∂ Ai + WΦ (Ai ) . +WΦ (Ai ) ∂θ ∂θ ∂θ ∂θ
=−
The derivatives of Σ, di and Ai are given in Appendix A. The expected values E[U k |Y = y] can be calculated by using the expressions given in Sect. 2.1. A simple way of obtaining the standard errors of ML estimates of SSMN parameters is to approximate the asymptotic covariance matrix of θ by the inverse of the observed n ∂ 2 i (θ) denote the observed information information matrix. Let J(θ |y) = − i=1 ∂θ∂θ matrix for the marginal log-likelihood (θ), then by consistency of the ML estimates, θ has approximately an N p (θ , J−1 ) distribution. In practice, J is unknown and has to be replaced by J, that is, the matrix J evaluated at ML estimates θˆ .
3 Simulation experiments 3.1 Simulation study 1: parameter recovery To examine the performance of our proposed method, we conducted a simulation study. The goal was to investigate the functionality of the proposed EM algorithm and to check the standard errors provided by the observed Fisher information matrix. We generated 1000 samples of sizes n = 300, 500 and 1000 and the true values of the parameters were set at: μ = (0.5, −1), Σ = I2 and λ = (2, 2), ν = 5 for the St N and SS L and (ν, γ ) = (0.3, 0.8) for the SC N . From Tables 1 and 2
123
Likelihood-based inference for multivariate skew scale…
433
Table 1 Mean and standard deviations (SD) for EM estimates and empirical standard error estimates (SEemp ) based on 1000 samples from the StN2 model Parameter True value n = 300 Mean μ1 μ2
0.5 −1
n = 500 MC SD IMSE Mean
0.505 0.123 −0.999 0.122
α1
1
1.002 0.089
α2
0
−0.004 0.046
α3
1
λ1 λ2
0.112
n = 1000 MC SD IMSE Mean
0.500 0.088
0.501 0.062
0.061
0.112 −1.001 0.089
0.086 −1.000 0.062
0.061
0.076
0.059
0.042
1.001 0.067
0.086
MC SD IMSE
1.000 0.048
0.043 −0.002 0.034
0.033 −0.001 0.024
0.023
1.004 0.092
0.076
1.003 0.069
0.059
1.000 0.048
0.042
2
2.150 0.686
0.549
2.085 0.446
0.406
2.044 0.325
0.281
2
2.157 0.674
0.552
2.093 0.465
0.407
2.040 0.313
0.280
Table 2 Mean and standard deviations (SD) for EM estimates and empirical standard error estimates (SEemp ) based on 1000 samples from the SSL2 model Parameter True value n = 300 Mean μ1 μ2
0.5 −1
n = 500 MC SD IMSE Mean
0.526 0.161 −0.977 0.170
α1
1
1.009 0.104
α2
0
−0.014 0.053
α3
1
λ1 λ2
0.202
n = 1000 MC SD IMSE Mean
0.508 0.068
0.499 0.071
0.069
0.203 −1.003 0.067
0.069 −1.000 0.074
0.069
0.077
0.041
0.041
1.009 0.062
0.069
MC SD IMSE
1.014 0.063
0.041 −0.001 0.023
0.021 −0.001 0.022
0.021
1.009 0.106
0.075
1.018 0.064
0.042
1.013 0.066
0.041
2
2.054 0.691
0.602
2.045 0.323
0.271
2.073 0.321
0.276
2
2.052 0.715
0.599
2.064 0.311
0.272
2.065 0.333
0.275
we can say that bias related to all parameters tends to zero when the sample size increases, indicating that the ML estimates based on the proposed EM-type algorithm present good large sample properties. These tables also provide the average values of the approximate standard errors of the EM estimates obtained through the information-based method described in Sect. 2.3 (IM SE) and the Monte Carlo standard deviation (MC SD) for the parameters. As expected, the results summarized in these tables suggest that the approximation produced by the information method is reliable. 3.2 Simulation study 2: flexibility Here, the experiment is planned to show the flexibility of our proposed SSMN class. Our strategy is to generate artificial data from a linear regression model where the errors follow a distribution that is a finite mixture totally different in nature from the class of SSMN distributions studied here. Specifically, we consider the normal inverse Gaussian distribution (NIG), which is a scale mixture of a normal distribution and an inverse Gaussian distribution that produces asymmetry and heavy tails (Cabral et al.
123
434
C. S. Ferreira et al.
2014). We say that a random variable U has an inverse Gaussian (IG) distribution when its density is given by
δ ρ2 −3/2 2 f (u|ρ, δ) = √ u exp − (u − δ/ρ) , u > 0, 2u 2π where ρ > 0 and δ > 0. We use the notation U ∼ IG(ρ, δ). Definition 2 We say that the random vector X has a p-dimensional NIG distribution if it admits the representation X|U = u ∼ N p (μ + uΔλ, uΔ), U ∼ I G(ρ, δ), where μ and λ are p-dimensional vectors of parameters, Δ is a p × p positive definite matrix, ρ and δ are positive numbers. We use the notation X ∼ NIG(μ, Δ, λ; ρ, δ). Aiming at evaluating the performance of the SSMN distribution for multivariate data with heavy tails, we also use the multivariate skew-t (ST) distribution proposed by Azzalini and Capitanio (2003). We generated a sample of size n = 300 from X ∼ NIG2 ((−3, 1), I2 , (−4, 3); 1, 0.7) and the results, according to the four aforementioned distributions (Normal, SN, StN, SSL SCN) are presented in Table 3. Note that the StN distribution presented the best fit, according to the AIC and BIC measures. The ST distribution was also fitted, which provided a log-likelihood value ( θ) = −1133.039, leading to AI C = 2282.079 and B I C = 2311.709. In fact, these values are higher than those obtained for the StN and SCN from the SSMN class. A second simulation study was also conducted by generating 1000 Monte Carlo samples from X ∼ NIG2 ((−3, 1), I2 , (−4, 3); 1, 0.7) , with size n = 300. Under these parameter values, measures of sample skewness and kurtosis ranged from (−9, −1.5) to (1.4, 9.7) and (5.2, 105) to (4.8, 118.9) in each coordinate, respectively. For each generated sample, we fitted the StN, SSL and SCN distributions from the SSMN class and also the model ST of Azzalini and Capitanio (2003). Table 4 presents the results of this simulation experiment, where we show the percents of best fit for each model using the AIC and BIC. Note that the AIC and BIC pick the StN model in more than 90 % of the Monte Carlo samples, indicating the flexibility of the StN distribution from our proposed SSMN class.
4 Application This example concerns the Australian Institute of Sport (AIS) dataset which includes 11 physical and hematological attributes measured in 202 athletes and a binary indicator variable for their gender (100 females and 102 males). The data were originally collected by Cook and Weisberg (1994) and have previously been analyzed by various authors including Azzalini and Dalla-Valle (1996) and Azzalini (1985). Now, we revisit this dataset with the aim of expanding the inferential results to the SMSN family. Specifically, we focus on the SN, StN, SSL and the SCN distributions. The
123
Likelihood-based inference for multivariate skew scale…
435
Table 3 Simulation study SN
StN
SSL
SCN
MLE
SE
MLE
SE
MLE
SE
MLE
SE
μ1
−5.255
2.516
−3.004
0.075
−3.178
0.097
−3.303
0.089
μ2
2.266
2.305
0.994
0.068
1.074
0.092
1.157
0.082
α1
2.535
0.101
1.591
0.113
3.060
0.148
2.010
0.094 0.068
α2
−0.922
0.064
−0.577
0.0.060
−1.921
0.107
−1.262
α3
1.628
0.062
0.892
0.062
2.058
0.099
1.425
0.066
λ1
0.007
1.119
−4.906
0.1.162
−6.723
1.114
−3.625
0.573
λ2
0.001
1.699
2.782
0.0.674
5.071
0.907
2.787
0.478
ν
–
2.100
2.100
0.438
γ
–
–
–
0.100
( θ)
−1208.245
−957.137
−1148.703
−1123.452
AIC
2430.491
1930.274
2313.406
2264.905
BIC
2456.418
1959.904
2343.036
2298.239
ML estimates (MLE) for the SSMN models fitted with samples from X ∼ NIG2 ((−3, 1), I2 , (−4, 3); 1, 0.7), n = 300. The SE values are the average of the estimated asymptotic standard errors Table 4 Percentages of best performance for each SSMN distribution based on 1000 samples generated from X ∼ NIG2 ((−3, 1), I2 , (−4, 3); 1, 0.7) ST
StN
SSL
SCN
AIC
0
93.9
0
6.1
BIC
0
97.3
0
2.7
dataset is available in the R package “sn” (R Core Team 2015). For illustration, we consider a subset of variables: Y1 body mass index (BMI), Y2 percentage of body fat (Bfat) and Y3 lean body mass (LBM). Table 5 presents the ML estimates of the parameters from the SN, StN, SSL and SCN models, along with their corresponding standard errors (SE) calculated via the observed information matrix (see Sect. 2.3). We also compare the SN, ST and the SSL models by inspecting some information selection criteria. Comparing the models by looking at the values of the information criteria presented in Table 5, we observe that the SCN presents the best fit, followed closely by StN and SSL models. The fit of SN is the worst, indicating a lack of adequacy of SN assumptions for this dataset. The Q-Q plots and envelopes shown in Fig. 3 are based on the distribution of the Mahalanobis distance given in Proposition 4, which is the same as the SMN class (see Lange and Sinsheimer 1993). The lines in these figures represent the 5th percentile, the mean and the 95th percentile of 100 simulated points for each observation. These figures clearly show once again that the SSMN distribution with heavy tails provides a better-fit than the SN model to the AIS dataset. Figure 4 presents the contour plots for the skew-contaminated normal model fitted to AIS data. We can see from this figure
123
436
C. S. Ferreira et al.
Table 5 AIS data SN
StN
SSL
SCN
MLE
SE
MLE
SE
MLE
SE
MLE
SE
μ1
22.45
0.35
22.39
0.32
22.35
0.31
22.35
0.30
μ2
5.82
0.16
5.81
0.16
5.78
0.17
5.82
0.13
μ3
71.66
1.44
71.49
1.38
71.32
1.37
71.22
1.33
B1,1
2.03
0.10
1.84
0.09
1.58
0.08
1.76
0.09
B2,1
1.19
0.20
1.06
0.19
0.93
0.16
1.03
0.18
B3,1
1.69
0.16
1.53
0.16
1.32
0.14
1.46
0.15
B2,2
9.11
0.48
8.68
0.49
7.64
0.42
8.68
0.46
B3,2
−3.58
0.56
−3.58
0.54
−3.14
0.47
−3.58
0.52
B3,3
14.15
0.90
13.51
0.90
11.75
0.78
13.26
0.86
λ1
2.74
1.23
2.45
1.08
2.15
0.93
3.60
1.60
λ2
21.01
6.84
20.04
6.49
17.66
5.68
30.54
12.13
3.12
−9.16
3.08
−8.04
2.71
−13.797
λ3
−9.22
ν
–
16.59
2.97
0.07
γ
–
–
–
0.27
( θ)
−1768.03
−1765.11
−1764.28
−1761.54
AIC
3560.06
3556.22
3555.56
3551.08
BIC
3599.76
3599.23
3598.57
3597.40
5.58
ML estimation results (MLE) for the SSMN models. SE are the estimated standard errors
that the fitted SCN density has reasonable ability to capture the asymmetry present in the data.
5 Conclusions In this work, we defined a new family of asymmetric models by extending the symmetric scale mixtures of normal family. Our proposal extends recent results found in Ferreira et al. (2011) to a multivariate context. In addition, we developed a very general method based on the EM algorithm for estimating the parameters of the skew scale mixtures of normal distributions. Closed-form expressions were derived for the iterative estimation processes. This was a consequence of the fact that the proposed distributions possesses a stochastic representation that can be used to represent them conditionally. This stochastic representation also allows us to study many of its properties easily. We believe that the approaches proposed here can also be used to study other asymmetric multivariate models, like those proposed by Lachos et al. (2009, 2010). These models proposed in the latter article have a stochastic representation of the form Y = μ + κ 1/2 (U )Z, and they also have proper elements like the skew-t, skewslash, skew-contaminated normal, skew-logistic, skew-stable and skew-exponential power distributions. The assessment of influence of data and model assumption on the results of the statistical analysis is a key aspect of any new class of distributions.
123
0
2
4
6
8
10
12
2
4
6
8
10
12
437
0
Sample values and simulated envelope
20 15 10 5 0
Sample values and simulated envelope
Likelihood-based inference for multivariate skew scale…
14
0
2
4
6
6
8
10
12
Theoretical χ2p quantiles
(c)
14
14
60 50 40 30 20 10
Sample values and simulated envelope 4
12
0
100 80 60 40 20
2
10
(b)
0
Sample values and simulated envelope
(a)
0
8
Theoretical F(p, ν) quantiles
Theoretical χ2p quantiles
0
2
4
6
8
10
12
14
Theoretical χ2p quantiles
(d)
Fig. 3 Simulated envelope for SSMN distributions adjusted to AIS data: a skew-normal, b skew Student-t normal, c skew-slash and d skew-contaminated normal
We are currently exploring the local influence and residual analysis to address this issue. One anonymous referee suggests to generalize the presented results to “multivariate skew scale mixtures of t distributions” by replacing φ p in f 0 in (5) by the Student-t density. However, the theoretical derivation seems to be not so easy, elegant and clear as is the case of our proposal. For instance, different and independent u i for the t-density (t p (, ν) instead φ p ()) and for H (; τ ), makes the estimation quite complicated and we could not find a easy way to estimate the parameters in this scenario. Note also that this generalization includes an additional parameter ν, which needs to be included in the estimation process. For fitting SSMN models, we present feasible ECME algorithms with simple analytical expressions (also for the observed information matrix) according to a three-level hierarchical representation (8) of the model. In addition, numerical results show that the SSMN class of distributions is very flexible and has already been applied successfully in models of practical interest. For instance, Lin et al. (2013)
123
30
lbm
20
Bfat
25
0.001 0.002
0.003
0.004
15
0.005
0.008 0.01
10
0.011
0.012
5
0.009
20
40 50 60 70 80 90 100
C. S. Ferreira et al. 35
438
9 0.00
0.007
0.005
0.003
0.008
0.006
0.004 0.002
0.001
0.007 0.006
25
30
35
20
25
lbm
30
35
bmi
40 50 60 70 80 90 100
bmi
0.0024
0.0026 0.0022
0.0016
0.002
0.0014
0.0018
0.0012
8e−04
0.001 6e−04
4e−04
2e−04
5
10
15
20
25
30
35
Bfat
Fig. 4 Contour plots for the skew-contaminated normal fitted to the AIS data
introduce a new family of mixture models based on the multivariate skew-t-normal (FMStN) distribution, which is a particular case of our proposed SSMN class. They showed that the proposed FMStN distribution works more substantially feasible than the mixture model based on the multivariate skew-t distribution introduced by Sahu et al. (2003) whose fitting procedure relies heavily on a high-dimensional integration which may be computationally intensive. Acknowledgments We thank the editor, associate editor and two referees whose constructive comments led to an improved presentation of the paper. C.S. acknowledges support from FAPEMIG (Minas Gerais State Foundation for Research Development), Grant CEX APQ 01845/14. V.H. acknowledges support from CNPq-Brazil (Grant 305054/2011-2) and FAPESP-Brazil (Grant 2014/02938-9).
Appendix 1: Details of the observed information matrix Considering α = Vech(B), where Σ 1/2 = B = B(α), the first and second derivatives of log |Σ|, Ai and di are obtained. The notation used is that of Sect. 2 and for a pdimensional vector ρ = (ρ1 . . . , ρ p ) , we will use the notation B˙ r = ∂B(α)/∂αr , with r = 1, 2, . . . , p( p + 1)/2. Thus, • Σ ∂ 2 log |Σ| = −2tr(B−1 B˙ s B−1 B˙ k ), ∂αk ∂αs
123
Likelihood-based inference for multivariate skew scale…
439
• Ai ∂ Ai ∂ Ai ∂ Ai = −λ B−1 B˙ k B−1 (yi − μ), = −B−1 λ, = B−1 (yi − μ), ∂μ ∂αk ∂λ ∂ 2 Ai ∂ 2 Ai ∂ 2 Ai −1 ˙ −1 B = 0, = B B λ, = −B−1 , k ∂μ∂μ ∂μ∂αk ∂μ∂λ ∂ 2 Ai = −λ B−1 [B˙ s B−1 B˙ k + B˙ k B−1 B˙ s ]B−1 (yi − μ), ∂αk ∂αs ∂ 2 Ai ∂ 2 Ai = −B−1 B˙ k B−1 (yi − μ), = 0, ∂αk ∂λ ∂λ∂λ • di ∂di = −2B−2 (yi − μ), ∂μ
∂di = −(yi − μ) B−1 [B˙ k B−1 + B−1 B˙ k ]B−1 (yi − μ), ∂αk
∂di ∂ 2 di = 0, = 2B−2 , ∂λ ∂μ∂μ ∂ 2 di = 0, ∂μ∂λ
∂ 2 di = 0, ∂αk ∂λ
∂ 2 di = 2B−1 [B˙ k B−1 + B−1 B˙ k ]B−1 (yi − μ), ∂μ∂αk ∂ 2 di = 0, ∂λ∂λ
∂ 2 di ˙ k B−1 B˙ s B−1 + B˙ k B−2 B˙ s + B˙ s B−2 B˙ k = (yi − μ) B−1 [B˙ s B−1 B˙ k B−1 + B ∂αk ∂αs ˙ k B−1 B ˙ s ]B−1 (yi − μ). +B−1 B˙ s B−1 B˙ k + B−1 B
Appendix 2: Joint, conditional and marginal distributions of (Y, U, T ) Note first that from (7), it follows that t Y|T = t, U = u ∼ N p (μ + u 1/2 Σ 1/2 δ u , u1 Σ 1/2 (I p + λu λu )−1 Σ 1/2 ), U ∼ H (τ ), T ∼ T N (0, 1; (0, +∞)),
with U and T independent, δ u = √
λ , u+λ λ
(23)
√ λu = λ/ u.
Using some results given in Lachos et al. (2010), it follows that the joint distribution of (Y, U, T ) is given by f (y, u, t) = 2φ p (y|μ + At, Σ a ) φ1 (t|0, 1)h(u; τ ) = 2φ p (y|μ, Σ a + AA )φ1 (t|ΛA Σ a−1 (y − μ), Λ)h(u; τ ), y ∈ R p , t > 0, u > 0, −1 1/2 and Λ = (1 + A Σ −1 A)−1 . where A = Σu 1/2δ u , Σ a = u1 Σ 1/2 (I p + λu λ u) Σ a Using the results given in Harville (1997), and after some algebraic manipulations, it follows that Σ a + AA = u1 Σ, Λ = u+λu λ and ΛA Σ a−1 = Λ1/2 λ Σ −1/2 . 1/2
123
440
C. S. Ferreira et al.
Thus, the marginal distribution of Y ∼ SSMN p (μ, Σ, λ; H ) is given by Σ φ1 (t|ΛA Σ a−1 (y − μ), Λ)h(u; τ )dtdu 2 φ p y|μ, u 0 0 +∞ +∞ Σ h(u; τ ) φ p y|μ, φ1 (t|Λ1/2 λ Σ −1/2 (y − μ), Λ)dtdu 2 u 0 0 +∞ +∞ Σ h(u; τ )φ1 (t|λ Σ −1/2 (y − μ), 1)dtdu 2 φ p y|μ, u 0 0 +∞ Σ φ p y|μ, 2 h(u; τ )duΦ1 (λ Σ −1/2 (y − μ)). u 0
f (y) = = = =
+∞ +∞
Then the joint distribution of (Y, T ) is given by f (y, t) = 2 f 0 (y|μ, Σ)φ1 (t|λ Σ −1/2 (y − μ), 1), y ∈ R p , t > 0, and f (t|y) =
φ1 (t|λ Σ −1/2 (y − μ), 1) , Φ1 (λ Σ −1/2 (y − μ))
(24)
(25)
so that, T |Y = y ∼ T N (λ Σ −1/2 (y − μ), 1; (0, +∞)).
References Andrews, D.F., Mallows, C.L.: Scale mixtures of normal distributions. J. R. Stat. Soc. Ser. B 36, 99–102 (1974) Arellano-Valle, R.B., Bolfarine, H., Lachos, V.H.: Skew-normal linear mixed models. J. Data Sci. 3, 415–438 (2005) Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985) Azzalini, A., Capitanio, A.: Distributions generated and perturbation of symmetry with emphasis on the multivariate skew-t distribution. J. R. Stat. Soc. Ser. B 61, 367–389 (2003) Azzalini, A., Dalla-Valle, A.: The multivariate skew-normal distribution. Biometrika 83(4), 715–726 (1996) Azzalini, A., Capello, T.D., Kotz, S.: Log-skew-normal and log-skew-t distributions as models for family income data. J. Income Distrib. 11, 13–21 (2003) Bolfarine, H., Lachos, V.: Skew probit error-in-variables models. Stat. Methodol. 3, 1–12 (2007) Branco, M.D., Dey, D.K.: A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 79, 99–113 (2001) Cabral, C.R.B., Lachos, V.H., Prates, M.O.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56(1), 126–142 (2012) Cabral, C.R.B., Lachos, V.H., Zeller, C.B.: Multivariate measurement error models using finite mixtures of skew-Student t distributions. J. Multivar. Anal. 124, 179–198 (2014) Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, Hoboken (1994) Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39(1), 1–38 (1977) Ferreira, C.S., Bolfarine, H., Lachos, V.H.: Skew scale mixtures of normal distributions: properties and estimation. Stat. Methodol. 8, 154–171 (2011) Gómez, H.W., Venegas, O., Bolfarine, H.: Skew-symmetric distributions generated by the normal distribution function. Environmetrics 18, 395–407 (2007) Harville, D.: Matrix Algebra From a Statistician’s Perspective. Springer, New York (1997) Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, vol. 1. Wiley, New York (1994)
123
Likelihood-based inference for multivariate skew scale…
441
Lachos, V.H., Vilca, L.F., Bolfarine, H., Ghosh, P.: Robust multivariate measurement error models with scale mixtures of skew-normal distributions. Statistics 44(6), 541–556 (2009) Lachos, V.H., Ghosh, P., Arellano-Valle, R.B.: Likelihood based inference for skew-normal independent linear mixed models. Stat. Sin. 20(1), 303 (2010) Lange, K.L., Sinsheimer, J.S.: Normal/independent distributions and their applications in robust regression. J. Comput. Graph. Stat. 2, 175–198 (1993) Lange, K.L., Little, R., Taylor, J.: Robust statistical modeling using t distribution. J. Am. Stat. Assoc. 84, 881–896 (1989) Lin, T.I., Ho, H.J., Lee, C.R.: Flexible mixture modelling using the multivariate skew-t-normal distribution. Stat. Comput. 24, 531–546 (2013) Little, R.J.A.: Robust estimation of the mean and covariance matrix from data with missing values. Appl. Stat. 37, 23–38 (1988) Liu, C., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 80, 267–278 (1994) Osorio, F., Paula, G.A., Galea, M.: Assessment of local influence in elliptical linear models with longitudinal structure. Comput. Stat. Data Anal. 51(9), 4354–4368 (2007) R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2015). http://www.R-project.org/ Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate distributions with applications to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003) Wang, J., Boyer, J., Genton, M.: A skew-symmetric representation of multivariate distributions. Stat. Sin. 14, 1259–1270 (2004)
123