Sankhy¯ a : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 648-670 c 2007, Indian Statistical Institute °
Influence Diagnostics for Skew-Normal Linear Mixed Models Heleno Bolfarine Universidade de S˜ ao Paulo, S˜ ao Paulo, Brazil
Lourdes C. Montenegro Universidade Federal de Minas Gerais, Minas Gerais, Brazil
Victor H. Lachos Universidade Estadual de Campinas, S˜ ao Paulo, Brazil Abstract Normality (symmetry) of the random effects is a routine assumption in linear mixed models but it may, sometimes, be unrealistic, obscuring important features of among-subjects variation. We relax this assumption by assuming that the random effects density is skew-normal, considered as an extension of the univariate version proposed by Sahu, Dey and Branco (CJS, 2003). Following Zhu and Lee (JRSSB, 2001), we implement an EM-type algorithm to parameter estimation and then using the related conditional expectation of the complete-data log-likelihood function, develop diagnostic measures for implementing the local influence approach under four model perturbation schemes. Results obtained from simulated and real data sets are reported illustrating the usefulness of the approach. AMS (2000) subject classification. Primary 62H12, 60E05. Keywords and phrases. Skew-normal distribution, EM-algorithm, skewness, local influence, case deletion.
1
Introduction
Linear mixed models (LMM) are an important statistical tool for practising statisticians. Further, the LMM have become the most commonly used class of models for analysing continuous repeated measures data from a sample of individuals, in agricultural, environmental, biomedical applications and also in economics and social sciences. Repeated-measures data
Skew-normal linear mixed models
649
are typically generated by observing a number of subjects repeatedly under different experimental conditions. Observations on the same subject are usually made at different times, as in longitudinal studies. Mixed-effects models assume that the intra-subjects model relating the response variables to time is the same for all subjects, but the model parameters may vary within subjects. Despite its nice statistical properties, a standard but possibly restrictive assumption in LMM is that the random effects and the residual components follow normal (symmetric) distributions. Hence, considerable interest has been focused on relaxing the normality (symmetry) assumption and jointly estimating the random effects and model parameters. Relaxation of the normality assumption on the random effects can be found in Zhang and Davidian (2001), Verbeke and Lesaffre (1996), Magder and Zeger (1996) and Tao et al. (1999), among others. Ma et al. (2004) consider a generalized flexible skew-elliptical distribution to represent the random effects density and propose complicated algorithms for maximum likelihood (ML) and Bayesian inference by using MCMC methods. Recently, Arellano-Valle et al. (2005) defined a skew-normal linear mixed model by assuming that both the random error and random effects, follow skew-normal distributions within the setting introduced by Azzalini and Dalla-Valle (1996), and presented EM-type algorithms for ML estimation by using the marginal likelihood. As in Ma et al. (2004), in this work, we assume that the random effects follow a skew-normal distribution, which is an extension of the univariate version proposed by Sahu et al. (2003), and present an interesting EM-type algorithm for ML estimation. For special cases and common situations, this algorithm yields closed form expressions for the M-step. We also draw the reader’s attention to the fact that with this definition, the estimator for the asymmetric parameter is always finite and the assessment of local influence is a trivial problem. Studying the sensitivity to departures from basic assumptions is important in statistical analysis. Following the pioneering work of Cook (1986), this area of research has received much attention. See Lessaffre and Verbeke (1998), Zhu and Lee (2001), Lee and Xu (2004) and Galea-Rojas et al. (1997), among others. A study involving semi-parametric longitudinal mixed models is presented in Fung et al. (2002). For skew-normal linear mixed models, however, we are not aware of any work deriving appropriate local influence measures. The main objective of this paper is to develop some methods to obtain such measures. Given the ML estimates, we obtain the diagnostic measures via the approach proposed by Zhu and Lee (2001), which follows as a byproduct of the EM algorithm. They proposed a method
650
H. Bolfarine, L.C. Montenegro and V.H. Lachos
to assess the local influence resulting from a minor perturbation of a statistical model with incomplete data. The key idea of the development is to work with the conditional expectation of the complete-data log-likelihood function in the EM-algorithm (Dempster et al., 1977), leading to closed form local influence measures. The plan of the paper is as follows. In Section 2, for the sake of completeness, we present a multivariate extension of the univariate skew-normal distribution proposed by Sahu et al. (2003). Properties like moments and stochastic representation of the proposed multivariate distribution are also discussed. In Section 3, the skew-normal linear mixed model (SN-LMM, hereafter) is defined by extending the usual normal mixed model. The marginal density of the observed quantities is obtained analytically by integrating out the random effects, leading to the observed (marginal) likelihood function that can be maximized directly by using existing statistical software such as Ox, R or Matlab. Additionally, we present an EM-type algorithm, which has advantages over the direct maximization approach, specially in terms of robustness with respect to starting values. In Section 4, we give a brief sketch of the local influence approach for models with incomplete data and develop the methodology required for SN-LMM. The approach follows as a byproduct of the EM-algorithm. Different perturbation schemes are considered. Section 5 reports applications to simulated and a real data set indicating the usefulness of proposed methodology. Concluding remarks are given in Section 6. 2
A Skew-normal Distribution
In this section, we consider a multivariate extension of the univariate skew-normal distribution proposed by Sahu et al. (2003), offering advantages in the sense of making it possible to propose algorithms to obtain ML estimators since the skewing function is of dimension one. Sahu et al. (2003) proposed a multivariate version but the skewing function there is also multivariate, which complicates EM implementation, for example. We start by giving an important notation that will be used throughout the paper and present a review of the univariate skew-normal distribution of Sahu et al. (2003). Let φn (·|µ, Σ) and Φn (·|µ, Σ) be the probability density function (pdf) and the cumulative distribution function (cdf), respectively, of the Nn (µ, Σ)
Skew-normal linear mixed models
651
distribution. When µ = 0 and Σ = In (n × n identity matrix), we denote these functions as φn (·) and Φn (·). As considered in Sahu et al. (2003), a random variable Y follows a univariate skew-normal distribution with parameters µ, σ 2 and λ (skewness parameter), if the pdf of Y is given by µ ¶ λ (y − µ) 2 2 f (y) = 2φ1 (y|µ, σ + λ )Φ1 . (2.1) σ (σ 2 + λ2 )1/2 We use the notation Y ∼ SN1 (µ, σ 2 , λ) to denote this distribution. Note that if λ = 0 then the density of Y in (2.1) reduces to the density of the normal distribution (N1 (µ, σ 2 )). The stochastic representation, which can be used to study some properties and also to quickly simulate realizations from the SN1 (µ, σ 2 , λ), is given by d
Y = λ|X0 | + X1 ,
(2.2) d
where X0 ∼ N1 (0, 1) and is independent of X1 ∼ N1 (µ, σ 2 ), with “=” meaning “distributed as”. In the following, we propose a multivariate extension of the skew-normal distribution defined in (2.1). Definition 2.1. A n-dimensional random vector Y follows a skewnormal distribution with location vector µ ∈ Rn , dispersion matrix Σ (a n × n positive definite matrix) and skewness vector λ ∈ Rn , if its pdf is given by ! Ã > −1 λ Σ (y − µ) , y ∈ Rn . (2.3) f (y) = 2φn (y|µ, Σ + λλ> )Φ1 (1 + λ> Σ−1 λ)1/2 We denote this by Y ∼ SNn (µ, Σ, λ) and by Y ∼ SNn (λ) when µ = 0 and Σ = In , the n × n identity matrix. Note that the density (2.3) is not in the class of the multivariate skew distributions defined by Sahu et al. (2003) since the skewing function in this expression is of dimension 1. However, it is in the class of the fundamental skew-normal distributions defined by Arellano-Valle and Genton (2005). Hence, many properties of the above skew-normal distribution may be derived from the results developed by Arellano-Valle and Genton (2005). From there, it follows, for example, the stochastic representation given next for a skew-normal random vector.
2 1 0 −1 −2 −3
−3
−2
−1
0
1
2
3
H. Bolfarine, L.C. Montenegro and V.H. Lachos
3
652
−3
−2
−1
0
1
2
3
−3
−2
−1
0
2
3
1
2
3
2 1 0 −1 −2 −3
−3
−2
−1
0
1
2
3
(b)
3
(a)
1
−3
−2
−1
0
1
2
3
−3
−2
−1
(c)
0
(d)
Figure 1. Contours of the bivariate skew-normal distribution for different values of λ, including (a) λ1 = 0 and λ2 = 0, (b) λ1 = −0.7 and λ2 = 0.9, (c) λ1 = 0.9 and λ2 = −0.7 and (d) λ1 = 0.3 and λ2 = 0.8.
Proposition 2.1. Let Y ∼ SNn (µ, Σ, λ). Then d
Y = λ|X0 | + X1 ,
(2.4)
where X0 ∼ N1 (0, 1), independent of X1 ∼ Nn (µ, Σ). Notice that the stochastic representation given in (2.2) for the univariate case is a special case of (2.4). Hence, we have extended the univariate skew-normal distribution given in Sahu et al. (2003) in a nice way for the multivariate case. In Figure 1, we present some contours of the densities
Skew-normal linear mixed models
653
associated with the bivariate skew-normal distribution SN2 (0, Σ, λ), with Σ = I2 − λλ> for different values of λ. Note that these contours are not elliptical and can be strongly asymmetric depending on suitable choices of the parameters. A direct consequence of Proposition 2.1 is given in the following corollary. Corollary 2.1. Let Y ∼ SNn (µ, Σ, λ). Then, E[Y] = µ + (2/π)1/2 λ and V ar[Y] = Σ + (1 − 2/π) λλ> .
From Corollary 2.1, it is easy to see that if V ar[Y] is finite then the elements of the parameter vector λ are also finite, which is not necessarily the case with the skew-normal distribution introduced by Azzalini and DallaValle (1996) (see also Arellano-Valle et al., 2005) where the elements of the asymmetric parameter vector can be infinite. 3
The Skew-normal Linear Mixed Model
In its most generality, a linear mixed effects model is defined as (see Verbeke and Molenberghs, 2000) Yi = Xi β + Zi bi + ²i , i = 1, . . . , n ,
(3.1)
where Yi is a ni × 1 vector of observed continuous responses for sample unit i, Xi with dimension ni × p is the design matrix corresponding to the fixed effects, β with dimension p × 1 is a vector of population-averaged regression coefficients called fixed effects, Zi with dimension ni × q is the design matrix corresponding to the q × 1 random effects vector bi , and ²i , with dimension ni × 1, is the vector of random errors. It is assumed that the iid random effects bi and the residual components ²i are independent with bi ∼ ind Nq (0, D) and ²i ∼ Nni (0, Σi ) (N-LMM hereafter), where D = D(α) and Σi = Σi (γ), i = 1, . . . , n, are dispersion matrices, typically associated with the within and between individuals variabilities, which depend on unknown and reduced parameters γ and α, respectively. Following the same ideas as in Arellano-Valle et al. (2004), we extend the N-LMM defined above by considering the linear model in (3.1) with the assumptions iid ind bi ∼ SNq (0, D, λ) and ²i ∼ Nni (0, Σi ), (3.2)
654
H. Bolfarine, L.C. Montenegro and V.H. Lachos
where bi is independent of ²i , i = 1, . . . , n. The asymmetry parameter λ incorporates asymmetry in the random effects bi and, consequently, in the observed quantities Yi , i = 1, . . . , n. If λ = 0, then the asymmetric model reduces to the usual N-LMM in which inferences are extensively studied in the literature. Note from (2.4) that the regression set up defined in (3.1)-(3.2) can be written hierarchically as Yi |bi
ind
∼
Nni (Xi β + Zi bi , Σi ),
(3.3)
bi |Ti = ti
ind
∼
Nq (λti , D) and
(3.4)
Ti
iid
HN1 (0, 1),
(3.5)
∼
i = 1, . . . , n, all independent, where HN1 (0, 1) denotes the standardized univariate half-normal distribution (see |X0 | in equation (2.2)). Strategies to obtain a correct interpretation of the model parameters are given in ArellanoValle et al. (2005). From their Corollary 2, it follows also that the marginal density of Yi , which is important to make inference on the parameter vectors θ = (β > , γ > , α> , λ> )> , is given by ¡ ¢ ¯ f (yi |θ) = 2φni (yi |Xi β, Σi + Zi (D + λλ> )Z> i )Φ1 λi (yi − Xi β) , i = 1, . . . , n, where ¯i = λ
−1 λ> D−1 Λi Z> i Σi , (1 + λ> D−1 λ + λ> D−1 Λi D−1 λ)1/2
−1 with Λi = {(D + λλ> )−1 + Zi Σ−1 i Zi } .
Hence, the log-likelihood function P for θ given the observed sample y = (y1> , . . . , yn> )> is given by `(θ|y) = ni=1 log (f (yi |θ)). In the following, we describe an iterative procedure to obtain the ML estimate of θ based on the EM-algorithm. 3.1. ML estimation via the EM-algorithm. Let yc = (y> , b> , t> )> with > > > y = (y1> , . . . , yn> )> , b = (b> 1 , . . . , bn ) and t = (t1 , . . . , tn ) . Hence, under the hierarchical representation (3.3)-(3.5), it follows that the P complete loglikelihood function, associated with yc is given by `c (θ|yc ) = ni=1 `i (θ|yc ), with 1 1 `i (θ|yc ) = − log(|Σi |) − (yi − Xi β − Zi bi )> Σ−1 i (yi − Xi β − Zi bi ) 2 2 1 1 − log |D| − (bi − λti )> D−1 (bi − λti ) + c, 2 2
Skew-normal linear mixed models
655
where c is a constant that is independent of the parameter vector θ. Letting c2 i = b i = E[bi |θ = θ, b yi ], Ω b yi ], b b yi ], = b i = Cov[bi |θ = θ, b ti = E[Ti |θ = θ, b yi ] and tb c i = E[Ti bi |θ = θ, b yi ], we obtain from (3.3)–(3.5), E[Ti2 |θ = θ, after some tedious algebraic manipulation, that ! ! Ã Ã µ b µ b T T i i c2 i = µ cT , = cT2 + WΦ cT µ b ti = µ bT i + WΦ1 M b2Ti + M M 1 i i bTi , i cT cT M M i i ³ ´ 2 > c2 c2 i , ci = b bi = b bi = T b2 + b b b = − ( t ) , and tb ri b ti + b si = ti , Ω s s b ri + b si b i i i i bi (3.6) where
· cT2 M i µ bTi b2 T bi b ri b si
=
¸−1 ³ ´−1 > > > b b b b 1 + λ Zi Σi + Zi DZi Zi λ ,
´−1 ³ ´ ³ b b > Z> Σ b > b i + Zi DZ cT2 λ y − X β , = M i i i i i i−1 h b −1 Zi b −1 + Z> Σ , = D i i ³ ´ b and b −1 yi − Xi β b 2 Z> Σ = T i bi i ³ ´ b b 2 Z> Σ b −1 Zi λ, = Iq − T i bi i
i = 1, . . . , n. It follows that the conditional expectation of the complete log-likelihood function has the form b = E[`c (θ|yc )|y, θ] b = Q(θ|θ)
n X
b Qi (θ|θ),
(3.7)
i=1
where
b = Q1i (β, γ|θ) b + Q2i (α, λ|θ), b Qi (θ|θ)
with b = − 1 log |Σi | − 1 (yi − Xi β − Zi b b i )> Σ−1 (yi − Xi β − Zi b bi) Q1i (β, γ|θ) i 2 2 1 b > − tr{Σ−1 (3.8) i Zi Ωi Z } 2 and n ³ ´o c2 i λλ> , b = − 1 log |D|− 1 tr D−1 Ω bib b > −2tb c i λ> += b i +b Q2i (α, λ|θ) i 2 2 (3.9)
656
H. Bolfarine, L.C. Montenegro and V.H. Lachos
b i , Ω, c i are as in (3.6), and tr{A} indicates the trace of the mab tb where b trix A. Therefore, the following steps of the EM-algorithm can be formulated to obtain the ML estimate of θ for the SN-LMM defined above. c2 i , b b compute b bi, Ω c i for i = 1, . . . , n, using b i and tb E-step: Given θ = θ, ti , = (3.6). b by maximizing Q(θ|θ) b over θ, which leads to the folM-step: Update θ lowing constrained maximization (CM) steps (see Meng and Rubin, 1993). b and update β as CM-step 1: Fix γ = γ !−1 n à n ³ ´ X X −1 > bi . b= b −1 yi − Zi b b Xi Xi Σ β X Σ i
i=1
i
i
i=1
CM-step 2: Update λ as Pn c tbi b λ = P i=1 . n c2 i=1 = i
n o b and update γ as γ b γ|θ) b . b = argmaxγ Q1i (β, CM-step 3: Fix β = β n o b and update α as α b θ) b . b = argmaxα Q2i (α, λ| CM-step 4: Fix λ = λ Notice that steps CM-3 and CM-4 require only one-parameter search and can be obtained, for instance, using quasi-Newton methods. Under the special (and common) situation, where D is unstructured and Σi = σe2 Ri , with Ri a known matrix and γ = σe2 , steps CM-3 and CM-4 are reduced, respectively, to the following closed forms. n · ´> ³ ´ 1X ³ c 2 b − Zi b b i R−1 yi − Xi β b − Zi b bi yi − Xi β σe = i n i=1 ³ ³ ´´i b i Z> +tr R−1 Z Ω and i i i b = D
´ 1 X³b c2 i λ b b tb c> + = bbλ b> . bib b > − tb ciλ b> − λ Ωi + b b b i i n n
i=1
When λ = 0, the M-step equations are reduced to the equations obtained by Pinheiro and Bates (2000), for example.
Skew-normal linear mixed models 4
657
Diagnostic Analysis
One important aspect of model building is checking model assumptions. In this section, we use the local influence approach to detect observations that under small perturbation of the model exert great influence on the maximum likelihood estimators. There are basically two approaches for detecting influential observations that seriously influence results of a statistical analysis. The first approach is the case-deletion approach, in which the impact of deleting an observation on the estimators is directly assessed by metrics such as the likelihood distance and Cook’s distance (see Cook, 1977). The second approach is based on the estimation outputs, with respect to the model inputs via various minor model perturbations such as the local influence (see Cook, 1986). Inspired by the basic idea of the EM-algorithm, Zhu and Lee (2001) proposed a unified method for local influence analysis of general statistical models with missing data on the basis of the Q-function defined in (3.7). We briefly review that procedure in the next section. 4.1. The local influence approach. Consider a perturbation vector ω = (w1 , ..., wm )> varying in an open region Ω ∈ Rm . Let `c (θ, ω|Yc ), θ ∈ Rp be the complete-data log-likelihood of the perturbed model. We assume b that there is a ω 0 such that `c (θ, ω 0 |Yc ) = `c (θ|Yhc ) for all θ. Leti θ(ω) b = E `c (θ, ω|Yc )|y, θ b . The denote the maximum of the function Q(θ, ω|θ) influence graph is defined as α(ω) = (ω > , fQ (ω))> , where fQ (ω) is the Q-displacement function defined as follows: h ³ ´ ³ ´i bθ b − Q θ(ω)| b b . fQ (ω) = 2 Q θ| θ Following the approach developed in Cook (1986) and Zhu and Lee (2001), the normal curvature CfQ ,d , of α(ω) at ω 0 in the direction of some unit vector d can be used to summarize the local behaviour of the Q-displacement function. It can be shown that (see, Zhu and Lee, 2001)
where
o−1 n b ¨ ω o d and − Q ¨ ω = ∆> ¨ CfQ ,d = −2d> Q − Q ( θ) ∆ω 0 θ ωo 0 ¯ ¯ 2 b ¯¯ b ¯ ∂ 2 Q(θ, ω|θ) b = ∂ Q(θ|θ) ¯¯ ¨ (θ) . and ∆ = Q ¯ ω θ ∂θ∂ω > ¯ ∂θ∂θ > ¯θ = θ b b θ = θ(ω)
¨ ω is the fundamental equation As in Cook (1986), the expression for −Q 0 ¨ ω (a symmetric for detecting influential observations. A clear picture of −Q 0
658
H. Bolfarine, L.C. Montenegro and V.H. Lachos
matrix), is given by its spectral decomposition ¨ ωo = −2Q
m X
λk ek e0k ,
k=1
where (λ1 , e1 ), . . . , (λm , em ) are the eigenvalue-eigenvector pairs of the ma¨ ω o with λ1 ≥ . . . ≥ λp , λp+1 = . . . = λm = 0, and e1 , . . . , em trix −2Q are elements of the associated orthonormal basis. Lessaffre and Verbeke (1998), Poon and Poon (1999) and Zhu and Lee (2001) proposed to inspect all eigenvectors corresponding to nonzero eigenvalues for more revealing information but it can be computationally intensive for large m. Following Zhu and Lee (2001) and Lu and Song (2006), we consider an aggregated contribution vector of all eigenvectors corresponding to nonzero eigenvalues. ek = λk /(λ1 + . . . + λp ), e2 = (e2 , . . . , e2 ) and Let λ k k1 km M (0) =
p X
ek e2 . =λ k
k=1
Hence, the assessment of influential cases is based on {Mh(0)l , l =i 1, . . . , m}, ¨ ¨ and one can obtain M (0)l via BfQ ,ul = −2u> l Qω 0 ul /tr −2Qω 0 , where ul is a column vector in Rm with the l-th entry equal to one and all other entries zero. Refer to Zhu and Lee (2001) for other theoretical properties of BfQ ,ul , such as invariance under reparametrization of θ. Additionally, Lee and Xu (2004) propose to use 1/m + c∗ SM (0) as a bench-mark to regard the l-th case as influential, where c∗ is a selected constant (depending on the real application) and SM (0) is the standard deviation of {M (0)l , l = 1, . . . , m}. 4.2. Case-deletion measures. Case-deletion diagnosis is a common approach for studying the effect of dropping the i-th case from the data set. We develop diagnostic measures with the whole vector (yi> , x> i ) deleted and denote these by the subindex (i). In the literature, the classical measures are the Cook distance and the likelihood displacement. Based on these ideas, Lee and Xu (2004) propose the analogue to the Cook’s distance and likelib and these measures hood displacement for the Q-function, namely Q(θ|θ), c c are denoted by Di and LDi , respectively, which are given by ´> ³ ´³ ´ b(i) − θ b b b(i) − θ b and ¨ θ|θ) θ −Q( θ h ³ ´i b(i) , = 2 `c (θ) − `c θ
Dic = LDic
³
Skew-normal linear mixed models
659
b(i) is the maximizer of the Q-function Q(i) (θ|θ), b i = 1, . . . , n. In this where θ work we use M (0) as a diagnostic for local influence, and Dic and LDic as diagnostics for global influence. In the following sections, we obtain analytb and the matrix ∆ω under ¨ (θ) ical expressions for the hessian matrix Q 0 θ different perturbation schemes. b ¨ (θ). 4.3. The Hessian matrix Q To obtain the diagnostic measures θ for local influence of a particular perturbation scheme, it is necessary to 2 b b = ∂ Q(θ|θ) , where θ = (β, γ, α, λ)> . It follows from (11) ¨ (θ) compute Q θ ∂θ∂θ > b = Pn Q ¨ (θ) ¨ that the hessian matrix is given by Q i=1 i (θ) with θ µ ¶ b ¨ 1i (β, γ) ∂ 2 Qi (θ|θ) Q 0 Q¨i (θ) = − = ¨ 2i (α, λ) , 0 Q ∂θ∂θ > where the matrices 2 b ¨ 1i (β, γ) = − ∂ Q1i (β, γ|θ) , with τ = (β > , γ > )> and Q ∂τ ∂τ > 2 b ¨ 2i (α, λ) = − ∂ Q2i (α, λ|θ) , with π = (α> , λ> )> , Q ∂π∂π >
have elements given by (see Magnus and Neudecker, 1988) b ∂ 2 Q1i (β, γ|θ) > ∂β∂β b ∂ 2 Q1i (β, γ|θ) ∂β∂γr 2 b ∂ Q1i (β, γ|θ) ∂γr ∂γs
−1 = −X> i Σi Xi ,
³ ´ −1 ˙ −1 b = −X> Σ Σ (r)Σ y − X β − Z b i i i i i , i i i =
´o 1 n −1 ³ ˙ ˙ i (s) − Σ ¨i (r, s) tr Σi Σi (r)Σ−1 Σ i 2 n ³ 1 −1 ˙ ˙ ˙ − tr Mi Σ−1 Σ˙ i (r)Σ−1 i Σi (s) + Σi (s)Σi Σi (r) i 2 ´ o ¨i (r, s) Σ−1 , −Σ i
b ∂ 2 Q2i (α, λ|θ) ∂λ∂αr
³ ´ −1 c c2 λ , ˙ = −D−1 D(r)D tbi − = i
660
H. Bolfarine, L.C. Montenegro and V.H. Lachos b ∂ 2 Q2i (α, λ|θ) ∂αr ∂αs
b ∂ 2 Q2i (α, λ|θ) ∂λ∂λ> where Mi =
=
´o 1 n −1 ³ ˙ ˙ ¨ s) tr D D(r)D−1 D(s) − D(r, 2 ³ 1 n −1 ˙ −1 ˙ ˙ ˙ − tr Ni D−1 D(r)D D(s) + D(s)D D(r) 2 ´ o ¨ s) D−1 −D(r, and
c2 D−1 , = −= i ³ ´³ ´> b i yi − Xi β − Zi b bi b i Z> , yi − Xi β − Zi b + Zi Ω i
c2 i λλ> , bib b > − 2tb c i λ> + = bi + b Ni = Ω i ∂Σi , Σ˙ i (r) = ∂γr 2 ¨i (r, s) = ∂ Σi /∂γr , r, s = 1, . . . , dim(γ), Σ ∂γs ∂D ˙ D(r) = ∂αr ∂2D ¨ s) = , r, s = 1, . . . , dim(α). and D(r, ∂αr ∂αs 4.4. Perturbation schemes. In this section, we consider four different perturbation schemes for the baseline model defined in (3.1)-(3.2). Case weights perturbation. Let ω = (ω1 , . . . , ωn )> a n × 1 dimensional vector with ω 0 = (1, . . . , 1)> . Then the expected value of the perturbed complete-data log-likelihood function (perturbed Q-function) can be written as b = E[`c (θ, ω|yc )] Q(θ, ω|θ) n n n X X X b b = ωi E[`i (θ|yc )] = Q1i (β, γ, ω|θ) + Q2i (α, λ, ω|θ), i=1
i=1
i=1
b = wi Q1i (β, γ|θ) b and Q2i (α, λ, ω|θ) b = wi Q2i (α, λ|θ), b where Q1i (β, γ, ω|θ) b b with Q1i (β, γ|θ) and Q2i (α, λ|θ) as in (3.8) and (3.9), respectively. In this case, the matrix
¯ b ¯¯ ∂ 2 Q(θ, ω|θ) ∆ω 0 = ¯ ∂θ∂ω > ¯
= ω =ω 0
b ∂ 2 Q(θ, ω 0 |θ) > ∂θ∂ω
Skew-normal linear mixed models
661
has elements given by b ∂Q1i (β, γ, ω 0 |θ) ∂β∂ωi b ∂Q1i (β, γ, ω 0 |θ) ∂γr ∂ωi b ∂Q2i (α, λ, ω 0 |θ) ∂αr ∂ωi b ∂Q2i (α, λ, ω 0 |θ) ∂λ∂ωi
³ ´ −1 b = X> Σ y − X β − Z b i i i i , i i o 1 n ˙ i (r) − Σ−1 Σ˙ i (r)Σ−1 Mi , = − tr Σ−1 Σ i i i 2 o 1 n o 1 n ˙ ˙ = − tr D−1 D(r) + tr D−1 Ni D−1 D(r) and 2 2 ³ ´ c2 λ , ci − = = D−1 tb i
where Mi and Ni , i = 1, . . . , n, are as defined in Section 4.3. Perturbation of the matrix D. To study the effects of departures from the assumptions about the matrix D associated with the random effects, we consider the following perturbation: D(ωi ) = ωi−1 D, meaning that the distribution of the random effects is heteroscedastic, that is, bi ∼ SNq (0, D/wi , λ), i = 1, . . . , n. The perturbed Q-function is of the form b = Q(θ, ω|θ)
n X
b + Q1i (β, γ, ω|θ)
i=1
n X
b Q2i (α, λ, ω|θ),
i=1
b = Q1i (β, γ|θ) b and where Q1i (β, γ, ω|θ) b = Q2i (α, λ, ω|θ)
q 1 log wi − log |D| 2 2 ´o wi n −1 ³ b c2 i λλ> . bib b > − 2tb c i λ> + = − tr D Ωi + b i 2
Under this perturbation scheme, the no-perturbation model follows by considering ω 0 = (1, . . . , 1)> . The matrix ¯ b ¯¯ b ∂ 2 Q(θ, ω|θ) ∂ 2 Q(θ, ω 0 |θ) ∆ω 0 = = ¯ > > ¯ ∂θ∂ω ∂θ∂ω ω =ω 0
662
H. Bolfarine, L.C. Montenegro and V.H. Lachos
has elements given by b ∂ 2 Q1i (β, γ, ω 0 |θ) ∂β∂ωi 2 b ∂ Q2i (α, λ, ω 0 |θ) ∂αr ∂ωi 2 b ∂ Q2i (α, λ, ω 0 /θ) ∂λ∂ωi
b ∂ 2 Q1i (β, γ, ω 0 |θ) = 0, ∂γ∂ωi o 1 n −1 ˙ = tr Ni D−1 D(r)D and 2 ³ ´ c2 λ . ci − = = D−1 tb i = 0,
Perturbation of explanatory variables. In this case, we are interested in perturbing a specific explanatory variable. Under this condition, we have the following perturbed explanatory matrix Xiω = (xi1 , . . . , xiu (ωi ), . . . , xip ), where xiu (ωi ) = xiu + ωi 1ni , u = 1, . . . , p, xiu is the u-th column of the matrix Xi and 1ni is a ni × 1 vector of ones. Hence, this case may cover situations where x is measured with error. In this case, ω 0 = 0 and b Q1i (β, γ, ω|θ) ³ ´ ´> 1 1³ bi b i Σ−1 yi − Xi (ω)β − Zi b = − log |Σi | − yi − Xi (ω)β − Zi b i 2 2 1 n −1 b > o − tr Σi Zi Ωi Zi 2 b = Q2i (α, λ|θ). b and Q2i (α, λ, ω|θ) As in the previous section, the matrix ¯ b ¯¯ b ∂ 2 Q(θ, ω|θ) ∂ 2 Q(θ, ω 0 |θ) ∆ω 0 = = ¯ > > ¯ ∂θ∂ω ∂θ∂ω ω =ω 0 has elements given by b ∂ 2 Q1i (β, γ, ω 0 |θ) ∂β∂ωi b ∂ 2 Q1i (β, γ, ω 0 |θ) ∂γr ∂ωi b ∂ 2 Q2i (α, λ, ω 0 |θ) ∂αr ∂ωi
³ ´> > −1 = fu yi − Xi β − Zi bbi Σ−1 i 1ni − βu Xi Σi 1ni , ³ ´ −1 −1 ˙ = −βu 1> yi − Xi β − Zi bbi , ni Σi Σi (r)Σi = 0 and
b ∂ 2 Q2i (α, γ, ω0 |θ) = 0, ∂λ∂ωi
i = 1, . . . , n, where fu denotes a p × 1 vector of zeros with one in the u-th position, and βu denotes the u-th element of β.
Skew-normal linear mixed models
663
Perturbation of response variables. A perturbation of the response variables (y1> , . . . , yn> )> is introduced by replacing yi by yiω = yi + ωi 1ni , where 1ni is a ni × 1 vector of ones, i = 1, . . . , n. In this case, ω 0 = 0, b Q1i (β, γ, ω|θ) ´> ³ ´ 1 1³ b i Σ−1 yiω − Xi β − Zi b bi = − log |Σi | − yiω − Xi β − Zi b i 2 2 n o 1 b > , − tr Σ−1 i Zi Ωi Zi 2 b = Q2i (α, λ|θ). b and Q2i (α, λ, ω|θ) Moreover, the matrix ∆ω0
¯ n X b ¯¯ b ∂ 2 Q(θ, ω|θ) ∂ 2 Q(θ, ω 0 |θ) = = ¯ ∂θ∂ω > ¯ ∂θ∂ω > i=1 ω =ω 0
has the following elements. b ∂ 2 Q1i (β, γ, ω 0 |θ) ∂β∂ω i 2 b ∂ Q1i (β, γ, ω 0 |θ) ∂γr ∂ω i 2 b ∂ Q2i (α, λ, ω 0 |θ) ∂αr ∂ω i
5
−1 = X> i Σi 1ni ,
³ =
´ −1 ˙ yi − Xi β − Zi bbi Σ−1 i Σi (r)Σi 1ni ,
= 0 and
b ∂ 2 Q2i (α, λ, ω 0 |θ) = 0. ∂λ∂ω i
Illustrative Examples
In order to illustrate the usefulness of the proposed methodology, in this section, we present the results obtained through hypothetical and real situations. MATLAB software was used in implementing the programs required for computing the matrices derived in the previous sections. 5.1. Simulated data. To investigate the empirical performance of the proposed methods, a simulation study was conducted. We started with the following skew-normal linear mixed model. yij = β0 + β1 tij + β2 ωi + bi + ²ij ,
(5.1)
where j = 1, . . . , 5, i = 1, . . . , 200. We took tij = j − 3, wi = 1 if i ≤ 100 and wi = 0 if i > 100, β1 = 2, β2 = 1, ²ij ∼ N (0, 0.52 ) and β0 + bi ∼
664
H. Bolfarine, L.C. Montenegro and V.H. Lachos
0.08
0.16
20
20
5
0.07
5
0.14
50 0.12
0.05
0.1
M(0)
M(0)
50 0.06
0.04
0.08
0.03
0.06
0.02
0.04
0.01
0.02
0
0
20
40
60
80
100
120
140
160
180
0
200
0
20
40
60
80
index
(a) 0.015
120
140
160
180
200
(b) 3
20
5
100
index
50
20
5 2.5
50 0.01
i
LDc
M(0)
2
0.005
1.5
1
0.5
0
0
20
40
60
80
100
120
140
160
180
200
0
0
20
40
60
80
index
100
120
140
160
180
200
index
(c)
(d)
Figure 2. Simulated data set. Index plots of (a) M (0) for case weights perturbation, (b) M (0) for perturbation of the dispersion component D, (c) M (0) for perturbation of the response variable and (d) complete-data likelihood displacement LDic (the dotted line is the bench-mark for M (0) with c∗ = 2).
SN1 (0, 0.22 , 0.3), yielding highly skewed data. Note that tij represents a covariate with values changing within individuals and being the same for all individuals, while wi is the individual level-covariate, e.g., a treatment indicator. Now we consider the following atypical points {yij : j = 1, . . . , 5, i = 5, 20, 50}, where the corresponding bi was replaced with (fixed) bi = 2, for i = 5, 20, 50. These atypical points correspond to individuals with the same level-covariate w. With all these information, we generate a data set with outliers according to the SN-LMM defined in (5.1). Following the proposed procedures in Section 4, Figure 2 depicts the index plots of M (0) for case weights perturbation, perturbation of the dispersion component D,
Skew-normal linear mixed models
665
0.5 −1.0
0.000
−0.5
0.0
Slope
0.004
Density
0.008
1.0
perturbation of the response variables and for the likelihood displacement based in the complete data (LDic ). Using the bench-mark with c∗ = 2 (see Section 4.1), for all the perturbation schemes considered, the atypical points 5, 20 and 50 were correctly picked up indicating that the methodology works very well when suspicious points are present in the data set. The big jump of the Figure 2(c) might be due to the level-covariate considered in the simulation study.
150 200 250 300 350 400 450 Cholesterol Level
(a)
−1.0
−0.5
0.0
0.5
1.0
1.5
Intercept
(b)
Figure 3. Framingham cholesterol data set: (a) histogram for cholesterol levels with kernel density estimate and (b) contour plot of estimated density of bi
5.2. Framingham cholesterol data set. Zhang and Davidian (2001) studied a data set on cholesterol levels, collected as part of the famous Framingham heart study. The data set includes the cholesterol levels over time, age at baseline and gender for n = 200 randomly selected individuals. We adopt the same linear mixed model used by these authors, given by yij = β0 + β1 sexi + β2 agei + β3 tij + boi + b1i tij + ²ij ,
(5.2)
where yij is the cholesterol level divided by 100 at the j-th time for subject i and tij is (time − 5)/10, with time measured in years from baseline; agei is age at baseline; sexi is the gender indicator (0 = f emale, 1 = male). Hence, xij = (1, sexi , agei , tij )> , bi = (boi , b1i )> and Zij = (1, tij )> . Figure 3(a) shows the histogram of cholesterol levels, clearly indicating its asymmetric nature, and it seems that it would be adequate to fit a skew-normal model to the data set. In all cases, we considered Σi = σe2 Ini , i = 1, . . . , 200.
666
H. Bolfarine, L.C. Montenegro and V.H. Lachos Table 1. Results of fitting SN-LMM and N-LMM to the Framingham cholesterol data set. d11 , d12 and d22 are the distinct elements of the matrix D1/2 . SE are the estimated asymptotic standard errors.
Parameter
SN-LMM Estimate SE
N-LMM Estimate SE
β0 β1 β2 β3
1.3555 –0.0484 0.0150 0.3541
0.1397 0.0496 0.0035 0.0555
1.5967 –0.0631 0.0184 0.2817
0.1543 0.0568 0.0037 0.0242
σe2
0.0429
0.0024
0.0434
0.0024
d11 d12 d22
0.1875 0.1363 0.1434
0.0547 0.0380 0.0518
0.3716 0.0563 0.1868
0.0201 0.0179 0.0329
λb1 λb2
0.4776 –0.0918
0.0673 0.0633
-
-
log-likelihood
–152.0384
–160.9864
AIC BIC HQ
0.1552 0.1789 0.1642
0.1619 0.1808 0.1691
Resulting parameter estimates are given in Table 1. Note that the values corresponding to the Akaike information criterion (AIC), Schwarz’s information criterion (BIC) and the Hannan-Quinn (HQ) criterion shown at the bottom of this table favour SN-LMM, supporting the contention of the departure from normality. Furthermore, for this data set, we conducted the local influence study with interest in θ based in M (0). In all the cases, we have used c∗ = 3 to construct the bench-mark. For the case weight perturbation, individuals 39, 90, 146, 160 and 175 stand out, as depicted in Figure 4(a). Individuals 2, 39 and 160 are found as influential using the perturbation of the matrix D as shown in Figure 4(b). Figures 5(a) and (b) present index plots of M (0) for the perturbation of the explanatory variable sexi and the response variables, respectively. Once again the individuals 39, 26, 90 and 146 seem to be most influential. Note that the individual 39 seems to be more influential in the majority of the perturbations schemes. Individual 39 (female) is more influential due to the fact that her cholesterol level is found low (about the average) in comparison of the other individuals considering that her age is 59 years. We notice also that for the case deletion diagnosis, the measures LDic and Dic presented in Figures 6(a) and (b), respectively, indicate individual 39 as very influential.
Skew-normal linear mixed models 0.07
667
0.12
160 160
39
0.06
146 0.1
90
39
175
0.05
0.08
2
M(0)
M(0)
0.04 0.06
0.03
0.04 0.02
0.02 0.01
0
0
20
40
60
80
100
120
140
160
180
0
200
0
20
40
60
80
index
100
120
140
160
180
200
index
(a)
(b)
Figure 4. Framingham cholesterol data set. Index plots of (a) M (0) for perturbation of case weights and (b) M (0) for perturbation of the dispersion component matrix. (The dotted line is the bench-mark for M(0) with c∗ = 3). 0.045
0.04
39
90
0.04
0.035
0.035 0.03
146
26
0.03 0.025
M(0)
M(0)
0.025 0.02
0.02 0.015 0.015
0.01 0.01
0.005
0.005
0
0
20
40
60
80
100
120
140
160
180
200
0
0
20
40
60
80
index
100
120
140
160
180
200
index
(a)
(b)
Figure 5. Framingham cholesterol data set. Index plots of (a) M (0) for perturbation of the explanatory variable and (b) M (0) for perturbation of the response variable. (The dotted line is the bench-mark for M(0) with c∗ = 3).
6
Final Conclusion
In this paper, we have proposed a multivariate extension of the univariate skew-normal distribution of Sahu et al. (2003) as an alternative distribution to be used in practice. This multivariate distribution is interesting because it allows easy implementation of the EM-algorithm. For evaluation of the ML estimates in SN-LMM, an EM-type algorithm is developed and then using
668
H. Bolfarine, L.C. Montenegro and V.H. Lachos 4
4.5
140
x 10
39 4
2
120
39
3.5
26
100
160
90
3
80
Di
c
LDci
2.5
2
60
1.5 40
1 20
0.5
0
0
20
40
60
80
100
120
140
160
180
200
0
0
20
40
60
80
100
120
140
160
180
200
index
index
(a)
(b)
Figure 6. Framingham cholesterol data set. (a) complete-data likelihood displacement LDic and (b) Cook’s distance Dic based on the complete data.
the related conditional expectation of the complete-data log-likelihood function, we have derived the appropriate matrices to assess the local influence on the parameter estimates under four perturbation schemes. A simulation study conducted indicates that the methodology seems to work well when a suspicious point is present in the data, which can, at minor perturbation, significantly distort model estimators. It is still an open problem, however, to develop model adequacy procedures for asymmetric regression models. We point out that the methodology and the analytical expressions provided in this work do not seem to be available elsewhere in the literature.
Acknowledgements. The authors thank the co-editor and anonymous referees for helpful comments and suggestions on a previous version of this article. Grants from CNPq and FAPESP-Brazil are also acknowledged.
References Arellano-Valle, R.B. and Genton, M.G. (2005). Fundamental skew distributions. J. Multivariate Anal., 96, 93–116. Arellano-Valle R.B., Bolfarine, H. and Lachos, V.H. (2005). Skew-normal linear mixed models. J. Data Science, 3, 415–438. Azzalini, A. and Dalla-Valle, A. (1996). The multivariate skew-normal distribution. Biometrika, 83, 715–726.
Skew-normal linear mixed models
669
Cook, R.D. (1977). Detection of influential observation in linear regression. Technometrics, 19, 5–18. Cook, R.D. (1986). Assessment of local influence (with discussion). J. Roy. Statist. Soc. Ser. B, 48, 133–169. Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM-algorithm. J. Roy. Statist. Soc. Ser. B, 39, 1–22. Fung, W.K., Zhu, Z.Y., Wey, B.C. and He, X. (2002). Inference diagnostics and outlier tests for semiparametric mixed models J. Roy. Statist. Soc. Ser. B, 64, 565–579. Galea, M., Paula, G.A. and Bolfarine, H. (1997). Local influence in elliptical linear regression models. The Statistician, 46, 71–79. Lee, S. and Xu, L. (2004). Influence analysis of nonlinear mixed-effects models. Comput. Statist. Data Anal., 45, 321–341. Lessaffre, E. and Verbeke, G. (1998). Local influence in linear mixed models. Biometrics, 54, 570–582. Lu, B. and Song, X.Y. (2006). Local influence of multivariate probit latent variable models. J. Multivariate Anal., 97, 1783–1798. Ma, Y., Genton, M.G. and Davidian, M. (2004). Linear mixed effects models with flexible generalized skew-elliptical random effects. In Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality, Genton, M.G., ed., Chapman & Hall / CRC, Boca Raton, FL, 339–358. Magder, L.S. and Zeger, S.L. (1996). A smooth nonparametric estimate of a mixing distribution using mixtures of Gaussians. J. Amer. Statist. Assoc., 91, 1141–1151. Magnus, J.R. and Neudecker, H. (1988). Matrix Differential Calculus with Applications in Statistics and Econometrics, Wiley, New York. Meng, X. and Rubin, D.B. (1993). Maximum likelihood estimation via ECM algorithm: a general framework. Biometrika, 80, 267–278. Pinheiro, J.C. and Bates, D.M. (2000). Mixed-Effects Models in S and S-plus, SpringerVerlag, New York. Poon W.Y. and Poon Y.S. (1999). Conformal normal curvature and assessment of local influence. J. Roy. Statist. Soc. Ser. B, 61, 51–61. Sahu, S.K., Dey, D.K. and Branco, M.D. (2003). A new class of multivariate skew distributions with applications to Bayesian regression models. Canad. J. Statist., 31, 129–150. Tao, H., Palta, M., Yandell, B.S. and Newton, M.A. (1999). An estimation method for the semi-parametric mixed effects model. Biometrics, 55, 102–110. Verbeke, G. and Lessafre, E. (1996). A linear mixed-effects model with heterogeneity in the random-effects population. J. Amer. Statist. Assoc., 91, 217–221. Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data, Springer, New York. Zhang, D. and Davidian, M. (2001). Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics, 57, 795–802.
670
H. Bolfarine, L.C. Montenegro and V.H. Lachos
Zhu, H. and Lee, S. (2001). Local influence for incomplete-data models. J. Roy. Statist. Soc. Ser. B, 63, 111–126. Zhu, H. and Lee, S. (2003). Local influence for generalized linear mixed models. Canad. J. Statist., 31, 293–309.
Heleno Bolfarine Departmento De Estatistica ˜ o Paulo Universidade de Sa Caixa Postal 66281 - CEP 05315-970 ˜ o Paulo, Brazil Sa E-mail:
[email protected]
Lourdes C. Montenegro Departmento De Estatistica Universidade Federal de Minas Gerais 31270-901 Belo Horizonte Minas Gerais, Brazil E-mail:
[email protected]
Victor H. Lachos Departmento De Estatistica Universidade Estadual de Campinas Caixa Postal 6065 - CEP 13083-859 ˜ o Paulo, Brazil Campinas, Sa E-mail:
[email protected]
Paper received July 2006; revised January 2007.