Robust mixture regression modeling using the least ...

2 downloads 0 Views 784KB Size Report
Jun 14, 2017 - Robust mixture regression modeling using the least trimmed squares (LTS)-estimation method. Fatma Zehra Doğru & Olcay Arslan. To cite this ...
Communications in Statistics - Simulation and Computation

ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: http://www.tandfonline.com/loi/lssp20

Robust mixture regression modeling using the least trimmed squares (LTS)-estimation method Fatma Zehra Doğru & Olcay Arslan To cite this article: Fatma Zehra Doğru & Olcay Arslan (2017): Robust mixture regression modeling using the least trimmed squares (LTS)-estimation method, Communications in Statistics Simulation and Computation, DOI: 10.1080/03610918.2017.1341528 To link to this article: http://dx.doi.org/10.1080/03610918.2017.1341528

Accepted author version posted online: 14 Jun 2017. Published online: 14 Jun 2017. Submit your article to this journal

Article views: 30

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=lssp20 Download by: [Ankara Universitesi]

Date: 25 July 2017, At: 05:58

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION , VOL. , NO. , – https://doi.org/./..

Robust mixture regression modeling using the least trimmed squares (LTS)-estimation method ˘ Fatma Zehra Dogru

a

b

and Olcay Arslan

a

Department of Econometrics, Faculty of Economics and Administrative Sciences, Giresun University, Giresun, Turkey; b Department of Statistics, Faculty of Science, Ankara University, Ankara, Turkey

ABSTRACT

ARTICLE HISTORY

Mixture regression models are used to investigate the relationship between variables that come from unknown latent groups and to model heterogenous datasets. In general, the error terms are assumed to be normal in the mixture regression model. However, the estimators under normality assumption are sensitive to the outliers. In this article, we introduce a robust mixture regression procedure based on the LTS-estimation method to combat with the outliers in the data. We give a simulation study and a real data example to illustrate the performance of the proposed estimators over the counterparts in terms of dealing with outliers.

Received  October  Accepted  June  KEYWORDS

EM algorithm; LTS-estimation method; Mixture regression model; Robust regression MATHEMATICS SUBJECT CLASSIFICATION

C; U; F

1. Introduction In this article, we will consider the parameter estimation in a g-component mixture regression model, which is defined as follows. Let x be a p-dimensional explanatory variable, y be the response variable, and Z be the latent class variable independent  of x with P ( Z j = i|x) = g πi for i = 1, . . . , g. Here, πi ’s show the mixing probabilities with i=1 πi = 1, 0 ≤ πi ≤ 1. Assume that the response variable y depends on the explanatory variable x in a linear way given Z = i as y = x  β i + i ,

i = 1, 2, . . . , g,

(1)

where βi = (β1i , . . . , βip ) is the regression parameters, i is the error terms, and x contains both predictors and constant 1. Then, the conditional density function of y given x is f (y|x, ) =

g 

πi fi (y; x βi , σi ),

(2)

i=1

where fi (y; x βi , σi ) is the density function of the ith component and  = (π1 , . . . , πg−1 , β1 , . . . , βg , σ1 , . . . , σg ) is the unknown parameter vector. This model is called as a g-component mixture regression model. Mixture regression models were first defined by Quandt (1972) and Quandt and Ramsey (1978) as switching regression models. These models are commonly applied in areas such as engineering, genetics, biology, econometrics, and marketing. These models are used to ˘ CONTACT Fatma Zehra Dogru [email protected] Department of Econometrics, Faculty of Economics and Administrative Sciences, Giresun University, Giresun , Turkey. Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/lssp. ©  Taylor & Francis Group, LLC

2

˘ F. Z. DOGRU AND O. ARSLAN

model heterogenous datasets and to explore the connection between variables that come from unknown latent groups. In general, the error terms in a mixture regression model are assumed to be normally distributed and the estimation is carried out using the Expectation-Maximization (EM) algorithm (Dempster et al., 1977). However, the estimators obtained under the normality assumption are very sensitive to the outliers and heavy-tailed errors. Therefore, robust mixture regression models have been proposed to cope with these problems. There are several works on robust mixture regression procedures. For example, Markatou (2000) and Shen et al. (2004) introduced a weight factor for each observation to robustify the resulting estimators for mixture regression model. Neykov et al. (2007) used the trimmed likelihood approach for the mixture regression model to get robust estimators. Bashir and Carter (2012) adapted the S-estimation method to mixture regression model to obtain robust estimators for the parameters. Bai (2010) and Bai et al. (2012) proposed a robust mixture regression procedure based on the M-regression estimation method. Recently, Do˘gru (2015) and Do˘gru and Arslan (2015) have proposed robust mixture regression modeling based on the generalized M (GM)estimation method. Also, some robust estimators for the parameters of the mixture regression model have been proposed using the heavy-tailed and heavy-tailed skew distributions (Do˘gru, 2015; Do˘gru and Arslan, 2016a, 2016b, 2017; Song et al., 2014; Yao et al., 2014; Zhang, 2013). Although most of the robust estimators for the mixture regression model have been proposed to gain some robustness over the estimators based on the normality assumption, these robust estimators are not usually robust to the leverage points. The only GM and S-estimators can provide estimators that are robust against the leverage points. Yao et al. (2014) introduced mixture regression model based on the t distribution and proposed to trim the leverage points before carried out the mixture regression. Also, Neykov et al. (2007) proposed using weighted trimmed likelihood approach to obtain the estimators that are robust against any type of outliers. In this article, we will also use the trimming approach to get robust estimators. However, in our approach, we will embed the trimming in the complete data log-likelihood function during the EM algorithm steps. That is in the complete data log-likelihood, we use the least trimmed squares (LTS) criterion to be maximized. By doing so, we will trim the outliers and hence reduce the effect of the outliers on the estimators. Therefore, since LTS-estimator for the regression parameter has high breakdown point, the resulting estimators for the mixture regression model will also have high breakdown point. The rest of the article is organized as follows. In Section 2, we give the mixture regression model based on the normal distribution. In Section 3, we propose the mixture regression model based on the LTS-estimation method and also give an EM-type algorithm to obtain the parameter estimators. In Sections 4 and 5, we provide a simulation study and a real data example to compare the performance of the proposed estimation procedure over the estimation procedures given in the literature. The article is concluded with a conclusion section.

2. Mixture regression model based on the normal distribution Let {(x1 , y1 ), . . . , (xn , yn )} be a sample. If it is assumed that the error terms have the normal distribution with 0 mean and σ 2 variance in the mixture regression model, the estimator of  can be found by maximizing the following log-likelihood function

 () =

n  j=1

log

 g  i=1

πi φ



y j ; xj βi , σi2



 .

(3)

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION

3

However, since the direct maximization of (3) cannot be usually possible, in general, the EM algorithm is used to find the ML estimate of . Let Zi j be the latent variables with  Zi j =

1, if jth observation is coming from ith component , 0, otherwise

(4)

where, j = 1, . . . , n and i = 1, . . . , g. Here, Z j = (Z1 j , . . . , Zg j ) will be regarded as missing observations because they cannot be observable. Then, the complete data log-likelihood function for (y, Z j ) given X is obtained as c (; y, Z j ) =

g n   j=1 i=1





1 1 zi j log (πi ) − log (2π ) − log σi2 2 2

g n  

zi j

(y j − xj βi )2 2σi2

j=1 i=1

,

(5)

where X = (x1 , . . . , xn ) and y = (y1 , . . . , yn ). After computing the conditional expectation of the complete data log-likelihood function E(c (; y, Z j )|y j ), the EM algorithm will be as follows. EM algorithm 1. Take initial estimate for the parameter, say (0) and fix a stopping rule . 2. E-step: Compute the following conditional expectation when y and the current paramˆ (k) are given eter value 

zˆi(k) j

   πˆ i(k) φ y j ; x j βˆ i(k) , σˆ i2(k) ˆ (k) ) = .  = E(Zi j |y j ,  g  ˆ i(k) φ y j ; x j βˆ i(k) , σˆ i2(k) i=1 π

(6)

3. M-step: Compute the following estimates n πˆ i(k+1)

(k) j=1 zˆi j

= ⎛

βˆ i(k+1) = ⎝

n n  j=1

,

(7) ⎞−1 ⎛

⎠ zˆi(k) j x jx j



n 

⎞ ⎠, zˆi(k) j x jy j

(8)

j=1

 2 (k)  ˆ (k) ˆ β y − x z j j i j=1 i j . n (k) j=1 zˆi j

n σˆ i2(k+1) =

(9)

ˆ (k) || <  is satˆ (k+1) −  4. Repeat E and M steps until the convergence criterion || ˆ (k+1) ) − isfied. Moreover, the absolute difference of the actual log-likelihood ||( (k) (k+1) (k) ˆ )|| <  or ||( ˆ ˆ ) − 1|| <  can be used (see Dias and Wedel, ( )/( 2004) for the convergence.

4

˘ F. Z. DOGRU AND O. ARSLAN

3. Mixture regression model based on the LTS-estimation method The LTS estimators (Rousseeuw, 1984) for the regression parameters are obtained as the solution of the following minimization problem: min βˆ

h 

(r2 ) j:n ,

(10)

j=1

where r shows the residuals with r = y j − xj β, (r2 )1:n ≤ · · · ≤ (r2 )n:n are the ordered squared residuals, h = [n(1 − α) + 1] is the number of observations after trimming, and α is the trimming proportion. If h equals to [ n2 ] + 1, the breakdown point of the LTS regression estimators will be 1/2. The estimator based on the normal distribution in the mixture regression model will not be robust against the outliers because of the second term of the complete data log-likelihood function given in (5). This term is basically the least-squares (LS) criterion and it is known that the LS method is sensitive to the outliers. To obtain robust estimators, this term should be robustified. There are some approaches to robustify this term in literature (mixture regression model based on M (Bai, 2010; Bai et al., 2012), GM (Do˘gru and Arslan, 2015), and S (Bashir and Carter, 2012) estimation methods). In this article, we will use the LTS criterion instead of this term. The adaptation of LTS criterion in the complete data log-likelihood will be much easier than the other robust methods and will produce high breakdown point estimator. The adaptation will be as follows. We will take the complete data log-likelihood function given in (5) and use the LTS criterion given in (10) in this equation. This results the following adapted complete data log-likelihood function c (; y, Z j ) =

g n   j=1 i=1

xj βi

 g h  (ri )2j:n

2 1 1 zi j log (πi ) − log (2π ) − log σi zi j , − 2 2 2σi2 j=1 i=1 (ri2 )1:n

(11) is the ordered squared residuals of the ith

and ≤ ··· ≤ where ri = y j − component for i = 1, . . . , g. To run the EM algorithm, we will take the conditional expectation of the complete data log-likelihood function to get rid of the latency of zi j E(c (; y, Z j )|y j ) =

(ri2 )n:n

g n   j=1 i=1





1 1 E(Zi j |y j ) log (πi ) − log 2π − log σi2 2 2

g h   j=1 i=1

E(Zi j |y j )

(ri )2j:n 2σi2



.

Note that the conditional expectation E(Zi j |y j ) can be calculated using the classical theory of the mixture modeling. Then, the steps of the EM algorithm for the mixture regression based on the LTS estimation method will be as follows. EM algorithm 1. Set initial parameter estimate (0) and fix a stopping rule . 2. E-step: Compute the following conditional expectation when y and the current paramˆ (k) are given eter value 

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION

zˆi(k) j

  πˆ i(k) φ y j ; xj βˆ i(k) , σˆ i2(k) ˆ (k) ) = .  = E(Zi j |y j ,  g ˆ i(k) φ y j ; xj βˆ i(k) , σˆ i2(k) i=1 π

3. M-step: Compute the parameter estimates for the (k + 1) th steps n (k) j=1 zˆi j (k+1) = , πˆ i n ⎞−1 ⎛ ⎞ ⎛ h h   ⎠ ⎝ ⎠, βˆ i(k+1) = ⎝ zˆi(k) zˆi(k) j x jx j j x jy j j=1

5

(12)

(13) (14)

j=1

 2 (k)  ˆ (k) ˆ y β z − x j j i j=1 i j , h (k) j=1 zˆi j − p

h σˆ i2(k+1) = cα

(15)

where cα is a consistency constant. For the normal errors, cα will be (1 − α)/Fχ32 (qα ) with 2 (see Agulló et al., 2008 for the case multivariate normal errors). Here, Fχ32 shows qα = χ1,1−α the cumulative distribution function of the χ 2 distribution with the 3 degrees of freedom and qα is the upper α percent point of the χ 2 distribution with 1 degrees of freedom. ˆ (k) || <  is ˆ (k+1) −  4. Repeat E and M steps until the convergence criterion || obtained.

4. Simulation study In this section, we provide a simulation study to compare the performance of the robust mixture regression procedure based on the LTS estimation method (MixregLTS) with the mixture regression model based on normal distribution (MixregN), the mixture regression model based on the t distribution (Mixregt) proposed by Yao et al. (2014), the mixture regression model based on the M-estimation method (MixregM) proposed by Bai (2010) and Bai et al. (2012), and the mixture regression model based on the GM-estimation method (MixregGM) with Mallows type function proposed by Do˘gru and Arslan (2015). These estimators will be compared using the following bias and mean squared error (MSE) criteria: N   θˆ ) = 1 θˆj − θ, bias( N j=1

N   θˆ ) = 1 MSE( (θˆj − θ )2 , N j=1

where θ is the true parameter value, θˆj is the estimate of θ from the jth simulated data, and N = 500 is the number of replications. The sample sizes are taken as 200 and 400 for all simulation settings. For the M-estimator case, we use Huber’s ψ function ψc (x) = max(−c, min(c, x)) with c = 1.345. The simulation study and real data example are performed using MATLAB R2013a. For all numerical calculations, the stopping rule  is taken as 10−6 . For simplicity, we assume that all σi2 ’s are equal throughout the simulation study. We consider the following simulation configurations. Scenario 1. The data {(x1 j , x2 j , y j ), j = 1, . . . , n} are generated from the following two component mixture regression models (Bai et al., 2012)  0 + X1 + X2 + 1 , Z = 1, Y = 0 − X1 − X2 + 2 , Z = 2,

˘ F. Z. DOGRU AND O. ARSLAN

6

where P (Z = 1) = 0.25 = π1 , P (Z = 2) = 0.75 = 1 − π1 , X1 ∼ N(0, 1), and X2 ∼ N(0, 1). The model coefficients are β1 = (β10 , β11 , β12 ) = (0, 1, 1) and β2 = (β20 , β21 , β22 ) = (0, −1, −1) . The following error distributions are considered: Case I: 1 , 2 ∼ N(0, 1), standard normal distribution. Case II: 1 , 2 ∼ t3 , t distribution with the degrees of freedom 3. Case III: 1 , 2 ∼ 0.95N(0, 1) + 0.05N(0, 25), contaminated normal distribution. Case IV: 1 , 2 ∼ N(0, 1), standard normal distribution with %5 outliers from the model Y = 50 + X1 + X2 + , where  ∼ N(0, 1), X1 ∼ U (15, 20), and X2 ∼ U (15, 20). In this case, we add 10 outliers for the sample 200 then the sample size will be 210. Similarly, we add 20 outliers for the sample 400 and the final sample size will be 420. Table . Bias and MSE values of estimates for n =  in Scenario . MixregN Bias

Mixregt

MixregM

MSE

Bias

MSE

Bias

MSE

Case I: 1 , 2 ∼ N(0, 1) . .

MixregGM Bias

MSE

MixregLTS Bias

MSE

β10 : 0

.

.

− .

.

.

.

.

.

β20 : 0

− .

.

− .

.

− .

.

− .

.

− .

.

β11 : 1 β21 : −1

− . − .

. .

− . .

. .

− . − .

. .

− . .

. .

− . .

. .

β12 : 1 β22 : −1 π1 : 0.25

. . .

. . .

− . . .

. . .

. . .

. . .

. . .

. . .

. . − .

. . .

β10 : 0

.

.

.

.

.

.

− .

.

.

β20 : 0

− .

.

.

.

.

.

.

.

− .

.

β11 : 1 β21 : −1 β12 : 1

. − . .

. . .

− . − . .

. − . .

. . .

. . − .

. . .

− . . − .

. . .

β22 : −1 π1 : 0.25

− . − .

. .

− . .

. − . . . . . . − . Case III: 1 , 2 ∼ 0.95N(0, 1) + 0.05N(0, 25)

. .

. − .

. .

Case II: 1 , 2 ∼ t3

β10 : 0 β20 : 0

. − .

. . .

.

.

− .

.

− .

.

− .

.

− .

.

.

.

.

.

.

.

.

.

.

β11 : 1

.

.

− .

.

.

.

.

.

− .

.

β21 : −1 β12 : 1

. .

. .

− . − .

. .

− . .

. .

. .

. .

. − .

. .

β22 : −1 π1 : 0.25

. − .

. .

. . − . . . . − . . . . − . . Case IV: 1 , 2 ∼ N(0, 1) with outliers from the outlier model

. − .

. .

β10 : 0

.

.

.

.

.

.

.

.

.

.

β20 : 0

.

.

.

.

.

.

.

.

− .

.

β11 : 1

.

.

.

.

.

.

.

.

− .

.

β21 : −1 β12 : 1 β22 : −1

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. − . .

. . .

π1 : 0.25

− .

− .

.

− .

.

− .

.

.

.

.

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION

7

Scenario 2. We generate the data {(x j , y j ), j = 1, . . . , n} from a three-component mixture regression models (Bai et al., 2012) ⎧ ⎨ 1 + X + 1 , Z = 1, Y = 2 + 2X + 2 , Z = 2, ⎩ 3 + 5X + 3 , Z = 3, where P (Z = 1) = P (Z = 2) = 0.3 = π1 = π2 , P (Z = 3) = 0.4 = 1 − (π1 + π2 ), and X ∼ N(0, 1). The model coefficients are β1 = (β10 , β11 ) = (1, 1) , β2 = (β20 , β21 ) = (2, 2) , and β3 = (β30 , β31 ) = (3, 5) . For the error distribution, we take the following cases: Case I: 1 , 2 , 3 ∼ N(0, 1), standard normal distribution. Case II: 1 , 2 , 3 ∼ t3 , t distribution with the degrees of freedom 3. Case III: 1 , 2 , 3 ∼ 0.95N(0, 1) + 0.05N(0, 25), contaminated normal distribution. Table . Bias and MSE values of estimates for n =  in Scenario . MixregN Bias

MSE

Mixregt Bias

MixregM MSE

Bias

MSE

Case I: 1 , 2 ∼ N(0, 1) . .

MixregGM Bias

MSE

MixregLTS Bias

MSE

β10 : 0

.

.

.

.

.

.

.

.

β20 : 0

.

.

.

.

.

.

.

.

.

.

β11 : 1 β21 : −1 β12 : 1

. − . − .

. . .

− . . − .

. . .

. − . − .

. . .

. . − .

. . .

. . − .

. . .

β22 : −1 π1 : 0.25

− . .

. .

. − .

. .

− . . . . Case II: 1 , 2 ∼ t3

− . .

. .

. − .

. .

β10 : 0

− .

.

− .

.

− .

.

− .

.

− .

.

β20 : 0

− .

.

.

.

.

.

.

.

− .

.

.

.

− .

.

.

.

.

.

− .

.

β21 : −1 β12 : 1 β22 : −1

− . . − .

. . .

− . − . − .

. . .

− . . − .

. . .

. − . .

. . .

. − . .

. . .

π1 : 0.25

− .

.

. . . . − . Case III: 1 , 2 ∼ 0.95N(0, 1) + 0.05N(0, 25) − . . − . . − .

.

− .

.

β11 : 1

β10 : 0

.

.

.

− .

.

β20 : 0

.

.

.

.

.

.

.

.

.

.

β11 : 1 β21 : −1

. .

. .

− . .

. .

. − .

. .

. .

. .

− . .

. .

β12 : 1 β22 : −1 π1 : 0.25

. . − .

. . .

. . − .

. . .

. − . .

. . .

. . − .

. . .

− . . − .

. . .

β10 : 0

.

.

Case IV: 1 , 2 ∼ N(0, 1) with outliers from the outlier model . . . . . .

− .

.

β20 : 0

.

.

.

.

.

.

.

.

.

.

β11 : 1 β21 : −1 β12 : 1

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

− . . − .

. . .

β22 : −1 π1 : 0.25

. − .

. .

. − .

. .

. − .

. .

. − .

. .

. .

. .

˘ F. Z. DOGRU AND O. ARSLAN

8

Case IV: 1 , 2 , 3 ∼ N(0, 1), standard normal distribution with %5 outliers from the model Y = 50 + X1 + , where  ∼ N(0, 1) and X1 ∼ U (15, 20). Here, we add 10 outliers for the sample 200 then the sample size will be 210. Also, we add 20 outliers for the sample 400 and the final sample size will be 420. Overall, the cases for the error distributions considered in two scenarios represent different scenarios to be met in applications. For example, Case II and III represent the heavy-tailed error case. In this case, it is usual to get some vertical outliers in y direction. On the other hand, Case IV is designed to get leverage points (outliers in x direction). Table . Bias and MSE values of estimates for n =  in Scenario . MixregN

Mixregt

MixregM Bias

MSE

MixregGM Bias

MSE

MixregLTS

Bias

MSE

Bias

MSE

Bias

MSE

β10 : 1

− .

.

− .

.

− .

.

− .

β20 : 2

.

.

.

.

.

.

.

.

− .

.

β30 : 3

.

.

.

.

.

.

.

.

.

.

β11 : 1 β21 : 2 β31 : 5

− . . .

. . .

. . − .

. . .

− . . .

. . .

− . . .

. . .

− . − . − .

. . .

π1 : 0.3 π2 : 0.3

− . .

. .

. .

. .

. . . . Case II: 1 , 2 , 3 ∼ t3

− . .

. .

− . .

. .

β10 : 1

− .

.

− .

.

− .

.

.

.

.

.

β20 : 2

− .

.

.

.

.

.

− .

.

− .

.

Case I: 1 , 2 , 3 ∼ N(0, 1) .

.

.

β30 : 3

.

.

.

.

.

.

− .

.

− .

.

β11 : 1 β21 : 2 β31 : 5

− . − . .

. . .

− . . .

. . .

− . − . .

. . .

. − . − .

. . .

. − . − .

. . .

π1 : 0.3 π2 : 0.3

− . .

. .

. . . . . . . . . . − . . Case III: 1 , 2 , 3 ∼ 0.95N(0, 1) + 0.05N(0, 25)

− . .

. .

.

β10 : 1

− .

− .

.

.

.

.

.

β20 : 2

.

.

.

.

.

.

− .

.

− .

.

β30 : 3

− .

.

.

.

.

.

− .

.

− .

.

β11 : 1 β21 : 2 β31 : 5

− . − . .

. . .

− . . .

. . .

− . − . .

. . .

. − . − .

. . .

. − . − .

. . .

π1 : 0.3 π2 : 0.3

− . .

. .

. .

. .

− . .

. .

. − .

. .

− . .

. .

β10 : 1

.

.

.

.

.

.

.

.

.

.

β20 : 2

− .

.

− .

.

− .

.

− .

.

− .

.

β30 : 3

.

.

.

.

.

.

− .

.

− .

.

β11 : 1 β21 : 2

. .

. .

. .

. .

. .

. .

. − .

. .

. − .

. .

β31 : 5

− .

.

− .

.

− .

.

− .

.

− .

.

π1 : 0.3 π2 : 0.3

. − .

. .

− . − .

. .

− . − .

. .

− . − .

. .

− . .

. .

− .

.

Case IV: 1 , 2 , 3 ∼ N(0, 1) with outliers from the outlier model

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION

9

The results of simulation study: The simulation results are shown in Tables 1–4. We display the bias, MSE values of the parameter estimates and the true parameter values in the tables. For Scenario 1, the results are shown in Tables 1 and 2. From these tables, we observed the followings. For the Case I, as it is expected, all estimators have similar performance. However, when the error terms have heavy-tailed distributions (1 , 2 ∼ t3 ), the estimators obtained from MixregN have badly influenced by the heavy-tailedness of the errors and the estimators obtained from Mixregt, MixregM, MixregGM, and MixregLTS have similar performance. When the error terms have the contaminated normal mixture model (1 , 2 ∼ 0.95N(0, 1) + 0.05N(0, 25)), except the Table . Bias and MSE values of estimates for n =  in Scenario . MixregN

Mixregt

MixregM

Bias

MSE

Bias

MSE

Bias

β10 : 1

− .

.

− .

.

− .

β20 : 2

.

.

.

.

.

MSE

MixregGM

MixregLTS

Bias

MSE

Bias

MSE

.

− .

.

− .

.

.

.

.

− .

.

Case I: 1 , 2 , 3 ∼ N(0, 1)

β30 : 3

.

.

.

.

.

.

.

.

.

.

β11 : 1 β21 : 2 β31 : 5

. − . .

. . .

. − . − .

. . .

. − . .

. . .

− . − . .

. . .

. − . .

. . .

π1 : 0.3 π2 : 0.3

− . .

. .

. .

. .

− . . . . Case II: 1 , 2 , 3 ∼ t3

− . .

. .

− . .

. .

β10 : 1

− .

.

− .

.

− .

.

.

.

.

.

β20 : 2

− .

.

.

.

.

.

− .

.

− .

.

β30 : 3

.

.

.

.

.

.

− .

.

− .

.

β11 : 1 β21 : 2 β31 : 5

− . − . .

. . .

− . . .

. . .

. − . .

. . .

. − . − .

. . .

. − . − .

. . .

π1 : 0.3 π2 : 0.3

− . .

. .

− . . − . . . . . . . . − . . Case III: 1 , 2 , 3 ∼ 0.95N(0, 1) + 0.05N(0, 25)

− . .

. .

β10 : 1

− .

.

− .

.

− .

.

.

.

.

.

β20 : 2

− .

.

.

.

.

.

− .

.

− .

.

β30 : 3

.

.

− .

.

.

.

− .

.

− .

.

β11 : 1 β21 : 2 β31 : 5

− . − . − .

. . .

. − . .

. . .

. − . .

. . .

. − . − .

. . .

. − . − .

. . .

π1 : 0.3 π2 : 0.3

− . .

. .

− . .

. .

− . .

. .

. − .

. .

− . .

. .

β10 : 1

.

.

.

.

.

.

.

.

.

.

β20 : 2

− .

.

− .

.

− .

.

− .

.

− .

.

Case IV: 1 , 2 , 3 ∼ N(0, 1) with outliers from the outlier model

β30 : 3

.

.

− .

.

.

.

− .

.

− .

.

β11 : 1 β21 : 2

. .

. .

. − .

. .

. .

. .

. − .

. .

. − .

. .

β31 : 5

− .

.

− .

.

− .

.

− .

.

− .

.

π1 : 0.3 π2 : 0.3

. − .

. .

− . − .

. .

− . − .

. .

− . − .

. .

− . .

. .

10

˘ F. Z. DOGRU AND O. ARSLAN

estimators obtained from MixregN, the other estimators have similar behavior. Finally, for the Case IV, which is designed to generate the leverage points, we observe that the estimators obtained from MixregLTS are superior to the other estimators in terms of bias and MSE values. For Scenario 2, we display the estimation results in Tables 3 and 4. For the three component mixture regression models, when the error terms have the normal distribution, the estimators have similar behavior. For the Case II and III, the estimators obtained from MixregN have drastically affected by the heavy-tailedness and contamination. Furthermore, the estimators obtained from MixregLTS have the smaller bias and MSE values than the estimators obtained from Mixregt, MixregM, and MixregGM in most of the conditions. For the outlier case, when we add 10 outliers for the sample size 200 and 20 outliers for the sample size 400, all estimators except the estimators obtained from MixregLTS have influenced by the outliers.

5. Real data example In this section, we will investigate the dataset, which was given by García-Escudero et al. (2010). They used this dataset for the robust clusterwise linear regression over trimming. This dataset consists of heights (in meters) and diameters (in millimeters) of 362 trees, which were in a cultivated forest of Pinus Nigra located in the north of Palencia (Spain). In Fig. 1, we display the scatterplot of the “Pinus Nigra” tree dataset and the histogram of the height. We can observe from the figure that there are three groups in the dataset and some outliers on the top right corner and one isolated point on the bottom right corner in Fig. 1(a), which was also pointed out by García-Escudero et al. (2010). We use the mixture regression procedure to model this dataset. We will compare the performance of the proposed mixture regression model (MixregLTS) with the mixture regression models MixregN, Mixregt, MixregM, and MixregGM. Table 5 displays the results obtained from the estimation methods along with the values of complete data log-likelihood and the integrated complete likelihood (ICL; Biernacki et al., 2000) criterion. According to the values of ICL criterion given in Table 5, the best result is obtained from the MixregLTS and it is followed by the Mixregt. Figure 2 depicts the scatterplot along with the fitted lines obtained from the mentioned estimators. From this figure, we observe that unlike the other estimators, the estimators obtained from MixregLTS give the best fit to the dataset. It finds all there groups correctly. The other estimators are badly affected from the outliers group.

Figure . (a) The scatterplot of the “Pinus Nigra” tree dataset. (b) Histogram of the height.

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION

11

Table . Parameter estimates and the values of ICL information criterion for fitting MixregN, Mixregt, MixregM, MixregGM, and MixregLTS to the “Pinus Nigra” tree dataset. MixregN

Mixregt

MixregM

MixregGM

MixregLTS

w ˆ1 w ˆ2 βˆ

. .

. .

. .

. .

. .

.

.

.

.

.

βˆ20 βˆ

.

.

.

.

.

.

.

.

.

.

βˆ11 βˆ

.

.

.

.

.

21

.

.

.

.

.

βˆ31

.

.

.

.

.

σˆ 1 σˆ 2 σˆ 3

. . .

. . .

. . .

. . .

. . .

− .

− .

− .

− .

− 66.0902

.

.

.

196.9885

10

30

ˆ c () ICL

.

Figure . Fitted mixture regression lines for the “Pinus Nigra” tree dataset.

6. Conclusions In this article, we have proposed a robust estimation procedure based on the LTS estimation method. We have given the parameter estimators using the EM-type algorithm to compute the estimates. We have provided a simulation study and a real data example to illustrate the performance of the proposed estimators over the estimators based on normal distribution, t distribution (Yao et al., 2014), M-estimation method (Bai, 2010; Bai et al., 2012), and GMestimation method (Do˘gru and Arslan, 2015). From the simulation study and real data example results, we observe that in the outlier case the estimators obtained from mixture regression model based on the LTS estimation method give the best fit. Therefore, the mixture regression estimation procedure based on LTS estimation method can be used as a robust alternative estimation method to the mixture regression estimation methods that are already in the literature.

12

˘ F. Z. DOGRU AND O. ARSLAN

Acknowledgments The authors thank the anonymous referees and the associate editor, whose comments and suggestions have greatly improved the article.

ORCID Fatma Zehra Do˘gru http://orcid.org/0000-0001-8220-2375 Olcay Arslan http://orcid.org/0000-0002-7067-4997

References Agulló, J., Croux, C., Van Aelst, S. (2008). The multivariate least-trimmed squares estimator. Journal of Multivariate Analysis 99(3):311–338. Bai, X. (2010). Robust Mixture of Regression Models. Master thesis, Kansas State University, Manhattan, KS. Bai, X., Yao, W., Boyer, J. E. (2012). Robust fitting of mixture regression models. Computational Statistics and Data Analysis 56(7):2347–2359. Bashir, S., Carter, E. M. (2012). Robust mixture of linear regression models. Communications in Statistics-Theory and Methods 41(18):3371–3388. Biernacki, C., Celeux, G., Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(3):719–725. Dempster, A. P., Laird, N. M., Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39:1–38. Dias, J. G., Wedel, M. (2004). An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods. Statistics and Computing 14:323–332. Do˘gru, F. Z. (2015). Robust Parameter Estimation in Mixture Regression Models. Ph.D. dissertation, Ankara University, Ankara, Turkey. Do˘gru, F. Z., Arslan, O. (2015). Robust mixture regression modeling based on the Generalized M (GM)estimation method. arXiv preprint arXiv:1511.07384. Do˘gru, F. Z., Arslan, O. (2016a). Robust mixture regression using the mixture of different distributions. In: Agostinelli, C., et al., eds. Recent Advances in Robust Statistics: Theory and Applications. India: Springer, pp. 57–79. Do˘gru, F. Z., Arslan, O. (2016b). Parameter estimation for mixtures of skew Laplace normal distributions and application in mixture regression modeling. Communications in Statistics - Theory and Methods, (just-accepted). Do˘gru, F. Z., Arslan, O. (2017). Robust mixture regression based on the skew t distribution. Revista Colombiana de Estadística 40(1):45–64. García-Escudero, L. A., Gordaliza, A., Mayo-Iscar, A., San Martín, R. (2010). Robust clusterwise linear regression through trimming. Computational Statistics and Data Analysis 54:3057–3069. Markatou, M. (2000). Mixture models, robustness, and the weighted likelihood methodology. Biometrics 56(2):483–486. Neykov, N., Filzmoser, P., Dimova, R., Neytchev, P. (2007). Robust fitting of mixtures using the trimmed likelihood estimator. Computational Statistics and Data Analysis 52(1):299–308. Quandt, R. E. (1972). A new approach to estimating switching regressions. Journal of the American Statistical Association 67(338):306–310. Quandt, R. E., Ramsey, J. B. (1978). Estimating mixtures of normal distributions and switching regressions. Journal of the American Statistical Association 73(364):730–752. Rousseeuw, P. J. (1984). Least median of squares regression. Journal of the American Statistical Association 79:871–880. Shen, H., Yang, J., Wang, S. (2004). Outlier detecting in fuzzy switching regression models. In: Bussler, C., Fensel, D., eds. Artificial Intelligence: Methodology, Systems, and Applications (Lecture Notes in Computer Science). Vol. 3192. Berlin, Heidelberg: Springer, pp. 208–215.

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION

13

Song, W., Yao, W., Xing, Y. (2014). Robust mixture regression model fitting by Laplace distribution. Computational Statistics and Data analysis 71:128–137. Yao, W., Wei, Y., Yu, C. (2014). Robust mixture regression using the t-distribution. Computational Statistics and Data Analysis 71:116–127. Zhang, J. (2013). Robust Mixture Regression Modeling with Pearson Type VII Distribution. Master thesis, Kansas State University, Manhattan, KS.

Suggest Documents