Robust selection of variables in linear discriminant ...

1 downloads 0 Views 336KB Size Report
Jan 23, 2007 - a robust version of the Wilks' lambda statistic will be constructed based on ..... distribution of the robust R statistic differs from the classical one.
Stat. Meth. & Appl. (2007) 15:395–407 DOI 10.1007/s10260-006-0032-6 O R I G I NA L A RT I C L E

Robust selection of variables in linear discriminant analysis Valentin Todorov

Accepted: 24 October 2006 / Published online: 23 January 2007 © Springer-Verlag 2007

Abstract A commonly used procedure for reduction of the number of variables in linear discriminant analysis is the stepwise method for variable selection. Although often criticized, when used carefully, this method can be a useful prelude to a further analysis. The contribution of a variable to the discriminatory power of the model is usually measured by the maximum likelihood ratio criterion, referred to as Wilks’ lambda. It is well known that the Wilks’ lambda statistic is extremely sensitive to the influence of outliers. In this work a robust version of the Wilks’ lambda statistic will be constructed based on the Minimum Covariance Discriminant (MCD) estimator and its reweighed version which has a higher efficiency. Taking advantage of the availability of a fast algorithm for computing the MCD a simulation study will be done to evaluate the performance of this statistic. Keywords Linear discriminant analysis · Variable selection · Wilks’ lambda · Minimum covariance discriminant · MCD 1 Introduction The problem of discriminant analysis arises when one wants to assign an individual to one of g populations on the basis of a p-dimensional feature vector x. Usually the case is considered where the p-dimensional vectors xik come from multivariate normal populations

The presentation of material in this article does not imply the expression of any opinion whatsoever on the part of Austro Control GmbH and is the sole responsibility of the authors. V. Todorov (B) Austro Control GmbH, Vienna, Austria e-mail: [email protected]

396

V. Todorov

xik : πk ∼ N(μk ,  k )

(i = 1, . . . , nk ; k = 1, . . . , g).

(1)

Here nk is the size of the sample from population k for each of the g different groups. If it is further assumed that all covariance matrices are equal ( 1 = · · · =  g = ), the overall probability of misclassification is minimized by assigning a new observation x to population πk which maximizes dk (x) =

1 (x − μk )T  −1 (x − μk ) + log(αk ) 2

(k = 1, . . . , g),

(2)

where αk is the prior probability that an individual comes from population πk . If the means μk , k = 1, . . . , g and the common covariance matrix  are unknown, which is the usual case, a training set consisting of samples drawn from each of the populations is required. Often the data have been collected on a large number of variables and it is desired to reduce their number in the later stages of the analysis. Reasons to use fewer variables in the discrimination are: (1) easier estimation and interpretation; (2) reduced costs on data collection and processing; (3) to avoid weakening of the predictive power because of inclusion of irrelevant and redundant variables (see McLachlan 1992). A commonly used procedure for reduction of the number of variables in linear discriminant analysis is the stepwise method for variable selection, i.e. stepwise discriminant analysis, which could also be called stepwise MANOVA. This is a sequence of steps and at each step a variable is added or deleted. This procedure is usually based on the likelihood ratio test statistic  known as Wilks’  statistic. The Wilks’  statistic is the ratio of the within generalized dispersion to the total generalized dispersion. The within generalized dispersion is the determinant of the within-group sum of squares and cross-products matrix W and the total generalized dispersion is the determinant of the total sum of squares and cross-products matrix T (see e.g. Johnson and Wichern 2002 chapter 6, p. 299). This statistic (3) where det(A) means the determinant of A det(W) (3) = det(T) takes values between zero and one. Because  is a kind of inverse measure, values which are near zero denote high discrimination between groups while higher values indicate poor discriminantion. However, this measure which is based on the classical normal theory estimates as well as the inference based on it, can be adversely affected by outliers present in the data. The effect of outliers on the quality of the variable selection based on the classical Wilks’  statistics will be illustrated in the examples in Sect. 3 and in the simulation study in Sect. 4. The problem of the non-robustness of the classical estimates in the setting of the quadratic and linear discriminant analysis has been addressed by many authors, see for example Krusinska (1988), Todorov et al. (1990, 1994), Chork and Rousseeuw (1992), Hawkins and McLachlan (1997), He and Fung

Robust selection of variables in linear discriminant analysis

397

(2000), Croux and Dehon (2001), Hubert and Van Driessen (2004), Croux and Joossens (2005). Although there exists much literature on robust methods for discrimination, robust variable selection for discriminant analysis did not get much attention (this holds for variable selection in linear regression, too). Krusinska and Liebhart (1988) proposed to use the numerical equivalency of linear discriminant analysis and multiple linear regression in the case of only two groups in order to devise a robust variable selection procedure. After transforming the discrimination problem into the context of regression the ideas of Ronchetti (1985) can be applied. Later Krusinska and Liebhart (1989) proposed to use Mestimates of location and scatter in order to obtain a robust version of the Wilks’  statistics. This method was extended by Todorov et al. (1990) by using the high breakdown point MVE estimates instead of the M-estimates. Robust variable selection was illustrated by an example based on the well known Fisher’s Iris data but the computational burden did not allow to carry out a more thorough study of its performance. In this paper we propose to use the Minimum Covariance Determinant Estimator of Rousseeuw (1984) which is a highly robust estimator of location and scatter. Furthermore a fast algorithm for computing the MCD estimator is available—the FAST MCD of Rousseeuw and Van Driessen (1999). The adaptations of the MCD estimator for computing the common covariance matrix will be summarized in Sect. 2. In Sect. 3 illustrative examples are presented and Sect. 4 describes the design of the simulation study and its results. 2 Robust Wilks’  In order to obtain a robust procedure with high breakdown point for selection of variables in linear discriminant analysis we construct a robust version of the Wilks’  statistics by replacing the classical estimators by the reweighted MCD estimators. The Minimum Covariance Determinant (MCD) Estimator introduced by Rousseeuw (1984) looks for a subset of h observations whose covariance matrix has the lowest determinant. The MCD location estimate T is defined as the mean of that subset and the MCD scatter estimate C is a multiple of its covariance matrix. The multiplication factor is selected so that C is consistent at the multivariate normal model and unbiased at small samples—see Pison et al. (2002). This estimator is not very efficient at normal models, especially if h is selected so that maximal breakdown point is achieved, but in spite of its low efficiency it is the mostly used robust estimator in practise, mainly because of the existing efficient algorithm for computation as well as the readily available implementations in most of the well known statistical software packages like R, S—Plus, SAS and Matlab. This was also the main reason for choosing the MCD estimator in the present work. To overcome the low efficiency of the MCD estimator, a reweighed version is used. We start by finding initial estimates of the group means m0k and the common covariance matrix C0 based on the reweighted MCD estimates. There are

398

V. Todorov

several methods for estimating the common covariance matrix based on a high breakdown point estimator. The easiest one is to obtain the estimates of the group means and group covariance matrices from the individual groups (mk , Ck ), k = 1, . . . , g and then pool them to yield the common covariance matrix g

nk Ck C = gk=1 . k=1 nk − g

(4)

This method, using MVE and MCD estimates, was proposed by Todorov et al. (1990, 1994) and was also used, based on the MVE estimator by Chork and Rousseeuw (1992). Croux and Dehon (2001) applied this procedure for robustifying linear discriminant analysis based on S estimates. A drawback of this method is that the same trimming proportions are applied to all groups which could lead to loss of efficiency if some groups are outlier free. Another method was proposed by He and Fung (2000) for the S estimates and was later adapted by Hubert and Van Driessen (2004) for the MCD estimates. Instead of pooling the group covariance matrices, the observations are centered and pooled to obtain a single sample for which the covariance matrix is estimated. It starts with obtaining the individual group location estimates tk , k = 1, . . . , g as the reweighted MCD location estimates of each group. These group means are swept from the original observations to obtain the centered observations (5) Z = {zik }, zik = xik − tk The common covariance matrix C is estimated as the reweighted MCD covariance matrix of the centered observations Z. The location estimate δ of Z is used to adjust the group means mk and thus the final group means are mk = tk + δ

(6)

This process could be iterated until convergence, but since the improvements from such iterations are negligible (see He and Fung 2000; Hubert and Van Driessen 2004) we are not going to use it. The third approach is to modify the algorithm for high breakdown point estimation itself in order to accommodate the pooled sample. He and Fung (2000) modified Ruperts’s SURREAL algorithm for S estimation in case of two groups. Hawkins and McLachlan (1997) defined the Minimum Withingroup Covariance Determinant estimator (MWCD) which does not apply the same trimming proportion to each group. Unfortunately their estimator is based on the Feasible Solution Algorithm (see Hawkins and McLachlan 1997 and the references there), which is extremely time consuming as compared to the FASTMCD algorithm. Hubert and Van Driessen (2004) proposed a modification of this algorithm taking advantage of the FAST-MCD, but it is still necessary to compute the MCD for each individual group. A thorough investigation and comparison of these methods in the context of variable selection is worth doing,

Robust selection of variables in linear discriminant analysis

399

but in this work, for the sake of facilitating the computations we choose the method of pooling the observations. Using the obtained estimates m0k and C0 we can calculate the initial robust distances (Rousseeuw and van Zomeren 1991)  0 (xik − m0k )t C−1 0 (xik − mk ).

RD0ik =

(7)

With these initial robust distances we can define a weight for each observation xik , i = 1, . . . , nk and k = 1, . . . , g by  setting the weight to 1 if the corresponding 2 robust distance is less or equal to χp,0.975 and to 0 otherwise, i.e.  wik =



1

RD0ik ≤

2 χp,0.975

0

otherwise.

(8)

With these weights we can calculate the final reweighted estimates the group means mk , the within-groups sum of squares and cross-products matrix W, the between-groups sum of squares and cross-products matrix B and the total sum of squares and cross-products matrix T = W + B which are necessary for constructing the robust Wilks’ lambda statistic R as defined in Eq. 3 (Todorov et al. 1990).

mk =

n k 

 wik xik /νk ,

m=

 g 

i=1

W=

g nk  

 νk mk /ν,

k=1

wik (xik − mk )(xik − mk )t ,

(9)

k=1 i=1

B=

g 

νk (mk − m)(mk − m)t ,

k=1

T=

g nk  

wik (xik − m)(xik − m)t = W + B,

k=1 i=1

where νk are the sums of the weights within group k for k = 1, . . . , g and ν is the total sum of weights:

νk =

nk  i=1

wik

and

ν=

g  k=1

nk

400

V. Todorov

Substituting these estimates of the matrices W and T in Eq. (3) we obtain a robust version of the test statistic  given by R =

det(W) . det(T)

(10)

For computing the MCD and the related estimators the FAST-MCD (Rousseeuw and Van Driessen 1999) algorithm will be used as implemented in the R (R Development Core Team 2005) package rrcov. The stepwise selection procedure which will be considered here proceeds similarly to the classical one, described in detail in Jennrich (1977). The selection is guided by the robust R statistic and the corresponding F-statistic which can be used to to test the significance of the change in R by adding or removing a variable. This F-statistic serves as F-to-enter or F-to-remove. At each stage: 1. The variable with smallest F-to-remove is removed unless this value is greater or equal to the Fout threshold. 2. The variable with largest F-to-enter is entered unless this values is below the Fin threshold. 3. The procedure stops if no variable can be entered or removed. The same critical values as in the classical procedure are used although the distribution of the robust R statistic differs from the classical one. This issue deserves further investigation. An alternative approach would be not to consider the statistical properties of R but to use it as a descriptive statistic which provides a measure of the discrimination contribution of the variables and on its bases to devise an “all possible subsets” selection procedure. Other alternatives to the stepwise selection are the Forward selection (if step 1 above is omitted) and the Backward elimination (if we start with all variables in the model and omit step 2.). 3 Example For illustration of the method we use a data set containing measurements on 159 fish caught in the lake Laengelmavesi, Finland. For the 159 fishes of 7 species the weight, length, height, and width were measured. Three different length measurements are recorded: from the nose of the fish to the beginning of its tail, from the nose to the notch of its tail and from the nose to the end of its tail. The height and width are calculated as percentages of the third length variable. This results in six observed variables, listed in Table 1. Observation 14 has a missing value in variable Weight, therefore this observation was excluded from the analysis. The data set is available from (Fish Catch Data Set 2006). It is also included in the R package rrcov. First we apply the classical and the robust variable selection procedures to the original data. The upper part of Table 2 shows the results. The subsets of variables selected at each step by the two procedures as well as the values of the corresponding statistics  and R are reported. Both procedures deliver

Robust selection of variables in linear discriminant analysis

401

Table 1 Fish measurements data: Variables 1 2 3 4 5 6

Weight Length1 Length2 Length3 Height% Width%

Table 2 Fish measurements data: Results of the classical and robust stepwise variable selection on clean data and on data with one large outlier

Weight of the fish (in g) Length from the nose to the beginning of the tail (in cm) Length from the nose to the notch of the tail (in cm) Length from the nose to the end of the tail (in cm) Maximal height as % of Length3 Maximal width as % of Length3

Step Clean data 1 5 2 35 3 435 4 6435 5 16435 6 216435 One large outlier 1 3 2 13 3 213 4 5213 5 65213 6 465213



R

0.24467 0.01886 0.00221 0.00094 0.00052 0.00036

5 35 435 6435 16435 216435

0.07732 0.00493 0.00064 0.00019 0.00010 0.00006

0.39024 0.16192 0.07244 0.05999 0.00583 0.00166

5 35 435 6435 16435 216435

0.08142 0.00539 0.00070 0.00020 0.00010 0.00007

the same selection results in all steps i.e. the order of selection of the variables is the same. Then we introduce outliers in the data by modifying one observation— for example we change the value of the variable Length3 of observation 120 (belonging to class 7 = Perch) from 23.5 to 235, i.e. we move the decimal point one position to the right, which is a quite plausible error. This one outlier is enough to distort the classical selection procedure and it selects completely different sets of variables as seen from the lower part of Table 2. When the robust procedure is applied to the modified data, it delivers the same selection results as these obtained on the clean data in all steps. Figure 1 presents graphically the results. On the horizontal axes the variables in the order of their inclusion in the model are plotted. For each step the value of the Wilks’ lambda as well as the apparent error rate (AER) are shown. The apparent error rate (known also as resubstitution error rate or reclassification error rate) is commonly used estimator of the actual error rate in discriminant analysis (the true error is likely to be higher) and is calculated by applying the classification criterion to the same data set from which it was derived and then counting the number of misclassified observations—see Lachenbruch (1975). In the top two panels the results for the clean data are shown. It is clearly seen that both curves (Wilks’  and AER) computed by the classical and robust selection procedure follow the same pattern. The bottom two panels show the results obtained from the modified data (with one large outlier added). The left panel shows the completely changed order of selection of the variables by the classical

402

V. Todorov Robust − Clean Data 1.0

1.0

Classical − Clean Data

0.8

Wilks’ Λ AER

+ Length1

+ Weight

+ Width

Wilks’ Λ AER

+ Length1

+ Weight

+ Width

+ Length3

+ Length2

+ Height

START

+ Length3

+ Width

+ Height

+ Length1

+ Weight

+ Length2

START

0.0

0.0

0.2

0.2

0.4

0.4

0.6

0.6

0.8

Wilks’ Λ AER

0.8

+ Length3

Robust − one outlier 1.0

1.0

Classical − one outlier

+ Length2

+ Height

START

+ Length1

+ Weight

+ Width

+ Length3

+ Length2

+ Height

START

0.0

0.0

0.2

0.2

0.4

0.4

0.6

0.6

0.8

Wilks’ Λ AER

Fig. 1 Fish measurements data: Results of the classic (left) and robust (right) stepwise variable selection on clean data (top) and on data with one large outlier (bottom). The solid line represents the Wilks’  statistics (or R ) and the dotted line represents the apparent error rate (AER)

procedure, while the right panel (robust procedure) remains unchanged with respect to the order of the selected variables and almost unchanged with respect to the values of Wilks’  and AER. 4 Simulation The example with the Fish Measurement data set presented in Sect. 3 shows that using the robustified Wilks’ Lambda statistic is useful for a robust variable selection in linear discriminant analysis. In this section a simulation study will be conducted in order to give more insight into the performance of this robust procedure. The objective of this study is twofold: (1) to show that the robust procedure for variable selection is not influenced by the presence of a suitable amount of outliers in the data and provides similar results as the classical MLE procedure when this is applied to clean data and (2) to show that when used on uncontaminated data the robust procedure behaves similar to the classical MLE procedure. Distributions. The data will be generated from the following p dimensional normal distributions, where each group k = 1, . . . , g has a separate mean μk but all of them have the same covariance matrix , πk ∼ Np (μk , ),

k = 1, . . . , g.

(11)

Robust selection of variables in linear discriminant analysis

403

Further, we will restrict the number of groups to g = 2 and the common covariance matrix will be set to  = Ip . The number of the variables will be set to p = 8. The mean of the first group will be 0, i.e. μ1 = (0, 0, 0, 0, 0, 0, 0, 0) and the mean of the second group will be μ2 = (b, b, b, b, 0, 0, 0, 0), where b is a parameter of the simulation. Thus only the first four variables contribute to the discrimination between the two groups. Data sets with the following sample sizes will be generated: n1 = n2 = n = {50, 100, 200, 400}. An additional case with unequal size training samples will be considered where n1 = 50 and n2 = 200. These distributions will be contaminated in the following way (scale contamination): (12) πk ∼ (1 − )Np (μk , Ip ) + Np (μk , κIp ), k = 1, 2 where  = {0, 0.1, 0.2, 0.4} and κ = {3, 9, 100} are parameters of the simulation. In this way, the variation of the parameters n,  and κ results in 50 data sets. In order to make the comparisons sensitive to the differences in the compared selection procedures, the mean value b for the four discriminating variables was selected in such a way that classical stepwise discriminant analysis applied to uncontaminated data makes the correct variable selection in approximately 50% of the cases. If larger values are used for b the procedure would almost always select the right variables and on the other hand, if the value of b was to small, the procedures would rarely select the four discriminating variables. The used values are given in Table 3. Criteria. Having in mind the way in which the data sets were generated, the outcome of the variable selection procedure falls in one of four categories. • Correct:

if the stepwise procedure selects the four discriminating variables generated with different means in the two groups and only them • Overselected: if the procedure selects all four discriminating variables, but additionally some of the other variables • Underselected: if the procedure selects not all discriminating variables and does not select any of the other variables • Mixed: if the procedure did not select all four discriminating variables but selects some of the other variables We will denote this outcome, which characterizes the agreement of the observed result of the selection procedure versus the expected, as COUM. Another measure for the quality of the selected model and thus for the performance of the selection procedure is the Overall Probability of Misclassi-

Table 3 Values for b used for the related variables in the second group

n1

n2

b

50 100 200 400 50

50 100 200 400 200

0.80 0.50 0.35 0.24 0.56

404

V. Todorov

fication (OPM). The Overall Probability of Misclassification can be estimated by simulation (similar as in He and Fung 2000; Hubert and Van Driessen 2004). For this purpose we generate a test sample consisting of 2,000 observations from each group (with the known distribution), classify them using each of the estimated discrimination rules and obtain the corresponding proportions of the misclassified observations. This procedure is repeated N = 500 times and the mean and standard error of the probability of misclassification are calculated for each method. For each of the 50 types of data sets described above, classical MLE stepwise discriminant analysis is performed and COUM and OPM are recorded. Denote these by COUMMLE(clean) and OPMMLE(clean) , respectively. Then a contamination according to the selected parameters is applied and the robust and classical procedures are performed on the contaminated data set. Denote the results by COUMMCD , OPMMCD , COUMMLE , OPMMLE respectively. This is repeated N = 500 times. Results. First we will consider the variable selection procedures applied on the clean data. Figure 2 shows the values of the criterion COUM for the robust (MCD) and classical (MLE) procedures in this case for different sample sizes. Because of the special selection of the mean parameter, the classical procedure selects the correct variables in about 50% of the trials in all cases. The robust procedure performs only slightly worse. The picture changes when contamination is added—see Fig. 3. Even with 10% outliers ( = 0.1, κ = 9) the percent of correct selections of the MLE procedure drops to 20% and less and the percent of under-selection increases drastically. The robust procedure retains its selection capabilities and the results remain almost unchanged.

Fig. 2 Performance of the classical (MLE) and robust (MCD) variable selection procedures on clean data for different training sample sizes n1 = n2 = {50, 100, 400} and n1 = 50, n2 = 200

Robust selection of variables in linear discriminant analysis

405

Fig. 3 Performance of the classical (MLE) and robust (MCD) variable selection procedures on contaminated data with ε = 10% and κ = 9 for different training sample sizes n1 = n2 = {50, 100, 400} and n1 = 50, n2 = 200

0.30

k=9 k=9 k=100

0.0

0.1

0.2

0.3

0.50

MLE MCD

0.40

k=100

0.30

0.40

k=100

Probability of missclassification

MLE MCD

k=9 k=9 k=100

0.20

0.50

OPM: n1=n2 = 100

0.20

Probability of missclassification

OPM: n1=n2 = 50

0.4

0.0

0.1

0.2

ε

0.2

0.3 ε

0.4

0.50 0.40

k=100

0.30

Probability of missclassification

k=9 k=9 k=100

MLE MCD

k=9 k=9 k=100

0.20

0.50 0.40 0.30

k=100

0.1

0.4

OPM: n1=50, n2 = 200

0.20

Probability of missclassification

OPM: n1=n2 = 200 MLE MCD

0.0

0.3 ε

0.0

0.1

0.2

0.3

0.4

ε

Fig. 4 Overall probability of misclassification for different degrees of contamination in the training samples, achieved by the classical (solid line) and robust (dotted line) variable selection procedures (ε = 0, 10, 20, 40%; κ = 9, 100; n1 = n2 = 50, 100, 200 as well as n1 = 50, n2 = 200)

Since the main purpose of discriminant analysis is to find appropriate rules which can be used for classifying future observations we want to assess the performance of the proposed high breakdown procedure for model selection

406

V. Todorov

against this criterion. As described above, the classification ability of the classical and robust procedures are represented by the overall probability of misclassification which is estimated by simulation. In Fig. 4 are shown the results for different degrees of contamination in the training samples. The classification power achieved by the classical and robust variable selection procedures is shown for  = 0, 10, 20, 40%; κ = 9, 100 and for different size of the training samples: n1 = n2 = 50, 100, 200 as well as n1 = 50, n2 = 200. The four panels representing different sample sizes show more or less similar pattern. In the case of classical estimates (solid line), with increasing percent of contamination the overall probability of misclassification increases and this is much more pronounced for larger values of the scale inflection factor κ. The dotted line representing the MCD based estimates remains almost unchanged which shows the positive effect of the robust variable selection procedure on the classification performance of linear discriminant analysis. 5 Conclusions In this paper we propose a high breakdown point approach to variable selection in linear discriminant analysis based on the well known Wilks’  statistics, but utilizing MCD estimates of the multivariate multigroup location and scatter instead of the classical ones. This proposal is illustrated by analyzing a data set known from the literature and this example suggests that it is preferable to use the robust variable selection procedure when outliers are expected in the data. To get further insight into the behavior of the proposed procedure a simulation study is carried out which shows that the classical variable selection procedure based on the popular Wilks’  statistics is affected by the presence of outliers in the training samples while the robust procedure based on MCD is not much influenced by the contamination. On the other hand, when there are no outliers both procedures perform almost equally well. References Chork CY, Rousseeuw PJ (1992) Integrating a high breakdown option into discriminant analysis in exploration geochemistry. J Geochem Explor 43:191–203 Croux C, Dehon C (2001) Robust linear discriminant analysis using S-estimators. Can J Statisti 29:473–492 Croux C, Joossens K (2005) Influence of observations on the misclassification probability in quadratic discriminant analysis. J Multivar Anal 96:384–403 Fish Catch Data Set (2006) Journal of Statistical Education [http://www.amstat.org/publications/jse/datasets/fishcatch.txt] accessed January 2006 He X, Fung WK (2000) High breakdown estimation for multiple populations with applications to discriminant analysis. J Multivar Anal 72:151–162 Hawkins DM, McLachlan GJ (1997) High-breakdown linear discriminant analysis. J Amer Statist Assoc 92(437):136–143 Hubert M, Van Driessen K (2004) Fast and robust discriminant analysis. Computat Statist Data Anal 45:301–320 Jennrich R (1977) Stepwise discriminant analysis. In: Enslein K,A,R, Wilf HS (eds) Statistical methods for digital computers. Wiley, New York, pp 76–95

Robust selection of variables in linear discriminant analysis

407

Johnson RA, Wichern DW (2002) Applied multivariate statistical analysis, 5th edn. Prentice Hall, International Editions Krusinska E, Liebhart J (1988) Robust selection of the most discriminative variables in the dichotomous problem with application to some respiratory desease data. Biometric J 30(2):295–304 Krusinska E (1988) Robust methods in discriminant analysis. Rivista di Statistica Applicada 21(3):239–253 Krusinska E, Liebhart J (1989) Some further remarks on the robust selection of variables in discriminant analysis. Biometric J 31(2):227–233 Lachenbruch PA (1975) Discriminant analysis. Hafner Press, New York McLachlan GJ (1992) Discriminant analysis and statistical pattern recognition. Wiley, New York Pison G, Van Aelst S, Willems G (2002) Small sample corrections for LTS and MCD. Metrika 55:111–123 R Development Core Team (2005) R: A language and environment for statistical computing. R Foundation for Statistical Computing Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org Ronchetti E (1985) Robust model selection in regression. Statist Probab Lett 3:21–23 Rousseeuw PJ (1984) Least median of squares regression. J Am Statist Assoc 79:851–857 Rousseeuw PJ, Van Driessen K (1999) A Fast algorithm for the Minimum Covariance Determinant Estimator. Technometrics 41:212–223 Rousseeuw PJ, van Zomeren BC (1991) Robust distances: simulation and cutoff values. In: Stahel W, Weisberg S (eds) Directions in robust statistics, Part II. Springer, Berlin Heidelberg New York Todorov V, Neykov N, Neytchev P (1990) Robust selection of variables in the discriminant analysis based on MVE and MCD estimators. In: Proceedings in Computational statistics, COMPSTAT’90. Physica Verlag, Heidelberg Todorov V, Neykov N, Neytchev Pl (1994) Robust two-group discrimination by bounded influence regression. Computat Statist Data Anal 17:289–302

Suggest Documents