Robust PLS regression based on simple least ... - Semantic Scholar

Robust PLS regression based on simple least median squares regression Biagio Simonetti DASES University of Sannio [email protected];

Smail Mahdi Department of Computer sciences University of West indies, Barbados [email protected] Ida Camminatiello Dipartimento di matematica e statistica Università di Napoli Federico II, Italy [email protected]

Keywords:Multivariate regression, partial least squares regression, least median squares regression, simulation study. 1. Introduction A classical statistical problem is to estimate the linear relationship between two sets of variables, X n , p (explicative variables) and y n ,1 (dependent variable) where n is the number of statistical units and q the number of the explanatory variables. The technique which is largely used to solve this problem is the multivariate regression model y = a + Xb + e where a (n,1) is the intercept term, b ( p,1) the gradient and e (n,1) the error term. This linear model is defined if and only if p > n . If p < n , the inverse of X'X does not exist (or is unstable) and therefore the classical least squares regression model cannot be applied. The solution for overcoming this problem is offered by Partial Least Squares (PLS) Regression (Wold, 1975).

1

Garthwaite (1994) shows that linear combinations of the explanatory variables can be formed sequentially and related to y variable by ordinary least squares regression. Since PLS regression is very sensitive to the presence of outliers in the data, different algorithms of robust version have been proposed. In this paper, we propose a robust version of PLS regression, based on least median square regression (Rousseeuw, 1984). This median square regression is substituted for the least square regression systematically used in Garthwaite (1994) set up for forming the latent PLS factors.

2. Partial Least Squares Regression Partial Least Squares (PLS) regression is a multivariate data analysis technique which can be applied to relate a response variable (y) to several explanatory (x) variables. The method aims to identify the underlying factors, that is, the linear combinations of the x-variables, that best fit and model the y dependent variable. PLS can deal efficiently with data sets consisting of a large number of variables that might be highly correlated and involving substantial random noise. To describe the technique, let X (n, p) be a matrix with p explicative variables observed on n units and y a vector of the response variable. The PLS projects the X and y onto a subset of latent variables, t j (linear combination of X) and y (linear combination of Y) which maximize the term: j = 1,2,K a Cov(t j , y ) where a stands for the number of components to be retained in the model.

3. Gartwhaite PLS approach In order to make easier the interpretation of the regression coefficients, Gartwaite proposed an alternative approach of the PLS based on simple linear regression. To study the relationship between y and X he proposed the following form for the regression equation:

yˆ = b1t 1 + b2 t 2 + L + b p t p

2

where each component t j is a linear combination of the predictors

weighted by the regression coefficient b j . The following components are computed in a sequential manner as in PLS regression. 4. Least median Squares regression Rousseeuw (1984) proposed the Least Median of Squares (LMS) estimator which is obtained from the following optimization set up Mimimize median ei2  i   where ei is the residual of observation i. It turns out that this estimator is very robust with respect to outliers in Y as well as outliers in X. Its breakdown point is 50%. Furthermore, LMS estimator is equivariant with respect to linear transformation on the explicative variables, because it only makes use of the residuals. 5. PLS under LMS The PLS method is known to be very sensible to outlying observations, which are typically expected to be present in experimental data. This drawback of classical partial least squares regression has been heeded by several authors who propose different ways to construct a robust version of partial least squares regression. We propose here to mix Gartwaite PLS technique with Rousseeuw Least median of squares estimators and to study the performance of this new procedure by simulation. 6. A Simulation Study In this section will illustrate the statistical properties of R-PLS Regression in comparison with PLS and the principal robust techniques proposed in literature. First the efficiency of the estimators is investigated. Will generate 1000 samples of size n with components 3

randomly drawn from a normal with mean zero and standard deviation 0.001. References M. Hubert, S. Verboven, (2003), A robust PCR method for high-dimensional regressors. J. of Chemometrics, 17, 438–452 Garthwaite, P. H., (1994). An Interpretation of Partial Least Squares, Journal of the American Statistical Association, 89 (425), 122-27. Rousseeuw, P. J. , (1984), Least median of squares regression, Journal of the American Statistical Association, 79, 871-88. Vanden Branden, K., Hubert, M., (2002), A robustified version of the SIMPLS algorithm. Proceeding of the International Conference on robust statistics ICOR 2002, Canada. Vanden Branden K, Hubert M., (2003), The influence function of the classical and robust PLS weight vectors., submitted. Wold H., (1966), Estimation of principal components and related models by iterative least squares. Multivariate Analysis, Krishnaiah P. R. (Ed.), Ac. Press, New York. Wold, H., (1975), Soft Modeling by Latent Variables: the Non-linear Iterative Partial Least Squares Approach, in Perspectives in Probability and Statistics, Papers in Honour of M. S.

4

Robust PLS regression based on simple least ... - Semantic Scholar

Robust PLS regression based on simple least ... - Semantic Scholar

Suggest Documents

Partial Least Squares-regression ( PLS-regression

Partial Least Squares (PLS) Regression. - Google Sites

Robust Linear Model Selection Based on Least Angle Regression

ROBUST KERNEL-BASED REGRESSION USING ... - Semantic Scholar

Level Robust Methods Based on the Least ... - Semantic Scholar

A Test for Normality Based on Robust Regression ... - Semantic Scholar

Unbiasedness in Least Quantile Regression - Semantic Scholar

On Least Squares Estimation in a Simple Linear Regression Model ...

A New Robust Regression Method Based on

Robustness properties of a robust PLS regression ...

Maximum Likelihood Robust Regression with ... - Semantic Scholar

Robust Regression Shrinkage and Consistent ... - Semantic Scholar

Robust Minimax Probability Machine Regression - Semantic Scholar

Robust Regression Methods: Achieving Small ... - Semantic Scholar

Robust Pedestrian Classification Based on ... - Semantic Scholar

ROBUST ADAPTIVE BEAMFORMING BASED ON ... - Semantic Scholar

A Simple and Robust Vector-Based shRNA ... - Semantic Scholar

On Robust Regression in Photogrammetric Point ... - Semantic Scholar

Simple and robust watermarking scheme based on

A Joint Use of PLS Regression and PLS Path ... - Semantic Scholar

Robust and Simple Authentication Protocol - Semantic Scholar

Compressor map regression modelling based on partial least squares

Robust Methods for Partial Least Squares Regression

Robust mixture regression modeling using the least ...