Elastic Net Regularization in Learning Theory ... Center for Biological and Computational Learning, Department of Brain
Elastic Net Regularization in Learning Theory
Christine De Mol
[email protected] Department of Mathematics and ECARES, Universit`e Libre de Bruxelles, Campus Plaine CP 217, Bd du Triomphe, 1050 Brussels, Belgium Ernesto De Vito
[email protected] Universit` a di Genova, Stradone Sant’Agostino, 37, 16123, Genova, Italy and INFN, Sezione di Genova, Via Dodecaneso 33, 16146 Genova, Italy Lorenzo Rosasco
[email protected] Center for Biological and Computational Learning, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 and DISI, Universit`a di Genova, v. Dodecaneso 35, 16146 Genova, Italy
Abstract In many applications of supervised learning a main goal, besides achieving good generalization properties, is to detect which features are meaningful to build an estimator. There at least two main difficulties in the solution of this type of problems: the initial number of potentially relevant features is often much larger than the number of examples and it is often the case that many of the variables (also those that are relevant) are strongly dependent. Note that if our criterion to select variables is only their generalization property, when faced to select two highly dependent variables we would obtain the same discriminative power selecting either one or both features. Both the above issues make the problem of variable selection ill-posed and suggest that the minimizer of the empirical risk is prone to overfit the data. Our study aims at exploring the use of regularization techniques for restoring well-posedness and for ensuring statistically meaningful solutions within the framework of statistical learning. A key to solve the above problems is assuming the number of relevant features to be small. Such a sparsity assumption advocates for the use of sparsity enhancing learning algorithms and indeed this class of methods have recently attracted increasing attention as a way to deal with high dimensional data. In this paper we wish to develop a sparsity based algorithm which can deal with features which may be dependent and infinite in number. To this aim we study a regularization procedure based on penalized empirical risk minimization with a penalty that is a weighted sum of an `1 and an `2 norm, namely elastic-net regularization (Zou & Hastie, 2005). Such a method allows
to obtain estimators that are both sparse and stable with respect to the data. The `1 term in the penalty promotes sparsity of the obtained estimator. The effect of the `2 term is twofold: first, a whole group of correlated relevant variables is selected rather than a single variable in the group; second, when the variables are dependent, stability is ensured both with respect to noise and random sampling. Our study focus on the generalization properties as well as the algorithmic properties of the elastic net regularization. We prove that if the hypotheses space induced by the set of feature is sufficiently rich the algorithm is universal consistent. Under suitable assumptions on the target function we can derive sample bounds and when such assumption are not available we propose an adaptive parameter choice to achieve the same rates. A key in our study is the characterization of the optimization problem defining elastic net using tools from convex analysis. As a by product we can define an iterative thresholding algorithm (Daubechies et al., 2004) to compute the regularized solution which is very easy to implement and it’s an alternative to the LARS iteration used in (Zou & Hastie, 2005).
References Daubechies, I., Defrise, M., & De Mol, C. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math., 57, 1413–1457. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. JRSSB, 67(2), 301–320.