Multi-task feature selection - CiteSeerX

Recommend Documents

A Minimum Description Length Approach to Multitask Feature Selection

May 30, 2009 - 6For convenience, we assume that Sender always transmits the intercept coefficient bÎ²1 and that, in fact, doing so is free of charge. Thus, even.

Multitask Feature Selection with Task Descriptors - Pacific Symposium ...

resulting statistical issues, the multitask learning framework proposes to learn different but related tasks joinlty, rather than ... these features can be used to aid diagnosis or design companion tests. However, l1- ..... Python code is available a

Feature Hashing for Large Scale Multitask Learning

Feb 27, 2010 - ation of the kernel-trick, which we refer to as the hashing-trick: one hashes the high dimensional input vec- tors x into a lower dimensional ...

Feature Hashing for Large Scale Multitask Learning

Feb 27, 2010 - original feature spaces are the cross-product of the data, X, and the set of tasks, U. We show that one can use different hash functions for each ...

Feature Hashing for Large Scale Multitask Learning

Feb 27, 2010 - tion 5 we introduce collaborative email-spam filtering as a novel application for ..... However, spammers target this memory-vulnerability by ...

Feature Selection for SVMs - CiteSeerX

y Barnhill BioInformatics.com, Savannah, Georgia, USA. yy CBCL MIT, Cambridge, Massachusetts, USA. AT&T Research Laboratories, Red Bank, USA.

Feature Selection for Ranking - CiteSeerX

the importance of the feature. We also define the correlation between the ranking results of two features as the similarity between them. Based on the definitions, ...

Combining feature selection and feature

constructed feature synthesizes the information contained in pairs of strongly ..... with the highest information of the pair and the white dot corresponds to the.

Semi-supervised feature selection via multiobjective ... - CiteSeerX

supervised feature selection and the semi-supervised clustering problem can be .... tion technique, the Silhouette Width [19], combined with the optimization of an ...... from http://dbk.ch.umist.ac.uk/handl/publications.html. [13] J. Handl and J.

Fire Fly Based Feature Selection Approach - CiteSeerX

2 University of Delhi, Department of Computer Science. Delhi, India. Abstract ... ISSN (Online): 1694-0814 www.IJCSI.org ... attributes from P. The equivalence classes of the P- indiscernibility ..... commerce. She has 5 publications to her credit.

Feature Selection Using a Multilayer Perceptron - CiteSeerX

Nov 7, 1989 - Steven K. Rogers. Matthew Kabrisky .... to date is back propagation which was originally developed by Paul Werbos [17], rediscovered by. 2 ...

Feature Selection Using a Multilayer Perceptron - CiteSeerX

Nov 7, 1989 - will be briefly discussed. The general classification problem consists of extracting a set of features ..... 100 networks on a set of image recognition data. ... four object classes: tank, jeep, 2.5-ton truck, and oil tanker truck. The

Optimizing Feature Selection for Recognizing Handwritten ... - CiteSeerX

and recognition of characters on HACR is done on three main stages: preprocessing, feature extraction, and recognition. Images for handwritten text are first ...

Scalable feature selection, classification and signature ... - CiteSeerX

We describe the design and implementation of our system, stressing how to exploit standard, efficient relational op- erations like sorts and joins. We report on ...

Feature Selection for Modeling Intrusion Detection - CiteSeerX

I.T. Department, Sardar Vallabhbhai Patel Institute of Technology, Vasad -388306, Gujarat, India. Email: [email protected]. Sameer Singh Chauhan.

Feature Selection with Adjustable Criteria - CiteSeerX

Feature Selection with Adjustable Criteria. J.T. Yao. M. Zhang. Department of Computer Science, University of Regina. Regina, Saskatchewan, Canada S4S 0A2.

Feature Selection in Imbalance data sets - CiteSeerX

Feature selection methods have been used these days in the various fields. Like information retrieval and filtering, text classification, risk management, web ...

Feature Selection for Brain-Computer Interfaces - CiteSeerX

discussions, for support and funding of the BCI project. ... Brain-Computer Interface (BCI) systems are a means of establishing communication for ...... The time necessary for an assistant medical technician to prepare a subject is 10 to 60 ...

Feature Selection for Partial Discharge Diagnosis - CiteSeerX

system designer typically does not have much difficulty to obtain a decent ... insulation systems, especially high voltage insulation systems, such as generators,.

Feature Band Selection for Multispectral Palmprint ... - CiteSeerX

Multispectral palmprint imaging and recognition can be a potential solution to such systems because it can acquire more discriminative information for personal ...

Feature Selection using PSO-SVM - CiteSeerX

Several methods have been previously used to perform feature selection on training and testing data, for example ... (Advance online publication: 13 February 2007) .... automatically trade-off accuracy and complexity by minimizing an upper ...

Feature Selection for Brain-Computer Interface - CiteSeerX

power spectral density (PSD). For each frequency ... band power was computed as the sum of all PSD .... bootstrap significance test with 100 permutations [5].

Normalized Mutual Information Feature Selection - CiteSeerX

methods that select the best subset of the original feature set. Feature ..... nonzero width histogram bins or kernel functions, this will not happen, because ...... 1998 [Online]. Available: http://www.ics.uci.edu/~mlearn/MLReposi- tory.html.

Consistency measures for feature selection - CiteSeerX

practical learning scenarios it may be better to use a reduced set of features (Kohavi. & John, 1997). ... consider two of them as the most common and important: choosing the best feature ..... As an illustration of this experiment, in Fig. 3 there i

Multi-task feature selection - CiteSeerX

Download PDF

51 downloads 26191 Views 402KB Size Report

Comment

[email protected]. Michael Jordan [email protected]. Technical Report. June 2006. Department of Statistics. University of California, Berkeley.

Multi-task feature selection

Guillaume Obozinski Ben Taskar Michael Jordan

[email protected] [email protected] [email protected]

Technical Report June 2006 Department of Statistics University of California, Berkeley

Abstract We address the problem of joint feature selection across a group of related classification or regression tasks. We propose a novel type of joint regularization of the model parameters in order to couple feature selection across tasks. Intuitively, we extend the `1 regularization for single-task estimation to the multi-task setting. By penalizing the sum of `2 -norms of the blocks of coefficients associated with each feature across different tasks, we encourage multiple predictors to have similar parameter sparsity patterns. To fit parameters under this regularization, we propose a blockwise boosting scheme that follows the regularization path. The algorithm introduces and updates simultaneously the coefficients associated with one feature in all tasks. We show empirically that this approach outperforms independent `1 -based feature selection on several datasets.

1

Introduction

We consider the setting of multi-task learning, where the goal is to estimate predictive models for several related tasks. For example, we might need to recognize speech of different speakers, or handwriting of different writers, learn to control a robot for grasping different objects or drive in different landscapes, etc. We assume that the tasks are sufficiently different that learning a specific model for each task results in improved performance, but similar enough that they share some common underlying representation that should make simultaneous learning beneficial. In particular, we focus on the scenario where the different tasks share a subset of relevant features to be selected from a large common space of features. 1

1.1

Feature selection for a group of tasks

Feature selection has been shown to improve generalization in situations in which many irrelevant features are present. In particular, the regularization by the `1 -norm implemented by the LASSO procedure [12] has been shown to yield useful feature selection. Model fits obtained under the `1 -norm penalty are typically sparse in the sense that only a few of the parameters are non-zero and the resulting models are often more accurate and more easily interpretable. Fu & Knight [7] characterize the asymptotic behavior of the solutions and their sparsity patterns and Donoho [5] shows how for some large linear systems of equations the `1 regularized solution achieves in a certain sense optimal sparsity. Recent papers [6, 11, 15] have established clearer correspondences between the regularization path of the LASSO and the path obtained with certain boosting schemes. To learn models for several tasks, the `1 regularization can obviously be used individually for each task. However, in the multi-task setting we prefer a regularization scheme that encourages solutions with shared pattern of sparsity. For example, when learning personalized models for handwriting recognition for each writer, we expect that even though individual styles vary, there is a common subset of image features (pixels, strokes) shared across writers. If we consider the entire block of coefficients associated with a feature across tasks as a unit, we can encourage sparsity at the block level, leading to estimated parameters where several blocks of coefficients are set to zero.

2

Joint Regularization for Feature Selection

2.1

From single tasks to multiple tasks

Formally, we assume that there are L models to learn and our training set consists of the samples {(xli , yil ) ∈ X × Y, i = 1 · · · Nl , l = 1 · · · L} where l indexes tasks and i indexes the i.i.d. samples for each task. We assume that the input space X is