svm-based incremental active learning for user adaptation for online

0 downloads 0 Views 297KB Size Report
Nov 5, 2002 - FOR ONLINE GRAPHICS RECOGNITION SYSTEM. BIN-BIN PENG ..... irrelevant with polygon's size, position, and rotation direction. Therefore .... Question 2: Can the incremental active learning save much of the training time ...
Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002

SVM-BASED INCREMENTAL ACTIVE LEARNING FOR USER ADAPTATION FOR ONLINE GRAPHICS RECOGNITION SYSTEM BIN-BIN PENG, ZHENG-XING SUN, XIAO-GAN XU State Key Lab for Novel Software Technology Nanjing University,Nanjing 210095, PR China Department of Computer Science and Technology, Nanjing University,Nanjing 210095, PR China. E-MAIL: [email protected], [email protected], [email protected]

Abstract: User adaptation is critical in the future design of humancomputer interaction systems. Many pattern recognition problems, such as handwriting/sketching recognition and speech recognition, are user dependent since different users’ handwritings, drawing styles, and accents are different. Hence, the classifiers for solving these problems should provide the functionality of user adaptation so as to let all users experience better recognition results. However, the user adaptation functionality requires the classifiers have the incremental learning ability in order to learn fast. In this paper, an SVMbased incremental active learning algorithm is presented to solve this problem. By utilizing the support vectors and only a small portion of the non-support vectors as well, in addition to the new interrogative samples, in the iterative training and reclassification cycle, both the training time and the storage space are saved with only very little classification precision being lost. Theoretical analysis, experimentation, evaluation, and real application samples in our on-line graphics recognition system are presented to show the effectiveness of this algorithm.

Keywords: Active learning; Incremental learning; Support vector machines; User adaptation

1

Introduction

User adaptation is a classical and important problem in user interface study. Many pattern recognition problems deal with users’ specialization, for users’ handwritings, drawing styles, and accents are different. The purpose of user adaptation is to recognize the users’ intention. The intentions of different users are various and even inconsistent. Take on-line graphics recognition [1] as an example. Different users have different drawing habits/preferences. User Adaptation System can be defined as an interactive system that adapts its behavior to individual on the basis of process of user model acquisition and application that involve some form of learning, inference, or decision making [2]. There are two approaches for such problems, rule-based approaches and statistic machine learning based approaches. For rule-based cases, some parameters are

available to be adjusted through user feedback. This strategy has been widely used in the research area of information retrieval and pattern recognition. Unfortunately, although the system by user feedback can adapt to a new user, the universality to other users may usually not be preserved. In order to maintain the system’s general performance at a relatively high level, the retraining process should involve all the historical samples besides the new samples of this particular user. The strategy based statistic machine learning is to combine the historical samples and new samples into one huge sample set and then retrain the classifier together. It is clumsy and uneconomic in terms of both time and space. As an ability to handle the issues of scalability and expandability of learning systems, incremental learning has attracted much attention in recent years. It can utilize the historical training results to make the retraining process much faster and avoid full usage of all previous samples to make the storage cost much smaller. And there are two advantages: computation efficiency and storage economy. By employing incremental learning, the user adaptation process may terminate in less time and make the user experience much better (with smoother interaction). Most incremental learning Algorithms is based on decision-trees and neural networks[3][4][5][6]. These Algorithms are short of expected risk controlling mechanism over the whole sample set, and they could not discard samples optimally. Syed et al.[7] retrains classifier with all SV[8] and newly input samples. Xiao et al.[8] proposed a different incremental learning approach for SVM based on the boosting idea. Over the past years, progress has been made to construct multi-class classifier based on the SVM theory. Platt et al.[10] adopted the Decision Directed Acyclic Graph (DDAG) to combine many binary (two-class) classifiers into a multiclass classifier. Weston et al.[11] proposed an extension to solve multi-class classification problems in one step. Generally speaking, these approaches are either too complicated or time consuming. One-against-all structure is the easiest to be implement, and most current multi-class utilities, such as SVMTorch[12] and SVMLight[13], are based on one-against all structure. However, this structure is not suitable for incremental learning purpose. We will analyze

0-7803-7508-4/02/$17.00 ©2002 IEEE 1379

Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002 this problem in section 3 and section 4. Active learning can be used to find a way to choose good requests or queries from the samples. There are two different strategies to actively learn because of the different strategies collecting samples. One is Pool-based active learning[14][15][16][17]. In this method the leaner has a pool of unlabeled data and can request the true class label for a certain number of instances in the pool. Another is the incremental active learning [18][19]. In this method the leaner must actively learn from the different inputted dataset. In this paper, we present an SVM-based incremental active learning algorithm for use adaptation. By analyzing the inputted instances, the machine only selects some that he thinks in question, submits them to the user to determine the right class and then trains the classifier with the interrogative ones and historic support vectors. As a result, both training time and storage space are saved, while only very little classification precision is lost. This algorithm forms the basis of the multi-class classifier implemented in our on-line graphics recognition system. Experimental results show the effectiveness and efficiency of this algorithm. The paper is organized as following. Section 2 presents a brief introduction of SVM and the SVM-based incremental active learning algorithms. Our solution for user adaptation, which is based on a multi-class SVM classifier with incremental active learning ability, is presented in Section 3. Experimentation and performance evaluation are given in Section 4. Finally, section 5 draws conclusions. 2

The SVM-based Incremental Active Learning Algorithm

Support Vector Machine (SVM) is a new and promising pattern recognition technique developed by Vapnik and his research group[8]. SVM is based on the theory of Structural Risk Minimization, and can gain good performance in little samples without over fitting. 2.1

An SVM-based incremental active learning algorithm

The main idea of Support Vector Machine (SVM)[8] is to construct a nonlinear kernel function to map the data from the input space into a possibly high-dimensional feature space and then generalize the optimal hyper-plane with maximum margin between the two classes. Hence, it is basically used for binary (positive or negative) classification. Given a training set of l examples F = {( x1 , y1 ),Λ ,( xl , yl )} , where yi ∈ {+1,−1} , we project the original training data to a higher dimensional feature space ƒ via a Mercer Kernel[20] operator K(x,y) = Φ(x)⋅ Φ(y) , where Φ : F → f , so as to make these samples linearly separable in the higher dimensional feature space.

Now we can build the hyper-plane that separates the training data by maximal margin in the higher dimensional feature space ƒ and the binary classifier is:   (1 ) f ( x ) = sgn  yi ∑α i ( Φ(x ) ⋅ Φ( xi ) ) − b x ∈SV  i  Where αi can be gotten by solving the following optimization problem. The optimization problem can be stated as: Minimize the objective function l

w(α ) = ∑ α i − i =1

1 l ∑α iα j yi y j Φ(xi ) ⋅ Φ(x j ) (2) 2 i , j =1

Subject to l

∑α y i =1

i

i

= 0 , α i ≥ 0 , i = 1,2 ,...,l .

So, the hyper-plane can be then expressed as:

∑ α y K (x , x ) + b = 0 , 0 < α

xi∈SV

i

i

i

i

≤C

(3)

The training instances that lie closest to the hyper-plane are selected as the support vectors that always have nonzero αi coefficients. Therefore, the SV set can fully describe the classification characteristics of the entire training set. Because the training process of SVM involves solving a quadratic programming problem, the computational complexity of training process is much higher than a linear complexity (the number of training samples). Hence, if we train the SVM on the SV set instead of the whole training set, the training time can be reduced greatly without much loss of the classification precision. This is the main idea of the incremental learning algorithm, such as [7][9]. Although this method can save much time, we can save much more time by selecting only a small portion of samples, which are close to the hyper-plane, into the training data set. This is the main idea our active learning algorithm. The main issue with active learning is to find whether the inputted data is in question and to submit these instances to the user to determine its right class. In the questing process (denoted as Quest()), the machine quest the interrogative instance by the distance between the point and hyper-plane. The distance can be defined as: DIS( x , w ) =

n

∑α y ( Φ(x )⋅ Φ(x ) ) + b , i =1

i

i

i

xi ∈ SV (4)

In simplification, we focus on binary classification problem in our discussion on incremental active learning, which can easily be extended to multi-class classification problem. Four extra sample sets are used in our further discussion. They are initial training set (IS), incremental training set (INS), new set (NS) and working set (WS), respectively. So our incremental active learning process can be illustrated as:

1380

Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002 3. All polygons with the same vertex number are included in the same shape type. For instance, all squares, rectangles, trapezoids, and parallelograms are regarded as quadrangles. 4. No hooklet in the beginning or ending of one stroke (as Figure1(c)). 5. No circlet at the corner of the two lines (as Figure1(d)).

Table 1. Incremental Active Learning Algorithm ALGORITHM: INCREMENTAL ACTIVE LEARNING ALGORITHM STEP 1. Γ=TRAIN(IS), WS=ISSV; STEP 2. NS=QUEST(ins), ws=WS∩NS; STEP 3. Γ=TRAIN(WS), WS=WSSV. STEP 4.GOTO STEP 2. 2.2

Computational complexity

In the training process, the training set µ is divided into two sub-sets: the SV set (µsv) and the non-SV set (µnon-sv). If most SVs are not on the classification boundary, the computational complexity of the training process can be specified in Eq. 4. O ( µ sv

3

+ µ × µ sv

2

+ d µ × µ sv )

Generally speaking, µsv

Suggest Documents