SVM and its application in intelligent project ...

2 downloads 60 Views 400KB Size Report
Aug 23, 2004 - SVM, prior knowledge, fuzzy logic and neural network, is proposed. ..... achieved atu = 0.5 with a large fraction of SVs 0.9134. However, the ...
Proceedings of the Third Intemational Conference on Machine Learning and Cybemetics, Shanghai, 26-23 August 2004

EXPLANATION BASED GENERALIZED E -SVM AND ITS APPLICATION IN INTELLIGENT PROJECT MANAGEMENT YOU-FASUN,FEI-QI DENG College of Automation Science & Engineering, South China University of Technology, Guangzhou510610, China E-MAIL: [email protected],[email protected]

Abstract: Support Vector Macbine works well in classifying populations characterized by abrupt decreases in density functions. Its generalization accuracy, however, is not always optimal in dealing with real world problems with neither Gaussian distributions nor sbarp boundaries. incorporating domain theory about problem and excellent intelligent techniques in machine learning into SVM becomes one of promising alternatives. In this paper, a novel approach, explanation based generalized E -SVM, which synthesizes SVM, prior knowledge, fuzzy logic and neural network, is proposed. Prior knowledge is expressed as a trained fuzzy neural network. An optimal subset of features is obtained by dynamically reducing feature space dimensionality according to the training derivatives extracted from nehvork. By examining a subset of the practical data sampled from Guangdong Natural Seience Foundation and testing the remaining set of data, application shows that explanation based generalized E -SVM performs better than that pure SVM and

other traditional classifiers. Keywords: Rationalization principle; domain theory; explanatiou based; training derivative; SVM; SRM 1.

Introduction

Following the seminal work of Vapnik[l], support vector machine (SVM) appears to be a popular tool in pattern recognition problems characteriza by abrupt decreases in density functions. However, since there are neither Gaussian populations nor data with sharp linear boundaries in real world where more than a fixed number of supporting vectors contribute to correct classification [2], the generalization performance of pure SVM is not always optimal. How to make SVM work well in such area remains one of key open questions. Combining SVM with domain theories and other excellent intelligent techniques such as fuzzy logic, neural network, genetic algorithm, rough set, et al. remains one of potential altematives. Until now, few theoretical studies of explanation based generalized E -SVM and its applications in intelligent

management characterized by real-life phenomena [ 111. are reported. Evaluation of project candidates (P.C.) [lo, 111, the critical part of scientific research project management, is just the case. Epistemologically, evaluation of a P.C. consists of at least three parts: the P.C. itself; juries (experts) and interactions between P.C and juries. In other words, a P.C. cannot be objectively valued without fully considering these parts of information. Therefore, it is quite necessary to model this problem by taking into account: indictors of P.C., faculty database of experts, and interaction effects. Most canonical approaches are ineffective or invalid due to false methodology or improper models or ineffective techniques applied [Ill. In this paper, a new approach, explanation based generalized E -SVM, is proposed. We f m t express prior knowledge about evaluation of P.C.s as a previously trained fuzzy neural network with the training data prepmessed with rationalization principle, then dynamically reduce the feature space dimensionality according to training derivatives of target function with respect to each feature, and yielding an optimal subset of original features, and finally apply them into SVM. Results show our method has a strong generalization performance. The rest of paper is organized as follows. After introduction in this section, rationalization principle for evaluation is described in Section 2. Section 3 discusses the methodology of our explanation based learning for feature space dimensionality reduction. Section 4 intmduces general theory of E -SVM for classification. Application and computational processes are analyzed in section S. Finally, conclusions and M e r researches are addressed in Section

6. 2.

Prindple of rdonalization

Generally, the evaluation result of a P.C. may vary greatly due to different methods. Which one is the best or closest to its real value? Here. we apply rationalization principle to obtain the almost "real" value. In our model,

0-7803-8403-2/04/$20.0002004 IEEE 3454

Proceedings of the Tbird International Conference on Machine Learning and Cybemetics, Shanghai, 26-29 August 2004 expert’s credibility is introduced as the weight how much his single evaluation accounts for the final synthetic evaluation. Calculation of an Expert’s Credibility mainly bases on his title, position and information about how much he manifests knowledge abiliry with P.C.,et al [I I]. By the way, the last factor accounts largely for expert’s credibility, which indirectly reflects the interaction effect between an expert and a P.C.. Let us denote by s the number of experts who participate in evaluating a P.C.. By w, (where i = I, 2,. .,s ) is denoted an Expert’s Credibility and hyw=(w,,w,,...,w,)the whole expem’ Credibility. Given a P.C. with n attributes, denote by uti

.

(where i=1,2,~~.,s:j=l,2;..,n ) the value of the jth attribute of the P.C. evaluated by the ith expert. The yielding s ’ n values form a sxn matrix denoted by U = ( U , ) , , .Denote by ii, the fmal synthetic value of jrhattribute and by C=[U;,C2,...,iin]the final synthetic evaluation of the P.C.. According to the principle of rationalization, we can get a much objective valueg :

ii = [U;.U;,...,E“] = w. U = w. (U. 3.

)% ‘“

3.1. Expression of domain theories A standard structured neural network (NN) with two layers of sigmoid units (one hidden layer and one output layer) is constructed to learn the domain theories concealed in the evaluation data of P.C.S. The input to the fuzzy neural network is n attribute’s values obtained f”the synthetic evaluation with the rationalization principle. The learning task here involves classifying P.C.s into “Yes” and “No”.

33.

Extraction of training derivatives

Consider a learning task involving an instance space X and target function f . Assume that each training example can be expressed as an ordered pair c x , , f ( x , )> where xi i , by a, denotes the attribute. Following a process very similar to calculating the 6 terms in the Back-propagation algorithm for an artificial neural network [SI, we can calculate the partial derivative of the target function with respect to each attribute, yielding the set of derivatives: [af(x,)/a(l, ,af(x,)/aa, .,-.+Jfcx, p a , (2)

]Ix-+

(1)

To make these training derivatives sense, consider the

Explanation based learning for feature space dimensionality reduction

Generally, all available attributes can serve as the inputs of SVM, but irrelevant or weak features can deteriorate the generalization accuracy. To develop a good SVM forecaster, the first important step is feature selection. In the framework of SVM, several approaches for feature selection are available. In [4], Bradley and Mangasafian find that SVM with I-norm regularized term is an indirect approach of f e a m selection. Weston et al. [5] propose the gradient descent method to SVM for feature selection. In [61, F.E.H. Tay et al. apply saliency analysis and genetic algorithm for selecting important features. However, these approaches are complex or compiltationr!ly inf&b!e md neglect domain theories at all. In this paper, we propose explanation based learning for feature selection, which is conceptually simple and computationally feasible. First, domain theories are expressed as a set of previously learned fuzzy neural networks. Then, for each training example, the training derivatives are extracted from domain theories. Finally, according to the training derivatives, we dynamically prune the neural network “input” unit to obtain a solution of optimal features.

derivativeaf(x,)/au,lPx,

. If the domain theory encodes the

knowledge that the attribute

U,

is inlevant to the target

function, the derivative af(x,)/&,(,,

extracted from the

explanation will have the value Fro. A derivative of zero corresponds to the assertion that a change in the attribute U, will have no impact on the predicted value off@,). On the other hand, a large positive or negative derivative corresponds to the assertion that the feature is highly relevant to determining the garget value. Thus. the derivatives extracted from the domain thmry explanation provide important information for distinguishing relevant from irrelevant features [9].

3.4.

Pczturc sclcctioo

After the mining derivatives are. extracted, we dynamically modify the neural network structure by shrinking the number of network “input” and corresponding “hidden” unit. Beginning with a previously trained network structured by n x [ & ] x l , for each iterative step, we prune one “input” attribute unit corresponding to the least salient training derivative until the network gets the highest generalization accuracy. The final remaining “input” attributes are the

3455

Proceidmgs of the Third International Conference on Machine Learning and Cybernetics, Shanghai, 26-29 August 2004 optimal features which can guarantee accomplishment of highest accuracy. 4.

Theory of generalized E -SvM for cldieation

The training data points can be expressed as (x, ,yl).(x2, y,),...,(xi, y,) ( x i E R" is the transformed input vector, m 5 n ; y , e R is the target value), SVM approximates the decision function using form f ( x ) = w.@(x)+bwhere@(x)representsa highdimensional feature space which is nonlinearly mapped from the input spaces. The coefficients w and b bare estimated by minimizing the regularized risk function (3). minimize

+llwf + ~ ~ ~ , ( Y , . . M . ~ , ) ) ) (3)

Such E retained, however, is on the assumption that the E -insensitive zone has a tube shape. Indeed, E is not necessary to be a constant. It can be any non-negative variable. That is to say, such E -tube in parametric models is of arbitrary shape. We call SVM with this kind&-tube generalized E -SVM. This new parametric insensitivity models can be especially useful in situations where the noise depends on x (this is called hetemscedastic noise). In this paper, application of evaluation of P.C.s in intelligent project management is just the case. Let (here and below, q = l , Z . . . . , p is

c{};

understood) be a set of 2 p positive functions on the input space x . Then the quadratic program (5) can be substituted by the following one. Forgivenu~7,u:?,...,u(*) P -> O ,

i=1

The f i t term [(wfis called the regularized term. Minimizing 11wfwill make a function as flat as possible. thus playing the role of controlling the function capacity. 1

The second term -xL,(yi,f($(x,))) is the empirical 1 3=1 error measured by the E -insensitive loss h c t i o n (4). This loss function provides the advantage of using sparse data points to represent the designed function f ( x ) C is referred to as the regularization constant. E is the tube size of SVM. Both parameters are determined empirically. To get the estimations of w and b , Eq. (3) is transformed to the primal objective function (5) by introducing the positive slack variables 6,and for two casesf(x,)-y, > E and y, - f ( x , ) > & respectively.

.

subied $0 (W44q))+b-Yi ~ E + G , yi -(w,$(+))-b

(5)

4

i=i

Suggest Documents