Support Vector Machines - Google Sites

2 downloads 348 Views 185KB Size Report
Porting some non-trivial application to SVM tool and analyze. OR а. Comparison of Neural Network and SVM using tools li
Support Vector Machines Saurabh Joshi-(03305R02), Nitin Agrawal-(03305019) and Vaibhav Gupta -(03305903) (sbjoshi,nitina, [email protected])

Guide : Prof. P Bhattacharya

Oct. 2004

Support Vector Machines

Oct. 2004

Motivation

Getting stuck at local minima is a problem Considering only training error may result in overfitting Time required for training Finding out optimal number of hidden units is a problem

:

1

Support Vector Machines

Oct. 2004

Perceptron Training Algorithm



Let Repeat if



 









 

 





 



then











 

 

Until some stopping criterion met 



where

 









So final



:

2

Support Vector Machines

Oct. 2004

Dual Representation

Decision function can be written as 

 















 



  

 

Update rule can now be written as then if

 

















  

  





 



 



:

represents the amount of information provided by point

3

Support Vector Machines

Oct. 2004

Margin

(a)

(b)

Better generalization is expected in case of fig (b).

:

4

Support Vector Machines

Oct. 2004

Maximum Margin Classifier



Equation of a hyperplane is



 



  

  

Geometric distance of a point from the hyperplane is

 









 

then margin





If for closest point



 

would increase Geometric margin





:





Minimizing

5

Support Vector Machines

Oct. 2004

Maximum Margin Classifier (Contd.)

Quadratic Programming



Minimize















for









subject to





Optimal separating hyperplane is the one with the maximum margin There is a unique optimal hyperplane Greater the margin better the generalization :

6





































 







 































 





 



















 

 























 

   

 



 















 









7 :

, Subject to

where 

Maximize

, therefore

, at optimum

Oct. 2004 Support Vector Machines

Primal and Dual Lagrangian

Dual Problem

Support Vector Machines

Oct. 2004

at solution 



 

 



 







 



  





 





only when

 



so







points with

 

are called support vectors

support vectors are the only points necessary for decision function support vectors in a sense support the decision surface 















 



















 

 

  

Decision function is now

for any support vector





Bias can be calculated as





:

8

Support Vector Machines

Oct. 2004

Soft Margin Classifier

Non-linearly separable data - Linear Decision Surface





 











Minimize Subject to

 







 











 



for









indicates amount of constraint violation by point

indicates how much do we penalize for the violation of constraint controls the trade-off between Generalization error and Training error :

9

Support Vector Machines

Oct. 2004

Contd.

W ξ

Margin

:

10

Support Vector Machines

Oct. 2004

Contd.

Corresponding Dual is





  

















 





 

Maximize Subject to





   







 

 

,

 



 









 









at optimum point we have



 





The decision function is same as in linearly separable case with increased constraint 





:

11

Support Vector Machines

Oct. 2004

Non-linear Decision Surfaces

Map input space to high dimensional Feature space Introduce a linear decision surface in High Dimensional space φ





Mapping







:

12

Support Vector Machines

Oct. 2004

XOR example

x3

1

x2

0 x1 1

0













 



 





























,





  

,

equation of a hyperbola is changed to equation of a plane :

13

Support Vector Machines

Oct. 2004

Kernel

For solving dual problem we only need to calculate Dot product of vectors in feature space Kernel function does this implicitly !!! Insensitive to dimension of feature space

For Example







 







 



 

 











 













 











:













for

14

Support Vector Machines

as polynomial kernel function.

 











In general

Oct. 2004









Several powerful kernel functions exist 

 

– Gaussian RBF kernel : – Two-layer perceptron :



 





 













 









Choice of kernel is done by user The decision function is given by  



 























 









:

15

Support Vector Machines

Oct. 2004

Some snapshots

:

16

Support Vector Machines

Oct. 2004

Contd.

:

17

Support Vector Machines

Oct. 2004

Contd.

:

18

Support Vector Machines

Oct. 2004

Contd.

:

19

Support Vector Machines

Oct. 2004

Training Algorithm Training algorithm for fixed bias 

and learning rates











Given training set 

 

Repeat for i = 1 to l 



































 















if then else then if end for Until some stopping criterion satisfied return 



















:

20

Support Vector Machines

Oct. 2004

More about SVMs

For large data sets Chunking ,Sequential Minimal Optimization used to reduce memory requirements In Chunking apply SVM training algorithm on a subset of data and discard all but Support Vectors Add points violating constraints to Support Vectors to form a new chunk and iterate until all points are considered 

In SMO modify two to increase value of objective function without violating the constraints

:

21

Support Vector Machines

Oct. 2004

Multiclass SVMs - One-against-All,One-against-One,etc. One-against-All : use

SVMs corresponding to each class 











SVMs corresponding to each pair of

One-against-One : use class

SVMs can be used for regression also

:

22

Support Vector Machines

Oct. 2004

Application of SVMs

Text Categorization Intrusion Detection Image Recognition Bioinformatics Handwritten Character Recognition

:

23

Support Vector Machines

Oct. 2004

Application of SVMs(contd.) For example in Text Categorization each dimension of feature space is represented by a stem word ”compute” is a stem word with respect to computers, computation etc. stop word like and, or, the, of etc. are not considered So a document is represented as a point depending upon which stem words are present and in what number Order of words is insignificant in this case

:

24

Support Vector Machines

Oct. 2004

Real Life Example

US Postal Service : Digit Recognition Problem

Training data :7291

Test Data : 2007

Input Space Dimensionality: 256 (16 x 16) :

25

Support Vector Machines

Oct. 2004

Classifier Human Performance Decision tree Best two layer neural network 5- layer neural network SVM with Kernel Polynomial RBF Neural Network

:

Raw Error 2.5 % 16.2% 5.9 % 5.1 %

Number of SV 274 291 254

Raw Error 4.0% 4.1% 4.2%

26

Support Vector Machines

Oct. 2004

Conclusion

SVM performance is sensitive to the choice of kernel Global optimum can be achieved Training in polynomial time Potential Alternative to Neural Network as it performed better in many classification tasks We expect that SVMs will extend its scope of application to more diverse fields :

27

Support Vector Machines

Oct. 2004

References

[1] http://cortex.informatik.tu-ilmenau.de/ koenig/monist/applets/html/AppletSVM. [2] Christopher J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998. [3] Nello Cristianini and John Shawe-Taylor. An introduction to Support Vector Machines. Cambridge University Press, 2000. [4] Robert Freund Edgar E. Osuna and Federico Girosi. Support vector :

28

Support Vector Machines

Oct. 2004

machines: Training and applications. Technical report, MIT, AI Lab, 1997. [5] Simon Haykin. Neural Networks. Pearson Education, 2003. [6] Vladimir N. Vapnik. Statistical Learning Theory. Interscience Publication, 1998.

:

A Wiley-

29

Support Vector Machines

Oct. 2004

Proposal for the project

Porting some non-trivial application to SVM tool and analyze

OR Comparison of Neural Network and SVM using tools like SNNS and SVMLight.

:

30

Support Vector Machines

Oct. 2004

Q&A

:

31