Porting some non-trivial application to SVM tool and analyze. OR а. Comparison of Neural Network and SVM using tools li
Support Vector Machines Saurabh Joshi-(03305R02), Nitin Agrawal-(03305019) and Vaibhav Gupta -(03305903) (sbjoshi,nitina,
[email protected])
Guide : Prof. P Bhattacharya
Oct. 2004
Support Vector Machines
Oct. 2004
Motivation
Getting stuck at local minima is a problem Considering only training error may result in overfitting Time required for training Finding out optimal number of hidden units is a problem
:
1
Support Vector Machines
Oct. 2004
Perceptron Training Algorithm
Let Repeat if
then
Until some stopping criterion met
where
So final
:
2
Support Vector Machines
Oct. 2004
Dual Representation
Decision function can be written as
Update rule can now be written as then if
:
represents the amount of information provided by point
3
Support Vector Machines
Oct. 2004
Margin
(a)
(b)
Better generalization is expected in case of fig (b).
:
4
Support Vector Machines
Oct. 2004
Maximum Margin Classifier
Equation of a hyperplane is
Geometric distance of a point from the hyperplane is
then margin
If for closest point
would increase Geometric margin
:
Minimizing
5
Support Vector Machines
Oct. 2004
Maximum Margin Classifier (Contd.)
Quadratic Programming
Minimize
for
subject to
Optimal separating hyperplane is the one with the maximum margin There is a unique optimal hyperplane Greater the margin better the generalization :
6
7 :
, Subject to
where
Maximize
, therefore
, at optimum
Oct. 2004 Support Vector Machines
Primal and Dual Lagrangian
Dual Problem
Support Vector Machines
Oct. 2004
at solution
only when
so
points with
are called support vectors
support vectors are the only points necessary for decision function support vectors in a sense support the decision surface
Decision function is now
for any support vector
Bias can be calculated as
:
8
Support Vector Machines
Oct. 2004
Soft Margin Classifier
Non-linearly separable data - Linear Decision Surface
Minimize Subject to
for
indicates amount of constraint violation by point
indicates how much do we penalize for the violation of constraint controls the trade-off between Generalization error and Training error :
9
Support Vector Machines
Oct. 2004
Contd.
W ξ
Margin
:
10
Support Vector Machines
Oct. 2004
Contd.
Corresponding Dual is
Maximize Subject to
,
at optimum point we have
The decision function is same as in linearly separable case with increased constraint
:
11
Support Vector Machines
Oct. 2004
Non-linear Decision Surfaces
Map input space to high dimensional Feature space Introduce a linear decision surface in High Dimensional space φ
Mapping
:
12
Support Vector Machines
Oct. 2004
XOR example
x3
1
x2
0 x1 1
0
,
,
equation of a hyperbola is changed to equation of a plane :
13
Support Vector Machines
Oct. 2004
Kernel
For solving dual problem we only need to calculate Dot product of vectors in feature space Kernel function does this implicitly !!! Insensitive to dimension of feature space
For Example
:
for
14
Support Vector Machines
as polynomial kernel function.
In general
Oct. 2004
Several powerful kernel functions exist
– Gaussian RBF kernel : – Two-layer perceptron :
Choice of kernel is done by user The decision function is given by
:
15
Support Vector Machines
Oct. 2004
Some snapshots
:
16
Support Vector Machines
Oct. 2004
Contd.
:
17
Support Vector Machines
Oct. 2004
Contd.
:
18
Support Vector Machines
Oct. 2004
Contd.
:
19
Support Vector Machines
Oct. 2004
Training Algorithm Training algorithm for fixed bias
and learning rates
Given training set
Repeat for i = 1 to l
if then else then if end for Until some stopping criterion satisfied return
:
20
Support Vector Machines
Oct. 2004
More about SVMs
For large data sets Chunking ,Sequential Minimal Optimization used to reduce memory requirements In Chunking apply SVM training algorithm on a subset of data and discard all but Support Vectors Add points violating constraints to Support Vectors to form a new chunk and iterate until all points are considered
In SMO modify two to increase value of objective function without violating the constraints
:
21
Support Vector Machines
Oct. 2004
Multiclass SVMs - One-against-All,One-against-One,etc. One-against-All : use
SVMs corresponding to each class
SVMs corresponding to each pair of
One-against-One : use class
SVMs can be used for regression also
:
22
Support Vector Machines
Oct. 2004
Application of SVMs
Text Categorization Intrusion Detection Image Recognition Bioinformatics Handwritten Character Recognition
:
23
Support Vector Machines
Oct. 2004
Application of SVMs(contd.) For example in Text Categorization each dimension of feature space is represented by a stem word ”compute” is a stem word with respect to computers, computation etc. stop word like and, or, the, of etc. are not considered So a document is represented as a point depending upon which stem words are present and in what number Order of words is insignificant in this case
:
24
Support Vector Machines
Oct. 2004
Real Life Example
US Postal Service : Digit Recognition Problem
Training data :7291
Test Data : 2007
Input Space Dimensionality: 256 (16 x 16) :
25
Support Vector Machines
Oct. 2004
Classifier Human Performance Decision tree Best two layer neural network 5- layer neural network SVM with Kernel Polynomial RBF Neural Network
:
Raw Error 2.5 % 16.2% 5.9 % 5.1 %
Number of SV 274 291 254
Raw Error 4.0% 4.1% 4.2%
26
Support Vector Machines
Oct. 2004
Conclusion
SVM performance is sensitive to the choice of kernel Global optimum can be achieved Training in polynomial time Potential Alternative to Neural Network as it performed better in many classification tasks We expect that SVMs will extend its scope of application to more diverse fields :
27
Support Vector Machines
Oct. 2004
References
[1] http://cortex.informatik.tu-ilmenau.de/ koenig/monist/applets/html/AppletSVM. [2] Christopher J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998. [3] Nello Cristianini and John Shawe-Taylor. An introduction to Support Vector Machines. Cambridge University Press, 2000. [4] Robert Freund Edgar E. Osuna and Federico Girosi. Support vector :
28
Support Vector Machines
Oct. 2004
machines: Training and applications. Technical report, MIT, AI Lab, 1997. [5] Simon Haykin. Neural Networks. Pearson Education, 2003. [6] Vladimir N. Vapnik. Statistical Learning Theory. Interscience Publication, 1998.
:
A Wiley-
29
Support Vector Machines
Oct. 2004
Proposal for the project
Porting some non-trivial application to SVM tool and analyze
OR Comparison of Neural Network and SVM using tools like SNNS and SVMLight.
:
30
Support Vector Machines
Oct. 2004
Q&A
:
31