4 A Kernel Pattern Analysis Algorithm. Primal linear regression. Dual linear regression. 5 Kernel Functions. Mathematica
Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
Introduction to Kernel Methods Fabio A. Gonz´alez Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogot´ a
September 30, 2010
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction Motivation
The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction Motivation
The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods F. Gonz´ alez Introduction Motivation
The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
Problem 1 How to separate these two classes using a linear function?
Introduction to Kernel Methods
Problem 2
F. Gonz´ alez Introduction Motivation
The Kernel Trick
How to do symbolic regression?
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
Σ = {A, C , G, T } f :
Σd ACGTA GTCCA GGTAC CCTGA .. .
→ R 7 → 10.0 7 → 11.3 7 → 1.0 7 → 4.5 .. .. . .
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
Problem 1 • How to separate these two classes using a linear function?
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Map to R3 :
φ : R2 → R3 (x , y) 7→ (x 2 , y 2 , xy)
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Map to R3 :
φ : R2 → R3 (x , y) 7→ (x 2 , y 2 , xy)
Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
Input space vs. feature space
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods
Dot product in the feature space
F. Gonz´ alez Introduction
•
The Kernel Trick
φ : R2 → R3
√ (x1 , x2 ) 7→ (x12 , x22 , 2x1 x2 )
Mapping the input space to the feature space Calculating the dot product in the feature space
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
•
hφ(x ), φ(z )i =
D
E √ √ (x12 , x22 , 2x1 x2 ), (z12 , z22 , 2z1 z2 )
= x12 z12 + x22 z22 + 2x1 x2 z1 z2 = (x1 z1 + x2 z2 )2 = hx , z i2 • A function k : X × X → R such that
k (x , z ) = hφ(x ), φ(z )i is called a kernel • Morale: you don’t need to apply φ explicitly to
calculate the dot product in the feature space!
Introduction to Kernel Methods
Dot product in the feature space
F. Gonz´ alez Introduction
•
The Kernel Trick
φ : R2 → R3
√ (x1 , x2 ) 7→ (x12 , x22 , 2x1 x2 )
Mapping the input space to the feature space Calculating the dot product in the feature space
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
•
hφ(x ), φ(z )i =
D
E √ √ (x12 , x22 , 2x1 x2 ), (z12 , z22 , 2z1 z2 )
= x12 z12 + x22 z22 + 2x1 x2 z1 z2 = (x1 z1 + x2 z2 )2 = hx , z i2 • A function k : X × X → R such that
k (x , z ) = hφ(x ), φ(z )i is called a kernel • Morale: you don’t need to apply φ explicitly to
calculate the dot product in the feature space!
Introduction to Kernel Methods
Dot product in the feature space
F. Gonz´ alez Introduction
•
The Kernel Trick
φ : R2 → R3
√ (x1 , x2 ) 7→ (x12 , x22 , 2x1 x2 )
Mapping the input space to the feature space Calculating the dot product in the feature space
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
•
hφ(x ), φ(z )i =
D
E √ √ (x12 , x22 , 2x1 x2 ), (z12 , z22 , 2z1 z2 )
= x12 z12 + x22 z22 + 2x1 x2 z1 z2 = (x1 z1 + x2 z2 )2 = hx , z i2 • A function k : X × X → R such that
k (x , z ) = hφ(x ), φ(z )i is called a kernel • Morale: you don’t need to apply φ explicitly to
calculate the dot product in the feature space!
Introduction to Kernel Methods
Dot product in the feature space
F. Gonz´ alez Introduction
•
The Kernel Trick
φ : R2 → R3
√ (x1 , x2 ) 7→ (x12 , x22 , 2x1 x2 )
Mapping the input space to the feature space Calculating the dot product in the feature space
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
•
hφ(x ), φ(z )i =
D
E √ √ (x12 , x22 , 2x1 x2 ), (z12 , z22 , 2z1 z2 )
= x12 z12 + x22 z22 + 2x1 x2 z1 z2 = (x1 z1 + x2 z2 )2 = hx , z i2 • A function k : X × X → R such that
k (x , z ) = hφ(x ), φ(z )i is called a kernel • Morale: you don’t need to apply φ explicitly to
calculate the dot product in the feature space!
Introduction to Kernel Methods
Kernel induced feature space
F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• The feature space induced by the kernel is not unique:
The kernel k (x , z ) = hx , z i2 also calculates the dot product in the four dimensional feature space: φ : R2 → R4 (x1 , x2 ) 7→ (x12 , x22 , x1 x2 , x2 x1 ) • The example can be generalised to Rn
Introduction to Kernel Methods
Kernel induced feature space
F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• The feature space induced by the kernel is not unique:
The kernel k (x , z ) = hx , z i2 also calculates the dot product in the four dimensional feature space: φ : R2 → R4 (x1 , x2 ) 7→ (x12 , x22 , x1 x2 , x2 x1 ) • The example can be generalised to Rn
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
The Process
Introduction to Kernel Methods
The Approach
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning
• Data items are embedded into a vector space called the
A Kernel Pattern Analysis Algorithm
• Linear relations are sought among the images of the data
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
feature space items in the feature space • The pattern analysis algorithm are based only on the
pairwise dot products, they do not need the actual coordinates of the embedded points • The pairwise dot products in the feature space could be
efficiently calculated using a kernel function
Introduction to Kernel Methods
The Approach
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning
• Data items are embedded into a vector space called the
A Kernel Pattern Analysis Algorithm
• Linear relations are sought among the images of the data
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
feature space items in the feature space • The pattern analysis algorithm are based only on the
pairwise dot products, they do not need the actual coordinates of the embedded points • The pairwise dot products in the feature space could be
efficiently calculated using a kernel function
Introduction to Kernel Methods
The Approach
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning
• Data items are embedded into a vector space called the
A Kernel Pattern Analysis Algorithm
• Linear relations are sought among the images of the data
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
feature space items in the feature space • The pattern analysis algorithm are based only on the
pairwise dot products, they do not need the actual coordinates of the embedded points • The pairwise dot products in the feature space could be
efficiently calculated using a kernel function
Introduction to Kernel Methods
The Approach
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning
• Data items are embedded into a vector space called the
A Kernel Pattern Analysis Algorithm
• Linear relations are sought among the images of the data
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
feature space items in the feature space • The pattern analysis algorithm are based only on the
pairwise dot products, they do not need the actual coordinates of the embedded points • The pairwise dot products in the feature space could be
efficiently calculated using a kernel function
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods
Problem definition
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning
• Given a training set S = {(x1 , y1 ), . . . , (xl , yl )} of points
xi ∈ Rn with corresponding labels yi ∈ R the problem is to find a real-valued linear function that best interpolates the training set: g(x) = hw, xi = w0 x =
A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions
wi xi
i=1
• If the data points were generated by a function like g(x),
it is possible to find the parameters w by solving Xw = y
Kernel Algorithms Kernels in Complex Structured Data
n X
where
x0 1 X = ... x0 l
Introduction to Kernel Methods
Problem definition
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning
• Given a training set S = {(x1 , y1 ), . . . , (xl , yl )} of points
xi ∈ Rn with corresponding labels yi ∈ R the problem is to find a real-valued linear function that best interpolates the training set: g(x) = hw, xi = w0 x =
A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions
wi xi
i=1
• If the data points were generated by a function like g(x),
it is possible to find the parameters w by solving Xw = y
Kernel Algorithms Kernels in Complex Structured Data
n X
where
x0 1 X = ... x0 l
Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
Graphical representation
Introduction to Kernel Methods
Loss function
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Minimize
L(g, S ) = L(w, S ) =
l X
(yi − g(xi ))2 =
i=1
=
l X
l X i=1
L(g, (xi , yi ))
i=1
• This could be written as
L(w, S ) = kξk2 = (y − Xw)0 (y − Xw)
ξi2
Introduction to Kernel Methods
Loss function
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Minimize
L(g, S ) = L(w, S ) =
l X
(yi − g(xi ))2 =
i=1
=
l X
l X i=1
L(g, (xi , yi ))
i=1
• This could be written as
L(w, S ) = kξk2 = (y − Xw)0 (y − Xw)
ξi2
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm
∂L(w, S ) = −2X0 y + 2X0 Xw = 0, ∂w therefore X0 Xw = X0 y,
Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
and w = (X0 X)−1 X0 y
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods
Dual representation of the problem
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• w = (X0 X)−1 X0 y = X0 X(X0 X)−2 X0 y = X0 α • So, w is a linear combination of the training samples,
w=
Pl
i=1 αi xi .
Introduction to Kernel Methods
Dual representation of the problem
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• w = (X0 X)−1 X0 y = X0 X(X0 X)−2 X0 y = X0 α • So, w is a linear combination of the training samples,
w=
Pl
i=1 αi xi .
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction
• From the solution of the primal problem:
The Kernel Trick
X0 Xw = X0 y,
The Kernel Approach to Machine Learning
• then
A Kernel Pattern Analysis Algorithm
• using the dual representation
XX0 Xw = XX0 y,
XX0 XX0 α = XX0 y,
Primal linear regression Dual linear regression
Kernel Functions
• then
α = (XX0 )−1 y,
Kernel Algorithms Kernels in Complex Structured Data
• and
g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according
to machine precision.
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction
• From the solution of the primal problem:
The Kernel Trick
X0 Xw = X0 y,
The Kernel Approach to Machine Learning
• then
A Kernel Pattern Analysis Algorithm
• using the dual representation
XX0 Xw = XX0 y,
XX0 XX0 α = XX0 y,
Primal linear regression Dual linear regression
Kernel Functions
• then
α = (XX0 )−1 y,
Kernel Algorithms Kernels in Complex Structured Data
• and
g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according
to machine precision.
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction
• From the solution of the primal problem:
The Kernel Trick
X0 Xw = X0 y,
The Kernel Approach to Machine Learning
• then
A Kernel Pattern Analysis Algorithm
• using the dual representation
XX0 Xw = XX0 y,
XX0 XX0 α = XX0 y,
Primal linear regression Dual linear regression
Kernel Functions
• then
α = (XX0 )−1 y,
Kernel Algorithms Kernels in Complex Structured Data
• and
g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according
to machine precision.
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction
• From the solution of the primal problem:
The Kernel Trick
X0 Xw = X0 y,
The Kernel Approach to Machine Learning
• then
A Kernel Pattern Analysis Algorithm
• using the dual representation
XX0 Xw = XX0 y,
XX0 XX0 α = XX0 y,
Primal linear regression Dual linear regression
Kernel Functions
• then
α = (XX0 )−1 y,
Kernel Algorithms Kernels in Complex Structured Data
• and
g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according
to machine precision.
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction
• From the solution of the primal problem:
The Kernel Trick
X0 Xw = X0 y,
The Kernel Approach to Machine Learning
• then
A Kernel Pattern Analysis Algorithm
• using the dual representation
XX0 Xw = XX0 y,
XX0 XX0 α = XX0 y,
Primal linear regression Dual linear regression
Kernel Functions
• then
α = (XX0 )−1 y,
Kernel Algorithms Kernels in Complex Structured Data
• and
g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according
to machine precision.
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction
• From the solution of the primal problem:
The Kernel Trick
X0 Xw = X0 y,
The Kernel Approach to Machine Learning
• then
A Kernel Pattern Analysis Algorithm
• using the dual representation
XX0 Xw = XX0 y,
XX0 XX0 α = XX0 y,
Primal linear regression Dual linear regression
Kernel Functions
• then
α = (XX0 )−1 y,
Kernel Algorithms Kernels in Complex Structured Data
• and
g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according
to machine precision.
Introduction to Kernel Methods
Ridge regression
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• If XX0 is singular, the pseudo-inverse could be used: to
find the w that satisfies X0 Xw = X0 y with minimal norm. • Optimisation problem:
min Lλ (w, S ) = min λ kwk2 + w
w
l X
(yi − g(xi ))2 ,
i=1
where λ defines the trade-off between norm and loss. This controls the complexity of the model (the process is called regularization).
Introduction to Kernel Methods
Ridge regression
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• If XX0 is singular, the pseudo-inverse could be used: to
find the w that satisfies X0 Xw = X0 y with minimal norm. • Optimisation problem:
min Lλ (w, S ) = min λ kwk2 + w
w
l X
(yi − g(xi ))2 ,
i=1
where λ defines the trade-off between norm and loss. This controls the complexity of the model (the process is called regularization).
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction The Kernel Trick
• Taking the derivative and making it equal to zero:
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm
X0 Xw + λw = (X0 X + λIn )w = X0 y, where In is an identity matrix of n × n dimension, • then,
w = (X0 X + λIn )−1 X0 y.
Primal linear regression Dual linear regression
Kernel Functions
• In terms of α:
w = λ−1 X0 (y − Xw) = X0 α,
Kernel Algorithms Kernels in Complex Structured Data
• then
α = λ−1 (y − Xw) = (XX0 + λIl )−1 y.
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction The Kernel Trick
• Taking the derivative and making it equal to zero:
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm
X0 Xw + λw = (X0 X + λIn )w = X0 y, where In is an identity matrix of n × n dimension, • then,
w = (X0 X + λIn )−1 X0 y.
Primal linear regression Dual linear regression
Kernel Functions
• In terms of α:
w = λ−1 X0 (y − Xw) = X0 α,
Kernel Algorithms Kernels in Complex Structured Data
• then
α = λ−1 (y − Xw) = (XX0 + λIl )−1 y.
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction The Kernel Trick
• Taking the derivative and making it equal to zero:
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm
X0 Xw + λw = (X0 X + λIn )w = X0 y, where In is an identity matrix of n × n dimension, • then,
w = (X0 X + λIn )−1 X0 y.
Primal linear regression Dual linear regression
Kernel Functions
• In terms of α:
w = λ−1 X0 (y − Xw) = X0 α,
Kernel Algorithms Kernels in Complex Structured Data
• then
α = λ−1 (y − Xw) = (XX0 + λIl )−1 y.
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction The Kernel Trick
• Taking the derivative and making it equal to zero:
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm
X0 Xw + λw = (X0 X + λIn )w = X0 y, where In is an identity matrix of n × n dimension, • then,
w = (X0 X + λIn )−1 X0 y.
Primal linear regression Dual linear regression
Kernel Functions
• In terms of α:
w = λ−1 X0 (y − Xw) = X0 α,
Kernel Algorithms Kernels in Complex Structured Data
• then
α = λ−1 (y − Xw) = (XX0 + λIl )−1 y.
Introduction to Kernel Methods
Prediction function
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
g(x) = hw, xi =
* l X i=1
+ αi xi , x
=
l X i=1
αi hxi , xi
Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
Ridge regression as a kernel method • The Gram matrix G = XX0 is the matrix of dot products
x0 1 hx1 , x1 i .. 0 G = XX = . [x1 · · · xl ] = hxl , x1 i x0 l
hx1 , xl i
hxl , xl i
• G may be replaced by a general kernel matrix, K, with
kij = k (xi , xj ) = < φ(xi ), φ(xj ) > • The α’s are calculated as:
α = (K + λIl )−1 y • The predicted function is approximated as:
k (x, x1 ) .. g(x) = αi k (x, xi ) = y 0 (K + λIl )−1 . i=1 k (x, xl ) l X
Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
Ridge regression as a kernel method • The Gram matrix G = XX0 is the matrix of dot products
x0 1 hx1 , x1 i .. 0 G = XX = . [x1 · · · xl ] = hxl , x1 i x0 l
hx1 , xl i
hxl , xl i
• G may be replaced by a general kernel matrix, K, with
kij = k (xi , xj ) = < φ(xi ), φ(xj ) > • The α’s are calculated as:
α = (K + λIl )−1 y • The predicted function is approximated as:
k (x, x1 ) .. g(x) = αi k (x, xi ) = y 0 (K + λIl )−1 . i=1 k (x, xl ) l X
Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
Ridge regression as a kernel method • The Gram matrix G = XX0 is the matrix of dot products
x0 1 hx1 , x1 i .. 0 G = XX = . [x1 · · · xl ] = hxl , x1 i x0 l
hx1 , xl i
hxl , xl i
• G may be replaced by a general kernel matrix, K, with
kij = k (xi , xj ) = < φ(xi ), φ(xj ) > • The α’s are calculated as:
α = (K + λIl )−1 y • The predicted function is approximated as:
k (x, x1 ) .. g(x) = αi k (x, xi ) = y 0 (K + λIl )−1 . i=1 k (x, xl ) l X
Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression
Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
Ridge regression as a kernel method • The Gram matrix G = XX0 is the matrix of dot products
x0 1 hx1 , x1 i .. 0 G = XX = . [x1 · · · xl ] = hxl , x1 i x0 l
hx1 , xl i
hxl , xl i
• G may be replaced by a general kernel matrix, K, with
kij = k (xi , xj ) = < φ(xi ), φ(xj ) > • The α’s are calculated as:
α = (K + λIl )−1 y • The predicted function is approximated as:
k (x, x1 ) .. g(x) = αi k (x, xi ) = y 0 (K + λIl )−1 . i=1 k (x, xl ) l X
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods
Characterisation
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
Theorem (Mercer’s Theorem) A function k : X × X → R, which is either continuous or has a countable domain, can be decomposed k (x, z) = hφ(x), φ(z)i into a feature map φ into a Hilbert space F applied to both its arguments followed by the evaluation of the inner product in F if and only if it satisfies the finitely positive semi-definite property.
Introduction to Kernel Methods
Some kernel functions
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
Assume k1 and k2 kernels: • k (x, z) = p(k1 (x, z)). p a polynomial with positive
coefficients. • k (x, z) = exp(k1 (x, z)). • k (x, z) = exp(− kx − zk2 /(2σ 2 )). Gaussian kernel. • k (x, z) = k1 (x, z)k2 (x, z)
Introduction to Kernel Methods
Some kernel functions
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
Assume k1 and k2 kernels: • k (x, z) = p(k1 (x, z)). p a polynomial with positive
coefficients. • k (x, z) = exp(k1 (x, z)). • k (x, z) = exp(− kx − zk2 /(2σ 2 )). Gaussian kernel. • k (x, z) = k1 (x, z)k2 (x, z)
Introduction to Kernel Methods
Some kernel functions
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
Assume k1 and k2 kernels: • k (x, z) = p(k1 (x, z)). p a polynomial with positive
coefficients. • k (x, z) = exp(k1 (x, z)). • k (x, z) = exp(− kx − zk2 /(2σ 2 )). Gaussian kernel. • k (x, z) = k1 (x, z)k2 (x, z)
Introduction to Kernel Methods
Some kernel functions
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
Assume k1 and k2 kernels: • k (x, z) = p(k1 (x, z)). p a polynomial with positive
coefficients. • k (x, z) = exp(k1 (x, z)). • k (x, z) = exp(− kx − zk2 /(2σ 2 )). Gaussian kernel. • k (x, z) = k1 (x, z)k2 (x, z)
Introduction to Kernel Methods F. Gonz´ alez Introduction
Embeddings corresponding to kernels
The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm
• It is possible to calculate the feature space induced by a
Kernel Functions
• This can be done in a constructive way
Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
kernel (Mercer’s Theorem) • The feature space can even be of infinite dimension.
Introduction to Kernel Methods F. Gonz´ alez Introduction
Embeddings corresponding to kernels
The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm
• It is possible to calculate the feature space induced by a
Kernel Functions
• This can be done in a constructive way
Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
kernel (Mercer’s Theorem) • The feature space can even be of infinite dimension.
Introduction to Kernel Methods F. Gonz´ alez Introduction
Embeddings corresponding to kernels
The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm
• It is possible to calculate the feature space induced by a
Kernel Functions
• This can be done in a constructive way
Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
kernel (Mercer’s Theorem) • The feature space can even be of infinite dimension.
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods
How to visualize?
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
• Choose a point in input space p0 • Calculate the distance from another point x to p0 in the
feature space: kφ(p0 ) − φ(x )k2F
= hφ(p0 ) − φ(x ), φ(p0 ) − φ(x )iF = hφ(p0 ), φ(p0 )iF + hφ(x ), φ(x )iF −2 hφ(p0 ), φ(x )iF = k (p0 , p0 ) + k (x , x ) − 2k (p0 , x )
• Plot f (x ) = kφ(p0 ) − φ(x )k2F
Introduction to Kernel Methods
How to visualize?
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
• Choose a point in input space p0 • Calculate the distance from another point x to p0 in the
feature space: kφ(p0 ) − φ(x )k2F
= hφ(p0 ) − φ(x ), φ(p0 ) − φ(x )iF = hφ(p0 ), φ(p0 )iF + hφ(x ), φ(x )iF −2 hφ(p0 ), φ(x )iF = k (p0 , p0 ) + k (x , x ) − 2k (p0 , x )
• Plot f (x ) = kφ(p0 ) − φ(x )k2F
Introduction to Kernel Methods
How to visualize?
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
• Choose a point in input space p0 • Calculate the distance from another point x to p0 in the
feature space: kφ(p0 ) − φ(x )k2F
= hφ(p0 ) − φ(x ), φ(p0 ) − φ(x )iF = hφ(p0 ), φ(p0 )iF + hφ(x ), φ(x )iF −2 hφ(p0 ), φ(x )iF = k (p0 , p0 ) + k (x , x ) − 2k (p0 , x )
• Plot f (x ) = kφ(p0 ) − φ(x )k2F
Introduction to Kernel Methods F. Gonz´ alez
Identity kernel
Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
k (x , z ) = hx , z i
Introduction to Kernel Methods F. Gonz´ alez
Quadratic kernel (1)
Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
k (x , z ) = hx , z i2
Introduction to Kernel Methods F. Gonz´ alez
Identity kernel (2)
Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
k (x , z ) = hx , z i2
Introduction to Kernel Methods F. Gonz´ alez
Gaussian kernel
Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space
Kernel Algorithms Kernels in Complex Structured Data
k (x, z) = e −
kx−zk2 2σ 2
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods
Basic computations in feature space
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Means • Distances • Projections • Covariance
Introduction to Kernel Methods
Basic computations in feature space
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Means • Distances • Projections • Covariance
Introduction to Kernel Methods
Basic computations in feature space
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Means • Distances • Projections • Covariance
Introduction to Kernel Methods
Basic computations in feature space
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Means • Distances • Projections • Covariance
Introduction to Kernel Methods F. Gonz´ alez
Classification and regression
Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Support Vector Machines • Support Vector Regression • Kernel Fisher Discriminant • Kernel Perceptron
Introduction to Kernel Methods F. Gonz´ alez
Classification and regression
Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Support Vector Machines • Support Vector Regression • Kernel Fisher Discriminant • Kernel Perceptron
Introduction to Kernel Methods F. Gonz´ alez
Classification and regression
Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Support Vector Machines • Support Vector Regression • Kernel Fisher Discriminant • Kernel Perceptron
Introduction to Kernel Methods F. Gonz´ alez
Classification and regression
Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Support Vector Machines • Support Vector Regression • Kernel Fisher Discriminant • Kernel Perceptron
Introduction to Kernel Methods
Dimensionality reduction and clustering
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Kernel PCA • Kernel CCA • Kernel k -means • Kernel SOM
Introduction to Kernel Methods
Dimensionality reduction and clustering
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Kernel PCA • Kernel CCA • Kernel k -means • Kernel SOM
Introduction to Kernel Methods
Dimensionality reduction and clustering
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Kernel PCA • Kernel CCA • Kernel k -means • Kernel SOM
Introduction to Kernel Methods
Dimensionality reduction and clustering
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Kernel PCA • Kernel CCA • Kernel k -means • Kernel SOM
Introduction to Kernel Methods
Outline
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
1 Introduction
Motivation 2 The Kernel Trick
Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm
Primal linear regression Dual linear regression 5 Kernel Functions
Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data
Introduction to Kernel Methods
Kernels in complex structured data
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Since kernel methods do not require an attribute-based
representation of objects, it is possible to perform learning over complex structured data (or unstructured data) • We only need to define a dot product operation (similarity,
dissimilarity measure) • Examples: • • • •
Strings Texts Trees Graphs
Introduction to Kernel Methods
Kernels in complex structured data
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Since kernel methods do not require an attribute-based
representation of objects, it is possible to perform learning over complex structured data (or unstructured data) • We only need to define a dot product operation (similarity,
dissimilarity measure) • Examples: • • • •
Strings Texts Trees Graphs
Introduction to Kernel Methods
Kernels in complex structured data
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• Since kernel methods do not require an attribute-based
representation of objects, it is possible to perform learning over complex structured data (or unstructured data) • We only need to define a dot product operation (similarity,
dissimilarity measure) • Examples: • • • •
Strings Texts Trees Graphs
Introduction to Kernel Methods
Problem 2
F. Gonz´ alez Introduction The Kernel Trick
How to do symbolic regression?
The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
Σ = {A, C , G, T } f :
Σd ACGTA GTCCA GGTAC CCTGA .. .
→ R 7 → 10.0 7 → 11.3 7 → 1.0 7 → 4.5 .. .. . .
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction
• Define a kernel on strings
The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
k : Σd × Σd → R • Use the kernel along with a kernel learning regression
algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different
strings • k (s1 . . . sd , t1 . . . td ) = (
equal (si , ti ) =
1 0
i=1
equal (si , ti )
if si = ti otherwise
• k (ACTAG, CCTCG) =? • Is it a kernel?
Pn
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction
• Define a kernel on strings
The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
k : Σd × Σd → R • Use the kernel along with a kernel learning regression
algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different
strings • k (s1 . . . sd , t1 . . . td ) = (
equal (si , ti ) =
1 0
i=1
equal (si , ti )
if si = ti otherwise
• k (ACTAG, CCTCG) =? • Is it a kernel?
Pn
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction
• Define a kernel on strings
The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
k : Σd × Σd → R • Use the kernel along with a kernel learning regression
algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different
strings • k (s1 . . . sd , t1 . . . td ) = (
equal (si , ti ) =
1 0
i=1
equal (si , ti )
if si = ti otherwise
• k (ACTAG, CCTCG) =? • Is it a kernel?
Pn
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction
• Define a kernel on strings
The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
k : Σd × Σd → R • Use the kernel along with a kernel learning regression
algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different
strings • k (s1 . . . sd , t1 . . . td ) = (
equal (si , ti ) =
i=1
equal (si , ti )
1 if si = ti 0 otherwise
• k (ACTAG, CCTCG) =? • Is it a kernel?
Pn
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction
• Define a kernel on strings
The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
k : Σd × Σd → R • Use the kernel along with a kernel learning regression
algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different
strings • k (s1 . . . sd , t1 . . . td ) = (
equal (si , ti ) =
i=1
equal (si , ti )
1 if si = ti 0 otherwise
• k (ACTAG, CCTCG) =? • Is it a kernel?
Pn
Introduction to Kernel Methods
Solution
F. Gonz´ alez Introduction
• Define a kernel on strings
The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
k : Σd × Σd → R • Use the kernel along with a kernel learning regression
algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different
strings • k (s1 . . . sd , t1 . . . td ) = (
equal (si , ti ) =
i=1
equal (si , ti )
1 if si = ti 0 otherwise
• k (ACTAG, CCTCG) =? • Is it a kernel?
Pn
Introduction to Kernel Methods
Induced Feature Space
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• What is the feature space induced by k ? •
φ : Σd
→ R4d
7→ (x11 , . . . , x41 , x12 , . . . , x42 , . . . , x1d , . . . , x4d ) 0 0 (1, 0, 0, 0) if sj = A (0, 1, 0, 0) if s = 0 C0 j (x1j , . . . , x4j ) = 0 0 (0, 0, 1, 0) if s j = G (0, 0, 0, 1) if s = 0 T0 j
s1 . . . sd
Introduction to Kernel Methods
Induced Feature Space
F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
• What is the feature space induced by k ? •
φ : Σd
→ R4d
7→ (x11 , . . . , x41 , x12 , . . . , x42 , . . . , x1d , . . . , x4d ) 0 0 (1, 0, 0, 0) if sj = A (0, 1, 0, 0) if s = 0 C0 j (x1j , . . . , x4j ) = 0 0 (0, 0, 1, 0) if s j = G (0, 0, 0, 1) if s = 0 T0 j
s1 . . . sd
Introduction to Kernel Methods F. Gonz´ alez
References
Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data
Shawe-Taylor, J. and Cristianini, N. 2004 Kernel Methods for Pattern Analysis. Cambridge University Press.