Introduction to Kernel Methods - Google Sites

2 downloads 244 Views 2MB Size Report
4 A Kernel Pattern Analysis Algorithm. Primal linear regression. Dual linear regression. 5 Kernel Functions. Mathematica
Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Introduction to Kernel Methods Fabio A. Gonz´alez Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogot´ a

September 30, 2010

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction Motivation

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction Motivation

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods F. Gonz´ alez Introduction Motivation

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Problem 1 How to separate these two classes using a linear function?

Introduction to Kernel Methods

Problem 2

F. Gonz´ alez Introduction Motivation

The Kernel Trick

How to do symbolic regression?

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Σ = {A, C , G, T } f :

Σd ACGTA GTCCA GGTAC CCTGA .. .

→ R 7 → 10.0 7 → 11.3 7 → 1.0 7 → 4.5 .. .. . .

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Problem 1 • How to separate these two classes using a linear function?

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Map to R3 :

φ : R2 → R3 (x , y) 7→ (x 2 , y 2 , xy)

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Map to R3 :

φ : R2 → R3 (x , y) 7→ (x 2 , y 2 , xy)

Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Input space vs. feature space

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Dot product in the feature space

F. Gonz´ alez Introduction



The Kernel Trick

φ : R2 → R3

√ (x1 , x2 ) 7→ (x12 , x22 , 2x1 x2 )

Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data



hφ(x ), φ(z )i =

D

E √ √ (x12 , x22 , 2x1 x2 ), (z12 , z22 , 2z1 z2 )

= x12 z12 + x22 z22 + 2x1 x2 z1 z2 = (x1 z1 + x2 z2 )2 = hx , z i2 • A function k : X × X → R such that

k (x , z ) = hφ(x ), φ(z )i is called a kernel • Morale: you don’t need to apply φ explicitly to

calculate the dot product in the feature space!

Introduction to Kernel Methods

Dot product in the feature space

F. Gonz´ alez Introduction



The Kernel Trick

φ : R2 → R3

√ (x1 , x2 ) 7→ (x12 , x22 , 2x1 x2 )

Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data



hφ(x ), φ(z )i =

D

E √ √ (x12 , x22 , 2x1 x2 ), (z12 , z22 , 2z1 z2 )

= x12 z12 + x22 z22 + 2x1 x2 z1 z2 = (x1 z1 + x2 z2 )2 = hx , z i2 • A function k : X × X → R such that

k (x , z ) = hφ(x ), φ(z )i is called a kernel • Morale: you don’t need to apply φ explicitly to

calculate the dot product in the feature space!

Introduction to Kernel Methods

Dot product in the feature space

F. Gonz´ alez Introduction



The Kernel Trick

φ : R2 → R3

√ (x1 , x2 ) 7→ (x12 , x22 , 2x1 x2 )

Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data



hφ(x ), φ(z )i =

D

E √ √ (x12 , x22 , 2x1 x2 ), (z12 , z22 , 2z1 z2 )

= x12 z12 + x22 z22 + 2x1 x2 z1 z2 = (x1 z1 + x2 z2 )2 = hx , z i2 • A function k : X × X → R such that

k (x , z ) = hφ(x ), φ(z )i is called a kernel • Morale: you don’t need to apply φ explicitly to

calculate the dot product in the feature space!

Introduction to Kernel Methods

Dot product in the feature space

F. Gonz´ alez Introduction



The Kernel Trick

φ : R2 → R3

√ (x1 , x2 ) 7→ (x12 , x22 , 2x1 x2 )

Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data



hφ(x ), φ(z )i =

D

E √ √ (x12 , x22 , 2x1 x2 ), (z12 , z22 , 2z1 z2 )

= x12 z12 + x22 z22 + 2x1 x2 z1 z2 = (x1 z1 + x2 z2 )2 = hx , z i2 • A function k : X × X → R such that

k (x , z ) = hφ(x ), φ(z )i is called a kernel • Morale: you don’t need to apply φ explicitly to

calculate the dot product in the feature space!

Introduction to Kernel Methods

Kernel induced feature space

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• The feature space induced by the kernel is not unique:

The kernel k (x , z ) = hx , z i2 also calculates the dot product in the four dimensional feature space: φ : R2 → R4 (x1 , x2 ) 7→ (x12 , x22 , x1 x2 , x2 x1 ) • The example can be generalised to Rn

Introduction to Kernel Methods

Kernel induced feature space

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• The feature space induced by the kernel is not unique:

The kernel k (x , z ) = hx , z i2 also calculates the dot product in the four dimensional feature space: φ : R2 → R4 (x1 , x2 ) 7→ (x12 , x22 , x1 x2 , x2 x1 ) • The example can be generalised to Rn

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

The Process

Introduction to Kernel Methods

The Approach

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning

• Data items are embedded into a vector space called the

A Kernel Pattern Analysis Algorithm

• Linear relations are sought among the images of the data

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

feature space items in the feature space • The pattern analysis algorithm are based only on the

pairwise dot products, they do not need the actual coordinates of the embedded points • The pairwise dot products in the feature space could be

efficiently calculated using a kernel function

Introduction to Kernel Methods

The Approach

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning

• Data items are embedded into a vector space called the

A Kernel Pattern Analysis Algorithm

• Linear relations are sought among the images of the data

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

feature space items in the feature space • The pattern analysis algorithm are based only on the

pairwise dot products, they do not need the actual coordinates of the embedded points • The pairwise dot products in the feature space could be

efficiently calculated using a kernel function

Introduction to Kernel Methods

The Approach

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning

• Data items are embedded into a vector space called the

A Kernel Pattern Analysis Algorithm

• Linear relations are sought among the images of the data

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

feature space items in the feature space • The pattern analysis algorithm are based only on the

pairwise dot products, they do not need the actual coordinates of the embedded points • The pairwise dot products in the feature space could be

efficiently calculated using a kernel function

Introduction to Kernel Methods

The Approach

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning

• Data items are embedded into a vector space called the

A Kernel Pattern Analysis Algorithm

• Linear relations are sought among the images of the data

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

feature space items in the feature space • The pattern analysis algorithm are based only on the

pairwise dot products, they do not need the actual coordinates of the embedded points • The pairwise dot products in the feature space could be

efficiently calculated using a kernel function

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Problem definition

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning

• Given a training set S = {(x1 , y1 ), . . . , (xl , yl )} of points

xi ∈ Rn with corresponding labels yi ∈ R the problem is to find a real-valued linear function that best interpolates the training set: g(x) = hw, xi = w0 x =

A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions

wi xi

i=1

• If the data points were generated by a function like g(x),

it is possible to find the parameters w by solving Xw = y

Kernel Algorithms Kernels in Complex Structured Data

n X

where

 x0 1   X =  ...  x0 l 

Introduction to Kernel Methods

Problem definition

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning

• Given a training set S = {(x1 , y1 ), . . . , (xl , yl )} of points

xi ∈ Rn with corresponding labels yi ∈ R the problem is to find a real-valued linear function that best interpolates the training set: g(x) = hw, xi = w0 x =

A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions

wi xi

i=1

• If the data points were generated by a function like g(x),

it is possible to find the parameters w by solving Xw = y

Kernel Algorithms Kernels in Complex Structured Data

n X

where

 x0 1   X =  ...  x0 l 

Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Graphical representation

Introduction to Kernel Methods

Loss function

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Minimize

L(g, S ) = L(w, S ) =

l X

(yi − g(xi ))2 =

i=1

=

l X

l X i=1

L(g, (xi , yi ))

i=1

• This could be written as

L(w, S ) = kξk2 = (y − Xw)0 (y − Xw)

ξi2

Introduction to Kernel Methods

Loss function

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Minimize

L(g, S ) = L(w, S ) =

l X

(yi − g(xi ))2 =

i=1

=

l X

l X i=1

L(g, (xi , yi ))

i=1

• This could be written as

L(w, S ) = kξk2 = (y − Xw)0 (y − Xw)

ξi2

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

∂L(w, S ) = −2X0 y + 2X0 Xw = 0, ∂w therefore X0 Xw = X0 y,

Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

and w = (X0 X)−1 X0 y

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Dual representation of the problem

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• w = (X0 X)−1 X0 y = X0 X(X0 X)−2 X0 y = X0 α • So, w is a linear combination of the training samples,

w=

Pl

i=1 αi xi .

Introduction to Kernel Methods

Dual representation of the problem

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• w = (X0 X)−1 X0 y = X0 X(X0 X)−2 X0 y = X0 α • So, w is a linear combination of the training samples,

w=

Pl

i=1 αi xi .

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• From the solution of the primal problem:

The Kernel Trick

X0 Xw = X0 y,

The Kernel Approach to Machine Learning

• then

A Kernel Pattern Analysis Algorithm

• using the dual representation

XX0 Xw = XX0 y,

XX0 XX0 α = XX0 y,

Primal linear regression Dual linear regression

Kernel Functions

• then

α = (XX0 )−1 y,

Kernel Algorithms Kernels in Complex Structured Data

• and

g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according

to machine precision.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• From the solution of the primal problem:

The Kernel Trick

X0 Xw = X0 y,

The Kernel Approach to Machine Learning

• then

A Kernel Pattern Analysis Algorithm

• using the dual representation

XX0 Xw = XX0 y,

XX0 XX0 α = XX0 y,

Primal linear regression Dual linear regression

Kernel Functions

• then

α = (XX0 )−1 y,

Kernel Algorithms Kernels in Complex Structured Data

• and

g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according

to machine precision.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• From the solution of the primal problem:

The Kernel Trick

X0 Xw = X0 y,

The Kernel Approach to Machine Learning

• then

A Kernel Pattern Analysis Algorithm

• using the dual representation

XX0 Xw = XX0 y,

XX0 XX0 α = XX0 y,

Primal linear regression Dual linear regression

Kernel Functions

• then

α = (XX0 )−1 y,

Kernel Algorithms Kernels in Complex Structured Data

• and

g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according

to machine precision.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• From the solution of the primal problem:

The Kernel Trick

X0 Xw = X0 y,

The Kernel Approach to Machine Learning

• then

A Kernel Pattern Analysis Algorithm

• using the dual representation

XX0 Xw = XX0 y,

XX0 XX0 α = XX0 y,

Primal linear regression Dual linear regression

Kernel Functions

• then

α = (XX0 )−1 y,

Kernel Algorithms Kernels in Complex Structured Data

• and

g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according

to machine precision.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• From the solution of the primal problem:

The Kernel Trick

X0 Xw = X0 y,

The Kernel Approach to Machine Learning

• then

A Kernel Pattern Analysis Algorithm

• using the dual representation

XX0 Xw = XX0 y,

XX0 XX0 α = XX0 y,

Primal linear regression Dual linear regression

Kernel Functions

• then

α = (XX0 )−1 y,

Kernel Algorithms Kernels in Complex Structured Data

• and

g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according

to machine precision.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• From the solution of the primal problem:

The Kernel Trick

X0 Xw = X0 y,

The Kernel Approach to Machine Learning

• then

A Kernel Pattern Analysis Algorithm

• using the dual representation

XX0 Xw = XX0 y,

XX0 XX0 α = XX0 y,

Primal linear regression Dual linear regression

Kernel Functions

• then

α = (XX0 )−1 y,

Kernel Algorithms Kernels in Complex Structured Data

• and

g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according

to machine precision.

Introduction to Kernel Methods

Ridge regression

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• If XX0 is singular, the pseudo-inverse could be used: to

find the w that satisfies X0 Xw = X0 y with minimal norm. • Optimisation problem:

min Lλ (w, S ) = min λ kwk2 + w

w

l X

(yi − g(xi ))2 ,

i=1

where λ defines the trade-off between norm and loss. This controls the complexity of the model (the process is called regularization).

Introduction to Kernel Methods

Ridge regression

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• If XX0 is singular, the pseudo-inverse could be used: to

find the w that satisfies X0 Xw = X0 y with minimal norm. • Optimisation problem:

min Lλ (w, S ) = min λ kwk2 + w

w

l X

(yi − g(xi ))2 ,

i=1

where λ defines the trade-off between norm and loss. This controls the complexity of the model (the process is called regularization).

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick

• Taking the derivative and making it equal to zero:

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

X0 Xw + λw = (X0 X + λIn )w = X0 y, where In is an identity matrix of n × n dimension, • then,

w = (X0 X + λIn )−1 X0 y.

Primal linear regression Dual linear regression

Kernel Functions

• In terms of α:

w = λ−1 X0 (y − Xw) = X0 α,

Kernel Algorithms Kernels in Complex Structured Data

• then

α = λ−1 (y − Xw) = (XX0 + λIl )−1 y.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick

• Taking the derivative and making it equal to zero:

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

X0 Xw + λw = (X0 X + λIn )w = X0 y, where In is an identity matrix of n × n dimension, • then,

w = (X0 X + λIn )−1 X0 y.

Primal linear regression Dual linear regression

Kernel Functions

• In terms of α:

w = λ−1 X0 (y − Xw) = X0 α,

Kernel Algorithms Kernels in Complex Structured Data

• then

α = λ−1 (y − Xw) = (XX0 + λIl )−1 y.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick

• Taking the derivative and making it equal to zero:

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

X0 Xw + λw = (X0 X + λIn )w = X0 y, where In is an identity matrix of n × n dimension, • then,

w = (X0 X + λIn )−1 X0 y.

Primal linear regression Dual linear regression

Kernel Functions

• In terms of α:

w = λ−1 X0 (y − Xw) = X0 α,

Kernel Algorithms Kernels in Complex Structured Data

• then

α = λ−1 (y − Xw) = (XX0 + λIl )−1 y.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick

• Taking the derivative and making it equal to zero:

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

X0 Xw + λw = (X0 X + λIn )w = X0 y, where In is an identity matrix of n × n dimension, • then,

w = (X0 X + λIn )−1 X0 y.

Primal linear regression Dual linear regression

Kernel Functions

• In terms of α:

w = λ−1 X0 (y − Xw) = X0 α,

Kernel Algorithms Kernels in Complex Structured Data

• then

α = λ−1 (y − Xw) = (XX0 + λIl )−1 y.

Introduction to Kernel Methods

Prediction function

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

g(x) = hw, xi =

* l X i=1

+ αi xi , x

=

l X i=1

αi hxi , xi

Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Ridge regression as a kernel method • The Gram matrix G = XX0 is the matrix of dot products

  x0 1 hx1 , x1 i  ..  0  G = XX =  .  [x1 · · · xl ] = hxl , x1 i x0 l 

hx1 , xl i

 hxl , xl i

• G may be replaced by a general kernel matrix, K, with

kij = k (xi , xj ) = < φ(xi ), φ(xj ) > • The α’s are calculated as:

α = (K + λIl )−1 y • The predicted function is approximated as:

 k (x, x1 )   .. g(x) = αi k (x, xi ) = y 0 (K + λIl )−1   . i=1 k (x, xl ) l X





Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Ridge regression as a kernel method • The Gram matrix G = XX0 is the matrix of dot products

  x0 1 hx1 , x1 i  ..  0  G = XX =  .  [x1 · · · xl ] = hxl , x1 i x0 l 

hx1 , xl i

 hxl , xl i

• G may be replaced by a general kernel matrix, K, with

kij = k (xi , xj ) = < φ(xi ), φ(xj ) > • The α’s are calculated as:

α = (K + λIl )−1 y • The predicted function is approximated as:

 k (x, x1 )   .. g(x) = αi k (x, xi ) = y 0 (K + λIl )−1   . i=1 k (x, xl ) l X





Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Ridge regression as a kernel method • The Gram matrix G = XX0 is the matrix of dot products

  x0 1 hx1 , x1 i  ..  0  G = XX =  .  [x1 · · · xl ] = hxl , x1 i x0 l 

hx1 , xl i

 hxl , xl i

• G may be replaced by a general kernel matrix, K, with

kij = k (xi , xj ) = < φ(xi ), φ(xj ) > • The α’s are calculated as:

α = (K + λIl )−1 y • The predicted function is approximated as:

 k (x, x1 )   .. g(x) = αi k (x, xi ) = y 0 (K + λIl )−1   . i=1 k (x, xl ) l X





Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Ridge regression as a kernel method • The Gram matrix G = XX0 is the matrix of dot products

  x0 1 hx1 , x1 i  ..  0  G = XX =  .  [x1 · · · xl ] = hxl , x1 i x0 l 

hx1 , xl i

 hxl , xl i

• G may be replaced by a general kernel matrix, K, with

kij = k (xi , xj ) = < φ(xi ), φ(xj ) > • The α’s are calculated as:

α = (K + λIl )−1 y • The predicted function is approximated as:

 k (x, x1 )   .. g(x) = αi k (x, xi ) = y 0 (K + λIl )−1   . i=1 k (x, xl ) l X





Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Characterisation

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

Theorem (Mercer’s Theorem) A function k : X × X → R, which is either continuous or has a countable domain, can be decomposed k (x, z) = hφ(x), φ(z)i into a feature map φ into a Hilbert space F applied to both its arguments followed by the evaluation of the inner product in F if and only if it satisfies the finitely positive semi-definite property.

Introduction to Kernel Methods

Some kernel functions

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

Assume k1 and k2 kernels: • k (x, z) = p(k1 (x, z)). p a polynomial with positive

coefficients. • k (x, z) = exp(k1 (x, z)). • k (x, z) = exp(− kx − zk2 /(2σ 2 )). Gaussian kernel. • k (x, z) = k1 (x, z)k2 (x, z)

Introduction to Kernel Methods

Some kernel functions

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

Assume k1 and k2 kernels: • k (x, z) = p(k1 (x, z)). p a polynomial with positive

coefficients. • k (x, z) = exp(k1 (x, z)). • k (x, z) = exp(− kx − zk2 /(2σ 2 )). Gaussian kernel. • k (x, z) = k1 (x, z)k2 (x, z)

Introduction to Kernel Methods

Some kernel functions

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

Assume k1 and k2 kernels: • k (x, z) = p(k1 (x, z)). p a polynomial with positive

coefficients. • k (x, z) = exp(k1 (x, z)). • k (x, z) = exp(− kx − zk2 /(2σ 2 )). Gaussian kernel. • k (x, z) = k1 (x, z)k2 (x, z)

Introduction to Kernel Methods

Some kernel functions

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

Assume k1 and k2 kernels: • k (x, z) = p(k1 (x, z)). p a polynomial with positive

coefficients. • k (x, z) = exp(k1 (x, z)). • k (x, z) = exp(− kx − zk2 /(2σ 2 )). Gaussian kernel. • k (x, z) = k1 (x, z)k2 (x, z)

Introduction to Kernel Methods F. Gonz´ alez Introduction

Embeddings corresponding to kernels

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

• It is possible to calculate the feature space induced by a

Kernel Functions

• This can be done in a constructive way

Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

kernel (Mercer’s Theorem) • The feature space can even be of infinite dimension.

Introduction to Kernel Methods F. Gonz´ alez Introduction

Embeddings corresponding to kernels

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

• It is possible to calculate the feature space induced by a

Kernel Functions

• This can be done in a constructive way

Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

kernel (Mercer’s Theorem) • The feature space can even be of infinite dimension.

Introduction to Kernel Methods F. Gonz´ alez Introduction

Embeddings corresponding to kernels

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

• It is possible to calculate the feature space induced by a

Kernel Functions

• This can be done in a constructive way

Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

kernel (Mercer’s Theorem) • The feature space can even be of infinite dimension.

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

How to visualize?

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

• Choose a point in input space p0 • Calculate the distance from another point x to p0 in the

feature space: kφ(p0 ) − φ(x )k2F

= hφ(p0 ) − φ(x ), φ(p0 ) − φ(x )iF = hφ(p0 ), φ(p0 )iF + hφ(x ), φ(x )iF −2 hφ(p0 ), φ(x )iF = k (p0 , p0 ) + k (x , x ) − 2k (p0 , x )

• Plot f (x ) = kφ(p0 ) − φ(x )k2F

Introduction to Kernel Methods

How to visualize?

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

• Choose a point in input space p0 • Calculate the distance from another point x to p0 in the

feature space: kφ(p0 ) − φ(x )k2F

= hφ(p0 ) − φ(x ), φ(p0 ) − φ(x )iF = hφ(p0 ), φ(p0 )iF + hφ(x ), φ(x )iF −2 hφ(p0 ), φ(x )iF = k (p0 , p0 ) + k (x , x ) − 2k (p0 , x )

• Plot f (x ) = kφ(p0 ) − φ(x )k2F

Introduction to Kernel Methods

How to visualize?

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

• Choose a point in input space p0 • Calculate the distance from another point x to p0 in the

feature space: kφ(p0 ) − φ(x )k2F

= hφ(p0 ) − φ(x ), φ(p0 ) − φ(x )iF = hφ(p0 ), φ(p0 )iF + hφ(x ), φ(x )iF −2 hφ(p0 ), φ(x )iF = k (p0 , p0 ) + k (x , x ) − 2k (p0 , x )

• Plot f (x ) = kφ(p0 ) − φ(x )k2F

Introduction to Kernel Methods F. Gonz´ alez

Identity kernel

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

k (x , z ) = hx , z i

Introduction to Kernel Methods F. Gonz´ alez

Quadratic kernel (1)

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

k (x , z ) = hx , z i2

Introduction to Kernel Methods F. Gonz´ alez

Identity kernel (2)

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

k (x , z ) = hx , z i2

Introduction to Kernel Methods F. Gonz´ alez

Gaussian kernel

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

k (x, z) = e −

kx−zk2 2σ 2

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Basic computations in feature space

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Means • Distances • Projections • Covariance

Introduction to Kernel Methods

Basic computations in feature space

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Means • Distances • Projections • Covariance

Introduction to Kernel Methods

Basic computations in feature space

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Means • Distances • Projections • Covariance

Introduction to Kernel Methods

Basic computations in feature space

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Means • Distances • Projections • Covariance

Introduction to Kernel Methods F. Gonz´ alez

Classification and regression

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Support Vector Machines • Support Vector Regression • Kernel Fisher Discriminant • Kernel Perceptron

Introduction to Kernel Methods F. Gonz´ alez

Classification and regression

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Support Vector Machines • Support Vector Regression • Kernel Fisher Discriminant • Kernel Perceptron

Introduction to Kernel Methods F. Gonz´ alez

Classification and regression

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Support Vector Machines • Support Vector Regression • Kernel Fisher Discriminant • Kernel Perceptron

Introduction to Kernel Methods F. Gonz´ alez

Classification and regression

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Support Vector Machines • Support Vector Regression • Kernel Fisher Discriminant • Kernel Perceptron

Introduction to Kernel Methods

Dimensionality reduction and clustering

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Kernel PCA • Kernel CCA • Kernel k -means • Kernel SOM

Introduction to Kernel Methods

Dimensionality reduction and clustering

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Kernel PCA • Kernel CCA • Kernel k -means • Kernel SOM

Introduction to Kernel Methods

Dimensionality reduction and clustering

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Kernel PCA • Kernel CCA • Kernel k -means • Kernel SOM

Introduction to Kernel Methods

Dimensionality reduction and clustering

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Kernel PCA • Kernel CCA • Kernel k -means • Kernel SOM

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Kernels in complex structured data

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Since kernel methods do not require an attribute-based

representation of objects, it is possible to perform learning over complex structured data (or unstructured data) • We only need to define a dot product operation (similarity,

dissimilarity measure) • Examples: • • • •

Strings Texts Trees Graphs

Introduction to Kernel Methods

Kernels in complex structured data

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Since kernel methods do not require an attribute-based

representation of objects, it is possible to perform learning over complex structured data (or unstructured data) • We only need to define a dot product operation (similarity,

dissimilarity measure) • Examples: • • • •

Strings Texts Trees Graphs

Introduction to Kernel Methods

Kernels in complex structured data

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Since kernel methods do not require an attribute-based

representation of objects, it is possible to perform learning over complex structured data (or unstructured data) • We only need to define a dot product operation (similarity,

dissimilarity measure) • Examples: • • • •

Strings Texts Trees Graphs

Introduction to Kernel Methods

Problem 2

F. Gonz´ alez Introduction The Kernel Trick

How to do symbolic regression?

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Σ = {A, C , G, T } f :

Σd ACGTA GTCCA GGTAC CCTGA .. .

→ R 7 → 10.0 7 → 11.3 7 → 1.0 7 → 4.5 .. .. . .

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• Define a kernel on strings

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

k : Σd × Σd → R • Use the kernel along with a kernel learning regression

algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different

strings • k (s1 . . . sd , t1 . . . td ) = (

equal (si , ti ) =

1 0

i=1

equal (si , ti )

if si = ti otherwise

• k (ACTAG, CCTCG) =? • Is it a kernel?

Pn

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• Define a kernel on strings

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

k : Σd × Σd → R • Use the kernel along with a kernel learning regression

algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different

strings • k (s1 . . . sd , t1 . . . td ) = (

equal (si , ti ) =

1 0

i=1

equal (si , ti )

if si = ti otherwise

• k (ACTAG, CCTCG) =? • Is it a kernel?

Pn

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• Define a kernel on strings

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

k : Σd × Σd → R • Use the kernel along with a kernel learning regression

algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different

strings • k (s1 . . . sd , t1 . . . td ) = (

equal (si , ti ) =

1 0

i=1

equal (si , ti )

if si = ti otherwise

• k (ACTAG, CCTCG) =? • Is it a kernel?

Pn

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• Define a kernel on strings

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

k : Σd × Σd → R • Use the kernel along with a kernel learning regression

algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different

strings • k (s1 . . . sd , t1 . . . td ) = (

equal (si , ti ) =

i=1

equal (si , ti )

1 if si = ti 0 otherwise

• k (ACTAG, CCTCG) =? • Is it a kernel?

Pn

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• Define a kernel on strings

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

k : Σd × Σd → R • Use the kernel along with a kernel learning regression

algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different

strings • k (s1 . . . sd , t1 . . . td ) = (

equal (si , ti ) =

i=1

equal (si , ti )

1 if si = ti 0 otherwise

• k (ACTAG, CCTCG) =? • Is it a kernel?

Pn

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• Define a kernel on strings

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

k : Σd × Σd → R • Use the kernel along with a kernel learning regression

algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different

strings • k (s1 . . . sd , t1 . . . td ) = (

equal (si , ti ) =

i=1

equal (si , ti )

1 if si = ti 0 otherwise

• k (ACTAG, CCTCG) =? • Is it a kernel?

Pn

Introduction to Kernel Methods

Induced Feature Space

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• What is the feature space induced by k ? •

φ : Σd

→ R4d

7→ (x11 , . . . , x41 , x12 , . . . , x42 , . . . , x1d , . . . , x4d )  0 0  (1, 0, 0, 0) if sj = A   (0, 1, 0, 0) if s = 0 C0 j (x1j , . . . , x4j ) = 0 0  (0, 0, 1, 0) if s j = G    (0, 0, 0, 1) if s = 0 T0 j

s1 . . . sd

Introduction to Kernel Methods

Induced Feature Space

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• What is the feature space induced by k ? •

φ : Σd

→ R4d

7→ (x11 , . . . , x41 , x12 , . . . , x42 , . . . , x1d , . . . , x4d )  0 0  (1, 0, 0, 0) if sj = A   (0, 1, 0, 0) if s = 0 C0 j (x1j , . . . , x4j ) = 0 0  (0, 0, 1, 0) if s j = G    (0, 0, 0, 1) if s = 0 T0 j

s1 . . . sd

Introduction to Kernel Methods F. Gonz´ alez

References

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Shawe-Taylor, J. and Cristianini, N. 2004 Kernel Methods for Pattern Analysis. Cambridge University Press.

Suggest Documents