Kernel spectral clustering: model representations, sparsity and out-of ...

Kernel spectral clustering: model representations, sparsity and out-of-sample extensions Johan Suykens and Carlos Alzate K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 10 B-3001 Leuven (Heverlee), Belgium Email: {johan.suykens,carlos.alzate}@esat.kuleuven.be http://www.esat.kuleuven.be/scd/ 4th Int. Conf. on Computational Harmonic Analysis May 2011, Hong Kong

ICCHA4, Hong Kong

Overview

•

Introduction

• Kernel PCA: primal and dual model representations • Spectral clustering • Kernel spectral clustering • Model selection • Sparsity • Incorporating prior knowledge

Introduction

Application studies related to spectral clustering: • complex networks: clustering time series in power grid networks • clustering of scientific journals • clustering time series in the CoRoT exoplanet database • prognostics tools for predicting maintenance of machines • image segmentation [Alzate et al. 2007, 2009 ; Varon et al., 2011]

ICCHA4, Hong Kong

1

Power grid networks 1

Winter 0.9

Normalized load

0.8

0.7

0.6

0.5

0.4

Summer 0.3

1

24

48

72

Hour

96

120

144

168

Network: 245 substations in Belgian grid Data: hourly load, seasonal/weekly/intra-daily patterns Aim: short-term load forecasting, important for power generation decisions Clustering: identifying customer profiles from time-series (over 5 years) [Espinoza et al., IEEE CSM 2007; Alzate et al., 2009] ICCHA4, Hong Kong

2

Kernel-based models and learning theory

• Classification and regression: the use of kernel-based models is well established aim: good predictive model (achieving good generalization) model selection (kernel parameters, regularization constants): CV

Clustering: underlying kernel-based models? out-of-sample extensions? learning and generalization? training, validation, test data? tuning parameters: how to determine? ICCHA4, Hong Kong

3

Kernel-based models and learning theory

• Classification and regression: the use of kernel-based models is well established aim: good predictive model (achieving good generalization) model selection (kernel parameters, regularization constants): CV • Clustering: underlying kernel-based models? out-of-sample extensions? learning and generalization? training, validation, test data? tuning parameters: how to determine?

ICCHA4, Hong Kong

4

Overview

• Introduction •

Kernel PCA: primal and dual model representations

• Spectral clustering • Kernel spectral clustering • Model selection • Sparsity • Incorporating prior knowledge

Kernel principal component analysis 1.5

1.5

1

1

0.5 0.5

x(2) 0

0

−0.5 −0.5 −1 −1 −1.5 −1.5

−2

−2.5 −1.5

−1

x(1) 0 linear PCA

−0.5

0.5

−2 −1.5

1

−1

−0.5

0

0.5

1

kernel PCA (RBF kernel)

Kernel PCA [Sch¨ olkopf et al., 1998]:

eigenvalue decomposition of ICCHA4, Hong Kong





K(x1 , x1) ... K(x1, xN ) .. ..   K(xN , x1) ... K(xN , xN ) 5

Kernel PCA: primal and dual problem • Underlying primal problem [Suykens et al., 2003] • Primal problem: N

1 X 2 1 T ei s.t. ei = wT ϕ(xi) + b, i = 1, ..., N. min − w w + γ w,b,e 2 2 i=1 • (Lagrange) dual problem = kernel PCA : Ωcα = λα with λ = 1/γ with Ωc,ij = (ϕ(xi) − µ ˆϕ)T (ϕ(xj ) − µ ˆϕ) the centered kernel matrix. • Interpretation: 1. pool of candidates components (objective function equals zero) 2. select relevant components ICCHA4, Hong Kong

6

Kernel PCA: model representations

Primal and dual model representations: wj ϕj (x∗) + b = wT ϕ(x∗) + b

eˆ∗ =

P

j

(D) : eˆ∗ =

P

i αi K(x∗ , xi )

(P ) : ր M ց

+b

which can be evaluated at any point x∗ ∈ Rd, where K(x∗ , xi) = ϕ(x∗)T ϕ(xi) with K(·, ·) a positive definite kernel and feature map ϕ(·) : Rd → Rnh .

ICCHA4, Hong Kong

7

Sparse and robust versions • Iteratively weighted L2 loss (to reduce the influence of outliers): min

w,b,ei

subject to

PN 1 1 T − 2 w w + γ 2 i=1 vi e2i ei = wT ϕ(xi) + b, i = 1, ..., N.

• Other loss functions: e.g. Huber loss for robustness, ǫ-insensitive loss for sparsity min

w,b,ei

subject to

PN 1 1 T − 2 w w + γ 2 i=1 L(ei ) ei = wT ϕ(xi) + b, i = 1, ..., N.

[Alzate & Suykens, IEEE-TNN, 2008] ICCHA4, Hong Kong

8

Robustness: Kernel Component Analysis original image

corrupted image

KPCA reconstruction

Weighted LS-SVM: robustness [Alzate & Suykens, IEEE-TNN 2008] ICCHA4, Hong Kong

9

Robustness: Kernel Component Analysis original image

corrupted image

KPCA reconstruction

KCA reconstruction

Weighted LS-SVM: robustness and sparsity [Alzate & Suykens, IEEE-TNN 2008] ICCHA4, Hong Kong

9

Overview

• Introduction • Kernel PCA: primal and dual model representations •

Spectral clustering

• Kernel spectral clustering • Model selection • Sparsity • Incorporating prior knowledge

Spectral graph clustering Minimal cut: given the graph G = (V, E), find clusters A1 , A2 min

qi ∈{−1,+1}

1X wij (qi − qj )2 2 i,j

with cluster membership indicator qi (qi = 1 if i ∈ A1, qi = −1 if i ∈ A2) and W = [wij ] the weighted adjacency matrix. 5 4

1 2

6

3

cut of size 2

cut of size 1 (minimal cut)

ICCHA4, Hong Kong

10

Spectral graph clustering • Min-cut spectral clustering problem min q˜T L˜ q

q˜T q˜=1

with L = D − W the unnormalized graph Laplacian, degree matrix P D = diag(d1, ..., dN ), di = j wij , giving L˜ q = λ˜ q. Cluster member indicators: qî = sign(˜ qi − θ) with threshold θ. • Normalized cut L˜ q = λD q˜ [Fiedler, 1973; Shi & Malik, 2000; Ng et al. 2002; Chung, 1997; von Luxburg, 2007]

• Discrete version to continuous problem (Laplace operator) [Belkin & Niyogi, 2003; von Luxburg et al., 2008; Smale & Zhou, 2007]

ICCHA4, Hong Kong

11

Spectral clustering + K-means

ICCHA4, Hong Kong

12

Overview

• Introduction • Kernel PCA: primal and dual model representations • Spectral clustering •

Kernel spectral clustering

• Model selection • Sparsity • Incorporating prior knowledge

Kernel spectral clustering: case of two clusters • Underlying model (primal representation): eˆ∗ = wT ϕ(x∗) + b with qˆ∗ = sign[ˆ e∗] the estimated cluster indicator at any x∗ ∈ Rd. • Primal problem: training on given data {xi }N i=1 N 1X 1 T min − w w+γ vi e2i w,b,e 2 2 i=1 subject to ei = wT ϕ(xi) + b, i = 1, ..., N

with positive weights vi (will be related to inverse degree matrix). [Alzate & Suykens, IEEE-PAMI, 2010] ICCHA4, Hong Kong

13

Lagrangian and conditions for optimality • Lagrangian: N N 1 T 1X 2 X αi(ei − wT ϕ(xi) − b) L(w, b, e; α) = − w w + γ vi ei − 2 2 i=1 i=1

• Conditions for optimality:  ∂L    =0 ⇒   ∂w    ∂L   =0 ⇒  ∂b ∂L   =0 ⇒   ∂e  i    ∂L   =0 ⇒  ∂αi

w= P

P

i αi

i αi ϕ(xi)

=0

αi = γviei, i = 1, ..., N ei = wT ϕ(xi) + b, i = 1, ..., N

• Eliminate w, b, e, write solution in α. ICCHA4, Hong Kong

14

Kernel-based model representation • Dual problem: V MV Ωα = λα with λ = 1/γ MV = IN − 1T V1 1 1N 1TN V : weighted centering matrix N

N

Ω = [Ωij ]: kernel matrix with Ωij = ϕ(xi)T ϕ(xj ) = K(xi , xj ) • Dual model representation:

eˆ∗ =

N X

αiK(xi, x∗) + b

i=1

with K(xi , x∗) = ϕ(xi)T ϕ(x∗). ICCHA4, Hong Kong

15

Choice of weights vi

• Take V = D

−1

where D = diag{d1, ..., dN } and di =

PN

j=1 Ωij

• This gives the generalized eigenvalue problem: MD Ωα = λDα with MD = IN − 1T D1−11 1N 1TN D −1 N

N

This is a modified version of random walks spectral clustering. • Note that sign[ei] = sign[αi] (on training data) ... but sign[e∗] applies beyond training data

ICCHA4, Hong Kong

16

Kernel spectral clustering: more clusters • Case of k clusters: additional sets of constraints min

w(l) ,e(l) ,bl

k−1

k−1

l=1

l=1

1 X (l)T (l) 1 X (l)T −1 (l) − w w + γle D e 2 2

subject to e(1) = ΦN ×nh w(1) + b11N e(2) = ΦN ×nh w(2) + b21N .. e(k−1) = ΦN ×nh w(k−1) + bk−11N (l)

(l)

where e(l) = [e1 ; ...; eN ] and ΦN ×nh = [ϕ(x1)T ; ...; ϕ(xN )T ] ∈ RN ×nh . • Dual problem: MD Ωα(l) = λDα(l), l = 1, ..., k − 1. [Alzate & Suykens, IEEE-PAMI, 2010]

ICCHA4, Hong Kong

17

Primal and dual model representations k clusters k − 1 sets of constraints (index l = 1, ..., k − 1)

(P ) :

(l) sign[ˆ e∗ ]

= sign[w

(l) T

ϕ(x∗) + bl]

ր M ց (l)

(D) : sign[ˆ e∗ ] = sign[

P

(l)

j αj K(x∗ , xj ) + bl ]

Note: additional sets of constraints also in multi-class and vector-valued output LS-SVMs [Suykens et al., 1999]

ICCHA4, Hong Kong

18

8

8

6

6

4

4

2

2

0

0

x(2)

x(2)

Out-of-sample extension and coding

−2

−2

−4

−4

−6

−6

−8

−8

−10 −12

−10

−8

−6

ICCHA4, Hong Kong

−4

−2

x(1)

0

2

4

6

−10 −12

−10

−8

−6

−4

−2

x(1)

0

2

4

6

19

8

8

6

6

4

4

2

2

0

0

x(2)

x(2)

Out-of-sample extension and coding

−2

−2

−4

−4

−6

−6

−8

−8

−10 −12

−10

−8

−6

ICCHA4, Hong Kong

−4

−2

x(1)

0

2

4

6

−10 −12

−10

−8

−6

−4

−2

x(1)

0

2

4

6

19

Overview

• Introduction • Kernel PCA: primal and dual model representations • Spectral clustering • Kernel spectral clustering •

Model selection

• Sparsity • Incorporating prior knowledge

Piecewise constant eigenvectors and extension • Definition. [Meila & Shi, 2001] Vector α is called piecewise constant relative to a partition (A1, ..., Ak ) iff αi = αj ∀xi, xj ∈ Ap, p = 1, ..., k. • Proposition. [Alzate & Suykens, 2010] Assume v v Nv (i) a training set D = {xi}N and validation set D = {x m }m=1 i=1 i.i.d. sampled from the same underlying distribution; (ii) a set of k clusters {A1 , ..., Ak } with k > 2; (iii) an isotropic kernel function such that K(x, z) = 0 when x and z belong to different clusters; (iv) the eigenvectors α(l) for l = 1, ..., k − 1 are piecewise constant.

Then validation set points belonging to the same cluster are collinear in the k − 1 dimensional subspace spanned by the columns of E v ∈ PN (l) (l) Nv ×(k−1) v R where Eml = em = i=1 αi K(xi , xvm) + bl. ICCHA4, Hong Kong

20

Piecewise constant eigenvectors and extension • Key aspect of the proof: one has (l) e∗

PN (l) (l) = i=1 αi K(xi , x∗ ) + b PN (l) (l) P (l) K(x , x ) + b α K(x , x ) + = cp i ∗ i ∗ i i∈A / p i∈Ap (l) P (l) K(x , x ) + b = cp i ∗ i∈Ap

• Model selection to determine kernel parameters and k: (k−1) (1) (2) ), Looking for line structures in the space (ei , ei , ..., ei evaluated on validation data (aiming for good generalization) • Choice kernel: Gaussian RBF kernel χ2-kernel for images ICCHA4, Hong Kong

21

Model selection (looking for lines): toy problem 1 25 0.4

20 0.3

15 10

0.1

5

(2) i,val

x(2)

0.2

e

0

0 −5

−0.1

−10 −0.2

−15

σ 2 = 0.5, BLF = 0.56

−0.3 −0.4 −0.4

−0.2

0

(1) 0.4 e i,val 0.2

0.6

−20 −25 −30

0.8

−20

−10

0 x(1)

10

20

30

−20

−10

0 x(1)

10

20

30

25 0.8

20 0.6

15 10

0.2

5

(2) i,val

x(2)

0.4

e

0

0 −5

−0.2

−10 −0.4 −0.6 −0.8 −0.4

−15

σ 2 = 0.16, BLF = 1.0 −0.2

(1) e i,val

validation set

ICCHA4, Hong Kong

0

−20

0.2

−25 −30

train + validation + test data

22

Model selection (looking for lines): toy problem 2 σ 2 = 1.20, BLF = 0.49

8

6

4

3

(2) i,val

2

x(3)

e

2

0

1 0

−1

−2

2 −2

−4

−3 2

−6 −8

0.3

−6

−4

−2 (1) e i,val

0

2

0 1

0

4

−1

x(2)

−2

−2

x(1)

σ 2 = 0.003, BLF = 1.0

0.1

3

(2) i,val

0.2

2

x(3)

0

e

−0.1

1 0

−1

−0.2

2 −2

−0.3

−0.4 −0.4

−3 2

−0.3

−0.2

(1) 0 e i,val

−0.1

validation set

ICCHA4, Hong Kong

0.1

0.2

0.3

0 1

0

−1

x(2)

−2

−2

x(1)

train + validation + test data

23

Example: image segmentation (looking for lines)

4 3

(3) i,val

2

e

1 0

−1 −2 3 2

2

1

1 0

0

−1

−1

(2) i,val

e

ICCHA4, Hong Kong

−2 −3

−2 −3

−4 −5

(1) i,val

e

24

Image ID

Image

Proposed method

Nyström method

Human

145086

42049

167062

147091

196073

62096

119082

3096

295087

37073

ICCHA4, Hong Kong

25

Example: power grid networks - identifying customer profiles Power load: 245 substations, hourly data (5 years), d = 43.824 Periodic AR modelling: dimensionality reduction 43.824 → 24 k-means clustering applied after dimensionality reduction

normalized load

1

0.9

normalized load

1

0.9

normalized load

1

0.9

normalized load

1

0.9

0.1

0.1

0.1

0.1

0

0

0

0.8

0.8

0.7

0.8

0.7

0.6

0.6

0.5

4

6

8

10

hour 12

14

16

18

20

22

24

0.4

0.3

0.2

2

0.5

0.4

0.3

0.2

0.6

0.5

0.4

0.3

0.7

0.6

0.5

0.4

0.8

0.7

0.3

0.2

2

4

6

8

10

hour 12

14

16

18

20

22

24

0.2

2

4

6

8

10

hour 12

14

16

18

20

22

24

0

normalized load

1

0.9

normalized load

1

0.9

normalized load

1

0.9

normalized load

1

0.9

0.1

0.1

0.1

0.1

0

0

0

0.8

0.8

0.7

0.8

0.7

0.6

0.5

0.4

0.3

0.2

2

4

6

8

10

hour 12

14

16

18

20

ICCHA4, Hong Kong

22

24

4

6

8

10

hour 12

14

16

18

20

22

24

16

18

20

22

24

2

4

6

8

10

hour

16

18

20

22

24

12

14

0.3

0.2

2

hour

0.4

0.3

0.2

10

0.5

0.4

0.3

8

0.6

0.5

0.4

6

0.7

0.6

0.5

4

0.8

0.7

0.6

2

0.2

2

4

6

8

10

hour 12

14

16

18

20

22

24

0

12

14

26

Clustering time-series: kernel spectral clustering Application of kernel spectral clustering, directly on d = 43.824 Model selection on kernel parameter and number of clusters [Alzate, Espinoza, De Moor, Suykens, 2009] 1

1

0.9

0.9

0.8

0.7

0.8

0.7

0.6

0.1

0.1

0

0

0

15

20

5

10

hour

15

20

5

10

normalized load

0.8

0.8

0.7

0.3

0.2

0.1

0.1

0

0

20

hour ICCHA4, Hong Kong

15

hour

0.4

0.3

0.2

10

0.5

0.4

0.3

5

0.6

0.5

0.4

0

0.7

0.6

0.5

20

0.8

0.7

0.6

15

normalized load

1 0.9

normalized load

1 0.9

15

0.1

hour

1

10

0.2

hour

0.9

5

0.3

0.2

0.1 10

0.4

0.3

0.2

5

0.5

0.4

0.3

0.2

0.6

0.5

0.4

0.3

0.7

0.6

0.5

0.4

0.8

0.7

0.6

0.5

normalized load

normalized load

0.8

normalized load

1 0.9

normalized load

1 0.9

0.2 0.1

5

10

15

hour

20

0

5

10

15

20

hour 27

20

1

1

0.9

0.9

0.9

0.8

0.8

0.8

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

normalized load

1

normalized load

normalized load

Clustering time-series: kernel spectral clustering

0.7 0.6 0.5 0.4 0.3 0.2 0.1

5

10

15

hour

20

0

0.7 0.6 0.5 0.4 0.3 0.2 0.1

5

10

15

hour

20

0

5

10

15

20

hour

Electricity load: 245 substations in Belgian grid (1/2 train, 1/2 validation) xi ∈ R43.824: spectral clustering on high dimensional data (5 years) 3 of 7 detected clusters: - 1: Residential profile: morning and evening peaks - 2: Business profile: peaked around noon - 3: Industrial profile: increasing morning, oscillating afternoon and evening

ICCHA4, Hong Kong

28

“Out-of-sample eigenvectors” • From the conditions for optimality: an eigenvector α satisfies α = γD −1e and 1TN α = 0. PN

• By defining deg(x) = j=1 K(x, xj ) the notion of eigenvector is extended to a validation set as follows: αval =

−1 eval [I − N1v 1Nv 1TNv ]γDval −1 evalk2 k[I − N1v 1Nv 1TNv ]γDval

satisfying kαvalk2 = 1 and 1TN αval = 0. Nv denotes the validation set size. [Alzate & Suykens, IJCNN 2011] ICCHA4, Hong Kong

29

Model selection (looking for dots): toy problem 2 badly tuned 0.03

σ = 0.003, Fisher = 1.0

0.02 0.01 3

(2) i,val

0 2

x(3)

−0.01

α

−0.02 −0.03

1 0

−1 2

−0.04

−2

−0.05

−3 2

0 (1) α i,val

−0.06 −0.05

0 1

0

0.05

−1

x(2)

−2

−2

x(1)

well tuned 0.05

σ = 1.20, Fisher = 0.12

0.04 0.03 3 2

0.01

1

x(3)

α

(2) i,val

0.02

0

0

−1 2

−0.01 −2

−0.02 −0.03 −0.04

ICCHA4, Hong Kong

−3 2

−0.03

−0.02

0 0.01 (1) α i,val

−0.01

0.02

0.03

0.04

0 1

0

−1

x(2)

−2

−2

x(1)

30

Example: image segmentation (looking for dots) 1 0.9

Fisher criterion

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2

4

6

8

0

0.02

10

12

14

Number of clusters k

0.1

0.05

(2)

αi,val

0

−0.05

−0.1

−0.15

−0.2

−0.25 −0.04

ICCHA4, Hong Kong

−0.02

(1)0.04

αi,val

0.06

0.08

0.1

31

Overview

• Introduction • Kernel PCA: primal and dual model representations • Spectral clustering • Kernel spectral clustering • Model selection •

Sparsity

• Incorporating prior knowledge

Kernel spectral clustering: sparse kernel models original image

binary clustering

Incomplete Cholesky decomposition: kΩ − GGT k2 ≤ η with G ∈ RN ×R and R ≪ N Image (Berkeley image dataset): 321 × 481 (154, 401 pixels), 175 SV (l) e∗

ICCHA4, Hong Kong

=

P

(l)

i∈SSV αi K(xi , x∗ ) + bl 32

Kernel spectral clustering: sparse kernel models sparse kernel model

original image

Incomplete Cholesky decomposition: kΩ − GGT k2 ≤ η with G ∈ RN ×R and R ≪ N Image (Berkeley image dataset): 321 × 481 (154, 401 pixels), 175 SV (l) e∗

ICCHA4, Hong Kong

=

P

(l)

i∈SSV αi K(xi , x∗ ) + bl 32

Highly sparse kernel models on images • application on images: xi ∈ R3 (r,g,b values per pixel), i = 1, ..., N pre-processed into zi ∈ R8 (quantization to 8 colors) χ2-kernel to compare two local color histograms (5 × 5 pixels window) • N > 100.000, select subset M ≪ N based on quadratic Renyi entropy as in the fixed-size method [Suykens et al., 2002] • Highly sparse representations: # SV = 3 k • Completion of cluster indicators based on out-of-sample extensions P (l) (l) sign[ˆ e∗ ] = sign[ j∈SSV αj K(x∗ , xj ) + bl] applied to the full image [Alzate & Suykens, Neurocomputing 2011]

ICCHA4, Hong Kong

33

Highly sparse kernel models: toy example 1

4

8

3

6

2 4

1

ei

(2)

x(2)

2

0

0

−1 −2

−2

−4

−3

−6 −5

0

5

10

−4 −3

−2

−1

0

1

2

3

(1) ei

x(1)

only 3k = 9 support vectors

ICCHA4, Hong Kong

34


4

3

2

x(2)

1

0

−1

−2

−3

−4 −4

ICCHA4, Hong Kong

−3

−2

−1

0

x(1)

1

2

3

4

35

Highly sparse kernel models: toy example 2 5 4 3 2

x(2)

1 0 −1 −2 −3 −4 −5 −5

−4

−3

−2

−1

0 x(1)

1

2

3

4

5

only 3k = 12 support vectors ICCHA4, Hong Kong

35


4

eî

(3)

2

0

−2

−4 4

6 4

2 2

0 0 −2

(2) eî

ICCHA4, Hong Kong

−2 −4

−4

(1)

eî

35

Highly sparse kernel models: image segmentation *

** *

*

* * ***

* * *

* *

*

*

*

*

* *

*

* ** *

* *

* * * * ** * ** * * * * * * * * ** * * * * * * ** * * * ** ** * *** * * * * * * * * * * ** * * * * * * * * * **** * * **** * * * * ** * * * ** ** * * * * * **** * * * * * * * * * * ** * * * * ** * * * *** * * * * * * * * * *** ** * * * * * * * * * * ** * * * * * * * * * * ***** * * * * **** * ** * * * * * * * ** * * * * * ** ** ** * * *** * * * * * *** * * ** * * * ** * * * * *** * * * ** ** * * * ** * * * * ** * * *** * * ** * ** * * ** *** * * * * * ** * * * * * * * * * ** * * * ** * * *** * * * *** * * * * ** * ** * * ** * * * * * * * * * ** * * * * * * ** * ** ** * * * * ** * * ** **** * * ** * ** * * * ** * * * * * ** * *** * * * * * * ** **** ** ** * * * ** * * * * * * ** * *** * * ** * * * * * ** * * * ** * * * * * * *** * * * ** ** * * * * * ** ** * * ** * * * * * * * * ** * * * ** * * ** * * * * * ** ** * * * * * * * * ** * * * * * * * ** * * * * * * * ** * ** * * * * * * ** ** * * * * * * * * * * * * *

*

**

*

* *

*

0.5 0 −0.5

(3)

* *

* * *

* * * ** *

−1

ei

* * ** **

−1.5 −2 −2.5 −3 2 1

0.5 0

0 −1

(2)

ei

ICCHA4, Hong Kong

−0.5 −2

−1 −3

−1.5

(1)

ei

36

Highly sparse kernel models: image segmentation * * *

*

*

*

0

** *

−0.5

*

(3)

*

0.5

−1

ei

*

−1.5 −2 −2.5 −3 2 1

0.5 0

0 −1

(2)

ei

−0.5 −2

−1 −3

−1.5

(1)

ei

only 3k = 12 support vectors ICCHA4, Hong Kong

36

Overview

• Introduction • Kernel PCA: primal and dual model representations • Spectral clustering • Kernel spectral clustering • Model selection • Sparsity •

Incorporating prior knowledge

Kernel spectral clustering: adding prior knowledge • Pair of points x†, x‡: c = 1 must-link, c = −1 cannot-link • Primal problem [Alzate & Suykens, IJCNN 2009] k−1

min

w(l) ,e(l) ,bl

k−1

1 X (l)T (l) 1 X (l)T −1 (l) − w w + γle D e 2 2 (1)

l=1

l=1

(1)

subject to e = ΦN ×nh w + b11N .. e(k−1) = ΦN ×nh w(k−1) + bk−11N T

T

w(1) ϕ(x†) = cw(1) ϕ(x‡) .. T T w(k−1) ϕ(x†) = cw(k−1) ϕ(x‡) • Dual problem: yields rank-one downdate of the kernel matrix ICCHA4, Hong Kong

37

Kernel spectral clustering: example

original image

ICCHA4, Hong Kong

without constraints

38

Kernel spectral clustering: example

original image

ICCHA4, Hong Kong

with constraints

39

Conclusions • Spectral clustering within a kernel-based learning framework • Training problem: characterization in terms of primal and dual problem • Out-of-sample extensions: primal and dual model representations • Extend desirable piecewise constant property to validation level • New model selection criteria (learning and generalization aspects) • (highly) sparse kernel models • Suitable for adding prior knowledge through constraints

ICCHA4, Hong Kong

40

References Downloadable from www.esat.kuleuven.be/sista/lssvmlab/ Alzate C., Suykens J.A.K., “Multiway Spectral Clustering with Out-of-Sample Extensions through Weighted Kernel PCA”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2), 335-347, 2010 Alzate C., Suykens J.A.K., “Sparse Kernel Spectral Clustering Models for Large-Scale Data Analysis”, Neurocomputing, 74(9), 1382-1390, 2011 Alzate C., Suykens J.A.K., “A Regularized Formulation for Spectral Clustering with Pairwise Constraints”, International Joint Conference on Neural Networks (IJCNN 2009), Atlanta US, 2009, 141-148 Alzate C., Suykens J.A.K., “Out-of-Sample Eigenvectors in Kernel Spectral Clustering”, to appear International Joint Conference on Neural Networks (IJCNN 2011) Suykens J.A.K., “Data Visualization and Dimensionality Reduction using Kernel Maps with a Reference Point”, IEEE Transactions on Neural Networks, 19(9), 1501-1517, 2008 Suykens J.A.K., Alzate C., Pelckmans K., “Primal and dual model representations in kernel-based learning”, Statistics Surveys, 4, 148-183, 2010

ICCHA4, Hong Kong

41

Kernel spectral clustering: model representations, sparsity and out-of ...

Kernel spectral clustering: model representations, sparsity and out-of ...

Suggest Documents

Modularity-based model selection for kernel spectral clustering

Kernel Spectral Clustering for Big Data Networks

Highly Sparse Reductions to Kernel Spectral Clustering

Clustering data over time using kernel spectral clustering with memory

Kernel k-means, Spectral Clustering and Normalized Cuts - Google Sites

Self-Tuned Kernel Spectral Clustering for Large Scale Networks

Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large

Kernel spectral clustering for predicting maintenance of ... - KU Leuven

Community detection using Kernel Spectral Clustering ... - KU Leuven

Kernel Spectral Clustering with Memory Effect - KU Leuven

Kernel spectral clustering for community detection in complex networks

Agglomerative Hierarchical Kernel Spectral Data Clustering - KU Leuven

Primal and dual model representations in kernel ... - ESAT - K.U.Leuven

On Potts Model Clustering, Kernel K-means, and ... - Semantic Scholar

Hybrid model of clustering and kernel ... - Semantic Scholar

Tensor Representations via Kernel Linearization

Tensor Representations via Kernel Linearization

Robust Clustering Using Outlier-Sparsity Regularization - arXiv

Learning Spectral Clustering

Spectral Clustering

Spectral Embedded Clustering - IJCAI

Sparsity Constrained Graph Regularized NMF for Spectral

SPARSITY-BASED SPATIAL-SPECTRAL RESTORATION OF MUSE ...

Active Spectral Clustering - CiteSeerX