A Conscience On-line Learning Approach for Kernel ... - Google Sites

A Conscience On-line Learning Approach for Kernel-Based Clustering Chang-Dong Wang1 Jian-Huang Lai1 Jun-Yong Zhu2 1 School

of Information Science and Technology Sun Yat-sen University, P. R. China.

2 School

of Mathematics and Computational Science Sun Yat-sen University, P. R. China.

The 10th IEEE International Conference on Data Mining Wang et al. (Sun Yat-sen University)

COLL for Kernel-Based Clustering

ICDM 2010

1 / 24

Outline

1

Motivation Kernel Clustering and Problem Previous Work

2

Conscience On-Line Learning The Model Computation

3

Applications Digit Clustering Video Clustering

Wang et al. (Sun Yat-sen University)


ICDM 2010

2 / 24

Kernel Clustering Kernel clustering is one of major methods for partitioning nonlinearly separable dataset. 1.5 1 0.5 0 −0.5 −1 −1.5


−1

−0.5

0

0.5

1

1.5


2

2.5

ICDM 2010

3 / 24

Kernel Clustering Given X = {x1 , . . . , xn }, a kernel mapping φ, and the number of clusters c, find an assignment function ν ν : X → {1, . . . , c} to minimize the distortion error n X

k φ(xi ) − µνi k2 ,

i=1

where νi is short for ν(xi ), and µk =

1 |ν −1 (k )|

X

φ(xi ),

νi = arg min k φ(xi ) − µk k2

φ(xi )∈ν −1 (k )



k =1,...,c

ICDM 2010

4 / 24

Problem

k -means is used to solve min 1

Pn

i=1 k

φ(xi ) − µνi k2 via

Assigning the samples: νi ← arg min k φ(xi ) − µk k2 , ∀i = 1, . . . , N, k =1,...,c

2

Recomputing the prototypes: µk ←

1 −1 |ν (k )|

X φ(xi

φ(xi ), ∀k = 1, . . . , c.

)∈ν −1 (k )

However, ill-initialization would cause degenerate result.



ICDM 2010

5 / 24

Problem


Pn

i=1 k



2


1 −1 |ν (k )|

X φ(xi

φ(xi ), ∀k = 1, . . . , c.

)∈ν −1 (k )




ICDM 2010

5 / 24

Problem


Pn

i=1 k



2


1 −1 |ν (k )|

X φ(xi

φ(xi ), ∀k = 1, . . . , c.

)∈ν −1 (k )




ICDM 2010

5 / 24

Problem


Pn

i=1 k



2


1 −1 |ν (k )|

X φ(xi

φ(xi ), ∀k = 1, . . . , c.

)∈ν −1 (k )




ICDM 2010

5 / 24

Ill-initialization and Degenerate Clustering

μ1 μ1

μ2: degenerate

μ2: ill−initialized μ3

(a) Ill-initialization

μ3

(b) Degenerate results by k -means

Figure: Ill-initialization and degenerate result by k -means.



ICDM 2010

6 / 24

Previous Work

1

Methods refining the initial prototypes. e.g., Bradley and Fayyad, ’98, Khan and Ahmad, ’04.

2

Lower bound approach. e.g., Zhang et al., ’06.

3

Evolutionary algorithm. e.g., Krishna and Murty, ’99, Abolhassani et al., ’04.

4

Global search strategy, i.e., global k -means. e.g., Likas et al., ’03, Tzortzis and Likas, ’09.

Computationally expensive.



ICDM 2010

7 / 24

Previous Work

1


2


3


4





ICDM 2010

7 / 24

Previous Work

1


2


3


4





ICDM 2010

7 / 24

Previous Work

1


2


3


4





ICDM 2010

7 / 24

Previous Work

1


2


3


4





ICDM 2010

7 / 24

Conscience On-Line Learning (COLL) For each randomly taken data point φ(xi ): 1

Select the winner based on conscience mechanism: νi = arg min {fk k φ(xi ) − µk k2 }, k =1,...,c

2

Update the winner: µνi ← µνi + ηt φ(xi ) − µνi ,

3

Update the winning frequencies {fk , k = 1, . . . , c}: c X nl , ∀k = 1, . . . , c. nνi ← nνi + 1, fk = nk / l=1

So as to Bring all prototypes available into the solution quickly. Allow all prototypes to win the competition fairly. Wang et al. (Sun Yat-sen University)


ICDM 2010

8 / 24



2


3




ICDM 2010

8 / 24



2


3




ICDM 2010

8 / 24



2


3




ICDM 2010

8 / 24



2


3




ICDM 2010

8 / 24



2


3




ICDM 2010

8 / 24

Conscience On-Line Learning

μ1 μ1

μ2: ill−initialized

μ2

μ3 μ

3

(a) Ill-initialization

(b) Satisfactory result by COLL

Figure: Ill-initialization and satisfactory result by COLL.



ICDM 2010

9 / 24

Conscience On-Line Learning 1 0.9

Winning frequency

0.8 0.7 0.6 0.5 0.4 0.3 0.2 Prototype 1 Prototype 2 Prototype 3

0.1 0 0

20

40

60 80 100 120 140 Competition index over six iterations

160

180

200

Figure: The winning frequencies {f1 , f2 , f3 } of three prototypes as a function of competition index. Wang et al. (Sun Yat-sen University)


ICDM 2010

10 / 24

Kernel Matrix

The kernel mapping φ is often unknown or hard to obtain. The feature space Y is characterized by the kernel function κ(x, z) = hφ(x), φ(z)i and corresponding kernel matrix Ki,j = κ(xi , xj ). The distance between two points in the kernel space is computed via kernel trick. For instance, the distance between point φ(xi ) and prototype µk is computed as P P 2 j∈ν −1 (k ) Ki,j h,l∈ν −1 (k ) Kh,l 2 − k φ(xi ) − µk k = Ki,i + |ν −1 (k )|2 |ν −1 (k )| The COLL model must work with only the kernel matrix K .



ICDM 2010

11 / 24

Kernel Matrix




ICDM 2010

11 / 24

Kernel Matrix




ICDM 2010

11 / 24

Prototype Descriptor To this end, we develop an efficient framework for computation of the COLL model based on the prototype descriptor.

Definition (Prototype descriptor) A prototype descriptor is a matrix W φ ∈ Rc×(n+1) , such that the k -th row represents prototype µk by Wkφ,i = hµk , φ(xi )i, ∀i = 1, . . . , n, Wkφ,n+1 = hµk , µk i, i.e.,    Wφ =  

hµ1 , φ(x1 )i . . . hµ2 , φ(x1 )i . . . .. .. . . hµc , φ(x1 )i . . .


hµ1 , φ(xn )i hµ1 , µ1 i hµ2 , φ(xn )i hµ2 , µ2 i .. .. . . hµc , φ(xn )i hµc , µc i


   . 

ICDM 2010

12 / 24

Prototype Descriptor To this end, we develop an efficient framework for computation of the COLL model based on the prototype descriptor.

Definition (Prototype descriptor) A prototype descriptor is a matrix W φ ∈ Rc×(n+1) , such that the k -th row represents prototype µk by Wkφ,i = hµk , φ(xi )i, ∀i = 1, . . . , n, Wkφ,n+1 = hµk , µk i, i.e.,    Wφ =  

hµ1 , φ(x1 )i . . . hµ2 , φ(x1 )i . . . .. .. . . hµc , φ(x1 )i . . .


hµ1 , φ(xn )i hµ1 , µ1 i hµ2 , φ(xn )i hµ2 , µ2 i .. .. . . hµc , φ(xn )i hµc , µc i


   . 

ICDM 2010

12 / 24

Initialization

Theorem (Initialization) The random initialization of prototype descriptor can be realized by φ W:,1:n = AK ,

φ W:,n+1 = diag(AKA> )

c×n where the matrix A = [Ak ,i ]c×n ∈ R+ reflects the initial assignment ν

( Ak ,i =


1 |ν −1 (k )|

if i ∈ ν −1 (k )

0

otherwise.


ICDM 2010

13 / 24

Winner Selection & Updating Theorem (Conscience based winner selection) The winning prototype can be selected based on the conscience mechanism as νi = arg min {fk · (Ki,i + Wkφ,n+1 − 2Wkφ,i )}. k =1,...,c

Theorem (On-line winner updating) The winner can be updated in the way of  φ  (1 − ηt )Wνi ,j + ηt Ki,j Wνφi ,j ←   (1 − ηt )2 Wνφi ,j + ηt2 Ki,i + 2(1 − ηt )ηt Wνφi ,i Wang et al. (Sun Yat-sen University)


j = 1, . . . , n, j = n + 1.

ICDM 2010

14 / 24

Winner Selection & Updating Theorem (Conscience based winner selection) The winning prototype can be selected based on the conscience mechanism as νi = arg min {fk · (Ki,i + Wkφ,n+1 − 2Wkφ,i )}. k =1,...,c

Theorem (On-line winner updating) The winner can be updated in the way of  φ  (1 − ηt )Wνi ,j + ηt Ki,j Wνφi ,j ←   (1 − ηt )2 Wνφi ,j + ηt2 Ki,i + 2(1 − ηt )ηt Wνφi ,i Wang et al. (Sun Yat-sen University)


j = 1, . . . , n, j = n + 1.

ICDM 2010

14 / 24

Iteration Stopping Criterion

Theorem (Iteration stopping criterion) If eφ < or t > tmax , stop the iteration. φ

e

! 2 mk X mk c X Kπk ,πk X 1 φ 2 h l = 1− W + η t k ,n+1 (1 − ηt )mk (1 − ηt )h+l k =1 k =1 h=1 l=1   φ m c k Wk ,πk X X 1 l .  1− +2ηt (1 − ηt )mk (1 − ηt )l c X

k =1

l=1

Here, the array π k stores the indices of mk ordered points assigned to the k-th prototype in one iteration.



ICDM 2010

15 / 24

Demonstration of COLL Winner Updating

φ (ο )

Nonlinear separator x ο

ο

ο W1,: ο

x

x Wˆv ,:

x

ο

x

ο

Wv ,:

φ (ο )

x′

φ

φ (ο )

W1,:φ

φ (ο ) φ (ο )

x

φ (ο ) Linear separator

φ (x) φ (x) φ (x′)

φ (x) Wˆ φ v ,:

φ (x) Wvφ,: φ (x) φ (x)

Figure: The mapping φ embeds the data into a feature space where the nonlinear pattern becomes linear. Then COLL is performed in this feature space. The linear separator in feature space is actually nonlinear in input space, so is the update of the winner.



ICDM 2010

16 / 24

Digit Clustering Table: Summary of four digit datasets.

Dataset Pendigits Mfeat USPS MNIST

n 10992 2000 11000 5000

c 10 10 10 10

d 16 649 256 784

Balanced × √ √ √

σ 60.53 809.34 1286.70 2018.30

Figure: Some samples of MNIST.



ICDM 2010

17 / 24

Comparing Convergence Rate 800

500

Kernel k−means The proposed COLL

700 600

φ

Log(eφ)

500 Log(e )


400

400 300 200

300

200

100

100 0 0

10

20

30

40

50 60 Iteration step

70

80

90

0 0

100

10

20

30

(a) Pendigits 800

70

80

90

100

500

500 Log(eφ)

600

400

400

300

300

200

200

100


700

600

φ

Log(e )


800


700

0 0

40

(b) Mfeat

100 10

20

30

40


70

80

90

100

0 0

10

20

(c) USPS

30

40


70

80

90

100

(d) MNIST

Figure: Comparing the convergence rate in terms of log(eφ ) as a function of iteration step. Wang et al. (Sun Yat-sen University)


ICDM 2010

18 / 24

Comparing Distortion Error Kernel k−means Global kernel k−means The proposed COLL

7200

Distortion error

Distortion error

1700

7000 6900 6800

1600 1500 1400

6700

1300

6600 6500 2

Kernel k−means Global kernel k−means The proposed COLL

1800

7100

4

6

8

10 12 Number of clusters

14

16

18

1200 2

20

4

6

(a) Pendigits Kernel k−means Global kernel k−means The proposed COLL

8200 8100


14

16

18

20

Kernel k−means Global kernel k−means The proposed COLL

3150 3100

8000

3050 Distortion error

Distortion error

8

(b) Mfeat

7900 7800 7700

3000 2950 2900

7600 2850 7500 2800 7400 2

4

6

8


14

16

18

20

2750 2

4

6

(c) USPS

8


14

16

18

20

(d) MNIST

Figure: Comparing the distortion error as a function of cluster number.



ICDM 2010

19 / 24

Comparing using Internal & External Measurement Table: Average distortion error (DE).

Dataset Kk -means Gkk -means Pendigits 6704.0 6664.9 Mfeat 1405.5 1363.3 USPS 8004.7 7782.0 MNIST 3034.3 2894.1

COLL 6619.3 1324.2 7561.0 2754.9

Table: Average normalized mutual information (NMI).

Dataset Kk -means Gkk -means Pendigits 0.715 0.736 Mfeat 0.533 0.542 USPS 0.354 0.369 MNIST 0.441 0.472 Wang et al. (Sun Yat-sen University)


COLL 0.753 0.604 0.461 0.520 ICDM 2010

20 / 24

Video Clustering Cluster 1: 0001-0061

Cluster 2: 0062-0163

Cluster 3: 0164-0289

Cluster 4: 0290-0400

Cluster 5: 0401-0504

Cluster 6:0505-699

Cluster 7: 0700-0827

Cluster 8: 0828-1022

Cluster 9: 1023-1083

Cluster 10: 1084-1238

Cluster 11: 1239-1778

Cluster 12: 1779-1983

Cluster 13: 1984-2119

Cluster 14: 2120-2243

Cluster 15: 2244-2400

Cluster 16: 2401-2492

Figure: Clustering frames of ANNI002 into 16 scenes using COLL. Wang et al. (Sun Yat-sen University)


ICDM 2010

21 / 24

Comparison Results

Table: The means of NMI and computational time in seconds on the 11 video sequences. Video (#frames) ANNI001 (914) ANNI002 (2492) ANNI003 (4265) ANNI004 (3897) ANNI005 (11361) ANNI006 (16588) ANNI007 (1588) ANNI008 (2773) ANNI009 (12304) ANNI010 (30363) ANNI011 (1987)


kk -means NMI Time 0.781 72.2 0.705 94.7 0.712 102.2 0.731 98.3 0.645 152.2 0.622 193.0 0.727 81.1 0.749 95.9 0.727 167.0 0.661 257.2 0.738 85.4

gkk -means NMI Time 0.801 94.0 0.721 126.4 0.739 139.2 0.750 121.6 0.656 173.3 0.638 255.5 0.740 136.7 0.771 119.0 0.763 184.4 0.709 426.4 0.749 142.7


Proposed COLL NMI Time 0.851 70.4 0.741 89.0 0.762 99.5 0.759 93.6 0.680 141.2 0.642 182.3 0.770 79.1 0.794 81.5 0.781 160.4 0.734 249.0 0.785 83.7

ICDM 2010

22 / 24

Summary

1

In this paper, we have shown that, existing method for solving kernel clustering may suffer the ill-initialization problem.

2

We present a conscience on-line learning (COLL) approach for addressing the ill-initialization.

3

In the future, we may explore the on-line learning framework with other mechanisms, such as rival penalization, to realize automatic cluster number selection in the kernel clustering.



ICDM 2010

23 / 24

Summary

1


2


3




ICDM 2010

23 / 24

Summary

1


2


3




ICDM 2010

23 / 24

Thank you very much! Q&A



ICDM 2010

24 / 24

A Conscience On-line Learning Approach for Kernel ... - Google Sites

A Conscience On-line Learning Approach for Kernel ... - Google Sites

Suggest Documents

A Conscience On-line Learning Approach for Kernel ... - Google Sites

A Multiple Operator-valued Kernel Learning Approach ... - Google Sites

A Fast Multiple Kernel Learning Approach

Hyperparameter Learning for Kernel Embedding ... - Google Sites

A Cooperative Q-learning Approach for Online Power ... - Google Sites

A kernel entropy manifold learning approach for financial data analysis

A Deep Feature based Multi-kernel Learning Approach for Video

A Deep Feature based Multi-kernel Learning Approach for ... - CCVCL

A Multiple Kernel Learning Approach for Human Behavioral ... - arXiv

A Multiple Kernel Learning Approach for Human Behavioral ... - arXiv

Research Article A Multiple Kernel Learning Approach for ... - Hindawi

A kernel entropy manifold learning approach for financial data analysis

A Kernel Approach

Conscience Accounting - Google Sites

A machine learning approach for online automated

[PDF] The Conscience of a Liberal Online Ebook - Google Sites

Read Online A Case of Conscience Full ePub - Google Sites

Download [PDF] A Case of Conscience Online Books - Google Sites

Learning-Based Approach for Online Lane Change ... - Google Sites

Read Online Deep Learning: A Practitioner s Approach ... - Google Sites

PDF Online Deep Learning: A Practitioner's Approach ... - Google Sites

pdf-11148\online-learning-a-user-friendly-approach-for ... - Google

A Deep Learning Approach To Multiple Kernel Fusion

An interpretable multiple kernel learning approach for the discovery of