2School of Mathematics and Computational Science. Sun Yat-sen University, P. R. China. The 10th IEEE International Confe
A Conscience On-line Learning Approach for Kernel-Based Clustering Chang-Dong Wang1 Jian-Huang Lai1 Jun-Yong Zhu2 1 School
of Information Science and Technology Sun Yat-sen University, P. R. China.
2 School
of Mathematics and Computational Science Sun Yat-sen University, P. R. China.
The 10th IEEE International Conference on Data Mining Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
1 / 24
Outline
1
Motivation Kernel Clustering and Problem Previous Work
2
Conscience On-Line Learning The Model Computation
3
Applications Digit Clustering Video Clustering
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
2 / 24
Kernel Clustering Kernel clustering is one of major methods for partitioning nonlinearly separable dataset. 1.5 1 0.5 0 −0.5 −1 −1.5
Wang et al. (Sun Yat-sen University)
−1
−0.5
0
0.5
1
1.5
COLL for Kernel-Based Clustering
2
2.5
ICDM 2010
3 / 24
Kernel Clustering Given X = {x1 , . . . , xn }, a kernel mapping φ, and the number of clusters c, find an assignment function ν ν : X → {1, . . . , c} to minimize the distortion error n X
k φ(xi ) − µνi k2 ,
i=1
where νi is short for ν(xi ), and µk =
1 |ν −1 (k )|
X
φ(xi ),
νi = arg min k φ(xi ) − µk k2
φ(xi )∈ν −1 (k )
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
k =1,...,c
ICDM 2010
4 / 24
Problem
k -means is used to solve min 1
Pn
i=1 k
φ(xi ) − µνi k2 via
Assigning the samples: νi ← arg min k φ(xi ) − µk k2 , ∀i = 1, . . . , N, k =1,...,c
2
Recomputing the prototypes: µk ←
1 −1 |ν (k )|
X φ(xi
φ(xi ), ∀k = 1, . . . , c.
)∈ν −1 (k )
However, ill-initialization would cause degenerate result.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
5 / 24
Problem
k -means is used to solve min 1
Pn
i=1 k
φ(xi ) − µνi k2 via
Assigning the samples: νi ← arg min k φ(xi ) − µk k2 , ∀i = 1, . . . , N, k =1,...,c
2
Recomputing the prototypes: µk ←
1 −1 |ν (k )|
X φ(xi
φ(xi ), ∀k = 1, . . . , c.
)∈ν −1 (k )
However, ill-initialization would cause degenerate result.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
5 / 24
Problem
k -means is used to solve min 1
Pn
i=1 k
φ(xi ) − µνi k2 via
Assigning the samples: νi ← arg min k φ(xi ) − µk k2 , ∀i = 1, . . . , N, k =1,...,c
2
Recomputing the prototypes: µk ←
1 −1 |ν (k )|
X φ(xi
φ(xi ), ∀k = 1, . . . , c.
)∈ν −1 (k )
However, ill-initialization would cause degenerate result.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
5 / 24
Problem
k -means is used to solve min 1
Pn
i=1 k
φ(xi ) − µνi k2 via
Assigning the samples: νi ← arg min k φ(xi ) − µk k2 , ∀i = 1, . . . , N, k =1,...,c
2
Recomputing the prototypes: µk ←
1 −1 |ν (k )|
X φ(xi
φ(xi ), ∀k = 1, . . . , c.
)∈ν −1 (k )
However, ill-initialization would cause degenerate result.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
5 / 24
Ill-initialization and Degenerate Clustering
μ1 μ1
μ2: degenerate
μ2: ill−initialized μ3
(a) Ill-initialization
μ3
(b) Degenerate results by k -means
Figure: Ill-initialization and degenerate result by k -means.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
6 / 24
Previous Work
1
Methods refining the initial prototypes. e.g., Bradley and Fayyad, ’98, Khan and Ahmad, ’04.
2
Lower bound approach. e.g., Zhang et al., ’06.
3
Evolutionary algorithm. e.g., Krishna and Murty, ’99, Abolhassani et al., ’04.
4
Global search strategy, i.e., global k -means. e.g., Likas et al., ’03, Tzortzis and Likas, ’09.
Computationally expensive.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
7 / 24
Previous Work
1
Methods refining the initial prototypes. e.g., Bradley and Fayyad, ’98, Khan and Ahmad, ’04.
2
Lower bound approach. e.g., Zhang et al., ’06.
3
Evolutionary algorithm. e.g., Krishna and Murty, ’99, Abolhassani et al., ’04.
4
Global search strategy, i.e., global k -means. e.g., Likas et al., ’03, Tzortzis and Likas, ’09.
Computationally expensive.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
7 / 24
Previous Work
1
Methods refining the initial prototypes. e.g., Bradley and Fayyad, ’98, Khan and Ahmad, ’04.
2
Lower bound approach. e.g., Zhang et al., ’06.
3
Evolutionary algorithm. e.g., Krishna and Murty, ’99, Abolhassani et al., ’04.
4
Global search strategy, i.e., global k -means. e.g., Likas et al., ’03, Tzortzis and Likas, ’09.
Computationally expensive.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
7 / 24
Previous Work
1
Methods refining the initial prototypes. e.g., Bradley and Fayyad, ’98, Khan and Ahmad, ’04.
2
Lower bound approach. e.g., Zhang et al., ’06.
3
Evolutionary algorithm. e.g., Krishna and Murty, ’99, Abolhassani et al., ’04.
4
Global search strategy, i.e., global k -means. e.g., Likas et al., ’03, Tzortzis and Likas, ’09.
Computationally expensive.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
7 / 24
Previous Work
1
Methods refining the initial prototypes. e.g., Bradley and Fayyad, ’98, Khan and Ahmad, ’04.
2
Lower bound approach. e.g., Zhang et al., ’06.
3
Evolutionary algorithm. e.g., Krishna and Murty, ’99, Abolhassani et al., ’04.
4
Global search strategy, i.e., global k -means. e.g., Likas et al., ’03, Tzortzis and Likas, ’09.
Computationally expensive.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
7 / 24
Conscience On-Line Learning (COLL) For each randomly taken data point φ(xi ): 1
Select the winner based on conscience mechanism: νi = arg min {fk k φ(xi ) − µk k2 }, k =1,...,c
2
Update the winner: µνi ← µνi + ηt φ(xi ) − µνi ,
3
Update the winning frequencies {fk , k = 1, . . . , c}: c X nl , ∀k = 1, . . . , c. nνi ← nνi + 1, fk = nk / l=1
So as to Bring all prototypes available into the solution quickly. Allow all prototypes to win the competition fairly. Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
8 / 24
Conscience On-Line Learning (COLL) For each randomly taken data point φ(xi ): 1
Select the winner based on conscience mechanism: νi = arg min {fk k φ(xi ) − µk k2 }, k =1,...,c
2
Update the winner: µνi ← µνi + ηt φ(xi ) − µνi ,
3
Update the winning frequencies {fk , k = 1, . . . , c}: c X nl , ∀k = 1, . . . , c. nνi ← nνi + 1, fk = nk / l=1
So as to Bring all prototypes available into the solution quickly. Allow all prototypes to win the competition fairly. Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
8 / 24
Conscience On-Line Learning (COLL) For each randomly taken data point φ(xi ): 1
Select the winner based on conscience mechanism: νi = arg min {fk k φ(xi ) − µk k2 }, k =1,...,c
2
Update the winner: µνi ← µνi + ηt φ(xi ) − µνi ,
3
Update the winning frequencies {fk , k = 1, . . . , c}: c X nl , ∀k = 1, . . . , c. nνi ← nνi + 1, fk = nk / l=1
So as to Bring all prototypes available into the solution quickly. Allow all prototypes to win the competition fairly. Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
8 / 24
Conscience On-Line Learning (COLL) For each randomly taken data point φ(xi ): 1
Select the winner based on conscience mechanism: νi = arg min {fk k φ(xi ) − µk k2 }, k =1,...,c
2
Update the winner: µνi ← µνi + ηt φ(xi ) − µνi ,
3
Update the winning frequencies {fk , k = 1, . . . , c}: c X nl , ∀k = 1, . . . , c. nνi ← nνi + 1, fk = nk / l=1
So as to Bring all prototypes available into the solution quickly. Allow all prototypes to win the competition fairly. Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
8 / 24
Conscience On-Line Learning (COLL) For each randomly taken data point φ(xi ): 1
Select the winner based on conscience mechanism: νi = arg min {fk k φ(xi ) − µk k2 }, k =1,...,c
2
Update the winner: µνi ← µνi + ηt φ(xi ) − µνi ,
3
Update the winning frequencies {fk , k = 1, . . . , c}: c X nl , ∀k = 1, . . . , c. nνi ← nνi + 1, fk = nk / l=1
So as to Bring all prototypes available into the solution quickly. Allow all prototypes to win the competition fairly. Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
8 / 24
Conscience On-Line Learning (COLL) For each randomly taken data point φ(xi ): 1
Select the winner based on conscience mechanism: νi = arg min {fk k φ(xi ) − µk k2 }, k =1,...,c
2
Update the winner: µνi ← µνi + ηt φ(xi ) − µνi ,
3
Update the winning frequencies {fk , k = 1, . . . , c}: c X nl , ∀k = 1, . . . , c. nνi ← nνi + 1, fk = nk / l=1
So as to Bring all prototypes available into the solution quickly. Allow all prototypes to win the competition fairly. Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
8 / 24
Conscience On-Line Learning
μ1 μ1
μ2: ill−initialized
μ2
μ3 μ
3
(a) Ill-initialization
(b) Satisfactory result by COLL
Figure: Ill-initialization and satisfactory result by COLL.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
9 / 24
Conscience On-Line Learning 1 0.9
Winning frequency
0.8 0.7 0.6 0.5 0.4 0.3 0.2 Prototype 1 Prototype 2 Prototype 3
0.1 0 0
20
40
60 80 100 120 140 Competition index over six iterations
160
180
200
Figure: The winning frequencies {f1 , f2 , f3 } of three prototypes as a function of competition index. Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
10 / 24
Kernel Matrix
The kernel mapping φ is often unknown or hard to obtain. The feature space Y is characterized by the kernel function κ(x, z) = hφ(x), φ(z)i and corresponding kernel matrix Ki,j = κ(xi , xj ). The distance between two points in the kernel space is computed via kernel trick. For instance, the distance between point φ(xi ) and prototype µk is computed as P P 2 j∈ν −1 (k ) Ki,j h,l∈ν −1 (k ) Kh,l 2 − k φ(xi ) − µk k = Ki,i + |ν −1 (k )|2 |ν −1 (k )| The COLL model must work with only the kernel matrix K .
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
11 / 24
Kernel Matrix
The kernel mapping φ is often unknown or hard to obtain. The feature space Y is characterized by the kernel function κ(x, z) = hφ(x), φ(z)i and corresponding kernel matrix Ki,j = κ(xi , xj ). The distance between two points in the kernel space is computed via kernel trick. For instance, the distance between point φ(xi ) and prototype µk is computed as P P 2 j∈ν −1 (k ) Ki,j h,l∈ν −1 (k ) Kh,l 2 − k φ(xi ) − µk k = Ki,i + |ν −1 (k )|2 |ν −1 (k )| The COLL model must work with only the kernel matrix K .
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
11 / 24
Kernel Matrix
The kernel mapping φ is often unknown or hard to obtain. The feature space Y is characterized by the kernel function κ(x, z) = hφ(x), φ(z)i and corresponding kernel matrix Ki,j = κ(xi , xj ). The distance between two points in the kernel space is computed via kernel trick. For instance, the distance between point φ(xi ) and prototype µk is computed as P P 2 j∈ν −1 (k ) Ki,j h,l∈ν −1 (k ) Kh,l 2 − k φ(xi ) − µk k = Ki,i + |ν −1 (k )|2 |ν −1 (k )| The COLL model must work with only the kernel matrix K .
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
11 / 24
Prototype Descriptor To this end, we develop an efficient framework for computation of the COLL model based on the prototype descriptor.
Definition (Prototype descriptor) A prototype descriptor is a matrix W φ ∈ Rc×(n+1) , such that the k -th row represents prototype µk by Wkφ,i = hµk , φ(xi )i, ∀i = 1, . . . , n, Wkφ,n+1 = hµk , µk i, i.e., Wφ =
hµ1 , φ(x1 )i . . . hµ2 , φ(x1 )i . . . .. .. . . hµc , φ(x1 )i . . .
Wang et al. (Sun Yat-sen University)
hµ1 , φ(xn )i hµ1 , µ1 i hµ2 , φ(xn )i hµ2 , µ2 i .. .. . . hµc , φ(xn )i hµc , µc i
COLL for Kernel-Based Clustering
.
ICDM 2010
12 / 24
Prototype Descriptor To this end, we develop an efficient framework for computation of the COLL model based on the prototype descriptor.
Definition (Prototype descriptor) A prototype descriptor is a matrix W φ ∈ Rc×(n+1) , such that the k -th row represents prototype µk by Wkφ,i = hµk , φ(xi )i, ∀i = 1, . . . , n, Wkφ,n+1 = hµk , µk i, i.e., Wφ =
hµ1 , φ(x1 )i . . . hµ2 , φ(x1 )i . . . .. .. . . hµc , φ(x1 )i . . .
Wang et al. (Sun Yat-sen University)
hµ1 , φ(xn )i hµ1 , µ1 i hµ2 , φ(xn )i hµ2 , µ2 i .. .. . . hµc , φ(xn )i hµc , µc i
COLL for Kernel-Based Clustering
.
ICDM 2010
12 / 24
Initialization
Theorem (Initialization) The random initialization of prototype descriptor can be realized by φ W:,1:n = AK ,
φ W:,n+1 = diag(AKA> )
c×n where the matrix A = [Ak ,i ]c×n ∈ R+ reflects the initial assignment ν
( Ak ,i =
Wang et al. (Sun Yat-sen University)
1 |ν −1 (k )|
if i ∈ ν −1 (k )
0
otherwise.
COLL for Kernel-Based Clustering
ICDM 2010
13 / 24
Winner Selection & Updating Theorem (Conscience based winner selection) The winning prototype can be selected based on the conscience mechanism as νi = arg min {fk · (Ki,i + Wkφ,n+1 − 2Wkφ,i )}. k =1,...,c
Theorem (On-line winner updating) The winner can be updated in the way of φ (1 − ηt )Wνi ,j + ηt Ki,j Wνφi ,j ← (1 − ηt )2 Wνφi ,j + ηt2 Ki,i + 2(1 − ηt )ηt Wνφi ,i Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
j = 1, . . . , n, j = n + 1.
ICDM 2010
14 / 24
Winner Selection & Updating Theorem (Conscience based winner selection) The winning prototype can be selected based on the conscience mechanism as νi = arg min {fk · (Ki,i + Wkφ,n+1 − 2Wkφ,i )}. k =1,...,c
Theorem (On-line winner updating) The winner can be updated in the way of φ (1 − ηt )Wνi ,j + ηt Ki,j Wνφi ,j ← (1 − ηt )2 Wνφi ,j + ηt2 Ki,i + 2(1 − ηt )ηt Wνφi ,i Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
j = 1, . . . , n, j = n + 1.
ICDM 2010
14 / 24
Iteration Stopping Criterion
Theorem (Iteration stopping criterion) If eφ < or t > tmax , stop the iteration. φ
e
! 2 mk X mk c X Kπk ,πk X 1 φ 2 h l = 1− W + η t k ,n+1 (1 − ηt )mk (1 − ηt )h+l k =1 k =1 h=1 l=1 φ m c k Wk ,πk X X 1 l . 1− +2ηt (1 − ηt )mk (1 − ηt )l c X
k =1
l=1
Here, the array π k stores the indices of mk ordered points assigned to the k-th prototype in one iteration.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
15 / 24
Demonstration of COLL Winner Updating
φ (ο )
Nonlinear separator x ο
ο
ο W1,: ο
x
x Wˆv ,:
x
ο
x
ο
Wv ,:
φ (ο )
x′
φ
φ (ο )
W1,:φ
φ (ο ) φ (ο )
x
φ (ο ) Linear separator
φ (x) φ (x) φ (x′)
φ (x) Wˆ φ v ,:
φ (x) Wvφ,: φ (x) φ (x)
Figure: The mapping φ embeds the data into a feature space where the nonlinear pattern becomes linear. Then COLL is performed in this feature space. The linear separator in feature space is actually nonlinear in input space, so is the update of the winner.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
16 / 24
Digit Clustering Table: Summary of four digit datasets.
Dataset Pendigits Mfeat USPS MNIST
n 10992 2000 11000 5000
c 10 10 10 10
d 16 649 256 784
Balanced × √ √ √
σ 60.53 809.34 1286.70 2018.30
Figure: Some samples of MNIST.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
17 / 24
Comparing Convergence Rate 800
500
Kernel k−means The proposed COLL
700 600
φ
Log(eφ)
500 Log(e )
Kernel k−means The proposed COLL
400
400 300 200
300
200
100
100 0 0
10
20
30
40
50 60 Iteration step
70
80
90
0 0
100
10
20
30
(a) Pendigits 800
70
80
90
100
500
500 Log(eφ)
600
400
400
300
300
200
200
100
Kernel k−means The proposed COLL
700
600
φ
Log(e )
50 60 Iteration step
800
Kernel k−means The proposed COLL
700
0 0
40
(b) Mfeat
100 10
20
30
40
50 60 Iteration step
70
80
90
100
0 0
10
20
(c) USPS
30
40
50 60 Iteration step
70
80
90
100
(d) MNIST
Figure: Comparing the convergence rate in terms of log(eφ ) as a function of iteration step. Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
18 / 24
Comparing Distortion Error Kernel k−means Global kernel k−means The proposed COLL
7200
Distortion error
Distortion error
1700
7000 6900 6800
1600 1500 1400
6700
1300
6600 6500 2
Kernel k−means Global kernel k−means The proposed COLL
1800
7100
4
6
8
10 12 Number of clusters
14
16
18
1200 2
20
4
6
(a) Pendigits Kernel k−means Global kernel k−means The proposed COLL
8200 8100
10 12 Number of clusters
14
16
18
20
Kernel k−means Global kernel k−means The proposed COLL
3150 3100
8000
3050 Distortion error
Distortion error
8
(b) Mfeat
7900 7800 7700
3000 2950 2900
7600 2850 7500 2800 7400 2
4
6
8
10 12 Number of clusters
14
16
18
20
2750 2
4
6
(c) USPS
8
10 12 Number of clusters
14
16
18
20
(d) MNIST
Figure: Comparing the distortion error as a function of cluster number.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
19 / 24
Comparing using Internal & External Measurement Table: Average distortion error (DE).
Dataset Kk -means Gkk -means Pendigits 6704.0 6664.9 Mfeat 1405.5 1363.3 USPS 8004.7 7782.0 MNIST 3034.3 2894.1
COLL 6619.3 1324.2 7561.0 2754.9
Table: Average normalized mutual information (NMI).
Dataset Kk -means Gkk -means Pendigits 0.715 0.736 Mfeat 0.533 0.542 USPS 0.354 0.369 MNIST 0.441 0.472 Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
COLL 0.753 0.604 0.461 0.520 ICDM 2010
20 / 24
Video Clustering Cluster 1: 0001-0061
Cluster 2: 0062-0163
Cluster 3: 0164-0289
Cluster 4: 0290-0400
Cluster 5: 0401-0504
Cluster 6:0505-699
Cluster 7: 0700-0827
Cluster 8: 0828-1022
Cluster 9: 1023-1083
Cluster 10: 1084-1238
Cluster 11: 1239-1778
Cluster 12: 1779-1983
Cluster 13: 1984-2119
Cluster 14: 2120-2243
Cluster 15: 2244-2400
Cluster 16: 2401-2492
Figure: Clustering frames of ANNI002 into 16 scenes using COLL. Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
21 / 24
Comparison Results
Table: The means of NMI and computational time in seconds on the 11 video sequences. Video (#frames) ANNI001 (914) ANNI002 (2492) ANNI003 (4265) ANNI004 (3897) ANNI005 (11361) ANNI006 (16588) ANNI007 (1588) ANNI008 (2773) ANNI009 (12304) ANNI010 (30363) ANNI011 (1987)
Wang et al. (Sun Yat-sen University)
kk -means NMI Time 0.781 72.2 0.705 94.7 0.712 102.2 0.731 98.3 0.645 152.2 0.622 193.0 0.727 81.1 0.749 95.9 0.727 167.0 0.661 257.2 0.738 85.4
gkk -means NMI Time 0.801 94.0 0.721 126.4 0.739 139.2 0.750 121.6 0.656 173.3 0.638 255.5 0.740 136.7 0.771 119.0 0.763 184.4 0.709 426.4 0.749 142.7
COLL for Kernel-Based Clustering
Proposed COLL NMI Time 0.851 70.4 0.741 89.0 0.762 99.5 0.759 93.6 0.680 141.2 0.642 182.3 0.770 79.1 0.794 81.5 0.781 160.4 0.734 249.0 0.785 83.7
ICDM 2010
22 / 24
Summary
1
In this paper, we have shown that, existing method for solving kernel clustering may suffer the ill-initialization problem.
2
We present a conscience on-line learning (COLL) approach for addressing the ill-initialization.
3
In the future, we may explore the on-line learning framework with other mechanisms, such as rival penalization, to realize automatic cluster number selection in the kernel clustering.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
23 / 24
Summary
1
In this paper, we have shown that, existing method for solving kernel clustering may suffer the ill-initialization problem.
2
We present a conscience on-line learning (COLL) approach for addressing the ill-initialization.
3
In the future, we may explore the on-line learning framework with other mechanisms, such as rival penalization, to realize automatic cluster number selection in the kernel clustering.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
23 / 24
Summary
1
In this paper, we have shown that, existing method for solving kernel clustering may suffer the ill-initialization problem.
2
We present a conscience on-line learning (COLL) approach for addressing the ill-initialization.
3
In the future, we may explore the on-line learning framework with other mechanisms, such as rival penalization, to realize automatic cluster number selection in the kernel clustering.
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
23 / 24
Thank you very much! Q&A
Wang et al. (Sun Yat-sen University)
COLL for Kernel-Based Clustering
ICDM 2010
24 / 24