Sandrine Mouysset, Joseph Noailles and Daniel Ruiz. â. Abstractâ Spectral ... tion of the Laplace-Beltrami operator and heat kernel on. *IRIT-ENSEEIHT,. 2 rue.
Proceedings of the World Congress on Engineering 2010 Vol I WCE 2010, June 30 - July 2, 2010, London, U.K.
On An Interpretation Of Spectral Clustering Via Heat Equation And Finite Elements Theory Sandrine Mouysset, Joseph Noailles and Daniel Ruiz
Abstract— Spectral clustering methods use eigenvectors of a matrix, called Gaussian affinity matrix, in order to define a low-dimensional space in which data points can be clustered. This matrix is widely used and depends on a free parameter σ. It is usually interpreted as some discretization of the Heat Equation Green kernel. Combining tools from Partial Differential Equations and Finite Elements theory, we propose an interpretation of this spectral method which offers an alternative point of view on how spectral clustering works. This approach develops some particular geometrical properties inherent to eigenfunctions of some specific partial differential equation problem. We analyze experimentally how this geometrical property is recovered in the eigenvectors of the affinity matrix and we also study the influence of the parameter σ. Keywords: Clustering, Machine Learning, Spatial Data Mining
1
Introduction
Clustering aims to partition a data set by grouping similar elements into subsets. Two general main issues concern, on the one hand, the choice of a similarity criterion and, on the other hand, how to separate clusters the one from the other. Spectral methods, and in particular the spectral clustering algorithm introduced by Ng-JordanWeiss (NJW) [8], are useful when considering non-convex shaped subsets of points. Spectral clustering (SC) consists in defining a low-dimensional data space by selecting particular eigenvectors of a matrix called normalized affinity matrix in which data points are clustered. In this clustering method, two main problems arise : first, what sense can we give to the notion of cluster when considering finite discrete data sets and then, how can we link these clusters to some spectral elements (eigenvalues/eigenvectors extracted in SC). Spectral Clustering can be analyzed using graph theory and some normalized cut criterion leading to the solution of some trace-maximum problems (see, for instance, [8], [9]). As the aim is to group data points in some sense of closeness, Spectral Clustering can also be interpreted by exploiting the neighbourhood property in the discretization of the Laplace-Beltrami operator and heat kernel on ∗ IRIT-ENSEEIHT, 2 rue Charles Camichel, University of Toulouse, France, 31017 BP 122 {sandrine.mouysset,joseph.noailles,daniel.ruiz}@enseeiht.fr
ISBN: 978-988-17012-9-9 ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
∗
manifolds [2] : in the algorithm, neighbourhood property is represented by the step of adjacency graph. Moreover, consistency of the method was investigated by considering the asymptotic behaviour of clustering when samples become very large [11, 10]. Some properties under standard assumptions for the normalized Laplacian matrices were proved, including convergence of the first eigenvectors to eigenfunctions of some limit operator [11, 7]. To the best of our knowledge, the main results establish this relation for huge numbers of points. However, from a numerical point of view, SC still works for smaller data sets. So, in this paper, we try to give explanations that may address a given data sample. A second problem appears when considering the most widely used affinity matrix in spectral clustering techniques, as in the NJW-algorithm. It is based on the Gaussian measure between points which depends on a free scale parameter σ which has to be properly defined. Many investigations on this parameter were led and several definitions were suggested either heuristically [12, 3], or with physical considerations [6], or from geometrical point of views [5]. The difficulty to fix this choice seems to be tightly connected to the lack of some clustering property explaining how the grouping in this low-dimensional space defines correctly the partitioning in the original data. We will show how this free parameter affects clustering results. In this paper, as spectral elements used in SC do not give explicitly this topological criteria for a discrete data set, we are drawing back to some continuous formulation wherein clusters will appear as disjoint subsets. So eigenvectors of the affinity matrix A will be interpreted as eigenfunctions of some operator. This drawback is performed (section 3) using Finite Elements (FE). Indeed, with FE whose nodes correspond to data points, representation of any L2 -function is given by its nodal value. So we expect to interpret both A and its eigenvectors as a representation of respectively a L2 operator and L2 functions on some bounded regular Ω. Now, the commonly used operator whose Finite Elements representation matches with A is the kernel of the Heat equation on unbounded space (noted KH ). As its spectrum is essential, we cannot interpret eigenvectors of A as a representation of eigenfunctions of KH . However, on a bounded domain O, KH is closed to KD , the kernel of the Heat equation on a bounded domain Ω, for Ω including
WCE 2010
Proceedings of the World Congress on Engineering 2010 Vol I WCE 2010, June 30 - July 2, 2010, London, U.K.
strictly O (section 2.2). Now, operator SD (convolution by KD ) has eigenfunctions (vni ) in H01 (Ω). From this, we show (proposition 1) that operator SH (convolution by KH ) admits these vni as near eigenfunctions plus a residue (noted η). At last, using Finite Elements approximation, we show (proposition 2) that eigenvectors of A are a representation of these eigenfunctions plus a residue (noted ψ). To summarize, the main result of this paper is that for a fixed data set of points, the eigenvectors of A are the representation of functions which support is included in only one connected component at once. The accuracy of this representation is shown to depend for a fixed density of points, on the affinity parameter t. This result is illustrated with small fixed data points (section 4).
1.1
Spectral clustering Algorithm
Let us first give some notations and recall the NJW algorithm for a n points data set in a p-dimensional euclidean space. Assume that the number of targeted clusters k is known. The algorithm contains few steps which are described as follows: Algorithm 1 Spectral Clustering Algorithm Input : data set, number of clusters k 1. Form the affinity matrix A ∈ Rn×n defined by: ( ) { ∥x −x ∥2 exp − i2σ2j if i ̸= j, (1) Aij = 0 otherwise, 2. Construct the normalized matrix: ∑n L = D−1/2 AD−1/2 with Di,i = j=1 Aij , 3. Assemble the matrix X = [X1 X2 ..Xk ] ∈ Rn×k by stacking those eigenvectors associated with the k largest eigenvalues of L, 4. Form the matrix Y by normalizing each row in the n × k matrix X, 5. Treat each row of Y as a point in Rk , and group them in k clusters via the K-means method, 6. Assign the original point xi to cluster j when row i of matrix Y belongs to cluster j.
In this paper, we won’t take into account the normalization of the affinity matrix described in step 2. But we focus on the two main steps of this algorithm: assembling the Gaussian affinity matrix and extracting its eigenvectors to create the low-dimensional space.
ISBN: 978-988-17012-9-9 ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
2
Clustering property of Heat kernel
As recalled in introduction, Spectral Clustering consists in selecting particular eigenvectors of the normalized affinity matrix defined by (1). Gaussian affinity coefficients Aij are usually interpreted as a representation of the Heat kernel evaluated at data points xi and xj . ( ) 2 p be the Heat Let KH (t, x, y) = (4πt)− 2 exp − ∥x−y∥ 4t ∗ p p kernel in R+ × R × R . Note that the Gaussian affinity between two distinct data points xi and xj is defined p by Aij = (2πσ 2 )− 2 KH (σ 2 /2, xi , xj ). Now, consider the initial value problem in L2 (Rp ), for f ∈ L2 (Rp ): { ∂t u − ∆u = 0 for (x, t) ∈ Rp × R+ \{0}, (PRp ) u(x, 0) = f for x ∈ Rp , and introduce the solution operator in L2 (Rp ) of (PRp ): ∫ (SH (t)f )(x) = KH (t, x, y)f (y)dy, x ∈ Rp . (2) Rp
Now if f (y) ≈ δxj (y), where δxj denotes the Dirac function on xj , one can observe that: ) ( ) ( 2 ( )p σ2 σ SH ( )f (xi ) ≈ KH , xi − xj = 2πσ 2 2 (A + In )ij . 2 2 Thus, the spectral properties of matrix A used in the Spectral clustering algorithm seems to be related to those of operator SH (t). We propose to analyze this in details in the following.
2.1
Heat equation with Dirichlet boundary conditions
To simplify, we shall fix in the following the number of clusters k to 2, but the discussion can be extended to any arbitrary number k. Consider then a bounded open set Ω in Rp made up of two disjoint connected open subsets Ω1 and Ω2 . Assume that the boundary ∂Ω = ∂Ω1 ∪∂Ω2 with ∂Ω1 ∩∂Ω2 = ∅ is regular enough so that the trace operator is well-posed on ∂Ω and the spectrum decomposition of Laplacian operator is well-defined. For instance, ∂Ω is C 1 on both ∂Ω1 and ∂Ω2 . According to [4], determining the eigenvalues (λn (Ω))n>0 and associated eigenfunctions (vn )n>0 of the Dirichlet Laplacian on Ω: { ∆vn = λvn in Ω, L (PΩ ) vn = 0 on ∂Ω, is equivalent to define a function of the eigenvalues (λn1 (Ω1 ))n1 >0 and (λn2 (Ω2 ))n2 >0 and of the eigenfunctions of the Dirichlet Laplacian on Ω1 and Ω2 respectively. In other words, if vn ∈ H01 (Ω), then ∆v = λv on Ω if and only if for i ∈ {1, 2}, vi = v|Ωi ∈ H01 (Ωi ) satisfies ∆vni = λvni on Ωi . Therefore, {λn (Ω)}n>0 =
WCE 2010
Proceedings of the World Congress on Engineering 2010 Vol I WCE 2010, June 30 - July 2, 2010, London, U.K.
{λn1 (Ω1 )}n1 >0 ∪ {λn2 (Ω2 )}n2 >0 . Additionally, if we consider vni the solution on Ωi of PΩLi , an eigenfunction of the Laplacian operator ∆ associated to λni , and if we extend the support of vni ∈ Ωi to the whole set Ω by setting vni = 0 in Ω\Ωi , we also get an eigenfunction of the Laplacian ∆ in H01 (Ω) associated to the same eigenvalue λni . Note that since Ω1 and Ω2 are disjoint, the extension by 0 to the whole set Ω of any function in H01 (Ωi ) is a function in H01 (Ω). Consequently, the union of the two sets of eigenfunctions {(vni )ni >0 , i ∈ {1, 2}}, is an Hilbert basis of H01 (Ω). Now let f ∈ L2 (Ω) and consider the heat problem with Dirichlet conditions on the bounded set Ω = Ω1 ∪ Ω2 : + ∂t u − ∆u = 0 in R × Ω, D + (PΩ ) u = 0, on R × ∂Ω, u(t = 0) = f, in Ω. Ω KD the 2
(PΩD ).
Denote by Green’s kernel of The solution operator in H (Ω) ∩ H01 (Ω)) associated to this problem is defined, for f ∈ L2 (Ω), by: ∫ Ω SD (t)f (x) = KD (t, x, y)f (y)dy, x ∈ Rp .
Ω
1
O
O1 δ
y
2.2
A clustering property from eigenfunctions of Heat equation with Dirichlet boundary conditions
We consider now the solution operator SH (t) of (PRp ). As opposed to the previous case, we can not deduce similar properties as with the operator solution of (PΩD ) because the spectrum of SH (t) is essential and eigenfunctions are not localized in Rp without boundary conditions. However, we are going to compare, for fixed values Ω of t, SH (t) and SD (t) and derive some asymptotic properties on SH (t) that approximate the previously analyzed Ω properties of SD (t). For ε > 0, let introduce an open subset O which approximates from the interior the open set Ω, in the sense ¯ ⊂ Ω and Volume(Ω\O) ≤ ε. Note δ(y) the that O ⊂ O distance of a point y ∈ O to ∂Ω, the boundary of Ω, and δ = inf y∈O δ(y) the distance from O to Ω (see fig1). At last, O is supposed to be chosen such that δ > 0. As recalled in [1], for t > 0, x and y in Ω, KH (t, x, y) is a distribution that corresponds to the elementary solutions
ISBN: 978-988-17012-9-9 ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
Ω2
Ω1
(a)
(b)
Figure 1: (a) Approximation of Ω1 with an open set O1 , ◦
(b) Mesh representation : xi ∈ Ω, ∀i ∈ {1, ..n} of the heat equation (PRp )|Ω , KH is strictly positive on the boundary ∂Ω and, due to the maximum principle, KD is bounded above by KH , ∀(t, x, y) ∈ R+ × Ω × Ω : (3)
0 < KD (t, x, y) < KH (t, x, y).
Ωi Furthermore, both Green’s kernels KH and KD approximate each other on each open set Oi , i ∈ {1, 2} and an estimation about their difference can be given by, ∀(x, y) ∈ Oi × Oi , ∀i ∈ {1, 2} : Ωi 0 < KH (t, x, y) − KD (t, x, y) ≤
Ω Ω Eigenfunctions of SD (t) and those of the Laplacian operator ∆ are the same. Indeed, if vni is an eigenfunction of (PΩL ) associated to an eigenvalue λni then vni is also eigenfunction of SD (t) with eigenvalue etλni in the Ω sense that SD (t)vni (x) = etλni vni (x). Thus the geometrical property of the eigenfunction is preserved, i.e the support of eigenfunctions can be defined in only one connected component Ω1 or Ω2 .
O2
1
δy
δ(y)2 1 − 4t . p e (4πt) 2
(4)
Now, for any f ∈ L2 (Θ), let us come back to the heat solution SH (t)f and restrict its support to the open set Oi , for i ∈ {1, 2}. For any open set Θ, the resulting restricted operator SH to Θ, for f ∈ L2 (Θ) is: (∫ ) Θ SH (t)f (x) = KH (t, x, y)f (y)dy |Θ , x ∈ Θ. Θ
At last, define also Ti , for i ∈ {1, 2} as the L2 -mapping from L2 (Oi ) onto L2 (O) which extends the support of any function u ∈ L2 (Oi ) to the whole set Ω with: { u in Oi , 2 ∀u ∈ L (Oi ), Ti (u) = (5) 0 in Ω\Oi With these notations, we can then state the following proposition: Proposition 1 Let i ∈ {1, 2}, vf ni = Ti (vni |Oi ) where vni is the eigenfunction in H 2 (Ωi ) ∩ H01 (Ωi ) associated to Ωi exp−λni t for operator SD . Then, for t > 0, we have the following result: O −λni t SH (t)vf vf ni = exp ni + η(t, vf ni ),
(6)
with ∥η(t, vf ni )∥L2 (O) → 0 when t → 0, δ → 0. Proof sketch. inequality,
O By triangular
−λni t
S (t)vf
∥η(t, vf − exp v f is ni )∥L2 (O) = n n i i L2 (O) H bounded above by three terms: ∥η(t, vf ni )∥L2 (O) ≤ ∥ς(t, vni |Oi )∥L2 (O) + . . .
WCE 2010
Proceedings of the World Congress on Engineering 2010 Vol I WCE 2010, June 30 - July 2, 2010, London, U.K.
Ωi
Ωi
. . . SD (t)Ti (vni |Oi − vni ) + SD (t)vni − e−λni t vf ni
L2 (O)
On the right side, Ti (vni |Oi − vni ) has its support included in Ωi \Oi and supp Ωi \Oi tends to ∅ when δ → 0. Ωi Moreover, operator SD (t) tends to identity when t → 0. So both two last terms in the right side of this inequality vanish when (t, δ) → 0. O For vni |Oi ∈ L2 (Oi ), ∥ς(t, vni |Oi )∥L2 (O) = ∥SH (t)vf ni − Ωi 2 SD (t)vf ∥ can be bounded above by means of both ni L (O) equations (3) and (4), for t > 0 and for all i ∈ {1, 2}, by : (4πt)
−p 2
2 ∥vni |Oi ∥L2 (Oi )
with γ =
inf
x∈Oj ,y∈Oi
) ( 2 2 − δ2t − γ4t e V ol(Oi ) + e V ol(O) ,
∥x − y∥ ≥ 2δ.
O Proposition 1 means that the operator SH (t) has ’almost eigenfunctions’ which are exactly those of the Dirichlet problem (PΩD ) restricted to Oi ⊂ Ωi . Moreover, it precises this closeness behaviour with its dependance on t. At last, these eigenfunctions are supported into a single connected component which means that they can be used to characterize if a point belongs to a given Ωi or not. Finally, if we consider the behaviour of the function −p 2 −
δ(y)2 4s
s 7→ (4πs) e involved in (4), we may assume that parameter t should be smaller than δ(y)2 for all y ∈ O, and should tend to 0 within a fraction of the distance δ 2 to preserve the asymptotical result of propostion 1.
3
Interpretation of the Spectral Clustering algorithm with Finite Elements
Introduce Lagrange finite elements so that nodes are the ¯ ⊂ Rp . set of NJW algorithm data points Σ included in O ¯ such that : h = max hK , Let τh be a triangulation on O K∈τh
hK being a characteristic length of triangle K. Meshing of each part of O is represented on figure 1 (b). Thus, con¯ = ∪K∈τ K sider a finite decomposition of the domain : O h in which (K, PK , ΣK ) satisfies Lagrange finite element assumptions for all K ∈ τh : K is a compact, non empty set of τh such that supp(K) ⊂ O, PK is a finite dimension vector space generated by shape functions (ϕi )i such that PK ⊂ H 1 (K) and Σ = ∪K∈τh ΣK . Let define the finite dimension approximation space: ¯ ∀K ∈ τh , w|K ∈ PK }. Vh = {w ∈ C 0 (O); For all i, 1 ≤ i ≤ n, let the sequence of shape functions of Vh noted (ϕi )i : O → R where ϕi = ϕ|K ∈ PK satisfied to: ϕi (xj ) = δij , ∀ xj ∈ Σ, δij being the Dirac function of (i, j). This finite space Vh is generated by the sequence (ϕi )i with Vh = V ect < (ϕi )i >⊂ L2 (O, R). Let Πh the linear interpolation from L2 (O, R) in Vh with the usual scalar product (·|·)L2 . According to this notations, the Πh -mapping of kernel y 7→ KH (t, x − y) is defined for
ISBN: 978-988-17012-9-9 ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
t. > 0 by, ∀y ∈ O, ∀ 1 ≤ j ≤ n : Πh (KH (t, x − y)) =
( ) n ∥x − xi ∥2 1 ∑ exp − ϕi (y). p t (4πt) 2 i=1
O Thus, for t > 0, the Πh -mapped operator SH applied to each ϕj is, ∀ 1 ≤ j ≤ n : n 1 ∑( b ) AM ϕk (x) + O(h, x), p kj (4πt) 2 k=1 (7) where M stands for the mass matrix defined by : Mij = b = A + In where A is the affinity matrix. (ϕi |ϕj )L2 and A So, with the same assumptions as in section 2.2, equation (1) can be formulated via the linear interpolation Πh : O Πh (SH (t)ϕj )(x) =
Ωi Proposition 2 Let vni be an eigenfunction of SD (t) in 2 1 H (Ω) ∩ H0 (Ω) associated to the eigenvalue exp−λni t Ωi (t)vni = exp−λni t vni . Let i ∈ {1, 2}, such that SD vf ni = Ti (vni |Oi ). Then, for Wni = Πh vf ni ∈ Vh and t > 0, : O Πh SH (t)Wni = exp−λni t Wni + ψ(t, h),
(8)
where ∥ψ(t, h)∥L2 (Vh ) → 0 when (h, t) → 0 and δ → 0. Proof sketch. Apply Πh interpolation to the equation of proposition 1. Using linearity and continuity of Πh and Cauchy-Schwarz inequality, an upper bound of O ∥Πh SH (t)(Πh vf ni − vf ni )∥2 could be given, for a given constant CΠh > 0, by : CΠh (4πt)
−p 4
( ) 12 2 2 − γ4t − δ2t ∥vni |Oi ∥L2 (Oi ) V ol(O) +e . e 3 2
In this approach, we have to estimate the neighbourhood distance δ between Ωi and Oi for i ∈ {1, 2}. But this result totally relies on both the discretization step h and the Gaussian parameter t. So we investigate their role in the following numerical experiments.
4
Numerical experiments
Let consider two test cases : one in which clusters can be separated by hyperplanes (see fig 2(a)) and another one in which clusters are embedded (see fig 3(a)). We analyze the influence of parameter t on the difference O between discretized eigenfunctions of SD (t) and eigenˆ vectors of matrix AM . In figure 2 (b)-(c)-(d) and figure 3 (b)-(c)-(d), we plot the discretized eigenfunction associated to the first eigenvalue of each connected component. We also indicate in figure 2 (e)-(f)-(g) the product of these discretized eigenfunctions Wni , for i ∈ {1, 2, 3}, ˆ or with matrix A as well. This shows the with matrix AM locality of eigenfunctions is preserved by the discretized
WCE 2010
Proceedings of the World Congress on Engineering 2010 Vol I WCE 2010, June 30 - July 2, 2010, London, U.K.
ˆ or A. Still, to get better insight about this operator AM geometrical invariance property, we finally compare the projection coefficients between these discretized eigenˆ and A that functions Wni with the eigenvectors of AM maximizes these projection coefficient. In other words, ˆ let Wni = Πh vf ni ∈ Vh and Xl the eigenvector of AM such that l = argmaxj (Wni |Xj ), the projection coeffi|(W |X )| cient correponds then to ω = ∥Wn n∥i2 ∥Xl l ∥2 . Since matrix i ˆ AM depends on the parameter t, we indicate in figure (k) how this coefficient ω evolves with varying values of t. In figure (l), we also indicate the difference of the norms α = ∥Xl − Wni ∥2 . We do the same in figure (m) and (n) with respect to eigenvectors of the affinity matrix ˆ and noted respectively τ and β. The A instead of AM projection coefficient ω, which gets close to 1 for some optimal value of t, highlights the fact discretized eigenfunctions of Laplacian operator with Dirichlet boundary conditions on some wider subset Ωi that includes the connected component Oi are indeed close to eigenvectors of the discretized heat operator without boundary conditions but with a parameter t that needs to be chosen in a given appropriate window.
8
6
4
2
0
−2
−4
−6 −6
−4
−2
0
2
4
6
(a) Data set 0 0
0.16
0.14
−0.05
0.1
0.1
0.2 0.12
−0.05
0.05
0
0.15 0
−0.1 −0.1
0.1 0.1
−0.1
−0.05 −0.2
0.08
0.05
−0.1 −0.15
0
−0.15
−0.3 8
0.06
4
0.04
6
6
2
2
4
0
0.02
0
0 −4 −10
5
6 0
0
0
−5
−2
−4 −6
−6
2 0
−0.25
−2
−4
(b)
−0.2
4
2 −2
−2
−5
−0.25 10
−0.2
4
5
−0.15
−0.2
6 −0.05 10
−4
−10
−6
(c)
−6
(d)
0.08
0.07
0.1
−0.01
−0.01
0
0
0.09 0.06
0.08
−0.02
−0.01
−0.05
0.07
−0.03 −0.1
0.04
0.03
−0.04 6
0.03
−0.04
−0.05 −0.06
−0.05 4
0.02
−0.05
−0.07 −0.06
2
0.02
0.01
−0.03
−0.03 −0.04
8
0.04
−0.02
−0.02
0.05 0.06 0.05
−0.08 10
−0.06
0
0 10
6
0.01
4
5
2
4
−6
2
−4
−6
−2 −4
−10
−6
(e)
−0.07
0 −5
−0.08
−2
−4 −10
0
0
−4
−2
−5
4
2
0
0
6
5
−0.07
6
−2
(f)
−6
(g)
0.35 −0.05 0.4
0
0.3
0.35
The vertical dash-dot line in figure (k)-(l)-(m)-(n) indicates the value of the parameter t based on the geometrical point of view proposed in [5]: th = Dmax with 1
0
−0.1
−0.05
−0.2
0.25
−0.3
0.2
−0.15
8
0.15
0.05
−0.2
−0.25
6
0.15
−0.15
−0.2
−0.2
0.1
−0.1
−0.1
−0.15
−0.4
0.2
−0.3
4
−0.25
−0.25
−0.35
0 10
2
0.1 6 5
0
4
−0.3
0
2
−0.3 6 4 0
−0.35
0
−4
−6
−6
2
−4
−2 −10
−6
(h)
−0.35
0 −5
−2
−4 −10
5
4
−2
−2
−5
−0.4 10
6
0.05
2
0
2n p
(i)
−4 −6
(j)
1 Wn
Wn
1
1
1
Wn
0.9
Wn
2
2
Wn
Wn
3
0.8
3
0.8 0.7
0.6
α
0.6
ω
Dmax = max1≤i,j≤n ∥xi − xj ∥2 . At a first glance, it is close to an optimum in all these figures. This critical value th , seems to give a good estimate of the length δ between clusters. This critical value th relies on the assumption that the p-dimensional data set is isotropic enough in the sense that no directions are privileged with very different magnitudes in the distances between points along these directions. It shows the link between eigenvectors and eigenfunctions with respect to parameter t, and that th is a good candidate to perform Spectral Clustering.
−0.05
−0.1
0.3 0.25
0.5
0.4 0.4 0.3
0.2
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0
0.9
0
0.1
0.2
0.3
0.4
t
0.5
0.6
0.7
0.8
0.9
t
(k)
(l)
1 Wn
Wn
1
5
Conclusion
1
1
Wn
0.9
Wn
2
2
W
W
n
n
3
0.8
3
0.9 0.7
ISBN: 978-988-17012-9-9 ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
0.8
β
0.6
τ
In this paper, we analyze some steps of Spectral Clustering with analogy to eigenvalue problems. From this interpretation, a clustering property based on the eigenvectors and an asymptotic condition on the Gaussian parameter have been extracted. This leads to some interpretation on how spectral clustering works and how results could be affected with respect to the affinity parameter and separation between clusters. However we didn’t take into account the normalization step of the spectral clustering algorithm which role is crucial to order the eigenvalues in an appropriate way so that connected components are easy to detect by means of eigenvectors associated to the largest eigenvalues.
0.5
0.7 0.4 0.6
0.3
0.2 0.5 0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.4
0
0.1
0.2
0.3
0.4
0.5
t
t
(m)
(n)
0.6
0.7
0.8
0.9
Figure 2: Example 1: (a) Data set with n = 302 points, (b), (c), (d) Discretized eigenfunction Wni , i ∈ {1, 2, 3}, ˆ Wn , (h)-(i)-(j) Product AWn , (e)-(f)-(g) Product AM i i (k)-(l) Correlation ω and Euclidean norm difference α ˆ between Wni , i ∈ {1, 2, 3} and eigenvectors Xl of AM functions of t, (m)-(n) Correlation τ and Euclidean norm difference β between Wni , i ∈ {1, 2, 3} and eigenvectors Yl of A functions of t.
WCE 2010
Proceedings of the World Congress on Engineering 2010 Vol I WCE 2010, June 30 - July 2, 2010, London, U.K.
References 8
6
[1] C. Bardos and O. Lafitte. Une synthese de resultats anciens et recents sur le comportement asymptotique des valeurs propres du Laplacien sur une variete riemannienne. CMLA, 1998.
4
2
0
−2
−4
−6 −6
−4
−2
0
2
4
6
8
(a) Data set 0
0 0 −0.05
[2] M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in Neural Information Processing Systems, 14(3), 2002.
−0.01 −0.1 −0.05
−0.02
0.05
0 −0.15 −0.1
−0.03
0 0 −0.05
−0.04
−0.1
−0.2
−0.2 −0.3
−0.02
−0.1
−0.5
[3] M. Brand and K. Huang. A unifying theorem for spectral embedding and clustering. 9th International Conference on Artificial Intelligence and Statistics, 2002.
−0.005
[4] F. Hirsch and G. Lacombe. Elements of functional analysis. Springer, 1999.
−0.25
−0.4
−0.05
−0.04
−0.3
−0.15
−0.5 −0.06
−0.06
−0.2
−0.35
−0.15 −0.08
−0.25
−0.07 −0.4
−0.1 10
5
5 −0.08 5
−0.2
5
0
0
−0.09
2 0
−5 −10
4
0
0
0
−5
−5
−0.1
−5
−5
(b)
−0.45
6
10 5
−2 −4
(c)
(d)
−0.005 0
−0.005
−0.01
−0.01 −0.01
−0.015
−0.02
−0.02
−0.03
−0.025
0
0 −0.005
−0.015
−0.01
−0.01
−0.01
−0.02
−0.02
−0.015 −0.03
−0.04
−0.03
−0.02
−0.035
−0.025 8
−0.05 10 5 0 −5 −10
−4
−6
2
0
−2
6
4
−0.04
−0.02
4 2
5
5
0 −2
(e)
0
−5
−4 −6
−0.04
0
−2 −4
−0.035
10
2 0
−0.045
−0.03
−0.05 10
6 4
−0.04
8
−0.025
−0.015
8 6
−6
−10
(f)
−0.045
−5
(g) −0.02
0
−0.04
−0.005
−0.05
0 −0.05
−0.06
0 −0.05 −0.005
−0.1
−0.1 −0.15
−0.15
−0.2
−0.08
−0.01
−0.01
−0.1
−0.015
−0.15
−0.1 −0.12
−0.015
−0.02
−0.2 −0.025 8
−0.25
10 4
5
−0.16 10 −0.18
5
5
0
6
0
−6
−10
−5
−0.2
0
−5
−4
−0.25
2
0
0
−2
4
0
−0.02
5
2
−0.14
−0.25 10
6 −0.2
−5
−2
−5
[5] S. Mouysset, J. Noailles, and D. Ruiz. Using a Global Parameter for Gaussian Affinity Matrices in Spectral Clustering. In High Performance Computing for Computational Science: 8th International Conference, Toulouse, France, June 24-27, 2008. Revised Selected Papers, pages 378–391. Springer, 2009.
−4
(h)
(i)
(j)
[6] B. Nadler and M. Galun. Fundamental limitations of spectral clustering. Advances in Neural Information Processing Systems, 19:1017, 2007.
1 Wn
Wn
1
Wn
2
2
Wn
Wn
1
3
0.8
[7] B. Nadler, S. Lafon, R.R. Coifman, and I.G. Kevrekidis. Diffusion maps, spectral clustering and eigenfunctions of fokker-planck operators. Arxiv preprint math/0506090, 2005.
1
Wn
0.9
3
0.7 0.8
α
ω
0.6
0.5
0.6
0.4 0.4 0.3
0.2 0.2 0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0
0.9
0
0.1
0.2
0.3
0.4
t
0.5
0.6
0.7
0.8
0.9
t
(k)
(l)
[9] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
1 Wn
Wn
1
1
1.2
Wn
0.9
Wn
2
2
Wn
Wn
3
0.8
3
1 0.7 0.8
β
τ
0.6
[8] A. Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: analysis and an algorithm. Proc.Adv.Neural Info.Processing Systems, 2002.
0.5
0.6 0.4 0.4
0.3
0.2 0.2 0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
0
0.1
0.2
0.3
0.4
0.5
t
t
(m)
(n)
0.6
0.7
0.8
0.9
Figure 3: Example 2: (a) Data set with n = 368 points, (b), (c), (d) Discretized eigenfunction Wni , i ∈ {1, 2, 3}, ˆ Wn , (h)-(i)-(j) Product AWn , (e)-(f)-(g) Product AM i i (k)-(l) Correlation ω and Euclidean norm difference α ˆ between Wni , i ∈ {1, 2, 3} and eigenvectors Xl of AM functions of t, (m)-(n) Correlation τ and Euclidean norm difference β between Wni , i ∈ {1, 2, 3} and eigenvectors Yl of A functions of t.
ISBN: 978-988-17012-9-9 ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
[10] U. Von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering. Annals of Statistics, 36(2):555, 2008. [11] U. Von Luxburg, O. Bousquet, and M. Belkin. Limits of spectral clustering. Advances in neural information processing systems, 17, 2004. [12] L. Zelnik-Manor and P. Perona. Self-tuning spectral clustering. Advances in Neural Information Processing Systems, 17(1601-1608):16, 2004.
WCE 2010