Mar 5, 2016 - A compare study of all ... where |∂S| is the cardinality of the set ∂S. ... (the first non-zero) eigenvalue of 1-Laplacian (see Definition 1) for con-.
The 1-Laplacian Cheeger Cut: Theory and Algorithms K.C. Chang†,
Sihong Shao†,
Dong Zhang†
arXiv:1603.01687v1 [math.SP] 5 Mar 2016
Abstract This paper presents a detailed review of both theory and algorithms for the Cheeger cut based on the graph 1-Laplacian. In virtue of the cell structure of the feasible set, we propose a cell descend (CD) framework for achieving the Cheeger cut. While plugging the relaxation to guarantee the decrease of the objective value in the feasible set, from which both the inverse power (IP) method and the steepest descent (SD) method can also be recovered, we are able to get two specified CD methods. A compare study of all these methods are conducted on several typical graphs.
1
Introduction
Graph cut, partitioning the vertices of a graph into two or more disjoint subsets, is a fundamental problem in graph theory [1]. It is a very powerful tool in data clustering with wide applications ranging from statistics, computer learning, image processing, biology to social sciences [2]. There exist several kinds of balanced graph cut [3–5]. The Cheeger cut [6], which has recently been shown to provide excellent classification results [7–9], is one of them and its definition is as follows. Let G = (V, E) denote a undirected and unweighted graph with vertex set V = {1, 2, · · · , n} and edge set E. Each edge e ∈ E is a pair of vertices {i, j}. For any vertex i, the degree of i, denoted by di , is defined to be the number of edges passing through i. Let S and T be two nonempty subsets of V and use E(S, T ) = {{i, j} ∈ E : i ∈ S, j ∈ T } to denote the set of edges between S and T . The edge boundary of S is ∂S = P E(S, S c ) (S c is the complement of S in V ) and the volume of S is defined to be vol(S) := i∈S di . The number |∂S| h(G) = min S⊂V,S6∈{∅,V } min{vol(S), vol(S c )} is called the Cheeger constant, and a partition (S, S c ) of V is called a Cheeger cut of G if |∂S| = h(G), min{vol(S), vol(S c )} where |∂S| is the cardinality of the set ∂S. †
LMAM and School of Mathematical Sciences, Peking University, Beijing 100871, China.
1
However, solving analytically the Cheeger cut problem is combinatorially NP-hard [7]. Approximate solutions are required. The most well-known approach to approximate the Cheeger cut solutions is the spectral clustering method, which relaxes the original discrete combination optimization problem into a continuous function optimization problem through the graph Laplacian [10]. The standard graph Laplacian (i.e. the 2-Laplacian) is defined as L := D − A, where D = diag(d1 , · · · , dn ) is a diagonal matrix and A the adjacency matrix of G. According to the linear spectral graph thoery, the eigenvalues of L satisfy 0 = λ1 ≤ λ2 ≤ · · · ≤ λn ≤ 2 and the second eigenvalue λ2 can be used to bound the Cheeger constant as follows p λ2 ≤ h(G) ≤ 2λ2 , 2
(1.1)
which is nothing but the Cheeger inequality [1]. Furthermore, the corresponding second eigenvector is also used to approximate the Cheeger cut, i.e. the 2-spectral clustering or the `2 relaxation. It should be noted that this second eigenvector is not the Cheeger cut, but only an approximation [10]. In order to achieve a better cut than the `2 relaxation, a spectral clustering based on the graph p-Laplacian defined by X (∆p x)i = |xi − xj |p−1 sign(xi − xj ) j∼i
with small p ∈ (1, 2) was proposed, in view of the fact that the cut by threshold the second + eigenvector of the graph p-Laplacian tends Pto the Cheeger cut as p → 1 [11]. Here j ∼ i denotes vertex j is adjacent to vertex i, j∼i means the summation is with respect to all vertices adjacent to vertex i, and sign(t) is the standard sign function which equals to 1 if t > 0, 0 if t = 0, and −1 if t < 0. The resulting `p relaxation P p i∼j |xi − xj | Pn (1.2) p i=1 di |xi |
is differentiable but nonconvex, so that standard Newton-like methods can be applied, but only local minimizers are obtained. In actual calculations, multiple runs with random initializations are taken to approximate the global minimizer. All above mentioned p-spectral clustering for any p ∈ (1, 2] are indirect methods. Since the second (the first non-zero) eigenvalue of 1-Laplacian (see Definition 1) for connected graphs equals to the Cheeger constant, and the corresponding eigenvectors provide exact solutions of the Cheeger cut problem [7, 12]. We study the numerical solution of the second eigenvector of the graph 1-Laplacian. However, the `1 nonlinear eigenvalue problem (the corresponding object function is obtained by setting p = 1 in Eq. (1.2)) is not only nonconvex but also nondifferentiable. Three types of algorithms have been proposed to minimize the 1-spectral clustering problem. They are: the Split-Bregman like ratio minimization algorithm [7], the inverse power (IP) method [8], and the steepest descent (SD) algorithm [13]. Unfortunately, all these methods fail to give global minimizers. Motivated by a recently developed nonlinear spectral graph theory of 1-Laplacian [12] by the first author of this paper, we propose a cell descend (CD) algorithm framework. The main 2
idea is based on the fact that the feasible set consists of a collection of cells, and the objective function on each cell is convex. At each step, one obtains the minimum on the initial cell, and finding out a suitable direction transfer to a new cell by descending the value of the objective function. This CD algorithm framework combines the advantages of both the original discrete combination optimization and its equivalent continuous optimization. Preliminary numerical results on several typical graphs demonstrate that the proposed CD algorithm framework could provide better cut than the IP and SD methods. In order to provide a solid theoretical foundation for algorithms, we also present a unified framework for designing algorithms for the 1-Laplacian Cheeger cut problem, in which both IP and SD algorithms can be easily recovered but from a totally different angle. The paper is organized as follows. In the rest of this section we list the notations, definitions and some known results. A brief review of the spectral theory of 1-Laplacian and Cheeger cut is in Section 2. A new CD framework for solving the Cheeger cut problem is proposed in Section 3. The corresponding numerical experiments on several typical graphs are presented with discussions in Section 4. The conclusion with a few remarks is presented in Section 5.
1.1
Notations and definitions
In the following, we list some notations which will be used in this paper. • To the edge e ∈ E, we assign an orientation, let i be the head, and j be the tail, they are denoted by i = eh , and j = et respectively. Under this orientation, one defines the incidence matrix B = (bei )l×n where if i = eh , 1, (1.3) bei = −1, if i = et , 0, if i ∈ / e, with e ∈ E, i ∈ V and l being the number of edges in E.
• For any x ∈ Rn , we denote D+ (x) = {i ∈ V |xi > 0}, δ± (x) = I(x) =
X
di ,
D0 (x) = {i ∈ V |xi = 0},
δ0 (x) =
i∈D± (x) n X
X
di ,
i∈D0 (x)
1 aij |xi − xj |, 2 i,j=1
kxk1,d =
kxk2 =
n X i=1
di |xi |,
D− (x) = {i ∈ V |xi < 0}, (1.4) ! 12 n X x2i , (1.5) i=1
F (x) =
I(x) . kxk1,d
(1.6)
• Let X = {x ∈ Rn : kxk1,d = 1}, π = {x ∈ X : |δ+ (x) − δ− (x)| ≤ δ0 (x)}.
(1.7) (1.8)
• For given S ⊂ Rn , define cone(S) = {k x : x ∈ S, k > 0}. 3
(1.9)
• Given a subset T ⊂ X, let x, y ∈ T , we say that x is equivalent to y in T , denoted by x ' y in T , if there is a path γ connecting x and y in T , i.e. ∃ a continuous γ : [0, 1] → T such that γ(0) = x, γ(1) = y . Definition 1. Given a graph G = (V, E), for any x ∈ Rn , the set valued map: ∆1 : x → {B T z |z : E → R1 is a Rl vector satisfying ze (B x)e = |(B x)e |, ∀ e ∈ E} is called the graph 1-Laplacian on G. Namely, it can be rewritten as ∆1 x = B T Sgn(B x),
(1.10)
where Sgn : Rn → (2R )n is a set valued mapping: Sgn(y ) = (Sgn(y1 ), Sgn(y2 ), · · · , Sgn(yn )), and
∀y = (y1 , y2 , · · · , yn ) ∈ Rn ,
if t > 0, 1, Sgn(t) = −1, if t < 0, [−1, 1], if t = 0.
For convenience, we often write Eq. (1.10) in the coordinate form for i = 1, 2, · · · , n: (∆1 x)i = (B T Sgn(B x))i ( ) X = zij (x) zij (x) ∈ Sgn(xi − xj ), zji (x) = −zij (x), ∀j ∼ i .
(1.11)
j∼i
Obviously, from Eq. (1.11), the graph 1-Laplacian ∆1 is a nonlinear set valued mapping, which is independent to the special choice of orientation. Definition 2. Given a graph G = (V, E), a pair (µ, x) ∈ R1 × X is called an eigenpair of the 1-Laplacian ∆1 on G if \ µD Sgn(x) ∆1 x 6= ∅. (1.12) In the coordinate form, Eq. (1.12) is equivalent to the system: ∃zij (x) ∈ Sgn(xi −xj ) satisfying zji (x) = −zij (x), ∀j ∼ i, and X zij (x) ∈ µdi Sgn(xi ), i = 1, 2, · · · , n. (1.13) j∼i
The set of all solutions of Eq. (1.12) is denoted by S(G). Remark 1. According to Theorem 10 (vide post), one can choose either S = D+ (x) or S = D− (x) based on an eigenvector x with respect to the second eigenvalue to produce a Cheeger cut (S, S c ).
4
Definition 3. Given a graph G = (V, E) and x ∈ Rn , let σ : {1, 2, · · · , n} → {1, 2, · · · , n} be a permutation satisfying xσ(1) ≤ xσ(2) ≤ · · · ≤ xσ(n) . Then there exists unique k0 ∈ {1, · · · , n} such that −
n X i=1
dσ(i) < · · ·
0 for m(i) 6= 0}
is called a cell. We have several remarks about the cell as follows. • 4m are piecewise linear manifolds. The center of gravity of 4m is 1X center(4m ) = di m(i)ei . δ i=1 n
The dimension of 4m is dim 4m = Card{m(i) : m(i) 6= 0} − 1, and the total number of k-dimensional cells is Cnk+1 2k+1 for k = 0, 1, · · · , n − 1. • It is evident that sign(x) := (sign(x1 ), · · · , sign(xn )) = m if and only if x ∈ 4m . Consequently, the set of all cells having nonempty intersection with π can be denoted by Π: Π = {4m : ∃x ∈ π such that sign(x) = m}. (1.14) • Hereafter, we sometimes use [i · m(i) : m(i) 6= 0] to denote the cell 4m for simplicity.
2
Theory of 1-Laplacian Cheeger cut
In this section, we present a detailed review of the theory of 1-Laplacian Cheeger cut, including the spectrum of ∆1 , the property of the feasible set and the connection between the Cheeger constant h(G) and the second eigenvalue µ2 of ∆1 .
5
2.1
Spectrum of ∆1
The function I(x) defined in Eq. (1.6) is Lipschitzian on Rn . Let I˜ = I|X . A characterization of the subgradient vector field on X has been studied in [12], by which one proves ˜ Theorem 1 (Theorem 4.11 in [12]). x ∈ S(G) if and only if x is a critical point of I. Remark 2. It must be pointed out that, the definition (see Definition 4.4 in [12]) of the critical point of I˜ adopted in Theorem 1 is different from the usual one which studies the critical point I(·) I(·) through the so-called Clarke derivative [8]. In fact, the critical points of k·k in the of k·k 1,d 1,d sense of the Clarke derivative must be the eigenvectors of ∆1 , but the inverse is not true. To clarify this, we present an example in Appendix A. ˜ and let Kc denote the subset of K with critical Let K denote the set of critical points of I, value c. The Liusternik-Schnirelmann theory is extended to study the multiplicity of the critical ˜ The notion of genus due to Krasnoselski is introduced, see for points for the even function I. instance, [14, 15]. Let T ⊂ Rn \{0} be a symmetric set, i.e. −T = T satisfying 0 ∈ / T . An + integer valued function, which is called the genus of T , γ : T → Z is defined to be: ( 0, if T = ∅, γ(T ) = + k−1 min{k ∈ Z : ∃ odd continuous h : T → S }, otherwise. Obviously, the genus is a topological invariant. Let us define ck = inf max I(x), γ(T )≥k x∈T ⊂X
k = 1, 2, · · · n.
(2.1)
˜ One has It can be proved that these ck are critical values of I. c1 ≤ c2 ≤ · · · ≤ cn , and if c = ck+1 = · · · = ck+l , 0 ≤ k ≤ k + l ≤ n,
(2.2)
then γ(Kc ) ≥ l. A critical value c is said of multiplicity l, if γ(Kc ) = l. Theorem 2 (Theorem 4.10 in [12]). There are at least n critical points φk , k = 1, 2, · · · , n of I˜ such that φk ∈ Kck . Moreover, counting multiplicity, I˜ has at least n critical values. Theorem 3 (Corollary 5.5 in [12]). If a graph G consists of r connected components G1 , G2 , · · · , Gr , then the eigenvalue µ = 0 has multiplicity r, i.e. all the eigenvectors with respect to µ = 0 form a critical set K0 with γ(K0 ) = r. Lemma 1. Assume that x, y ∈ X satisfy: (P1) sign preserving property: xi = 0 ⇒ yi = 0; xi < 0 ⇒ yi ≤ 0; xi > 0 ⇒ yi ≥ 0, (P2) order preserving property: xi = xj ⇒ yi = yj ; xi < xj ⇒ yi ≤ yj , 6
∀ i, j ∈ {1, 2, · · · , n}. If (µ, x) is an eigenpair of ∆1 , then (µ, tx + (1 − t)y ) is also an eigenpair of ∆1 , ∀ t ∈ [0, 1]. Proof. Since (µ, x) is an eigenpair, we have: ∃zij (x) ∈ Sgn(xi − xj ) satisfying X zji (x) = −zij (x), ∀ i ∼ j and zij (x) ∈ µdi Sgn(xi ), i = 1, · · · , n. j∼i
By (P1) and (P2), we deduce that Sgn(xi ) ⊂ Sgn(yi ) and Sgn(xi − xj ) ⊂ Sgn(yi − yj ). If one takes zij (y ) = zij (x), ∀ i, j ∈ {1, 2, · · · , n}, then zij (y ) ∈ Sgn(yi − yj ) also satisfies X zji (y ) = −zij (y ), ∀ i ∼ j and zij (y ) ∈ µdi Sgn(yi ), i = 1, · · · , n. j∼i
These mean (µ, y ) is an eigenpair, too. More generally, for any given t ∈ [0, 1], we let y e= tx + (1 − t)y . It is easy to check that x and y := y e satisfy (P1) and (P2). In the same way, we have (µ, y e) is also an eigenpair. This completes the proof.
A vector x ˆ = (ˆ x1 , xˆ2 , · · · , xˆn ) ∈ X is called a binary valued vector, if x ˆ = c(a1 , · · · , an ) for some c > 0, where a1 , a2 , · · · , an are either 1 or 0. By the above lemma, it follows Theorem 4. For any eigenpair (µ, x) of ∆1 , there exists a binary valued vector x ˆ such that |∂A| −1 . (µ, x ˆ ) is an eigenpair and x ' x ˆ in S(G)∩I (µ). Consequently, ∃A ⊂ V such that µ = vol(A)
Proof. We only need to prove the last conclusion. In fact, let A = {i ∈ V | x ˆ i = 1}. From the system: X zij (ˆ x) = µdi x ˆi , ∀ i, j∼i
after summation on both sides, we have µ =
|∂A| . vol(A)
Theorem 5. The following facts are readily observed. (1) For any eigenvalue µ of ∆1 , we have 0 ≤ µ ≤ 1. (2) The distance of two different eigenvalues are at least are only finite eigenvalues of ∆1 .
4 , n2 (n−1)2
where n = |V |. So there
(3) If (µ, x) is an eigenpair of ∆1 , then I(x) = µ. Moreover, if µ 6= 0, then 0 ∈
Pn
i=1
di Sgn(xi ).
(4) If x ∈ X with x1 = x2 = · · · = xn , then x = ± 1δ (1, 1, · · · , 1) is an eigenvector with eigenvalue µ = 0. Conversely, if G is connected, then the eigenvector x corresponding to the eigenvalue µ = 0 must be x = ± 1δ (1, 1, · · · , 1). Proof. We only prove here the fact (2). Other facts follow from [12]. For given two different critical values µ, µ ˜, by Theorem 4, we have: ∃A, B ⊂ V such that |∂A| |∂B| µ = vol(A) and µ ˜ = vol(B) . Accordingly, we obtain 2 |∂A| ||∂A| vol(B) − |∂B| vol(A)| |∂B| 1 1 = |µ − µ ˜| = − ≥ ≥ . vol(A) vol(B) vol(A) vol(B) vol(A) vol(B) Cn2 7
Since the number of eigenvalues are finite, all eigenvalues can be ordered as follow: 0 = µ1 ≤ µ2 ≤ · · · , On the other hand, the subset of critical values (at least n if counting multiplicity ) {ck }, is ordered by the topological genus, i.e., 0 = c1 ≤ c2 ≤ · · · ≤ cn . e x). According to Theorem 1, µ is a eigenvalue of ∆1 if and only if µ is a critical value of I( But we do not know whether these two sequences are identical.
Theorem 6. Any eigenvector x of ∆1 with eigenvalue µ 6= 0 lies on π. Pn Proof. By Theorem 5, we have 0 ∈ i=1 di Sgn(xi ). So there exist θi ∈ Sgn(xi ) such that Pn d θ = 0, i = 1, 2, · · · , n. Then i=1 i i X X X θi di ≤ δ0 (x), di − di = |δ+ (x) − δ− (x)| = i∈D0 (x) i∈D+ (x) i∈D− (x)
which means that x ∈ π.
2.2
Properties of π
According to Theorems 6 and 10 (vide post), π is the feasible set for searching the Cheeger cut and some useful properties of π will be shown in this section. By simple observation, we have S 4. Theorem 7. π is compact and π = 4∈Π
Lemma 2. x ∈ π if and only if δ+ (x) ≤
δ 2
and δ− (x) ≤
δ 2
with δ =
Proof. Without loss of generality, we assume δ+ (x) ≥ δ− (x), then
Pn
i=1
di .
x ∈ π ⇔ δ+ (x) − δ− (x) ≤ δ0 (x)
⇔ δ+ (x) − δ− (x) ≤ δ − δ+ (x) − δ− (x) ⇔ 2δ+ (x) ≤ δ δ δ ⇔ δ+ (x) ≤ and δ− (x) ≤ . 2 2
Theorem 8. π is connected when n ≥ 3. Proof. For any x ∈ π, δ+ (x) ≤ 2δ , δ− (x) ≤ 2δ . Without loss of generality, we may assume x1 6= 0. On one hand, consider γ(t) =
(x1 , tx2 , · · · , txn ) P , d1 |x1 | + t ni=2 di |xi | 8
t ∈ [0, 1].
It is easily checked that both δ+ (γ(t)) ≤ δ+ (x) ≤ 2δ and δ− (γ(t)) ≤ δ− (x) ≤ 2δ hold, then we 1) have γ([0, 1]) ⊂ π. This means that the path γ connecting x and ( sign(x , 0, · · · , 0) lies in π, d1 sign(x1 ) i.e. x ' ( d1 , 0, · · · , 0). On the other hand, when n ≥ 3, we can prove (± d11 , 0, 0, · · · , 0) ' (0, ± d12 , 0, · · · , 0). Actually, for ( d11 , 0, 0, · · · , 0) and (0, − d12 , 0, · · · , 0), this can be readily verified by taking the following path t t−1 y(t) = , , 0, · · · , 0 ⊂ π, ∀ t ∈ [0, 1]. d1 d2 For ( d11 , 0, 0, · · · , 0) and (0, d12 , 0, · · · , 0), we then have (
1 1 1 , 0, 0, · · · , 0) ' (0, 0, − , · · · , 0) ' (0, , 0, · · · , 0). d1 d3 d2
Hence, for any y ∈ π with yi 6= 0, we have x ' ( d11 , 0, 0, · · · , 0) ' (0, · · · , d1i , · · · , 0) ' y . Remark 3. When n = 2, it can be easily checked that π consists of two disjoint connected subsets. P Lemma 3. y ∈ median(x) if and only if y = arg min ni=1 di |t − xi |. t∈R
Pn
Proof. Let f (t) = i=1 di |t − xi |. Without loss of generality, we assume x1 ≤ x2 ≤ · · · ≤ xn and x1 < xn , then there exists k ∈ {1, 2, · · · , n − 1} such that xk < xk+1 . P P Assume xk ≤ t < t + h < xk+1 , then we have f (t + h) − f (t) = h ki=1 di − h ni=k+1 di . P P If ki=1 di − ni=k+1 di ≤ 0, then f (t + h) − f (t) ≤ 0, so f is decreasing on [xk , xk+1 ]. P P If ki=1 di − ni=k+1 di ≥ 0, then f (t + h) − f (t) ≥ 0, so f is increasing on [xk , xk+1 ]. Since there exists k0 ∈ {1, · · · , n} such that −
n X i=1
di < · · ·
0, let xk = min median(x), then we have k−1 X i=1
Pn
di −
n X
di < 0,
i=k
/ π, which is a contradiction. and δ+ (x) ≥ i=k di > 2δ implying that x ∈ When max median(x) < 0, let xk+1 = min median(x), then we have 0≤ and δ− (x) ≥
Pk+1 i=1
di >
δ 2
k X i=1
di −
n X
di