IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011
2117
Symmetric Nonnegative Matrix Factorization: Algorithms and Applications to Probabilistic Clustering Zhaoshui He, Shengli Xie, Senior Member, IEEE, Rafal Zdunek, Member, IEEE, Guoxu Zhou, and Andrzej Cichocki, Senior Member, IEEE
Abstract— Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β-SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression. Index Terms— Basic linear algebra subprograms, completely positive, coordinate update, multiplicative update, nonnegative matrix factorization, parallel update, probabilistic clustering, symmetric nonnegative matrix factorization.
(NMF) has been recognized as an important exploratory analysis tool in discovering the underlying structures of measured data for these problems [1], [9], [13], [18]–[24]. For example, Lee and Seung used NMF to learn the parts of objects [5], [25]. Symmetric nonnegative matrix factorization (SNMF), also known as completely positive factorization [26], [27], was applied to probabilistic clustering by Zass and Shashua [10]. A data matrix is said to be completely positive if it is nonnegative and positive definite [26], [27]. SNMF is a special case of NMF, in which both nonnegative factors are the same and the observed matrix is completely positive. More precisely, this problem can be addressed as follows: given a completely N×N , SNMF aims to find a nonnegative positive matrix V ∈ R+ N×K matrix P ∈ R+ such that V ≈ P P T min V − P P T 2F P0
I. I NTRODUCTION
I
N PRACTICE, many physical quantities are intrinsically nonnegative (e.g., probability and image pixel value) [1]–[4]. Owing to this, nonnegativity requirement arises in many applications such as learning the parts of objects [5], processing multivariate data [5], [6], training support vector machine [7]–[9], clustering [10]–[15], blind source separation [1], [16], [17], and so on. Nonnegative matrix factorization Manuscript received June 20, 2011; revised October 4, 2011; accepted October 7, 2011. Date of publication October 26, 2011; date of current version December 1, 2011. This work was supported in part by National Natural Science Foundation of China under Grant 60974072, Grant 61103122, Grant 60804051 and Grant 61104053, and the Guangdong Natural Science Foundation (team project) under Grant S2011030002886. Z. He is with the Faculty of Automation, Guangdong University of Technology, Guangzhou 510641, China. He is also with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Saitama 3510198, Japan (e-mail:
[email protected]). S. Xie is with the Faculty of Automation, Guangdong University of Technology, Guangzhou 510641, China (e-mail:
[email protected]). R. Zdunek is with the Institute of Telecommunications, Teleinformatics, and Acoustics, Wroclaw University of Technology, Wroclaw 50-370, Poland (e-mail:
[email protected]). G. Zhou is with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Saitama 3510198, Japan. He is also with the Faculty of Automation, Guangdong University of Technology, Guangzhou 510641, China (e-mail:
[email protected]). A. Cichocki is with the RIKEN Brain Science Institute, Saitama 3510198, Japan. He is also with the System Research Institute, Polish Academy of Sciences, Warsaw 00-901, Poland (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNN.2011.2172457
(1)
where · F denotes Frobenius norm. In this paper, we develop efficient algorithms for SNMF and apply them for probabilistic clustering. The rest of this paper is organized as follows. In Section II, we overview the existing SNMF methods and discuss their differences in depth. We propose a new multiplicative update SNMF algorithm and prove its convergence in Section III. Based on it, two fast SNMF algorithms are further proposed in Section IV. The probabilistic clustering by the proposed algorithms is discussed in Section V. The experiments and analysis of results are given in Section VI. Finally, we conclude this paper in Section VII. II. E XISTING SNMF M ETHODS AND T HEIR U PDATE M ODES Nowadays, the operations of matrix–matrix multiplication are widely implemented in hardware [28]. Taking this into account, we can develop more efficient SNMF algorithms. A. Performing Matrix–Matrix Multiplication Using Level 3 BLAS The basic linear algebra subprograms (BLAS) are routines that provide a programming interface standard for publishing libraries to perform basic linear algebra operations such as vector and matrix multiplication [29], [30]. It mainly contains three kinds of basic routines: level 1, level 2, and level 3. The
1045–9227/$26.00 © 2011 IEEE
2118
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011
TABLE I C OMPARING T HREE L EVELS OF BLAS FOR M ATRIX –M ATRIX M ULTIPLICATION
Implementation scheme 1 Implementation scheme 2 Implementation scheme 3
Implementation scheme 4
Implementation and BLAS level Direct level 3 BLAS C = A ∗ A One for-loop with level 2 BLAS C(:, j) = A ∗ A( j, :) Two for-loops with level 1 BLAS C(i, j) = A(i, :) ∗ A( j, :) Three for-loops with level 1 BLAS C(i, j) = A(i, k) ∗ A( j, k)
Runtime (s)
Programming languages
0.14032
MATLAB
1.49131
MATLAB
16.21007
MATLAB
26.29738
MATLAB
Three for-loops with level 1 BLAS C[i][ j] = A[i][k] ∗ A[ j][k]
Algorithm 2 Parallel Update Mapping M while True do P ← M( P); end while
position as follows:
k
Implementation scheme 5
Algorithm 1 Coordinate Update Mapping M while True do for n = 1 to N do for k = 1 to K do Update P nk by (2); P nk ← Mnk ( P nk ); end for end for end while
12.18444
C language
k
*for i, j, k = 1, . . . , 1000
level 1 BLAS perform the multiplication of a vector times a vector and the level 2 and level 3 BLAS perform vector–matrix and matrix–matrix multiplications, respectively. Because the BLAS are efficient, portable, and widely available, they have been directly optimized and implemented in hardware [28] by the dominant chip providers, e.g., Intel, AMD, and NEC. Then, they are commonly encapsulated further and can be directly called in the popular programming languages (e.g., MATLAB, C, and FORTUNE). For this reason, it is not advisable to use low-level BLAS (level 1 or level 2) to replace level 3 BLAS for matrix–matrix multiplication in the high-level programming languages [31], [32], although it is feasible for this task. Consider a level 3 BLAS example: given A, we would like to compute C as C = A· AT , where A ∈ R1000 ×1000 is randomly generated. On a computer, the computing time used for this task by different level of BLAS is shown in Table I. We can see that the direct level 3 BLAS is much faster than level 1 or level 2 with the for-loops. The SNMF algorithms usually can involve very many matrix–matrix multiplications. By using level 3 BLAS directly, we can technically avoid the for-loops for SNMF to a high degree and greatly speed up computation. B. Performing Parallel Update Efficiently by Level 3 BLAS There are two important update modes for multiplicative NMF algorithms: coordinate/sequential update mode and parallel update mode, which are fundamentally different. Consider an N × K mapping matrix M for updating P such that Mnk : P → P nk (i.e., P nk ← Mnk ( P)), n = 1, . . . , N and k = 1, . . . , K . Definition 1: The matrix M is said to be a coordinate update mapping if it updates P recursively by function com-
P nk ← Mnk ( P nk ) where n = 1, . . . , N, k = 1, . . . , K , and P nk ∈ R N×K is given by Mi j ( P i j ), (i − 1)K + j < (n − 1)K + k nk Pij = (2) Pij , (i − 1)K + j ≥ (n − 1)K + k where i = 1, . . . , N, j = 1, . . . , K . As an example, for a 3 × 3 matrix P, its associated matrix P 23 is ⎞ ⎛ M11( P 11 ) M12 ( P 12 ) M13 ( P 13 ) ⎠. P 23 = ⎝ M21( P 21 ) M22 ( P 22 ) P 23 P 31 P 32 P 33 Definition 2: Differing from the coordinate update mapping, M is called parallel update mapping if its elements are entrywise separable (or independent) P nk ← Mnk ( P) where n = 1, . . . , N, k = 1, . . . , K . In this case, it can be simply represented in matrix notation as P ← M( P). In programming, the coordinate and parallel updates can be, respectively, implemented in above algorithm. The computational complexities of both coordinate and parallel updates are comparable in mathematics. But, the latter is usually advantageous over the former in a practical implementation because the parallel structures enable us to perform level 3 BLAS directly, which are usually not available for the coordinate updates, however. The coordinate update limits to be implemented by low-level BLAS (at most level 2) with the time-consuming for-loops (see Table II). Incidentally, the parallel update is more flexible and it also can work in a coordinate mode. The multiplicative update algorithm for nonnegative quadratic programming (NQP) in [9] is a typical parallel update method. However, not all algorithms work in the parallel mode. Some of them can work only in the coordinate mode, e.g., the multiplicative update NQP method proposed by Franc et al. [33].
HE et al.: SYMMETRIC NONNEGATIVE MATRIX FACTORIZATION
2119
TABLE II U PDATE M ODES AND BLAS L EVELS OF E XISTING M ETHODS
3.5
Work mode
Proof
Implementation and BLAS level
3
(3)
Coordinate
Already
Two for-loops with level 2 BLAS
2.5
(4)
Parallel
Already
Direct level 3 BLAS
(5)
Parallel
Already
Direct level 3 BLAS
(6)
Coordinate
Not yet
Two for-loops with level 2 BLAS
(7)
Parallel
Not yet
Direct level 3 BLAS
E(P)
SNMF rule
× 105 Multiplicative rule (6) with coordinate update Multiplicative rule (6) with parallel update
2 1.5 1 0.5
f(P,Pt)
O
0
Pt
P
0
10
20
30 40 Iterations
50
60
70
Fig. 2. Numerical experiments showing that the SNMF update rule (6) can gradually decrease the cost function in the coordinate mode. However, it fails under the parallel update one. Moreover, under parallel mode, its corresponding sequence {E( P t )}+∞ t=1 oscillates rather than converges.
Also, Long et al. empirically proposed the SNMF update rule [36, Eq. (6)] but a rigorous derivation was not provided
F(P)
Fig. 1. Paraboloid function F( P) and its corresponding auxiliary function f ( P, P t ), where f ( P, P t ) ≥ F( P) and f ( P t , P t ) = F( P t ).
C. Existing SNMF Methods and Their Update Modes Recently, much attention has been paid to SNMF and its extension V = P Q P T, known as weighted SNMF [34] or symmetric nonnegative tri-factorization [11], where Q is also a symmetric nonnegative matrix. Many methods have been developed for these problems. First, Zass and Shashua [10] pioneered the application of SNMF in probabilistic clustering and proposed the following multiplicative SNMF rule with rigorous justification: P nk ← Mnk ( P) = P nk = P nk
(V P)nk − V nn P nk T P) − ( P P T ) P ( P P nk nn nk V P nr rk r =n . (3) r =n P rk s P ns P sr
Long [35] and Chen [11] et al. applied the weighted SNMF V = P Q P T to relational clustering, in which they presented two multiplicative update rules based on Euclidean distance and Kullback–Leibler divergence, respectively, and proved that the corresponding cost functions are nonincreasing under them. Setting Q as an identity matrix (i.e., Q = I) in it, we have two SNMF update rules as follows: (V P)nk (4) P nk ← Mnk ( P) = P nk 4 ( P P T P)nk and P nk ← P nk
V in P ik
i T ( P P )in . i P ik
P nk ← P nk
(V P)nk . ( P P T P)nk
(6)
In addition, Ding, He, and Simon [34] proved that both kernel K-means clustering and Laplacian-based spectral clustering can be cast into an SNMF problem and proposed another empirical SNMF update rule
(V P)nk (7) P nk ← P nk 1 − β + β ( P P T P)nk where 0 ≤ β ≤ 1 and a choice for β is suggested as β = 1/2 in [34]. The derivation of update rule (7) is also not available. By extensive numerical experiments, we found that all methods listed above work in either the coordinate or parallel update mode (see Table II), and that the parallel update rules (4), (5), and (7) are faster than the coordinate ones (3) and (6) in the practical algorithmic implementation because of the efficient level 3 BLAS. Next, we develop more efficient parallel SNMF algorithms using direct level 3 BLAS and justify them.
III. N EW M ULTIPLICATIVE SNMF RULE U SING PARALLEL U PDATE AND C ONVERGENCE A NALYSIS Denote F( P) = −Tr V P P T = − V i j ( P P T )i j
(8)
ij
and (5)
G( P) = Tr P P T P P T = ( P P T )2i j ij
(9)
2120
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011
× 105 α = 1/4 α = 1/3 α = 1/2 α = 1/1.5 α = 1/1.1 α = 0.99
Average E(P)
1.5
1
0.5
0
2.5
× 105 α = 1/4 α = 1/3 α = 1/2 α = 1/1.5 α = 1/1.1 α = 0.99
2 Average E(P)
2
1.5 1 0.5
0
50
100
150 200 Iterations (a)
250
0
300
0
50
100
150 200 Iterations (b)
250
300
Fig. 3. Fifty Monte Carlo tests of the α-SNMF algorithm with different α values. The best results are achieved at α = 0.99. (a) Noise-free. (b) SNR = 20 dB.
where Tr(·) is the trace of a matrix. Then
Let
E( P) = V − P P T 2F = Tr (V − P P T )(V − P P T )T = Tr V V T − 2Tr V P P T + Tr P P T P P T (10) = Tr V V T + 2F( P) + G( P). Here, we need to recall the concept of auxiliary function [25]. Definition 3: e( P, P t ) is an auxiliary function for E( P) if it satisfies the conditions e( P, P ) ≥ E( P) and e( P , P ) = E( P ) t
t
t
i
ij
It is an auxiliary function for G( P) given in (9). Proof: See Appendix C.
F( P) ≤ f ( P, P t ), G( P) ≤ g( P, P t ) and F( P t ) = f ( P t , P t ), G( P t ) = g( P t , P t ) from which we immediately obtain E( P) = Tr V V T + 2F( P) + G( P) ≤ e( P, P t )
(15)
and
where P t T = ( P t )T . If V is positive definite, f ( P, P t ) is an auxiliary function for F( P) defined in (8). Proof: Refer to Appendix A. Note that F( P) is a non-positive-definite quadratic function with respect to P, which geometrically corresponds to a paraboloid. The linear function f ( P, P t ) corresponds to a tangent plane to this paraboloid at the point P t (see Fig. 1). Lemma 2: Let A be a nonnegative symmetric matrix, i.e., A = AT 0. Then, for every integer k ∈ {1, . . . , K }, we have j Ai j ( P tj k )2 2 2 Ai j P ik P j k ≤ P 4ik . (12) ( P tik )2 ij
From Lemmas 1 and 3, we have
t
where P t denotes the value of P at iteration t. Lemma 1: Define a linear function with respect to P as (11) f ( P, P t ) = −2Tr V P t P T + Tr V P t P t T
Proof: Refer to Appendix B. Lemma 3: Let ( P t P t T P t )i j P 4i j . g( P, P t ) = ( P ti j )3
e( P, P t ) = Tr V V T + 2 f ( P, P t ) + g( P, P t ). (14)
(13)
E( P t ) = e( P t , P t ).
(16)
So e( P, P t ) is an auxiliary function for cost function E( P). Proposition 1: For the cost function E( P) = V − P P T 2F , we have E( P t ) ≥ E( P t +1 ) under the update rule (V P)nk P nk ← P nk 3 (17) ( P P T P)nk where n = 1, . . . , N, k = 1, . . . , K . For simplicity, we rewrite (17) in matrix notation as 3 P ← P (V P) ( P P T P) where A B and A B are, respectively, the entrywise product (also called as Hadamard product [37]) and entrywise division √ 3 of matrices A and B, and A denotes the entrywise cube root of A. Proof: The auxiliary function e( P, P t ) is convex with respect to P. The minimum of e( P, P t ) is determined by setting the gradient to zero ∂ f ( P, P t ) ∂g( P, P t ) ∂e( P, P t ) =2 + =0 ∂ P nk ∂ P nk ∂ P nk ( P t P t T P t )nk −4(V P t )nk + 4 · P 3nk = 0. ( P tnk )3
⇒ (18)
HE et al.: SYMMETRIC NONNEGATIVE MATRIX FACTORIZATION
× 104
16
2.5 β =0.3 β = 0.5 β = 0.75 β = 0.9 β = 0.99
12 10
× 105 β =0.3 β = 0.5 β = 0.75 β = 0.9 β = 0.99
2 Average E(P)
14 Average E(P)
2121
8 6 4
1.5 1 0.5
2 0
0
50
100
150 200 Iterations (a)
250
300
0
0
50
100
150 200 Iterations (b)
250
300
Fig. 4. Fifty Monte Carlo tests on the β-SNMF algorithm with different β values. The best results are achieved when β = 0.99. (a) Noise-free. (b) SNR = 20 dB.
Only P nk is unknown in (18). We get ( P tnk )3 · (V P t )nk ( P t P t T P t )nk
⇒
= arg min e( P, P ) =
P tnk
P 3nk = P tnk+1
t
P nk
where n = 1, . . . , N, k = 1, . . . , K P t +1 = arg min e( P, P t ) = P t P
⇒
Besides, from (20) and (21), we have
3
0 ≤ e = e( P t +1 , P t ) − E( P t +1 ) → 0 ⇒ lim e = 0. t →+∞
(V P t )nk ( P t P t T P t )nk
From Lemmas 1 and 3, denoting f = f ( P t +1 , P t ) − F( P t +1 ) and g = g( P t +1 , P t ) − G( P t +1 ), we have f ≥ 0, g ≥ 0, and e = f + g. So
3
(V P t ) ( P t P t T P t )
E( P t ) = e( P t , P t ) ≥ e( P t +1 , P t ).
E( P t ) = e( P t , P t ) ≥ e( P t +1 , P t ) ≥ E( P t +1 ) ≥ 0. (20) That is, E( P t ) ≥ E( P t +1 ) ≥ 0. So Proposition 1 holds. Remark 1: Similar to the update rule (4), the multiplicative rule (17) is also a parallel update method and can be implemented by level 3 BLAS directly. Furthermore, it is faster than the rule (4), which is demonstrated by experiments in Section VI. One of the important problems of an iterative algorithm is the convergence analysis [21], [38], [39]. Next, we discuss the convergence of the update rule (17). Proposition 2: Starting from a positive initialization P 0 0 (i.e., P 0nk > 0, n = 1, . . . , N, k = 1, . . . , K ), if the matrix V is completely positive, the sequence { P t }+∞ t =1 generated by the update rule (17) converges to a Karush–Kuhn–Tucker (KKT) stationary point P ∗ of (1). Proof: We prove this proposition in two steps. 1) First, we prove the sequence { P t }+∞ t =1 is convergent. From Proposition 1 and (20), we can recursively obtain E( P 0 ) ≥ E( P t ) ≥ e( P t +1 , P t ) ≥ E( P t +1 ) ≥ 0. Thus, {E( P t )}+∞ t =1 is monotonically nonincreasing and bounded so that it must be convergent and lim E( P
t →+∞
) = lim e( P t →+∞
t +1
, P ) ≥ 0. t
⇒
lim f = 0.
t →+∞
From (36) in Appendix A (19)
Additionally, from (15), we have e( P t +1 , P t ) ≥ E( P t +1 ). Combing it with (19), we derive
t +1
0 ≤ f ≤ e → 0
(21)
f = f ( P t +1 , P t ) − F( P t +1 ) = Tr V ( P t +1 − P t )( P t +1 − P t )T → 0. Noting that V is positive definite because it is completely positive, we have lim f = 0 ⇒
t →+∞
lim | P t +1 − P t | = 0.
t →+∞
t +∞ In addition, { P t }+∞ t =1 is bounded because {E( P )}t =1 is +∞ t bounded. Therefore, the sequence { P }t =1 is convergent. 2) In the second step, we prove that the sequence { P t }+∞ t =1 converges to a KKT stationary point of (1). Construct a KKT function, which is an extension of the Lagrange function, as follows:
L( P, ) = V − P P T 2F − Tr(T P) where the Lagrange multiplier ∈ R N×K . We have the KKT conditions ⎧ ∂L ⎪ ⎪ = 4 · ( P P T P − V P) − = 0 ⎪ ⎪ ∂ P ⎪ ⎪ ⎪ ⎨ P nk ≥ 0 (22) nk · P nk = 0 ⎪ ⎪ ⎪ ⎪ ⎪ nk ≥ 0 ⎪ ⎪ ⎩ ∀n, k. ∗ Without loss of generality, suppose { P t }+∞ t =1 converges to P , ∗ t i.e., 0 P = lim P ≺ +∞. Construct a nonnegative t →+∞
2122
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011
matrix Rt given by Rtnk =
section, we solve this problem by combining it with the parallel update rule (17). For this purpose, we, respectively, use the α-weighted geometric mean1 between P nk and multiplicative rule (6) to obtain
(V P )nk , n = 1, . . . , N, k = 1, . . . , K . ( P t P t T P t )nk t
Then 0 ≤ R∗nk = lim R tnk t →+∞
and
P tnk+1 = P tnk
3
α P nk ← P 1−α nk [ P nk Rnk ]
(V P ∗ )nk = < +∞ ( P ∗ P ∗T P ∗ )nk
(V P t )nk t 3 t = P R nk . nk ( P t P t T P t )nk
α
= P nk (Rnk ) = P nk
(23)
P tnk+1 = P 0nk
3
Rrnk ⇒ P ∗nk = P 0nk lim
t 3
t →+∞
r=0
Rrnk . (24)
r=0
Since P ∗ 0, we have either P ∗nk > 0 or P ∗nk = 0. Next, we separately prove that P ∗nk satisfies the KKT conditions in both cases. Case 1: P ∗nk > 0. Because P 0nk > 0, P ∗nk > 0, and the infinite product (24) ∗ t 3 3 is convergent, we get R nk = limt →+∞ R nk = 1 (V P ∗ )nk 3 R∗nk = 3 = 1 ⇒ ( P ∗ P ∗T P ∗ )nk ( P ∗ P ∗T P ∗ )nk − (V P ∗ )nk = 0 and ∗nk = 0. So P ∗nk > 0 satisfies the KKT conditions (22). Case 2: P ∗nk = 0. Noting that P 0nk > 0 and there are no zero rows in the matrix V , we have (V P t )nk > 0, t = 0, . . . , +∞. Then R tnk > 0, t = 0, . . . , +∞. So in this case, we obtain 0≤ ⇒
P 0nk lim t →+∞ lim
t →+∞
3
t 3
Rrnk = P ∗nk = 0.
r=0
Rtnk =
3
R ∗nk =
3
(V
P ∗)
(V P)nk ( P P T P)nk
α (25)
where 0 ≤ α < 1 and use the β-weighted arithmetic mean between them to derive the update rule (7) as follows: P nk ← (1 − β) P nk + β( P nk R nk )
(V P)nk = P nk 1 − β + β ( P P T P)nk
From (23), we can recursively obtain t
(26)
where 0 ≤ β < 1. If α = 1/3, (25) is exactly the update rule (17). Proposition 3: For a KKT point P ∗ , we have
α (V P ∗ )nk ∗ = P ∗nk (27) P nk ( P ∗ P ∗T P ∗ )nk and
P ∗nk 1 − β + β
(V P ∗ )nk ( P ∗ P ∗T P ∗ )nk
= P ∗nk
(28)
where 0 ≤ α < 1, 0 ≤ β < 1, n = 1, . . . , N, and k = 1, . . . , K . Proof: Refer to Appendix D. Proposition 3 implies that the KKT stationary points exactly satisfy the equations (25) and (26). However, if an initialization P 0 is far away from the KKT stationary points, using the parallel mode, the multiplicative rules (25) and (26) may lead to the oscillations as shown in Fig. 2 when α → 1 or β → 1. To deal with this difficulty, we combine them with our update rule (17) to further develop two fast parallel SNMF algorithms: α-SNMF and β-SNMF algorithms. A. α-SNMF Algorithm
nk
( P ∗ P ∗T P ∗ )nk
≤1
⇒ ∗nk = 4[( P ∗ P ∗T P ∗ )nk − (V P ∗ )nk ] ≥ 0. P ∗nk
Thus, = 0 satisfies the KKT conditions (22), too. Accordingly, the sequence { P t }+∞ t =1 converges to a KKT point of (1) if P 0 0. Remark 2: In many situations, we have matrix V satisfying the condition of Proposition 2 that V is completely positive. If not, one can replace it with V + I, where > 0 and I is an identity matrix. Also, Proposition 2 shows that a positive initialization P 0 is helpful in finding the optimal solution. IV. FAST PARALLEL A LGORITHMS FOR SNMF For convenience, rewrite (6) as P nk ← P nk R nk where R nk = (V P)nk /( P P T P)nk . As mentioned in Section II, the multiplicative rule (6) does not work under the parallel update mode (see Fig. 2). In this
Consider the following problem with respect to α: ⎧ min E( P, α) = min V − P P T 2F ⎪ ⎨ 0≤α