This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
1
Multiview Spectral Clustering via Structured Low-Rank Matrix Factorization Yang Wang , Lin Wu, Xuemin Lin, Fellow, IEEE, and Junbin Gao
Abstract— Multiview data clustering attracts more attention than their single-view counterparts due to the fact that leveraging multiple independent and complementary information from multiview feature spaces outperforms the single one. Multiview spectral clustering aims at yielding the data partition agreement over their local manifold structures by seeking eigenvalue– eigenvector decompositions. Among all the methods, low-rank representation (LRR) is effective, by exploring the multiview consensus structures beyond the low rankness to boost the clustering performance. However, as we observed, such classical paradigm still suffers from the following stand-out limitations for multiview spectral clustering of overlooking the flexible local manifold structure, caused by aggressively enforcing the lowrank data correlation agreement among all views, and such a strategy, therefore, cannot achieve the satisfied between-views agreement; worse still, LRR is not intuitively flexible to capture the latent data clustering structures. In this paper, first, we present the structured LRR by factorizing into the latent lowdimensional data-cluster representations, which characterize the data clustering structure for each view. Upon such representation, second, the Laplacian regularizer is imposed to be capable of preserving the flexible local manifold structure for each view. Third, we present an iterative multiview agreement strategy by minimizing the divergence objective among all factorized latent data-cluster representations during each iteration of optimization process, where such latent representation from each view serves to regulate those from other views, and such an intuitive process iteratively coordinates all views to be agreeable. Fourth, we remark that such data-cluster representation can flexibly encode the data clustering structure from any view with an adaptive input cluster number. To this end, finally, a novel nonconvex objective function is proposed via the efficient alternating minimization strategy. The complexity analysis is also presented. The extensive experiments conducted against the real-world multiview data sets demonstrate the superiority over the state of the arts. Index Terms— Iterative multiview clustering agreement, lowrank matrix factorization, low-rank representation (LRR), multiview spectral clustering. Manuscript received August 4, 2016; revised April 10, 2017 and August 27, 2017; accepted November 6, 2017. (Corresponding author: Lin Wu.) Y. Wang is with the School of Computer Science and Engineering, Dalian University of Technology, Dalian 116024, China, and also with The University of New South Wales, Sydney, NSW 2052, Australia (e-mail:
[email protected]). L. Wu is with the Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD 4072, Australia (e-mail:
[email protected]). X. Lin is with the School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW 2052, Australia (e-mail:
[email protected]). J. Gao is with the Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, Sydney, NSW 2006, Australia (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2017.2777489
I. I NTRODUCTION
S
PECTRAL clustering [1]–[4] aims at exploring the local nonlinear manifold (spectral graph)1 structure [5], [6], attracting great attention within recent years. With the development of information technology, multiview spectral clustering, due to the fact of outperforming the single-view counterparts by leveraging the complementary information from multiview spaces. As implied by multiview research [7]–[9], an individual view is not capable of being faithful for effective multiview learning. Therefore, exploring multiview information is necessary, and has been demonstrated by a wide spectrum of applications, e.g., similarity search [10] and human action recognition [11], [12]. Essentially, given the complementary information from multiviews, the critical issue of multiview clustering is to achieve the multiview clustering agreement/consensus [8], [13], [14] to yield a substantial superior clustering performance over the single-view paradigm. Numerous multiview-based methods are proposed for spectral clustering. References [15] and [16] perform multiview information incorporation into the clustering process by optimizing certain objective loss function. Early fusion strategy can also be developed by concatenating the multiview features into a uniform one [17], upon which the similarity matrix is calculated for further multiview spectral clustering. As mentioned in [18] and [19], such a strategy will be more likely to destroy the inherent property of original feature representations within each view, hence resulting into a worse performance; worse still, sometimes, as indicated by the experimental reports from our previous research [18], it may even be inferior to the clustering performance with a single view. In contrast, late fusion strategy [20] conducts a spectral clustering performance for each view, and then combining multiple them afterward, which, however, cannot achieve the multiview agreement, without collaborating with each other. Canonical correlation analysis (CCA)-based methods [21], [22] for multiview spectral clustering project the data from multiview feature spaces onto one common lower dimensional subspace, where the spectral clustering is subsequently conducted. One limitation of such a method lies in the fact that such a common lower dimensional subspace cannot flexibly characterize the local manifold structures from heterogeneous views, resulting into an inferior performance. Kumar et al. [23] proposed a stateof-the-art coregularized spectral clustering for multiview data.
1 In the rest of this paper, we will alternatively use nonlinear manifold structure or spectral graph structure.
2162-237X © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2
Similarly, a cotraining [24], [25] model is proposed for this problem [26]. One assumption for the above work [23], [26] is the scenario with noise corruption free for each view. However, it is not easily met. To this end, low-rank representation (LRR) [18], [27]–[30] is proposed. As summarized in [9], where the basic idea is to decompose data representation into a view-dependent noise corruption term and a common LRR shared by all views, leading to common data affinity matrix for clustering, the effectiveness of low-rank model also leads to numerous research on multiview subspace learning [31], [32] applied to the pattern recognition field. LRR tries a common multiview LRR, but overlooks the distinct manifold structures. To remedy the limitations, inspired by the latest development of graph regularized LRR [33], [34], we recently proposed another iterative views agreement strategy [9] with a graph regularized LRR for multiview spectral clustering, named LRRGL; to characterize the nonlinear manifold structure from each view, LRRGL couples LRR with multigraph regularization, where each one can characterize the view-dependent nonlinear local data manifold structure [35]. A novel iterative view agreement process is proposed for optimizing the proposed, where, during each iteration, the LRR yielded from each view serves as the constraint to regulate the representation learning from other views, to achieve the consensus, implemented by applying a linearized alternating direction method with adaptive penalty [36]. Despite the effectiveness of LRRGL, we still identify the following nontrivial observations that are not addressed by LRRGL to obtain the further improvement. 1) It is less flexible for Z i yielded by low-rank constraint to capture the flexible latent data similarity that can encode the more rich similarity information than Z i over X i , which can be better solved by matrix factorization. 2) LRRGL mainly focused on yielding the low-rank primal data similarity matrix Z i derived from X i . However, such primal Z i is less intuitive to understand and less effective to reveal the ideal data clustering structure for the i th view, as well as multiviews. Hence, it will prevent achieving the better multiview spectral clustering performance. The structured consensus loss term imposed over Z i (i ∈ V ) may not effectively achieve the consensus regarding the multiview spectral clustering. A. Our Contributions This paper is the extension of our recent work [9]; upon that, we deliver the following novel contributions to achieve the further improvement over multiview spectral clustering. 1) Instead of focusing on primal low-rank data similarity matrix Z i such that i = 1, . . . , V , we perform a symmetric matrix factorization over Z i into the datacluster indicator matrix, so that such latent factorization provides the better chance to preserve the ideal cluster structure besides flexible manifold structure for each view. 2) We impose the Laplacian regularizer over factorized data-clustered representation to further characterize
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Fig. 1. Visualization results of the multiview (please refer to Section IV for specific multiview features) affinity matrix between ours and LRRGL over NUS data. The more whiter the diagonal blocks, the more ideally the cluster is to characterize the data objects within the larger similarity; meanwhile, the more blacker the nondiagonal blocks, the more reasonable the nonsimilarity data objects are unlikely to cluster together. For such a result, we can see that the diagonal blocks from the third to the eight of our method are more whiter than those of LRRGL, leading to the result that the surrounding black nondiagonal blocks of our method are more salient than those of LRRGL, which demonstrate the advantages of our method via a latent factorized data-cluster representation over LRRGL.
the nonlinear local manifold structure for each view. We remark that the factorized data-cluster matrix can effectively encode the clustering structure, and we provide an example to illustrate this in Fig. 1. To reach the multiview clustering agreement, we set the same clustering number for all views to the data-clustering representation for all views. 3) We impose the consensus loss term to minimize the divergence among all the latent data-cluster matrices instead of Z i to achieve the multiview spectral clustering agreement. 4) To implement all the above insights, we propose a novel objective function and an efficient alternating optimization strategy together with the complexity analysis to solve the objective function. Moreover, we deliver the intuitions of iterative multiview agreement over the factorized latent data-cluster representation during each iteration of our optimization strategy that will eventually lead to the multiview clustering agreement. 5) Extensive experiments over real-world multiview data sets demonstrate the advantages of our technique over the state of the arts, including our recently proposed LRRGL [9]. Recently, another elegant graph-based principal component analysis method [37] is proposed for spectral clustering with out-of-sample case. Unlike this effective technique, we study the multiview case to address the effective consensus for spectral clustering. We summarize the main notations in Table I. II. S TRUCTURED L OW-R ANK M ATRIX FACTORIZATION TO S PECTRAL C LUSTERING We get started from each single view, e.g., the i th view as min
Z i ,E i
θ ||X i − X i Z i − E i ||2F + ||Z i ||∗ + β||E i ||1 2
(1)
where θ and β are the tradeoff parameters, and as aforementioned, we always adopt Di to be X i , so that X i can be decomposed as clean component X i Z i and another corrupted
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. WANG et al.: MULTIVIEW SPECTRAL CLUSTERING VIA STRUCTURED LOW-RANK MATRIX FACTORIZATION
TABLE I N OTATIONS S UMMARIZATION
3
view, e.g., the i th view, but also need the same scale to unify all views to reach possible agreement. To address all the above notes, in what follows, we will present our technique of data-cluster-based structured low-rank matrix factorization. B. Data-Cluster (Landmark)-Based Structured Low-Rank Matrix Factorization We aim at factorizing Z i as an approximate symmetric lowrank data-cluster matrix to minimize the reconstruction error min ||X i − X i Z i ||2F Zi
(3)
where we assume the rank of Z i is ki , such that ki is related to the data cluster number for the i th view. As indicated in [30], minimizing (3) is equivalent to finding the optimal rank ki approximation relying on the skinny singular value decomposition of X i = V U T to yield the following optimal solution: Z i∗ = Ui UiT
component E i for the i th view. One can easily verify that rank(X i Z i ) ≤ rank(Z i ), and hence, minimizing rank(Z i ) is equivalent to bounding the low-rank structure of clean component X i Z i . Now, we are ready to deeply investigate ||Z i ||∗ for the i th view. Following [38], we reformulate the nuclear norm ||Z i ||∗ as: 1 ||Z i ||∗ = ||Ui ||2F + ||Vi ||2F (2) min T 2 Ui ,Vi ,Z i =Ui Vi where Ui ∈ Rn×d and Vi ∈ Rn×d , and d is always less than di , since high-dimensional data objects always characterize the low-rank structure. A. Notes Regarding Ui and Vi for Multiview Spectral Clustering Before further discussing the low-rank matrix factorization, one may consider the following notes that the factorized Ui and Vi may need to satisfy in the context of both the within-view data structure preserving and multiview spectral clustering agreement. 1) The low-rank data structure should be characterized by the factorized Ui or Vi for the i th view, especially to characterize the underlying data clustering structure. 2) Thefactorized latent factors should well encode the manifold structure for the i th view, which, as previously mentioned, is critical to the spectral clustering performance. or column-based 3) Either the row-based matrix Ui matrix Vi is considered to meet the above two notes? if so, which one? One may claim both to be considered, which, however, may inevitably raise more parameters to be tuned. 4) Not only the factorized latent low-dimensional factors, e.g., Ui or Vi , should meet the above notes within each
(4)
where Ui ∈ Rn×ki , such that ki denotes the top ki principle basis of X i . Here, we follow the assumption in [39] to see ki as the cluster number of data objects within the i th view, and the data-cluster symmetric matrix factorization has been widely adopted by the numerous existing research, including semisupervised learning [40], [41], metric fusion [19], and clustering [3]. We aim at solving the following equivalent lowrank minimization over Z i via the clustered symmetric matrix factorization as: ||Z i ||∗ =
min
Ui ,Z i =Ui UiT
||Ui ||2F
(5)
where we often minimize the following for derivative convenience with respect to Ui : ||Z i ||∗ =
min
Ui ,Z i =Ui UiT
1 ||Ui ||2F . 2
(6)
Remark: Following (3) and (4), we initialize Ui ∈ Rn×ki via a k-means clustering over X i and normalize Ui ( j, k) = (1/|Ck |) provided X i (·, j ), i.e., the j th data object is assigned to Ck . By such normalization, all the columns of Ui are orthonormal; moreover, they are within the same magnitude so as to perform the agreement minimization. Furthermore, such factorization can well address the aforementioned challenges, and it is worthwhile to summarize them as follows. 1) The data cluster structure can be well encoded by such low-rank data-cluster representation within each view. The setting Ui = Vi can avoid the more parameters and importance weight discussion provided Ui = Vi . 2) More importantly, inspired by the reasonable assumption hold by all the multiview clustering research [16], [18], [23], [26]. As indicated in [18], the ideal multiview clustering performance is that the common underlying data clustering structure is shared by all the views; we naturally set all the Ui values with the same size by adopting the same value for ki = d(i = 1, . . . , V ), i.e., the clustering number, upon the same data object number
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
n for all views, so that the feasible loss functions can be developed to seek the multiview clustering agreement with the same clustering number for all views. For spectral clustering from each view, we preserve the nonlinear local manifold structure of X i via such low-rank data-cluster representation Ui for the i th view, which can be formulated as n 1 i ||u j − u ik ||22 Wi ( j, k) 2 j,k
=
N N i T i i T i u j u j Hi ( j, j ) − u k u j Wi ( j, k) j =1
T T j,k = Tr Ui Hi Ui − Tr Ui Wi Ui = Tr UiT L i Ui Rd
(7)
where ∈ is the kth row vector of Ui ∈ representing the linear correlation between x k and x j ( j = k) in the i th view, Wi ( j, k) encodes the similarity between x j and x k for the i th view, Hi is a diagonal matrix with its kth diagonal entry to be the summation of the kth row of Wi , and L i = Hi − Wi is the graph Laplacian matrix for the i th view. Following [9], we choose Gaussian kernel to define W ij k : u ik
Rn×d
Wi ( j, k) = e
−
||x ij −x ki ||22 2σ 2
.
(8)
We aim to minimize the difference of low-rank based data-cluster representations for all views via a mutual consensus loss function term to coordinate all views to reach clustering agreement while structuring such representation with Laplacian regularizer to encode the local manifold structure for each view. Unlike the traditional LRR to achieve the common data similarity by all views, we propose to learn a variety of factorized low-rank data-cluster representations for different views to preserve the flexible local manifold structure while achieving the data cluster structure for each view; upon that, the consensus loss term is imposed to achieve the multiview consensus, leading to our iterative views agreement in Section II-C. C. Objective Function With Structured Low-Rank Matrix Factorized Representation We propose the objective function with structured LRR Ui for each view, e.g., the i th view with factorized low-rank via (6) data-clustered representation via (3). Then, we have the following: ⎛ ⎜ ⎜ ⎝ (i∈V )
where it has the following. 1) Ui ∈ Rn×d denotes the factorized low-rank data-cluster representation of X i for the i th view. Tr(UiT L i Ui ) makes Ui to be structured with local manifold structure for the i th view. ||E i ||1 is responsible for possible noise with X i . λ1 , λ2 , and β are all tradeoff parameters. 2) One reasonable assumption hold by a lot of multiview clustering research [16], [23], [26], [42] is that all the views should share the similar underlying clustering structure. i, j ∈V ||Ui − U j ||2F aims to achieve the view agreement regarding the factorized LRRs Ui from all |V | views; unlike the traditional LRR method to enforce an identical representation, we construct different values of Ui for each view, and then further minimize their divergence to generate a view agreement. 3) Ui ≥ 0 is a nonnegative constraint, through X i = X i Z i + E i = X i Ui UiT + E i for the i th view. Equation (9) is nonconvex; we, hence, alternately optimize each variable while fixing the others, that is, updating all the Ui and E i (i ∈ {1, . . . , V }) values in an alternative way until the convergence is reached. As solving all the {Ui , E i }(i ∈ V ) pairs shares the similar optimization strategy, only the i th view is presented. To this end, we introduce two auxiliary variables Di and G i , then solving (9) with respect to Ui , E i , Di , and G i that can be written as follows: 1 ||Ui ||2F + λ1 ||E i ||1 Ui ,E i ,Di ,G i 2 β + λ2 Tr(Ui L i UiT ) + ||Ui − U j ||2F 2 min
j ∈V , j =i
s.t. X i =
Di UiT
+ E i , Di = X i Ui , G i = Ui , G i ≥ 0 (10)
where Di ∈ Rdi ×d . We will show the intuition for the auxiliary variable relationship Di = X i Ui by introducing the augmented Lagrangian function based on (10) as follows: L Ui , E i , Di , G i , K 1i , K 2i , K 3i 1 = ||Ui ||2F + λ1 ||E i ||1 + λ2 Tr UiT L i Ui 2 β + ||Ui − U j ||2F + K 1i , X i − Di UiT − E i 2 j ∈V , j =i i + K 2 , Ui − G i + K 3i , Di − X i Ui 2 μ + X i − Di UiT − E i F 2 (11) + ||Ui − G i ||2F + ||Di − X i Ui ||2F
1 ||Ui ||2F + λ ||E || 1 i 1 2 i∈V noise and corruption robustness where K i ∈ Rdi ×n , K i ∈ Rn×d , and K i ∈ Rdi ×d are Lagrange 1 2 3 minimize ||Z i ||∗ via Eq.(6) T multipliers, and μ > 0 is a penalty parameter. + λ2 Tr Ui L i Ui
From (11), we can easily show the intuition on Di = T 2 Graph Structured Regularization X ⎞ i Ui , that is, minimizing ||X i − Di Ui − E i || F with respect to Di is similar as dictionary learning, while pop out UiT ⎟ β 2⎟ as corresponding representations learning, and both of them + ||Ui − U j || F ⎠ 2 reconstruct X i for the i th view. Besides the above intuition, j ∈V , j =i
Views-agreement it is quite simple to optimize only single UiT by merging the s.t. i = 1, . . . , V, X i = X i Ui UiT + E i , Ui ≥ 0 (9) other into Di . min
Ui ,E i
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. WANG et al.: MULTIVIEW SPECTRAL CLUSTERING VIA STRUCTURED LOW-RANK MATRIX FACTORIZATION
III. O PTIMIZATION S TRATEGY We minimize (11) by updating each variable while fixing the others. A. Solve Ui Minimizing Ui is to resolve 1 L1 = ||Ui ||2F + λ2 Tr UiT L i Ui 2 β + ||Ui − U j ||2F + K 1i , X i − Di UiT − E i 2 j ∈V , j =i i + K 2 , Ui − G i + K 3i , Di − X i Ui μ + ||X i − Di UiT − E i ||2F + ||Ui − G i ||2F 2 (12) +||Di − X i Ui ||2F . We set the derivative of (12) with respect to Ui to be the zero matrix, which yields the following equation: ∂L1 = Ui + 2λ2 L i Ui + β (Ui − U j ) ∂Ui j ∈V , j =i T − K 1i Di + K 2i − X iT K 3i + μUi DiT Di + μE iT Di + μ (Ui − G i ) − μX iT X i Ui = 0
(13)
Rn×d
where 0 ∈ shares the same size as Ui . Rearranging the other terms further yields the following: −1 Ui = 2λ2 L i + (1 + β(|V | − 1) + μ)In − μX iT X i S
with computational complexity O (n 3 )
S=
Uj +
j ∈V , j =i + X iT K 3i
i T
K1
T
(14)
− μUi DiT − μE i Di
+ μX iT X i Ui .
5
1) Complexity Discussion for the Row Updating Strategy for Ui : Unlike the closed form regarding Ui , it is apparent that the major computational complexity lies in the inverse matrix computation over the size of Rd×d , which leads to O(d 3 ) according to (17), as shown at the top of the next page, which is much smaller than O(n 3 ). Besides, as d is set as the cluster number across all views, moreover, aforementioned, it should be less than the inherent rank of X i and, hence, a small value. Upon the above facts, it is tremendously efficient via O(d 3 ) to sequentially update each row of Ui . 2) Intuitions for Views Agreement: The iterative views clustering agreement can be immediately captured via the terms underlined in (17). Specifically, during each iteration, Ui (l, ·) is updated via the influence from others view, while served as the constraint to generate U j (l, ·)( j = i ), the divergence among all Ui (l, ·) is decreased gradually toward an agreement for all views, and such a process repeats until the convergence is reached. Unlike the existing LRR method by directly imposing the common representation, our iterative multiview agreement can better preserve the flexible manifold structure for each view and, meanwhile, achieve the multiview agreement, which will be critical to finalize multiview spectral clustering. Remark: After the whole Ui is updated for the i th view, we simply perform a K -means clustering over it to assign each data object to one cluster exclusively, and then normalized each column of Ui to form an orthonormal matrix. B. Solve Di The optimization process regarding Di is equivalent to the following: min K ii , X i − Di UiT − E i + K 3i , Di − X i Ui Di
(15)
The bottleneck of computing (14) lies in the inverse matrix computation over the matrix of the size Rn×n causing the computational complexity O(n 3 ), which is computationally prohibitive provided that n is large. Therefore, we turn to update each row of Ui ; without loss of generality, we present the derivative with respect to Ui (l, ·) as n T (2λ2 L i (k, l) − μ (X i X i )(k, l)) Ui (l, ·) + Ui (l, ·) k=1
T + K 1i (l, ·)Di + μUi (l, ·)DiT Di + K 2i (l, ·) − X iT (l, ·)K 3i +β (Ui (l, ·) − U j (l, ·))
μ ||X i − Di Ui − E i ||2F + ||Di − X i Ui ||2F . (18) 2 We get the derivative with respect to Di , and then, it yields the following closed-form updating rule: −1 Id + UiT Ui i i Di = K 1 Ui − K 3 + μ(2X i − E i )Ui μ (19) +
where the major computational complexity lies in the inverse computation over matrix (Id + UiT Ui ) ∈ Rd×d , resulting into O(d 3 ), as aforementioned, that is the same as updating each row of Ui , and hence quite efficient.
j ∈V , j =i
+ μ Ui (l, ·) + E iT (l, ·)Di − G i (l, ·) = 0
(16)
where 0 ∈ Rd denotes the vector of the size d with all entries to be 0 and Ui (l, ·) ∈ Rd represents the lth row of Ui . We rearrange the terms to yield the following: where Til = X iT (l, ·)K 3i + μ G i (l, ·) − E iT (l, ·)Di T − K 2i (l, ·) − K 1i (l, ·)Di Id ∈ Rd×d is the identity matrix.
C. Solve E i It is equivalent to solving the following: μ 1 min λ1 ||E i ||1 + ||E i − X i − Di UiT + K 1i ||2F Ei 2 μ
(20)
where the following closed-form solution can be yielded for E i according to [43]: 1 E i = S λ1 X i − Di UiT + K 1i . (21) μ μ
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
⎛
⎞
⎜ ⎜ l Ui (l, ·) = ⎜ ⎜Ti + β ⎝
×
U j (l, ·)
j =i, j ∈V
⎟ ⎟ ⎟ ⎟ ⎠
Influences from other views n
1+μ+
(2λ2 L i (k, l) − μ
k=1
X iT
X i (k, l)) Id + DiT Di
−1 (17)
with computational complexity O (d 3 )
D. Solve G i It is equivalent for the following: i μ (22) K 2 , Ui − G i + ||G i − Ui ||2F 2 where the following closed-form solution of G i can be derived as: Ki (23) G i = Ui + 2 . μ E. Updating K 1i , K 2i , K 3i , and μ We update Lagrange multipliers K 1i , K 2i , and K 3i via K 1i = K 1i + μ(X i − Di Ui − E i ) K 2i K 3i
= =
K 2i K 3i
(24)
+ μ(Ui − G i )
(25)
+ μ(Di − X i Ui ).
(26)
Following [9], μ is tuned using the adaptive updating strategy [36] to yield a faster convergence. The optimization strategy alternatively updates each variable while fixing others until the convergence, which is summarized in Algorithm 1. F. Notes Regarding Algorithm 1 It is worthwhile to highlight some critical notes regarding Algorithm 1 in the following. 1) We initialize Ui [0] ∈ Rn×d for all views, such that each entry of Ui [0] represents similarity between each data object and one of the d anchors (cluster representatives), which can be seen as the centers from the clusters generated from the k-means or spectral clustering. 2) For our initialization, we adopt the spectral clustering outcome with the clustering number to be d, where the similarity matrix is calculated via the original X i feature representation within each view, and then, the Ui [0](i, j ) entry, i.e., the similarity between the i th data object and the j th anchor is yielded via (8). The Laplacian matrix L i (i = 1, . . . , V ) is computed once offline also within the original X i feature representation. 3) More importantly, we set the identical value of d(the cluster number) to the column size of Ui [0](i = 1, ··, V ) ∈ Rn×d for all the views. We remark that the above initial setting for Ui [0] with the same d is reasonable, and as stated before, all the views should share the similar underlying data clustering structure. This fact also implies that the initialized Ui [0] is reasonably not divergent a lot among all views.
Algorithm 1 Alternating Optimization Strategy for (9) Input: X i (i = 1, . . . , V ), d, λ1 , λ2 , β Output: Ui , Di , E i , G i (i ∈ V ) Initialize: Ui [0], L i (i = 1, . . . , V ) computation, set all entries of K 1i [0], G i [0], K 2i [0] to be 0, initialize E i [0] with sparse noise as 20% entries corrupted with uniformly distributed noise over [-5,5], μ[0] = 10−3 , 1 = 10−3 , 2 = 10−1 k=0 for i ∈ V do Solve Ui : Sequentially update each row of Ui according to Eq.(17). Orthonormalized each column of Ui . Update E i : 1 E i [k + 1] = S λ1 (X i − Di UiT [k] + μ[k] K 1i [k]) μ[k]
Update G i : K 2i [k] G i [k + 1] = Ui [k] + μ[k] Update K 1i , K 2i , K 3i and μ: K 1i [k + 1] = K 1i [k] + μ(X i − Di UiT [k] − E i [k]) K 2i [k + 1] = K 2i [k] + μ(Ui [k] − G i [k]) K 3i [k + 1] = K 3i [k] + μ(Di [k] − X i Ui [k]) Update μ according to [36] whether converged if ||X i − Di UiT [k + 1] − E i [k + 1]||/||X i || < 1 and max{ξ ||Ui [k + 1] − Ui [k]||, μ[k]||G i [k + 1] − G i [k]||, μ[k]||E i [k + 1] − E i [k]||} < 2 then Remove the i th view from the view set as V = V −i Ui [N] = Ui [k + 1], s.t. N is any positive integer. else k =k+1 Return Ui [k + 1], Di [k + 1], E i [k + 1], G i [k + 1] (i = 1, . . . , V )
G. Convergence Discussion Often, the above alternating minimization strategy can be seen as the coordinate descent method. According to [44], the above-mentioned sequences (Ui , Di , E i , G i ) will eventually converge to a stationary point. However, we are not sure whether the converged stationary point is a global optimum, as it is not jointly convex to all the above-mentioned variables.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. WANG et al.: MULTIVIEW SPECTRAL CLUSTERING VIA STRUCTURED LOW-RANK MATRIX FACTORIZATION
7
TABLE II
A. Baselines
D ATA S ETS
The following state-of-the-art baselines used in [9] are compared. 1) MFMSC: Concatenating multifeatures to be the multiview representation for similarity matrix, the spectral clustering is then conducted [17]. 2) Multifeature representation similarity aggregation for spectral clustering (MAASC) [15]. 3) Canonical Correlation Analysis Model [22]: Projecting multiview data into a common subspace, then perform spectral clustering. 4) Coregularized Multiview Spectral Clustering [23]: It regularizes the eigenvectors of view-dependent graph laplacians and achieve consensus clusters across views. 5) Cotraining [26]: Alternately, modify one view’s Laplacian eigenspace by learning from the other views ’s eigenspace, and the spectral clustering is then conducted. 6) Robust LRR method (RLRR) [27], after obtaining the data similarity matrix, upon which, the spectral clustering is performed to be the final multi-view spectral clustering result. 7) LRRGL [9] regularizer over the nonfactorized LRRs, with each of which corresponds to one view to preserve the individual manifold structure, while iteratively boost all these LRRs to reach agreement. The final multiview spectral clustering is performed upon the similarity representations.
H. Clustering Following [9], once the converged Ui (i = 1, . . . , V ) values are ready, all column vectors of Ui (i = 1, . . . , V ), while set small entries under given threshold τ to be 0. Afterward, the similarity matrix for the i th view between the j th and kth data objects as (27) Wi ( j, k) = Ui UiT ( j, k). Following [9], the final data similarity matrix can be defined as: V Wi . (28) W = i |V | The clustering is carried out against W via (28) to yield final outcome of d data groups. IV. E XPERIMENTS We adopt the data sets mentioned in [9] as follows. 1) UCI Handwritten Digit set2 : It consists of features for hand-written digits (0–9) with 6 features, and contains 2000 samples with 200 in each category. Analogous to [9] and [36], we choose two views as 76 Fourier coefficients of the character shapes and the 216 profile correlations. 2) Animal With Attribute3 : It consists of 50 kinds of animals described by 6 features (views): color histogram (CQ, 2688-D), local self-similarity (2000-D), pyramid histogram of oriented gradients (HOG) (252-D), scale-invariant feature transform (SIFT) (2000-D), color SIFT (RGSIFT, 2000-D), and speed up robust features (2000-D). Following [9], 80 images for each category and get 4000 images in total. 3) NUS-WIDE-Object (NUS) [45]: 30000 images from 31 categories. Five views are adopted using five features as provided by the website4 : 65-D color histogram (CH), 226-D color moments, 145-D color correlation (CORR), 74-D edge estimation (EDH), and 129-D wavelet texture. 4) PASCAL VOC 20125: We select 20 categories with 11 530 images, and two views are constructed with color features (1500-D) and HOG features (250-D). Among them, 5600 images are selected by removing the images with multiple categories. We summarize the above throughout in Table II. 2 http://archive.ics.uci.edu/ml/datasets/Multiple+Features. 3 http://attributes.kyb.tuebingen.mpg.de. 4 lms.comp.nus.edu.sg/research/NUS-WIDE.html. 5 http://host.robots.ox.ac.uk/pascal/VOC/voc2012/.
B. Experimental Settings and Parameters Study We implement these competitors under the experimental setting as mentioned in [9]. Following [9], σ in (8) is learned via [2], and s = 20 to construct s-nearest neighbors for (8). We adopt two standard metrics: clustering accuracy (ACC) and normalized mutual information (NMI) as the metric defined as n δ(map(ri ), li ) (29) ACC = i=1 n where ri denotes the cluster label of x i li denotes the true class label, n is the total number of images, δ(x, y) is the function that equals one if x = y and equals zero otherwise, and map(ri ) is the permutation mapping function that maps each cluster label ri to the equivalent label from the database. Meanwhile, the NMI is formulated as follows: c c n i, j i=1 j =1 n i, j log n i nˆ j (30) NMI = nˆ j c n i c n log n ˆ log i j i=1 j =1 n n where n i is the sample number in cluster Ci (1 ≤ i ≤ c), nˆ j is the sample number from class L j (1 ≤ j ≤ c), and n i, j denotes the sample number in the intersection between Ci and L j . Remark: Following [9], we repeated the running ten times, and their averaged mean value for multiview spectral clustering for all methods is reported. For each method including ours, we input the clustering number as the number of groundtruth classes from all data sets. Feature Noise Modeling for Robustness: Following [9] and [46], 20% feature elements are corrupted with uniform
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8
Fig. 2.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Study over λ2 and β over latent factorized data-cluster representation on three data sets.
distribution over the range [5, −5], which is consistent to the practical setting while matching with LRRGL, RLRR, and our method. Following [9], we set λ1 = 2 in (9) for sparse noise term. We test ACC and NMI over a different value of λ2 and β in (9) in Section IV-C.
TABLE III ACC R ESULTS
C. Validation Over Factorized Low-Rank Latent Data-Cluster Representation First, we would like to validate our method regarding the multigraph regularization and iterative views agreement over factorized latent data-cluster representation. Following [9], we test λ2 and β within the interval [0.001, 10], with one parameter while fixing the value of the other parameter, and the ACC results are shown in Fig. 2, where we have the following. 1) Increasing β will improve the performance, and vice versa, that is, increasing λ2 will improve the performance. 2) The clustering metric ACC increases when both λ2 and β increase. Based on the above, we choose a balance pair values: λ2 = 0.7 and β = 0.2 for our method. D. Results According to Tables III and IV, the following identification can be drawn, and note that we mainly deliver the analysis between our method and LRRGL, as the analysis over other competitors has been detailed in [9]. 1) First,our method outperforms LRRGL, implying the effectiveness of the factorized latent data-cluster representation, as it can better encode the data-cluster representation for each view as well as all views. We provide more insights about that in Fig. 3. 2) Second, both our method and LRRGL outperform the model of learning a common low-dimensional subspace among multiview data, as indicated in [9], and it is incapable of encoding local graph structures within a single subspace. 3) Our method and LRRGL are more effective under noise corruptions than other methods. More analysis can be referred to our conference version [9].
TABLE IV NMI R ESULTS
4) Our method achieves the best performance over PASCAL VOC 2012 under the selected two views via the tuned the parameters. We present Fig. 3 to show more intuitions on why our method with the multiview affinity matrix yielded from factorized data-cluster representation outperforms the primal similarity matrix for LRRGL. For example, it has the following. 1) For UCI data set, i.e., the multiview affinity matrix illustrated in Fig. 3(a) and (d), we can see that both the fourth and fifth diagonal blocks of our method in Fig. 3(d) are more whiter than those of LRRGL illustrated in Fig. 3(a); meanwhile, the surrounding nondiagonal black blocks, e.g., (4, 5)th and (5, 4)th, are more black than those of LRRGL. 2) For Animal with Attribute (AwA) data set, the diagonal blocks of our method from the second to the sixth are whiter than those of LRRGL, leading to a slight deeper black color over the surrounding nondiagonal blocks than LRRGL. 3) The similar conclusions also hold for the NUS data set. We can see that the diagonal blocks from the third to the eight of our method are more whiter than those of
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. WANG et al.: MULTIVIEW SPECTRAL CLUSTERING VIA STRUCTURED LOW-RANK MATRIX FACTORIZATION
9
Fig. 3. Recovered multiview-based consensus affinity matrix over both our proposed method and LRRGL on three multiview data sets with noise corruption. For UCI digit data set, we plot the affinity matrix over all data samples. For AwA and NUS data sets, we randomly select 10 classes, where 80 samples are randomly selected for each of them. The ten diagonal block represents the data samples within the ten clusters with respect to ground-truth classes, where more white the color is, the ideally larger affinity value will be to better reveal the data samples clusters within the same classes. Meanwhile, for nondiagonal blocks, the more black the color is, the ideally smaller affinity to reveal the data samples within different clusters.
LRRGL, leading to the result that the surrounding black nondiagonal blocks of our method are more salient than those of LRRGL. From the above observations, we can safely infer the advantages of the affinity matrix representation yielded by our factorized latent data-cluster representation over the primal affinity matrix of LRRGL for multiview spectral clustering. V. C ONCLUSION In this paper, we propose to learn a clustered LRR via structured matrix factorization for multiview spectral clustering. Unlike the existing methods, we propose an iterative strategy of intuitively achieving the multiview spectral clustering agreement by minimizing the between-view divergences in terms of the factorized latent data-clustered representation for each view. Upon that, we impose the graph Laplacian regularizer over such low-dimensional data-cluster representation, so as to adapt to the multiview spectral clustering, as demonstrated by the extensive experiments. The future work includes the following directions. The graph regularized low-rank embedding out-of-sample case has been researched [4], and will be applied for multiview outof-sample scenario. Unlike the predefined graph similarity value, inspired by [47], we will simultaneously learn and achieve the consensus graph clustering result and graph structure, i.e., graph similarity. Besides, the latest nonparametric graph construction model [48] will also be incorporated for
multiview spectral clustering. The practice of our method can be improved by reducing the tuned parameters further. Upon that, we will also investigate the problem of learning the weight [47], [49], [50] for each view. R EFERENCES [1] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” in Proc. Adv. Neural Inf. Process. Syst., 2001, pp. 849–856. [2] L. Zelnik-Manor and P. Perona, “Self-tuning spectral clustering,” in Proc. Adv. Neural Inf. Process. Syst., 2004, pp. 1601–1608. [3] D. Cai and X. Chen, “Large scale spectral clustering via landmarkbased sparse representation,” IEEE Trans. Cybern., vol. 45, no. 8, pp. 1669–1680, Aug. 2015. [4] F. Nie, Z. Zeng, I. W. Tsang, D. Xu, and C. Zhang, “Spectral embedded clustering: A framework for in-sample and out-of-sample spectral clustering,” IEEE Trans. Neural Netw., vol. 22, no. 11, pp. 1796–1808, Nov. 2011. [5] C. Hou, F. Nie, D. Yi, and D. Tao, “Discriminative embedded clustering: A framework for grouping high-dimensional data,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 6, pp. 1287–1299, Jun. 2015. [6] H. Tao, C. Hou, F. Nie, Y. Jiao, and D. Yi, “Effective discriminative feature selection with nontrivial solution,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 4, pp. 796–808, Apr. 2016. [7] C. Xu, D. Tao, and C. Xu, “Multi-view intact space learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 12, pp. 2531–2544, Dec. 2015. [8] C. Xu, D. Tao, and C. Yu. (2013). “A survey on multi-view learning.” [Online]. Available: https://arxiv.org/abs/1304.5634 [9] Y. Wang, W. Zhang, L. Wu, X. Lin, M. Fang, and S. Pan, “Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering,” in Proc. 25th Int. Joint Conf. Artif. Intell., New York, NY, USA, 2016, pp. 2153–2159.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10
[10] L. Liu, M. Yu, and L. Shao, “Multiview alignment hashing for efficient image search,” IEEE Trans. Image Process., vol. 24, no. 3, pp. 956–966, Mar. 2015. [11] S. Jones and L. Shao, “A multigraph representation for improved unsupervised/semi-supervised learning of human actions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 820–826. [12] L. Shao, L. Liu, and M. Yu, “Kernelized multiview projection for robust action recognition,” Int. J. Comput. Vis., vol. 118, no. 2, pp. 115–129, 2016. [13] J. Gui, D. Tao, Z. Sun, Y. Luo, X. You, and Y. Y. Tang, “Group sparse multiview patch alignment framework with view consistency for image classification,” IEEE Trans. Image Process., vol. 23, no. 7, pp. 3126–3137, Jul. 2014. [14] H. Gao, F. Nie, X. Li, and H. Huang, “Multi-view subspace clustering,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 4238–4246. [15] H.-C. Huang, Y.-Y. Chuang, and C.-S. Chen, “Affinity aggregation for spectral clustering,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 773–780. [16] S. Bickel and T. Scheffer, “Multi-view clustering,” in Proc. IEEE Int. Conf. Data Mining, 2004, pp. 19–26. [17] Y. Huang, Q. Liu, S. Zhang, and D. N. Metaxas, “Image retrieval via probabilistic hypergraph ranking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, Jun. 2010, pp. 3376–3383. [18] Y. Wang, X. Lin, L. Wu, W. Zhang, Q. Zhang, and X. Huang, “Robust subspace clustering for multi-view data by exploiting correlation consensus,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 3939–3949, Nov. 2015. [19] Y. Wang, W. Zhang, L. Wu, X. Lin, and X. Zhao, “Unsupervised metric fusion over multiview data by graph random walk-based crossview diffusion,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 1, pp. 57–70, Jan. 2017. [20] D. Greene and P. Cunningham, “A matrix factorization approach for integrating multiple data views,” in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases, 2009, pp. 423–438. [21] M. B. Blaschko and C. H. Lampert, “Correlational spectral clustering,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., Jun. 2008, pp. 1–8. [22] K. Chaudhuri, S. M. Kakade, K. Livescu, and K. Sridharan, “Multi-view clustering via canonical correlation analysis,” in Proc. 26th Int. Conf. Mach. Learn., 2009, pp. 129–136. [23] A. Kumar, P. Rai, and H. Daume, “Co-regularized multi-view spectral clustering,” in Proc. 25th Annu. Conf. Neural Inf. Process. Syst., 2011, pp. 1413–1421. [24] A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Proc. 11th Annu. Conf. Comput. Learn. Theory, 1998, pp. 92–100. [25] W. Wang and Z.-H. Zhou, “A new analysis of co-training,” in Proc. 27th Int. Conf. Mach. Learn., 2010, pp. 1135–1142. [26] A. Kumar and H. Daume, III, “A co-training approach for multiview spectral clustering,” in Proc. 28th Int. Conf. Mach. Learn., 2011, pp. 393–400. [27] R. Xia, Y. Pan, L. Du, and J. Yin, “Robust multi-view spectral clustering via low-rank and sparse decomposition,” in Proc. 28th AAAI Conf. Artif. Intell., 2014, pp. 2149–2155. [28] G. Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by lowrank representation,” in Proc. 27th Int. Conf. Mach. Learn., 2010, pp. 663–670. [29] G. Liu and S. Yan, “Latent low-rank representation for subspace segmentation and feature extraction,” in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 1615–1622. [30] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 171–184, Jan. 2013. [31] Z. Ding and Y. Fu, “Robust multi-view subspace learning through dual low-rank decompositions,” in Proc. 13th AAAI Conf. Artif. Intell., 2016, pp. 1181–1187. [32] Z. Ding and Y. Fu, “Low-rank common subspace for multi-view learning,” in Proc. 14th IEEE Int. Conf. Data Mining, Dec. 2014, pp. 110–119. [33] M. Yin, J. Gao, and Z. Lin, “Laplacian regularized low-rank representation and its applications,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 38, no. 3, pp. 504–517, Mar. 2016. [34] M. Yin, J. Gao, Z. Lin, Q. Shi, and Y. Guo, “Dual graph regularized latent low-rank representation for subspace clustering,” IEEE Trans. Image Process., vol. 24, no. 12, pp. 4918–4933, Dec. 2015.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
[35] L. Zhuang, J. Wang, Z. Lin, A. Y. Yang, Y. Ma, and N. Yu, “Localitypreserving low-rank representation for graph construction from nonlinear manifolds,” Neurocomputing, vol. 175, pp. 715–722, Jan. 2016. [36] Z. Lin, R. Liu, and Z. Su, “Linearized alternating direction method with adaptive penalty for low-rank representation,” in Proc. 25th Annu. Conf. Neural Inf. Process. Syst., 2011, pp. 612–620. [37] M. Tang, F. Nie, and R. Jain, “A graph regularized dimension reduction method for out-of-sample data,” Neurocomputing, vol. 225, pp. 58–63, Feb. 2017. [38] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization,” SIAM Rev., vol. 52, no. 3, pp. 471–501, 2010. [39] D. Kuang, C. Ding, and H. Park, “Symmetric nonnegative matrix factorization for graph clustering,” in Proc. SDM, 2012, pp. 106–117. [40] M. Wang, W. Fu, S. Hao, D. Tao, and X. Wu, “Scalable semi-supervised learning by efficient anchor graph regularization,” IEEE Trans. Knowl. Data Eng., vol. 28, no. 7, pp. 1864–1877, Jul. 2016. [41] B. Xu et al., “Efficient manifold ranking for image retrieval,” in Proc. 34th Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., 2011, pp. 525–534. [42] J. Gao, J. Han, J. Liu, and C. Wang, “Multi-view clustering via joint nonnegative matrix factorization,” in Proc. SDM, 2013, pp. 252–260. [43] J.-F. Cai, E. J. Candès, and Z. Shen, “A singular value thresholding algorithm for matrix completion,’ SIAM J. Optim., vol. 20, no. 4, pp. 1956–1982, 2010. [44] D. P. Bertsekas, Nonlinear Programming. Belmont, MA, USA: Athena Scientific, 1999. [45] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.-T. Zheng, “NUSWIDE: A real-world Web image database from National University of Singapore,” in Proc. 8th ACM Int. Conf. Image Video Retr., 2009, Art. no. 48. [46] F. Siyahjani, R. Almohsen, S. Sabri, and G. Doretto, “A supervised lowrank method for learning invariant subspace,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 4220–4228. [47] F. Nie, G. Cai, and X. Li, “Multi-view clustering and semisupervised classification with adaptive neighbours,” in Proc. AAAI, 2017, pp. 2408–2414. [48] F. Nie, X. Wang, M. I. Jordan, and H. Huang, “The constrained Laplacian rank algorithm for graph-based clustering,” in Proc. AAAI, 2016, pp. 1969–1976. [49] F. Nie, J. Li, and X. Li, “Parameter-free auto-weighted multiple graph learning: A framework for multiview clustering and semi-supervised classification,” in Proc. IJCAI, 2016, pp. 1881–1887. [50] X. Cai, F. Nie, and H. Huang, “Multi-view K -means clustering on big data,” in Proc. AAAI, 2013, pp. 2598–2604. Yang Wang received the Ph.D. degree from The University of New South Wales, Sydney, NSW, Australia, in 2015. He is currently a Research Fellow with The University of New South Wales. He has authored 40 research papers together with a book chapter, most of which have appeared at the competitive venues, including the IEEE T RANSACTIONS ON I MAGE P ROCESSING (TIP), the IEEE T RANSAC TIONS ON N EURAL N ETWORKS AND L EARNING S YSTEMS (TNNLS), the IEEE T RANSACTIONS ON C YBERNETICS , the Pattern Recognition, the Very Large Data Bases Journal, the ACM Multimedia, the ACM Special Interest Group on Information Retrieval, the International Joint Conference on Artificial Intelligence, the ACM Conference on Information and Knowledge Management, the IEEE International Conference on Data Mining, and the Knowledge and Information Systems. His current research interests include data mining and learning over visual data objects from multiview spaces. Dr. Wang served as the Program Committee Member for the Scientific Research Track of European Conference on Machine Learning & Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD) 2014, ECMLPKDD 2015, Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2018, PAKDD 2017, International Symposium on Methodologies for Intelligent Systems 2017, Asian Conference on Machine Learning 2016, and Australian Database Conference 2016. He was a recipient of the Best Research Paper Runner-up Award for PAKDD 2014. He is the Program Co-Chair for Big Data Analytics for Social Computing in conjunction with PAKDD 2018, Melbourne. He regularly served as an Invited Journal Reviewer for more than ten leading journals, such as the IEEE TIP, the IEEE TNNLS, the IEEE TKDE, and the Machine Learning (Springer). He is serving as a Guest Editor for the Pattern Recognition Letters (Elsevier), the Multimedia Tools and Application (Springer), and the Advances in multimedia.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. WANG et al.: MULTIVIEW SPECTRAL CLUSTERING VIA STRUCTURED LOW-RANK MATRIX FACTORIZATION
Lin Wu received the Ph.D. degree from The University of New South Wales, Sydney, NSW, Australia, in 2014. She was an ARC Senior Research Fellow with the Australian Centre for Robotic Vision, The University of Adelaide, Adelaide, SA, Australia. She is currently a Research Fellow with the Information Technology and Electrical Engineering and the Institute for Social Science Research, The University of Queensland, Brisbane, QLD, Australia. Her current research interests include computer vision, machine learning, multimedia analytics, and data mining. She has published 40 academic papers on competitive venues with 430 citations via Google Scholar, such as the Conference on Computer Vision and Pattern Recognition, the ACM Multimedia, the International Joint Conference on Artificial Intelligence, the ACM Special Interest Group on Information Retrieval, the IEEE T RANSACTIONS ON I MAGE P ROCESSING (TIP), the IEEE T RANSACTIONS ON N EURAL N ETWORKS AND L EARNING S YSTEMS (TNNLS), the IEEE T RANSACTIONS ON C YBERNETICS , the Pattern Recognition, the Computer Vision and Image Understanding, and the Applied Optics. Dr. Wu served as a program committee member for numerous international conferences. He was a co-recipient of the Best Research Paper Runner-up Award for PAKDD 2014. She served as an Invited Journal Reviewer for the IEEE TIP, the IEEE TNNLS, the IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY , the IEEE T RANSACTIONS ON M ULTIMEDIA, and the Pattern Recognition.
11
Xuemin Lin (F’15) served as a Qianren Scheme Professor with East China Normal University, Shanghai, China. He is currently a Scientia Professor with the School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW, Australia. Dr. Lin is currently an Editor-in-Chief of the IEEE T RANSACTIONS ON K NOWLEDGE AND D ATA E NGINEERING.
Junbin Gao was a Professor in computing from 2010 to 2016 and an Associate Professor from 2005 to 2010 with Charles Sturt University, Bathurst, NSW, Australia. He is currently a Full Professor of big data analytics with The University of Sydney, Sydney, NSW, Australia. His current research interests include machine learning, pattern recognition, and image analysis. He has extensively published the research results on the competitive venues, such as the IEEE T RANSACTIONS ON PATTERN A NALYSIS AND M ACHINE I NTELLIGENCE, the IEEE T RANS ACTIONS ON N EURAL N ETWORKS AND L EARNING S YSTEMS , the IEEE T RANSACTIONS ON I MAGE P ROCESSING, the IEEE T RANSACTIONS ON C YBERNETICS , the IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY , the Conference on Computer Vision and Pattern Recognition, the Association for the Advancement of Artificial Intelligence, the IEEE International Conference on Data Mining, the SIAM Journal on Discrete Mathematics, the Neural Computation, the Machine Learning, the Neural Networks, and the Pattern Recognition.