Partial Sum Minimization of Singular Values Representation on ... - arXiv

3 downloads 6830 Views 4MB Size Report
Jan 31, 2016 - extended LRR model for manifold-valued Grassmann data which incorporates prior ..... To recover a low rank matrix A from corrupted data.
IEEE TRANSACTIONS ON XX, VOL. XX, NO. X, JANUARY 2016

1

Partial Sum Minimization of Singular Values Representation on Grassmann Manifolds

arXiv:1601.05613v2 [cs.CV] 31 Jan 2016

Boyue Wang, Yongli Hu Member, IEEE, Junbin Gao, Yanfeng Sun Member, IEEE, and Baocai Yin Member, IEEE Abstract—As a significant subspace clustering method, low rank representation (LRR) has attracted great attention in recent years. To further improve the performance of LRR and extend its applications, there are several issues to be resolved. The nuclear norm in LRR does not sufficiently use the prior knowledge of the rank which is known in many practical problems. The LRR is designed for vectorial data from linear spaces, thus not suitable for high dimensional data with intrinsic non-linear manifold structure. This paper proposes an extended LRR model for manifold-valued Grassmann data which incorporates prior knowledge by minimizing partial sum of singular values instead of the nuclear norm, namely Partial Sum minimization of Singular Values Representation (GPSSVR). The new model not only enforces the global structure of data in low rank, but also retains important information by minimizing only smaller singular values. To further maintain the local structures among Grassmann points, we also integrate the Laplacian penalty with GPSSVR. An effective algorithm is proposed to solve the optimization problem based on the GPSSVR model. The proposed model and algorithms are assessed on some widely used human action video datasets and a real scenery dataset. The experimental results show that the proposed methods obviously outperform other state-of-the-art methods. Index Terms—Low Rank Representation, Nuclear norm, Subspace Clustering, Grassmann Manifold, Laplacian Matrix

F

1

I NTRODUCTION

Clustering is a fundamental problem in computer vision and machine learning and a large number of methods have been proposed to solve this problem, such as the conventional iterative methods [1], [2], the statistical methods [3], [4], the factorization-based algebraic approaches [5]–[7], and the spectral clustering methods [8]–[15]. Among all the clustering methods, the Spectral Clustering (SC) algorithm is state-of-theart with good performance in many applications [9], [16] by exploring affinity information of data. In this framework, final clustering is obtained by applying a spectral method such as the Normalized Cuts (NCut) [17] on the affinity matrix learned from data. As a result, how to construct an effective affinity matrix becomes the key problem. In this paradigm, two representative methods are Sparse Subspace Clustering (SSC) [16] and Low Rank Representation (LRR) [8], which are both based on the data self-expressive property. SSC uses the sparsest self representation of data produced by l1 norm to construct the affinity matrix, while LRR relies on the Rank Minimization regularization, inspired by Robust • Boyue Wang, Yongli Hu, and Yanfeng Sun are with Beijing Municipal Key Lab of Multimedia and Intelligent Software Technology, College of Metropolitan Transportation, Beijing University of Technology, Beijing 100124, China. E-mail: [email protected], {huyongli,yfsun}@bjut.edu.cn • Junbin Gao is with the Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, Camperdown, NSW 2006, Australia. E-mail: [email protected] • Baocai Yin is with the School of Software Technology at Dalian University of Technology, Dalian 116620, China; and with Beijing Municipal Key Lab of Multimedia and Intelligent Software Technology at Beijing University of Technology, Beijing 100124, China. E-mail: [email protected]

Principal Component Analysis (RPCA) [18]. Different from SSC, which only independently focuses on the sparsest representation for each datum and ignores the relations among object data, LRR explore the matrix rank to capture the underlying global structure hidden in data set. It has been proven that, when a data set is actually from a union of several low-dimension subspaces, LRR can reveal the this structure to facilitate subspace clustering [8]. In many clustering scenarios, LRR has obtained successful applications, such as face recognition [19], visual tracking [20] and saliency detection [15]. Although LRR shows good clustering performance, it also suffers some shortcomings. The core idea in the original LRR is based on the Rank Minimization principle which results in a non-convex problem. To provide a practical implementation for LRR, one employs the nuclear norm as a surrogate of the Rank Minimization regularization, resulting in a convex optimization problem. Cand´es et al. [21] prove that under certain incoherence assumption on the singular values of the data matrix, solving the convex nuclear norm regularized problem leads to a approximation low-rank solution. However, the indirectly minimizing the rank of the coefficient matrix by nuclear norm is not a perfect approximation way. The main argument is that nuclear norm minimizes the sum of all the singular values of the affinity matrix and treats all the singular values equally to decrease the rank of the matrix. This strategy ignores the fact that different singular values of the matrix generally correspond to different importance. Additionally, in many applications, the rank of the matrix is known, for example, 3 in photometric stereo application and 1 in background subtraction, but in the current minimization

IEEE TRANSACTIONS ON XX, VOL. XX, NO. X, JANUARY 2016

2

LRR by the PSSV norm to construct a new clustering model, called Partial Sum Minimization of Singular Values Representation (PSSVR) model. Compared with the LRR model, PSSVR is not only able to capture global structures of the data, but also takes into account the prior knowledge of the practical applications. Coefficients of We also note that LRR or other clustering methods PSSVR such as the mentioned SSC is designed for vectorial data which are generated from linear spaces and that data similarities are measured by Euclidean distance. This has limited the application of LRR in handling with very high dimensional data, due to high computational cost, Symmetric such as large scale image sets and video data. AddiMatrix tionally, it has been proven that such high dimension data are always embedded in nonlinear low dimension manifold [25] and are difficult to be represented by the current LRR method. In fact, with the widely using of Grassmann cheaper cameras in many domains such as human action Manifold recognition, safety production detection and traffic jam detection, there are huge amount of video data need to be processed efficiently. However, it is impossible to deal with so many videos in real world, even if few giving labels, thus the unsupervised video clustering algorithms have attracted increasing interests in recent years [26]– [28] and it is urgently desired to achieve good clustering performance for the real world videos. Particularly, it is necessary to explore a proper representation of high Fig. 1. (1) Image sets are represented as Grassmann dimensional data and incorporate their low dimensional points. (2) All Grassmann points are mapped into the manifold characteristics. In order to explore the nonlinear manifold structure symmetric matrices. (3) PSSVR model is formulated in existed in the high dimension data and obtain its proper symmetric matrix space and we constrain the coefficient representation, many manifold learning methods are matrix maintaining the inner structure of origin data. (4) proposed, such as Locally Linear Embedding (LLE) [29], Clustering by NCuts. ISOMAP [30], Locally Linear Projection (LLP) [31], Local Tangent Space Alignment (LTSA) [32]. However, these methods are mainly for vectorial data usually with of nuclear norm, this prior information does not been higher computational complexity and are not suitable well utilized. to process videos from wild practical sensors. As each To address the issues associated with the nuclear of those videos may contain different number of frames norm, researchers propose some non-convex penalty even without correct temporal relation, simply vectorizfunctions which are better approximation to the rank ing them may produce vectors in different dimensions. minimization and easier to optimize. Gu et al. [22] pro- Considering these obstacles of video vector representapose the weighted nuclear norm minimization method tion, Grassmann manifold becomes a competitive tool. (WNNM). Jeong and Lee [23] use the Schatten p-norm Grassmann manifold is widely used to represent videos of the singular values to fit the rank minimization of in recent researches [28], [33], [34], not relying on the the a matrix. Most interestingly, Oh et al. [24] propose number of frames of videos. Another key reason why minimizing only the smallest N −r singular values while Grassmann manifold is popular for video representation keeping the largest r singular values unconstrained, is that Grassmann manifold can be easily embedded into where N is the number of singular values of the matrix a linear space — Symmetric matrix space. Therefore, all and r is the targeted rank of the matrix which could be abstract Grassmann points are embedded into the symusually estimated by using prior knowledge. Actually, metric matrix space and the clustering methods can be the larger the singular value is, the more energy the applied in the embedded space for Grassmann manifold. corresponding singular vector contains. So concentrat- Utilizing these advantages of Grassmann manifold, we ing energy into several larger singular values benefits represent the high dimensional videos or image sets as for clustering or classification via reducing the rank Grassmann manifold points for clustering. of the affinity matrix. Inspired by the the Partial Sum Combining Grassmann manifold with the above minimization of Singular Values (PSSV) method [24] PSSVR, it leads to a new clustering method, namely for matrix completion, we replace the nuclear norm in Grassmann manifold PSSVR model (GPSSVR). The Label set

IEEE TRANSACTIONS ON XX, VOL. XX, NO. X, JANUARY 2016

whole clustering procedure is illustrated in Fig. 1. The videos or image sets are firstly represented as Grassmann manifold points. All Grassmann points are then embedded into Symmetric matrix space as mentioned above, thus we naturally extend the PSSVR model onto the Grassmann manifold. GPSSVR explores the intrinsic relation hidden in non-linear high dimension data and implements clustering on manifold. GPSSVR mainly reveals the global structure underlying the data while the local structure information of the data is not well considered. To address this limitation, we further introduce a local structure constraint based on Laplacian matrix into our model to model the local feature of the data and get a Laplacian GPSSVR, named as LapGPSSVR. The contributions of this paper are summarized as follows: • Reforming the nuclear norm term in the classical LRR model to the partial sum minimization of singular values and constructing a novel selfrepresentation based PSSV model, so-called PSSVR; • Extending the PSSVR model onto the Grassmann Manifold based on our previous works in [27] [35]; and giving a practical solution to the proposed GPSSVR model; and • Introducing a Laplacian matrix based constraint into the GPSSVR model to represent the local geometry of the data. The rest of the paper is organized as follows. In Section 2, we review the geometric properties of Grassmann manifold and some basic knowledge of LRR and PSSV. In Section 3, we propose the PSSVR on Grassmann manifold and detail the solution to it. In Section 4, a Laplacian constraint is introduced into our proposed model to maintain the local structure of data. In Section 5, the performance of the proposed methods are evaluated on several public datasets. In the last section, we give the conclusion and elaborate the future work.

2

BACKGROUND T HEORY

We review some concepts about Grassmann Manifold, Low Rank Representation and Partial Sum Minimization of Singular Values, which lead us the grounding for our proposed method. 2.1

Grassmann Manifold

Grassmann manifold G(p, d) [36] consists of the set of all linear p-dimensional subspaces of Rd (0 < p < d), which stands on a space of p × d matrices with d orthogonal columns: G(p, d) = {X ∈ Rd×p : X T X = Ip } There are two methods to measure the distance in Grassmann manifold. One is to map the Grassmann points into tangent spaces where there exist measures [37] [38]. Another method is to embed the Grassmann manifold into the symmetric matrix spaces while the Euclidean

3

distance is available. The latter one is easier and effective in practice, and the mapping relation can be represented as [34], Π : G(p, d) → Sym(d),

Π(X) = XX T .

(1)

The embedding Π(X) is diffeomorphism [39]. In this paper, we adopt the second strategy on Grassmann manifold to define the following distance inherited from the symmetric matrix space under this mapping, 1 kΠ(X) − Π(Y )k2F . (2) 2 One point on Grassmann manifold is actually an equivalent class of all the orthogonal high matrices in Rd×p , any one of which can be converted to the other by a p × p orthogonal matrix. Thus Grassmann manifold is naturally regarded as a good representation for video clips, thus used to tackle the problem of videos matching. d2g (X, Y ) =

2.2

Low Rank Representation

Given a set of data drawn from an unknown union of subspaces X = [x1 , x2 , ..., xm ] ∈ Rd×m where d is the data dimension, the objective of subspace clustering is to assign each data sample to its underlying subspace. The basic assumption is that the data in X are drawn from the union of k subspaces {Si }ki=1 of dimensions {di }ki=1 . Under the data self representation principle, each data point in the dataset can be written as a linear combination of other data points, i.e., X = XZ, where Z ∈ Rm×m is a matrix of similarity coefficients. The LRR model is formulated as [12] min kEk2F + λkZk∗ , s.t. X = XZ + E, Z,E

(3)

where E is the error resulting from the self representation. F -norm can be changed to other norms e.g. `1,2 as done in the original LRR model. When the data set does not contain many outliers, the final clustering accuracies have little differences between using F -norm and `1,2 −norm. What is more, problem (3) has a closedform solution which is faster many times than the one with the `1,2 -norm. LRR takes a holistic view in favor of a coefficient matrix in the lowest rank, measured by the nuclear norm k·k∗ , which uses the sum of singular values of the matrix to approximate to the Rank Minimization regularization. 2.3 Partial Sum Minimization of Singular Values in RPCA (PSSV) To recover a low rank matrix A from corrupted data X, RPCA [40] minimizes the rank of matrix A by formulating the following problem, min kAk∗ + λkEk0 s.t. X = A + E, A,E

where E ∈ Rd×m is the noise which is assumed to be sparse in RPCA model. If the data is corrupted

IEEE TRANSACTIONS ON XX, VOL. XX, NO. X, JANUARY 2016

4

by Gaussian noise, the l1 norm can be replaced by Frobenius norm. However, in many practical problems the rank can be estimated, the nuclear norm limits model performance due to over-relaxing the rank minimization constraints and ignoring the individual importance of each singular value of matrix A. Based on the prior knowledge about the rank of matrix A, Oh et al. [24] propose a new model to minimize partial sum of singular values of matrix A while maintain rest singular values unconstrained, as defined by the following problem, min kAk>r + λkEk1 , s.t. X = A + E,

(4)

A,E

where kAk>r =

min(d,m) P

σi (A) and σi (A) represents the

i=r+1

i-th singular value of the matrix A. Notation r is the expected rank of the matrix A which may be derived from the prior knowledge of defined problem.

3 PARTIAL S UM M INIMIZATION OF S INGULAR VALUES R EPRESENTATION ON G RASSMANN M ANIFOLD (GPSSVR) In this section, we will propose an improved Rank Minimization approximation-based subspace clustering method, namely Partial Sum Minimization of Singular Values Representation, and extend it onto Grassmann manifold. An effective solution to the proposed model on Grassmann manifold is explored. 3.1

model to non-flat Grassmann manifolds results in the following formulation:

m m X ]

Xi (

min kZk>r + λ (7) Z X ) ij j ,

Z

Z

min kZk>r + λkEk2F

(5)

Here we use k · k2F instead of k · k1 in (4) to measure the reconstruct error E. By eliminating variable E, we can rewrite an equivalent problem as follows min kZk>r + Z

m X

kxi −

i=1

where the measure kxi −

m X

E,Z

j=1

s.t.

X = X ×3 Z + E.

(8)

where the reconstruct error E is also a 3-order tensor and the coefficient matrix Z ∈ Rm×m . This formula is named as GPSSVR. 3.2

Algorithm for PSSVR on Grassmann Manifold

To solve the GPSSVR problem in (8), we firstly simplify the representation of the tensor of reconstruct error E to avoid the complex calculation between 3-order tensor and a matrix in (8). We consider the i-th front slice Ei of the tensor E, i.e., m X Ei = Xi XiT − Zij (Xj XjT ). j=1

Zij xj k2F ,

(6)

j=1 m P

G

T where X = {X1 X1T , X2 X2T , ..., Xm Xm } ⊂ Sym(d) is a 3-order tensor which stacks all mapped symmetric matrices along the 3rd mode [41]. Up to now, we can construct the PSSVR model on Grassmann Manifold as follows,

d×m

For a set of samples X = [x1 , x2 , ..., xm ] ∈ R , we adopt the self-representation method same as LRR to represent the data but replace the nuclear norm of LRR with the partial sum of singular values of coefficient matrix, so we construct a Partial Sum Minimization of Singular Values Representation method (PSSVR) as follows:

j=1

j=1

PSSVR on Grassmann

min kZk>r + kEk2F s.t. X = XZ + E.

i=1

Um

where Xi ( j=1 Zij Xj ) with the operator repG resents the manifold distance between Xi and its reconUN struction j=1 Zij Xj which denotes the combination operation on the manifold. So to establish the PSSVR model on Grassmann Manifold, one should define a proper distance and a combination operations on the manifold. From the geometric property of Grassmann manifold, we can use the metric of Grassmann manifold in (2) to replace the manifold distance in (7), i.e.

Um Um

Xi ( j=1 Zij Xj ) = dg (Xi , j=1 Zij Xj ). AdG ditionally, from the mapping in (1), the mapped points are positive definite matrices in Sym(d), so they have the natural linear combination operation like that in Euclidean space. Thus we can replace the Grassmann points with its mapped points to implement the combination in (7), i.e. N ] Zij Xj = X ×3 Z,

Zij xj k2F is the Euclidean

distance between the point xi and its linear combination of all the other data points including xi and Z = [Zij ]. Now, let us consider the generalization of problem (6) on Grassmann manifold. Given a set of Grassmann points X = {X1 , X2 , ..., Xm } where Xi ∈ G(p, d) and m is the number of samples. Intuitively translating the PSSVR

Denote ∆ij = tr[(XjT Xi )(XiT Xj )], where ∆ij = ∆ji . Hence we can define an N × N symmetric matrix ∆ = (∆ij )N i=1,j=1 . Now it is straightforward to represent the reconstruct error kEk2F as kEk2F = tr(∆) − 2tr(Z∆) + tr(Z∆Z T ).

(9)

IEEE TRANSACTIONS ON XX, VOL. XX, NO. X, JANUARY 2016

5

Therefore, substituting (9) into the objective function in (8) results in an equivalent and solvable optimization model, T

min −2λtr(Z∆) + λtr(Z∆Z ) + kZk>r . Z

(10)

To tackle this problem, we use the alternating direction method (ADM) [42], [43] which is widely used to solve unconstrained convex problems [8], [44]. Firstly, we introduce an auxiliary variable J = Z ∈ Rm×m to separate the terms of variable Z and reformulate the optimization problem as follows,

the following problem accordingly: J k+1 = arg min f (Z k , J, Y k , µk ) J

µ kZ − Jk2F (13) 2 µ Y = arg min(||J||>r + ||J − (Z + )||2F ) J 2 µ For the above problem (13), a closed-form solution is suggested in [24] as the following theorem. = arg min kJk>r + hY, Z − Ji + J

Theorem 1. Given that U DV T = SVD(Z + above, the solution to (13) is given by

Y µ)

as defined

min −2λtr(Z∆) + λtr(Z∆Z T ) + kJk>r s.t. J = Z.

J ∗ = U (Dr + Sµ−1 Dr0 )V T ,

(11) Thus the Augmented Lagrangian Multiplier (ALM) method can be applied to absorb the linear constrain into the objective function as follows,

where Dr and Dr0 are diagonal matrices. diag(Dr ) is the r largest singular values and diag(Dr0 ) collects all the rest singular values from SVD. The singular value thresholding operator is defined as Sτ [x] = sign(x) · max(|x| − τ, 0)

f (Z, J, Y, µ) = −2λtr(Z∆) + λtr(Z∆Z T ) + kJk>r (12) µ + hY, Z − Ji + kZ − Jk2F 2 where matrix Y indicates a Lagrangian Multiplier and µ is a weight to tune the error term kZ − Jk2F . The ALM formulation (12) can be naturally solved by alternatively solving for Z, J and Y , respectively in an iterative procedure. The pseudo code of our proposed method is summarized in Algorithm 1. Now, we will analyze how to update these variables in each iteration.

Proof: Please refer to the proof of Lemma 1 in [24].

Z,J

Algorithm 1 Solving the problem (12) by ADM. Input: The Grassmann sample set {Xi }m i=1 ,Xi ∈ G(p, d), the cluster number k, the expected rank r, and the balancing parameters λ. Output: The GPSSVR representation Z 1: Initialize:J = Z = 0, Y = 0, µ = 10−6 , µmax = 1010 and ε = 10−8 2: for i=1:m do 3: for j=1:m do 4: ∆ij ← tr[(XjT Xi )(XiT Xj )]; 5: end for 6: end for 7: while not converged do 8: fix Z and update J by J ← min(||J||>r + < Y, Z − J > + µ2 ||Z − J||2F ); J 9: fix J and update Z by Z = (2λ∆ + µJ − Y )(2λ∆ + µI)−1 ; 10: update the multipliers: Y ← Y + µ(Z − J) 11: update the parameter µ by µ ← min(ρµ, µmax ) 12: check the convergence condition: kZ − Jk∞ < ε 13: end while

3.2.1 Updating J To update J k+1 at the (k + 1)th iteration, we fix Z and Y to their k-th iteration values, respectively, and solve

3.2.2 Updating Z To update Z k+1 at the (k + 1)th iteration, we derive the ALM formulation (12) with fixed J and Y and obtain the following form: Z k+1 = arg min f (Z, J k , Y k , uk ) Z

= arg min −2λtr(Z∆) + λtr(Z∆Z T ) + hY, Z − Ji Z µ + kZ − Jk2F . 2 (14) This is a quadratic optimization problem about Z. The closed-form solution is given by Z = (2λ∆ + µJ − Y )(2λ∆ + µI)−1 .

(15)

3.2.3 Updating Y Matrix Y denotes the Lagrangian Multiplier. Once solving the former two subproblems about J and Z respectively in each iteration, we could update easily Y by the following rule: Y k+1 = Y k + µk (Z k − J k ).

(16)

3.2.4 Adapting Penalty Parameter µ For the penalty parameter µ > 0, we could update it by: µk+1 = min(ρµk , µmax ), where µmax is the upper bound of µk . 3.2.5 Termination and Clustering After obtaining the GPSSVR representation Z ∗ by Algorithm 1, an affinity matrix can be constructed W = T

|Z ∗ |+|Z ∗ | . 2

Then, this affinity matrix can be used in a spectral clustering algorithm to get the final clustering. As a widely used spectral clustering algorithm in subspace segmentation problems, NCut is chosen in this paper. The whole clustering procedure of the proposed method is summarized in Algorithm 2.

IEEE TRANSACTIONS ON XX, VOL. XX, NO. X, JANUARY 2016

6

Algorithm 2 Clustering algorithm by the PSSVR on Grassmann Manifold. Input: The videos for clustering X . Output: The clustering results of X . 1: Representing X as a set of Grassmann points. 2: Mapping Grassmann points into symmetric space as (1). 3: Obtaining the GPSSVR representation Z ∗ of X by Algorithm 1. ∗

4: 5:

∗T

| . Computing the affinity matrix W = |Z |+|Z 2 Implementing NCut(W) to get the final clustering result of X .

wij ’s. In this paper, we simply use the explicit neighborhood determined by its manifold distance measure to define all the wij . Let C be a parameter of neighborhood size, and we define ( dg (Xi , Xj ), Xi ∈ NC (Xj ), wij = 0, Xi ∈ / NC (Xj ), where NC (Xj ) denotes the C nearest elements of Xj on Grassmann manifold. By introducing the Laplacian matrix L, problem (17) can be easily re-written as its Laplacian form, min kZk>r + λkEk2F + 2βtr(ZLZ T ) s.t. X = X×3 Z + E E,Z

3.3

Computational Complexity

The computational complexity of Algorithm 1 could be divided into two parts: the data representation (steps 2 - 6) and the solution to the problem (steps 7 - 13). In the data representation part, ∆ is calculated by using the trace operation, therefore, for the m samples, the computational complexity of calculating ∆ should be O(m2 ). In the second part of the algorithm, the major computational complexity is the SVD decomposition of an m × m matrix for updating J in step 8, costing O(m3 ) computational time. However there is no need to calculate all the singular values due to the thresholding operation, instead calculating up to for example 4r first singular values by using the partial SVD like [12]. Thus, for the s iterations, the total cost of calculating the solution to the algorithm is O(srm2 ). The overall computational complexity is O(m2 ) + O(srm2 ).

(18) where the Laplacian matrix L ∈ Rm×m is defined as L = m D−W P where W = [wij ]i=1,j=1 and D = diag(dii ) with dii = wij . j

4.2 Algorithm for Laplacian PLRR on Grassmann Manifold Similar to deriving algorithm for GPSSVR, problem (18) can be easily converted to the following model: min −2λtr(Z∆) + λtr(Z∆Z T ) + 2βtr(ZLZ T ) + kZk>r . Z (19) Thus the alternating direction method (ADM) [42] can also be employed to solve this problem. Letting J = Z to separate the variable Z from the terms in the objective function, we can formulate the following problem for (19), min −2λtr(Z∆) + λtr(Z∆Z T ) + 2βtr(ZLZ T ) + kJk>r ,

4

L APLACIAN PSSVR ON G RASSMANN M AN IFOLD (L AP GPSSVR) 4.1

Laplacian PSSVR on Grassmann Manifold

For the self-representation based methods, the column of Z, denoted by zi , can be regarded as a new representation of data xi , and Zij represents the similarity between data xi and xj , accordingly. In the proposed method GPSSVR (8), the global structure is enforced by the global constraint of rank minimization of the matrix Z. To incorporate more local similarity information into Z in our model, we consider imposing the local geometrical structures. For this purpose, Laplacian matrix regularization is naturally regarded as a proper choice because it can maintain similarity between data. Thus a Laplacian Partial Sum Minimization of Singular Values Representation on Grassmann Manifold model, termed LapGRSSVR, can be formulated as X kzi − zj k22 wij min kZk>r + λkEk2F + β Z,E i,j (17) s.t. X = X×3 Z + E where wij denotes the local similarity between Grassmann points Xi and Xj . There are many ways to define

Z,J

s.t. J = Z.

(20)

So, its Augmented Lagrangian Multiplier formulation can be defined as the following unconstrained optimization, f 0 (Z, J, Y, µ) = − 2λtr(Z∆) + λtr(Z∆Z T ) + 2βtr(ZLZ T ) µ + kJk>r + hY, Z − Ji + kZ − Jk2F . 2 (21) This problem can be solved by solving the two subproblems (22) and (23) below, J k+1 = arg min f 0 (Z k , J, Y k , µk ) J

µ kZ − Jk2F J 2 µ Y = arg min(||J||>r + ||J − (Z + )||2F ) J 2 µ

= arg min kJk>r + hY, Z − Ji +

(22)

and Z k+1 = arg min f 0 (Z, J k , Y k , uk ) Z

= arg min −2λtr(Z∆) + λtr(Z∆Z T ) + 2βtr(ZLZ T ) Z µ + hY, Z − Ji + kZ − Jk2F . (23) 2

IEEE TRANSACTIONS ON XX, VOL. XX, NO. X, JANUARY 2016

7

Both (22) and (23) can be solved similar to (13) and (14), respectively. For example, the solution to (23) is given by Z = (2λ∆ + µJ − Y )(2λ∆ + 2βL + µI)−1 .

5

(24)

E XPERIMENTS

In this section, to test the effectiveness of our proposed methods, we conduct several unsupervised clustering experiments on different video datasets. The four video datasets used in our experiments are listed below: • SKIG action clips: http://lshao.staff.shef.ac.uk/ data/SheffieldKinectGesture.htm; • Ballet video clips: https://www.cs.sfu.ca/ research/groups/VML/semilatent/; • UCF sports clips: http://crcv.ucf.edu/data/UCF Sports Action.php; • Highway Traffic clips: http://www.svcl.ucsd.edu/ projects/traffic/. To demonstrate the performance of GPSSVR and LapGPSSVR methods, we compare them with several state-of-the-art clustering methods. Since our methods are related to LRR and manifold models, we mainly select LRR based methods or manifold based methods as baselines, which are listed below: • Sparse Subspace Clustering (SSC) [16]: The SSC model aims to find the sparsest representation for each datum using l1 regularization. • Low Rank Representation (LRR) [8]: The LRR model represents the holistic correlation among the data by the nuclear norm regularization. • Low Rank Representation on Grassmann Manifold (GLRR-F) [35]: The GLRR-F model embeds the image sets onto Grassmann manifold and extends the LRR model to Grassmann manifold space. • Statistical computations on Grassmann and Stiefel manifolds (SCGSM) [28]: The SCGSM model explores statistical modeling methods that are derived from the Riemannian geometry of the manifold. • Sparse Manifold Clustering and Embedding (SMCE) [45]: The SMCE model utilizes the local manifold structure to find a small neighborhood around each data point and connects each point to its neighbors with appropriate weights. • Latent Space Sparse Subspace Clustering (LS3C) [46]: The LS3C model describes a method that learns the projection of data and finds the sparse coefficients in the low-dimensional latent space. In all the experiments, the input raw data are image sets derived from video clips. To represent them as Grassmann points, for a video clip with M frames, denoted by {Yi }M i=1 , where Yi is the i-th gray frame with dimension a × b, we construct a matrix Y = [vec(Y1 ), vec(Y2 ), ..., vec(YM )] of size (a × b) × M . Thus, a Grassmann point can be generated by any orthogonalization procedure of Y. For convenience, we select SVD decomposition in our experiments i.e. Y = U ΣV T .

Then we pick up the first p singular-vectors of U as the representation of a Grassmann point X ∈ G(p, a × b). To execute all the comparative methods, we should formulate proper data representation for these methods as different methods demand different type of data inputs for clustering. For the baseline subspace clustering methods, LRR and SSC, they take vectors as input. They cannot be applied directly on data in form of Grassmann points. So we have to vectorize each video clip as vectorial inputs. However, the direct vectorization results in very high dimensional vectors which are hard to be handled on a normal PC. Thus we apply PCA to reduce these vectors to a low dimension which equals to the number of PCA components retaining 95% of its variance energy. The PCA projected vectors are be taken as inputs for both SSC and LRR. For the manifold related methods, SCGSM directly implements clustering as it requires Grassmann point X ∈ G(p, a × b) as input. Since GPSSVR, LapGPSSVR, GLRRF methods all embed Grassmann points into symmetric matrix space, we construct the corresponding symmetric matrix XX T ∈ R(a×b)×(a×b) as their inputs. Although SMCE and LS3C belong to manifold learning methods, they demand vectors as inputs. However, vectorizing the Grassmann point X ∈ G(p, a × b) will destroy the geometry of data, hence we vectorize the correspondent symmetric matrix XX T as their inputs. That is SMEC and LS3C take VEC(XX T ) as inputs for clustering. To obtain good experimental performances, the model parameters λ, β, r and C should be assigned properly. First of all, we should give a exact estimation of the representation rank r. For a Grassmann point X ∈ G(p, d), the rank of the Grassmann point X, even the mapped symmetric matrix refer to (1), are equal to p. Considering that a mapped symmetric matrix is linearly represented by other mapped symmetric matrices, so we may expect that the expected rank r of the coefficient matrix Z is relatively not larger than p. According to the analysis, in the experiment, we tune the expected rank r, around, e.g., 4 to acquire the best experiment result. λ and β are the most important penalty parameters for balancing the reconstructed error term, low-rank term and Laplacian regularization term. Empirically, the best value of λ depends on the application problems and has a large range for different applications from 0.01 to 20. While the value of β is usually very small, ranging from 1.0 × 10−4 to 1.0 × 10−2 , because the value of Laplacian regularization term is usually many times larger than two other terms. As for C, which defines the neighborhood size of each Grassmann point, large experimental results suggest that a size slightly larger than the average value of the numbers of data in different clusters is a good choice. We tune the model for C around this initial value for different applications.

IEEE TRANSACTIONS ON XX, VOL. XX, NO. X, JANUARY 2016

8

Methods,Datasets GPSSVR LapGPSSVR GLRR-F LRR SSC SCGSM SMCE LS3C

Fig. 2. Some samples from the Ballet Action dataset. Each row denotes a kind of action.

Ballet 0.6059 0.6255 0.5905 0.2819 0.2903 0.5877 0.5105 0.4222

TABLE 1 Subspace clustering results on the Ballet dataset.

to p = 6 and the Grassmann point can be represented as Xi ∈ G(6, 900). For the setting of rank r, we test a number of different values from 1 to 6 and find the best expected rankr is 3 in this experiment. The neighborhood size C number of correctly classified points Accuracy = ×100%. is tuned to 90 according to the experimental results. For total number of points LRR and SSC methods, the subspace of vectors in size All the algorithms are coded in Matlab 2014a and of 30 × 30 × 12 = 10800 are reduced to 135 by PCA. implemented on an Intel Core i7-4770K 3.5GHz CPU Table 1 presents the experimental results of all machine with 32G RAM. the algorithms on the Ballet dataset. Although this dataset contains no complex background or illumination changes and can be regarded as a clean human action 5.1 Clustering on Human Action data in ideal condition, the accuracy in Table 1 are not GPSSVR model maintains the first r singular values very high as in the case of eight clusters. The reason is unconstrained to preserve the most dominant informa- that some kind of actions are too similar to clustering, tion as much as possible. At the same time, it minimizes e.g., ’left-to-right hand opening’ and ’right-to-left hand the sum of the rest of singular values to seek a low opening’. Compared with GLRR-F method which is rank global structure. Thus the proposed methods are based on the nuclear norm regularization, the proposed generally more suitable for video clustering. methods give a higher clustering accuracy. This demonIn the first experiment on human actions, we se- strates the benefits of minimizing partial sum of smaller lect two challenge action video datasets, Ballet dataset singular values and leaving the r largest singular values and UCF sport dataset, to test the performance of the unconstrained to preserve as much discrimination inforproposed method. With simple backgrounds, the Ballet mation as possible. Of course, our proposed methods are dataset is an appropriate benchmark choice to verify the also obviously superior to other classical methods. capacity of the proposed method for action recognition in a rather ideal condition; while the UCF sport dataset 5.1.2 UCF Sports action Dataset containing more variations on scenario and viewpoint This dataset [48] consists of a set of actions collected can be used to examine the robustness of the proposed from various sport matches which are typically featured method in noised scenarios. on broadcast television channels. The dataset includes a total of 150 sequences. The collection has a natural pool 5.1.1 Ballet Action Dataset of actions with a wide range of scenes and viewpoints, This dataset [47] contains 44 video clips, collected from so that it is difficult for clustering. There are 10 actions in an instructional ballet DVD. The dataset consists of 8 this dataset: Diving, Golf Swing, Kicking, Lifting, Riding complex action patterns performed by 3 subjects. The Horse, Running, Skate Boarding, Swing-Bench, Swingeight actions include: ‘left-to-right hand opening’, ‘right- Side, and Walking. Each sequence has 22 to 144 frames. to-left hand opening’, ‘standing hand opening’, ‘leg We convert these video clips into gray images and each swinging’, ‘jumping’, ‘turning’, ‘hopping’ and ‘standing image is resized into 30 × 30. still’. The dataset is challenging due to the significant We regard each video clip as an image set, so the intra-class variations in terms of speed, spatial and number of frames M of each video clip is various for temporal scale, cloth texture and movement. The frame different video clips. There are totally 150 image sets images are normalized and centered in a fixed size of and 10 clusters in this experiment. We select p = 10 as 30 × 30. Some frame samples of Ballet dataset are shown the dimension of subspace for each Grassmann point. in Fig. 2. Therefore, a Grassmann point can be represented as We split each clip into subgroups of M = 12 images Xi ∈ G(10, 900). The expected rank r is set to 4 and and each subgroup is treated as an individual image set. the neighbor size C is 12. The PCA algorithm requires As a result, we construct a total of 713 image sets which the image sets with the same number of samples, but are labeled as 8 clusters. The dimension of subspace is set the RGB-D sequences contains various frames from 63 In our experiments, the performance of different algorithms is measured by the following clustering accuracy

IEEE TRANSACTIONS ON XX, VOL. XX, NO. X, JANUARY 2016

Fig. 3. Some samples from the UCF sports dataset and each row represents a kind of action. Methods,Datasets GPSSVR LapGPSSVR GLRR-F SCGSM SMCE LS3C

SKIG 0.6800 0.6800 0.6533 0.5333 0.6200 0.4667

9

Fig. 4. Some samples from the SKIG Action dataset and each row shows a kind of gesture. Methods,Datasets GPSSVR LapGPSSVR GLRR-F SCGSM SMCE LS3C

TABLE 2 Subspace clustering results on the UCF sport dataset.

to 605. Throwing away too many frames for longer sequences in the PCA algorithm is unfair for LRR and SSC methods, so we have to give up comparing with LRR and SSC in this experiment. The experimental results are reported in Table 2. Although this dataset has complex backgrounds, viewpoints change and scales variation and is regarded as a challenge dataset, the accuracy result seems be higher than the Ballet dataset. We conduct that the bigger movement in sport actions helps to distinguish action clusters, resulting in higher accuracy. 5.2

Clustering on Gesture Action

The SKIG dataset [49] contains 1080 RGB-D sequences captured by a Kinect sensor. In this dataset, there are ten kinds of gestures of six persons: ‘circle’, ‘triangle’, ‘updown’, ‘right-left’, ‘wave’, ‘Z’, ‘cross’, ‘comehere’, ‘turnaround’, and ‘pat’. All the gestures are performed by fist, finger and elbow respectively under three backgrounds (wooden board, white plain paper and paper with characters) and two illuminations (strong light and poor light). Each RGB-D sequence contains a set of frames (63 to 605). Here the images are normalized to 24 × 32 with mean zero and unit variance. Fig. 4 shows some samples of RGB images. In our experiments, we only use the RGB sequences in SKIG dataset. Similar to the previous experiments, each video clip is considered as an image set, thus a total of 540 image sets are labeled as 10 clusters. We preserve p = 10 as the dimension of subspace, so the Grassmann point is represented as Xi ∈ G(10, 768). In this experiment, we empirically set the expected rank and the neighbor size to r = 1 and C = 65, respectively. We did not conduct experiments for LRR and SSC due to the similar reason mentioned in the last experiment. Table 3 presents all the experimental results on SKIG dataset. Compared with Human action datasets, the

SKIG 0.55 0.5981 0.5056 0.3704 0.4611 0.4148

TABLE 3 Subspace clustering results on the SKIG dataset.

movement in this gesture dataset is smaller, and the illumination and background are more variate, therefore clustering task on this dataset is more challenging. Our proposed methods, especially LapGPSSVR method, have improved clustering accuracy over all other methods. Except for the discrimination information coming from the first r largest singular values, the Laplacian regularization also provides meaningful information for clustering. 5.3

Clustering on Natural Scene

In this experiment, we wish to inspect the proposed methods on practical applications in more complex conditions, such as Traffic Dataset. The traffic dataset [50] used in this experiment contains 253 video sequences of highway traffic captured under various weather conditions, such as sunny, cloudy and rainy. These sequences are labeled with three traffic levels: light, medium and heavy. There are 44 clips at heavy level, 45 clips at medium level and 164 clips at light level. Each video sequence has 42 to 52 frames. The video sequences are converted to gray images and each image is normalized to size 24 × 24 with mean zero and unit variance. Some samples of the Highway traffic dataset are shown in Fig. 5. The model parameter setting is described as follows. Each video clip is regarded as an image set and we generate a total of 253 image sets labeled with 3 clusters. The Grassmann points in this experiment are chosen as p = 10 dimension subspaces, i.e., Xi ∈ G(10, 576). We choose the expected rank and the neighbor size as r = 2 and C = 61. For LRR and SSC methods, we vectorize the first 42 frames in each clip (discarding the rest frames in the clip) and then use PCA algorithm to reduce the dimension 24 × 24 × 42 = 24192 to 147.

IEEE TRANSACTIONS ON XX, VOL. XX, NO. X, JANUARY 2016

10

results show that the proposed methods outperform the state-of-the-art clustering methods.

ACKNOWLEDGEMENTS

Fig. 5. Some samples from the Highway traffic dataset. First row illustrates heavy traffic level, second row illustrates medium traffic level and last row illustrates light traffic level. Methods, Datasets GPSSVR LapGPSSVR GLRR-F LRR SSC SCGSM SMCE LS3C

Traffic 0.8617 0.8933 0.8498 0.6838 0.6285 0.6443 0.5613 0.6364

The research project is supported by the Australian Research Council (ARC) through the grant DP140102270 and also partially supported by National Natural Science Foundation of China under Grant No. 61390510, 61133003, 61370119, 61171169, 61227004, Beijing Natural Science Foundation No. 4132013 and Funding Project for Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality(PHR(IHLB)).

R EFERENCES [1] [2]

[3] [4]

TABLE 4 Subspace clustering results on the Traffic dataset. [5] [6]

Table 4 presents the clustering result of all the algorithms. Obviously, our proposed methods get the highest accuracy 89.33% which almost reaches the highest classification accuracy 92.18% in [51]. For the real traffic applications, we can use the proposed methods to learn the different thresholds for the traffic jam levels on specific roads based on the historical traffic data, which is considered more accurate classify the traffic jam levels on individual road than using the empirical uniform thresholds for all roads. So, this method is meaningful for some practical applications.

[7] [8]

[9] [10] [11] [12]

6

C ONCLUSION

In this paper, we have proposed a novel PSSVR model on Grassmann manifold by embedding the manifold onto the space of symmetric matrices. Compared with the nuclear norm used in the LRR method, PSSV is proved a better approximation to the rank minimization problem, which is beneficial in exploring the global structure of data. We also provide efficient algorithms for the proposed methods. The computational complexity of the proposed GPSSVR method is presented, which proves that our algorithms are effective. In addition, to maintain the local structure hidden in data, we introduce soutthe a Laplacian regularization into our model. Several public video datasets are used to evaluate the performance of the proposed methods. The experimental

[13] [14] [15] [16]

[17] [18]

P. Tseng, “Nearest q-flat to m points,” Journal of Optimization Theory and Applications, vol. 105, no. 1, pp. 249–252, 2000. J. Ho, M. H. Yang, J. Lim, K. Lee, and D. Kriegman, “Clustering appearances of objects under varying illumination conditions,” in IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, 2003, pp. 11–18. M. Tipping and C. Bishop, “Mixtures of probabilistic principal component analyzers,” Neural Computation, vol. 11, no. 2, pp. 443– 482, 1999. A. Gruber and Y. Weiss, “Multibody factorization with uncertainty and missing data using the EM algorithm,” in IEEE Conference on Computer Vision and Pattern Recognition, vol. I, 2004, pp. 707–714. K. Kanatani, “Motion segmentation by subspace separation and model selection,” in IEEE International Conference on Computer Vision, vol. 2, 2001, pp. 586–591. Y. Ma, A. Yang, H. Derksen, and R. Fossum, “Estimation of subspace arrangements with applications in modeling and segmenting mixed data,” SIAM Review, vol. 50, no. 3, pp. 413–458, 2008. W. Hong, J. Wright, K. Huang, and Y. Ma, “Multi-scale hybrid linear models for lossy image representation,” IEEE Transactions on Image Processing, vol. 15, no. 12, pp. 3655–3671, 2006. G. Liu, Z. Lin, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 171–184, 2013. U. von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007. G. Chen and G. Lerman, “Spectral curvature clustering,” International Journal of Computer Vision, vol. 81, no. 3, pp. 317–330, 2009. E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009. G. Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by low-rank representation,” in International Conference on Machine Learning, 2010, pp. 663–670. G. Liu and S. Yan, “Latent low-rank representation for subspace segmentation and feature extraction,” in IEEE International Conference on Computer Vision, 2011, pp. 1615–1622. P. Favaro, R. Vidal, and A. Ravichandran, “A closed form solution to robust subspace estimation and clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1801–1807. C. Lang, G. Liu, J. Yu, and S. Yan, “Saliency detection by multitask sparsity pursuit,” IEEE Transactions on Image Processing, vol. 21, no. 1, pp. 1327–1338, 2012. E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, Theory, and Applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 2765–2781, 2013. J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 888–905, 2000. E. J. Cand´es, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM, vol. 58, no. 3, pp. 1– 37, 2011.

IEEE TRANSACTIONS ON XX, VOL. XX, NO. X, JANUARY 2016

[19] Y. Zhang, Z. Jiang, and L. S. Davis, “Learning structured lowrank representations for image classification,” in IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 676 – 683. [20] T. Zhang, B. Ghanem, S. Liu, and N. Ahuja, “Low-rank sparse learning for robust visual tracking,” in European Conference on Computer Vision, 2012, pp. 470–484. [21] E. Cand´es and T. Tao, “The power of convex relaxation: Nearoptimal matrix completion,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2053–2080, 2010. [22] S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm minimization with application to image denoising,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014. [23] I. Jeong and K. Lee, “Vocal separation using extended robust principal component analysis with Schatten p/lp -norm and scale compression,” in IEEE International Workshop on Machine Learning for Signal Processing, 2014, pp. 1–6. [24] T. Oh, Y. Tai, J. Bazin, H.Kim, and I. Kweon, “Partial sum minimization of singular values in robust pca: Algorithm and applications,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. PP, 2015. [25] R. Wang, S. Shan, X. Chen, and W. Gao, “Manifold-manifold distance with application to face recognition based on image set,” in IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8. [26] S. Shirazi, M. Harandi, C. Sanderson, A. Alavi, and B. Lovell, “Clustering on Grassmann manifolds via kernel embedding with application to action analysis,” in IEEE International Conference on Image Processing, 2012, pp. 781–784. [27] B. Wang, Y. Hu, J. Gao, Y. Sun, and B. Yin, “Product Grassmann manifold representation and its lrr models,” in American Association for Artificial Intelligence, 2016. [28] P. Turaga, A. Veeraraghavan, A. Srivastava, and R. Chellappa, “Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 11, pp. 2273–2286, 2011. [29] S. Roweis and L. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 1, pp. 2323–2326, 2000. [30] J. Tenenbaum, V. Silva, and J. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Optimization Methods and Software, vol. 290, no. 1, pp. 2319–2323, 2000. [31] X. He and P. Niyogi, “Locality preserving projections,” in Advances in Neural Information Processing Systems, vol. 16, 2003. [32] Z. Zhang and H. Zha, “Principal manifolds and nonlinear dimension reduction via local tangent space alignment,” SIAM Journal of Scientific Computing, vol. 26, no. 1, pp. 313–338, 2005. [33] M. T. Harandi, C. Sanderson, S. A. Shirazi, and B. C. Lovell, “Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 2705–2712. [34] M. T. Harandi, C. Sanderson, C. Shen, and B. Lovell, “Dictionary learning and sparse coding on Grassmann manifolds: An extrinsic solution,” in International Conference on Computer Vision, 2013, pp. 3120–3127. [35] B. Wang, Y. Hu, J. Gao, Y. Sun, and B. Yin, “Low rank representation on Grassmann manifolds,” in Asian Conference on Computer Vision, 2014. [36] P. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008. [37] H. Cetingul, M. Wright, P. Thompson, and R. Vidal, “Segmentation of high angular resolution diffusion mri using sparse Riemannian manifold clustering,” IEEE Transactions on Medical Imaging, vol. 33, no. 2, pp. 301–317, Feb 2014. [38] A. Goh and R. Vidal, “Clustering and dimensionality reduction on Riemannian manifolds,” in IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–7. ¨ [39] J. T. Helmke and K. Huper, “Newton’s method on Grassmann manifolds.” Preprint: [arXiv:0709.2205], Tech. Rep., 2007. [40] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma, “Robust principal component analysis: Exact recovery of corrupted lowrank matrices via convex optimization,” in Advances in Neural Information Processing Systems, vol. 22, 2009. [41] G. Kolda and B. Bader, “Tensor decomposition and applications,” SIAM Review, vol. 51, no. 3, pp. 455–500, 2009.

11

[42] Z. Lin, R. Liu, and Z. Su, “Linearized alternating direction method with adaptive penalty for low rank representation,” in Advances in Neural Information Processing Systems, vol. 23, 2011. [43] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011. [44] G. Liu and S. Yan, “Active subspace: Toward scalable low-rank learning,” Neural Computation, vol. 24, no. 12, pp. 3371–3394, 2012. [45] E. Elhamifar and R. Vidal, “Sparse manifold clustering and embedding.” Advances in Neural Information Processing Systems, 2011. [46] V. M. Patel, H. V. Nguyen, and R. Vidal, “Latent space sparse subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 691 – 701. [47] A. Fathi and G. Mori, “Action recognition by learning mid-level motion features,” in IEEE Conference on Computer Vision and Pattern Recognition, 2008. [48] M. Rodriguez, J. Ahmed, and M. Shah, “Action mach: a spatiotemporal maximum average correlation height filter for action recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2008. [49] L. Liu and L. Shao, “Learning discriminative representations from RGB-D video data,” in International Joint Conference on Artificial Intelligence, 2013. [50] A. B. Chan and N. Vasconcelos, “Modeling, clustering, and segmenting video with mixtures of dynamic textures,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 5, pp. 909–926, 2008. [51] A. Sankaranarayanan, P. Turaga, R. Baraniuk, and R. Chellappa, “Compressive acquisition of dynamic scenes,” in European Conference on Computer Vision, vol. 6311, 2010, pp. 129–142.

Boyue Wang received the B.Sc. degree from Hebei University of Technology, Tianjin, China, in 2012. he is currently pursuing the Ph.D. degree in the Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing. His current research interests include computer vision, pattern recognition, manifold learning and kernel methods.

Yongli Hu received his Ph.D. degree from Beijing University of Technology in 2005. He is a professor in College of Metropolitan Transportation at Beijing University of Technology. He is a researcher at the Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology. His research interests include computer graphics, pattern recognition and multimedia technology.

IEEE TRANSACTIONS ON XX, VOL. XX, NO. X, JANUARY 2016

Junbin Gao graduated from Huazhong University of Science and Technology (HUST), China in 1982 with BSc. degree in Computational Mathematics and obtained PhD from Dalian University of Technology, China in 1991. He is a Professor of Big Data Analytics in the University of Sydney Business School at the University of Sydney and was a Professor in Computer Science in the School of Computing and Mathematics at Charles Sturt University, Australia. He was a senior lecturer, a lecturer in Computer Science from 2001 to 2005 at University of New England, Australia. From 1982 to 2001 he was an associate lecturer, lecturer, associate professor and professor in Department of Mathematics at HUST. His main research interests include machine learning, data analytics, Bayesian learning and inference, and image analysis.

Yanfeng Sun received her Ph.D. degree from Dalian University of Technology in 1993. She is a professor in College of Metropolitan Transportation at Beijing University of Technology. She is a researcher at the Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology. She is the membership of China Computer Federation. Her research interests are multi-functional perception and image processing.

Baocai Yin received his Ph.D. degree from Dalian University of Technology in 1993. He is a Professor in School of Software Technology, Dalian University of Technology. He is a researcher at the Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology. He is a member of China Computer Federation. His research interests cover multimedia, multifunctional perception, virtual reality and computer graphics.

12

Suggest Documents