Compressive PCA for Low-Rank Matrices on Graphs Nauman Shahid∗ , Nathanael Perraudin, Gilles Puy† , Pierre Vandergheynst Email: {nauman.shahid, nathanael.perraudin, pierre.vandergheynst}@epfl.ch, †
[email protected]
arXiv:1602.02070v3 [cs.LG] 2 May 2016
Abstract We introduce a novel framework for an approximate recovery of data matrices which are low-rank on graphs, from sampled measurements. The rows and columns of such matrices belong to the span of the first few eigenvectors of the graphs constructed between their rows and columns. We leverage this property to recover the non-linear low-rank structures efficiently from sampled data measurements, with a low cost (linear in n). First, a Resrtricted Isometry Property (RIP) condition is introduced for efficient uniform sampling of the rows and columns of such matrices based on the cumulative coherence of graph eigenvectors. Secondly, a state-of-the-art fast low-rank recovery method is suggested for the sampled data. Finally, several efficient, parallel and parameterfree decoders are presented along with their theoretical analysis for decoding the low-rank and cluster indicators for the full data matrix. Thus, we overcome the computational limitations of the standard linear low-rank recovery methods for big datasets. Our method can also be seen as a major step towards efficient recovery of non-linear low-rank structures. On a single core machine, our method gains a speed up of p2 /k over Robust PCA, where k p is the subspace dimension. Numerically, we can recover a low-rank matrix of size 10304 × 1000 in 15 secs, which is 100 times faster than Robust PCA. Index Terms Robust PCA, graph Laplacian, spectral graph theory, compressive sampling
I. I NTRODUCTION In many applications in signal processing, computer vision and machine learning, the data has an intrinsic low-rank structure. One desires to extract this structure efficiently from the noisy observations. Robust PCA [4], a linear dimensionality reduction algorithm can be used to exactly describe a dataset lying on a single linear low-dimensional subspace. Low-rank Representation (LRR) [15], on the other hand can be used for data drawn from multiple linear subspaces. However, these methods suffer from two prominent problems: 1) They do not recover non-linear low-rank structures. 2) They do not scale for big datasets Y ∈ kc + Qkc Λckc Qkc , where Λckc ∈ < (n−k )×(n−k ) c c ¯ ck ∈ < is a diagonal matrix of lower eigenvalues and Λ is a diagonal matrix of higher graph eigenvalues. c > > > ¯ ¯ ¯ Similarly, let Lr = P Λr P = Pkr Λrkr Pkr + Pkr Λrkr Pkr . All the values in Λr and Λc are sorted in increasing order. For a K-nearest neighbors graph constructed from kc -clusterable data (along columns) one can expect λkc /λkc +1 ≈ 0 as λkc ≈ 0 and λkc λkc +1 . We refer to the ratio λkc /λkc +1 as the spectral gap of Lc . The same holds for the Laplacian Lr . Then, low-rank matrices on graphs can be defined as following and recovered by solving FRPCAG [29]. Definition 1. A matrix Y ∗ is (kr , kc )-low-rank on the graphs Lr and Lc if (Y ∗ )j ∈ span(Pkr ) for all j = 1, . . . , n and (Y ∗ )> i ∈ span(Qkc ) for all i = 1, . . . , p. The set of (kr , kc )-low-rank matrices on the graphs Lr and Lc is denoted by LR(Pkr , Qkc ). III. A G LIMPSE OF C OMPRESSIVE PCA (CPCA) In the sequel we present the full compressive PCA framework for two different cases:
¯ + E, ¯ where X ¯ ∈ LR(Pk , Qk ) and E ¯ models the 1) Case I: When Y is available: Given a data matrix Y ∈ ) + γr tr(X ˜ > L˜r X), ˜ min φ(Y˜ − X) ˜ X
˜ =X ˜∗ +E ˜ = Mr XM ¯ c + E, ˜ E ˜ models the errors in the recovery of the where φ is a loss function (possibly lp norm), X ˜ and Mr , Mc are the row and column sampling matrices whose design is discussed in Section subsampled low-rank X IV. ¯ = X ¯ ∗ + E ∗ (for the unsampled Y ) on graphs 5) Use the decoders presented in Section VII to decode the low-rank X ˜ to get cluster labels C˜ and use the semi-supervised Lr , Lc if the task is low-rank recovery, or perform k-means on X label propagation (presented in Section X) to get the cluster labels C for the full X. 2) Case II: When Y is not available: We assume for this case that the sampled data matrix Y˜ and the prior information in the form of Laplacians Lr , Lc for the unsampled Y is available. One can apply steps 3 to 5 to recover the low-rank. We directly state the computational complexities of several problems due to the space constraints. The details of calculations are presented in Appendix G. In the sequel we also use the terms compression, sampling and downsampling alternatively in the same context. IV. H OW M UCH TO SAMPLE ? RIP FOR LOW- RANK MATRICES ON GRAPHS A. The Uniform Sampling Matrices Let Mr ∈ kc ∆i > and Pkr ∆j characterize the first kc and kr fourier modes [31] of ∆i , ∆j on the graphs Gc and Gr respectively. Thus, the cumulative coherence is a measure of how well the energy of the (kr , kc ) low-rank matrices spreads over the nodes of the graphs. These quantities exactly control the number of vertices ρc and ρr that need to be sampled from the graphs Gr and Gc such that the properties of the graphs are preserved [23]. Consider the example of a single node i, if the coherence for this node is high then it implies that their exist some low-rank signals whose energy is highly concentrated on the node i. Removing this node would result in a loss of information in the data. If the coherence of this node is low then removing it in the sampling process would result in no loss of information. We already mentioned that we are interested in the case of uniform sampling. Therefore in order to be able to sample a small number of nodes uniformly from the graphs, the cumulative coherence should be as low as possible.
C. Implications of the Graph Cumulative Coherence We remind that for our application we desire to sample the data matrix Y such that its low-rank structure is preserved under this sampling. How can we ensure this via the graph cumulative coherence? This follows directly from the fact that we are concerned about the data matrices which are also low-rank with respect to the two graphs under consideration Y ∈ LR(Pkr , Qkc ). In simple words, the columns of the data matrix Y belong to the span of the eigenvectors Pkr and the rows to the span of Qkc . Thus, the coherence conditions for the graph directly imply the coherence condition on the data matrix Y itself. Therefore, using these quantities to sample the data matrix Y will ensure the preservation of two properties under sampling: 1) the structure of the corresponding graphs and 2) the low-rankness of the data matrix Y . Given the above definitions, we are now ready to present the restricted-isometry theorem for the low-rank matrices on the graphs. D. RIP for Low-rank Matrices on Graphs Theorem 1. (Restricted-isometry property (RIP) for low-rank matrices on graphs) Let Mc and Mr be two random subsampling matrices as constructed in (1). For any δ, ∈ (0, 1), with probability at least 1 − , np (1 − δ)kY k2F ≤ kMr Y Mc k2F ≤ (1 + δ)kY k2F (2) ρr ρc for all Y ∈ LR(Pkr , Qkc ) provided that 27 ρc ≥ 2 νk2c log δ
4kc
27 & ρr ≥ 2 νk2r log δ
where νkc , νkr characterize the graph cumulative coherence as in Definition 2 and quantifies the norm conservation in (2).
4kr
np ρc ρr
,
(3)
is just a normalization constant which
Proof: Please refer to Appendix A. Theorem 1 is a direct extension of the RIP for k-bandlimited signals on one graph [23]. It states that the information in Y ∈ LR(Pkr , Qkc ) is preserved with overwhelming probability if the sampling matrices (1) are constructed with a uniform sampling strategy satisfying (3). Note that ρr and ρc depend on the cumulative coherence of the graph eigenvectors. The better spread the eigenvectors are, the smaller is the number of vertices that need to be sampled. As proved in [23], we have vk2r ≥ kr and vk2c ≥ kc . Therefore, kc and kr is the minimum number of columns and rows that we need to sample. Our proposed framework in this paper builds on the case of uniform sampling but it can be easily extended for other sampling schemes [23]. V. G RAPHS FOR C OMPRESSED DATA To ensure the preservation of algebraic and spectral properties one can construct the compressed Laplacians L˜r ∈ . Furthermore assume that all the values in Λ ˜ r and Λ ˜ c are sorted in increasing L˜r = P˜ Λ r r r r kr kr order.
˜ k ). The low-rank X ˜ =X ˜ ∗ +E ˜ Assume Y˜ = Y˜ ∗ +E, where E models the noise in the compressed data and Y˜ ∗ ∈ LR(P˜kr , Q c can be recovered by solving the FRPCAG problem as proposed in the [29] and re-written below: ˜ + γc tr(X ˜ L˜c X ˜ > ) + γr tr(X ˜ > L˜r X), ˜ min φ(Y˜ − X) ˜ X
(4)
where φ is a proper, positive, convex and lower semi-continuous loss function (possibly lp norm). From Theorem 1 in [29], ¯ ˜ ∗ on the complement graph eigenvectors (Q ˜ k , P¯ ˜k ) the low-rank approximation error comprises the orthogonal projection of X c r ˜ k /λ ˜ k +1 , λ ˜ k /λ ˜ k +1 as following: and depends on the spectral gaps λ c c r r ˜ ˜ ¯ ˜ ∗Q ˜ k k2 + kP¯ ˜> X ˜ ∗ k2 = kEk ˜ 2 ≤ 1 φ(E) + kY˜ ∗ k2 λkc + λkr , (5) kX F ˜ kr F F c F ˜ k +1 γ λkc +1 λ r where γ depends on the signal-to-noise ratio. Clearly, if λkc /λkc +1 ≈ 0 and λkr /λkr +1 ≈ 0 and the compressed Laplacians ˜ k /λ ˜ k +1 ≈ 0 and λ ˜ k /λ ˜ k +1 ≈ 0. Thus, exact recovery is attained. are constructed using the kron reduction then λ c c r r > > Let g(Z) = γc tr(ZLc Z ) + γr tr(ZLr Z ), then ∇g (Z) = 2(γc ZLc + γr Lr Z). Also define proxλh (Z) = Y + sgn(Z − Y ) ◦ max(|Z − Y | − λ, 0), λ as the step size (we use λ = β10 ), where β ≤ β 0 = 2γc kLc k2 + 2γr kLr k2 and kLk2 is the spectral norm (or maximum eigenvalue) of L, as the stopping tolerance and J the maximum number of iterations. Then FRPCAG can be solved by the FISTA in Algorithm 1. Algorithm 1 FISTA for FRPCAG INPUT: Z1 = Y , S0 = Y , t1 = 1, > 0 for j = 1, . . . J do Sj = proxλj h (Zj − λj ∇g(Zj )) √ 1+ 1+4t2j tj+1 = 2 t −1 Zj+1 = Sj + tjj+1 (Sj − Sj−1 ) if kZj+1 − Zj k2F < kZj k2F then BREAK end if end for OUTPUT: Uj+1
VII. D ECODERS FOR LOW- RANK RECOVERY ρr ×ρc ˜ be the low-rank solution of (4) with the compressed graph Laplacians L˜r , L˜c and sampled data Y˜ . The Let X ∈ < ˜ = Mr XM ¯ c + E, ˜ where E ˜ ∈ ) + γ¯r tr(X > Lr X). min kMr XMc − Xk X
(8)
Theorem 3. Let Mr and Mc be such that (2) holds and γ > 0. Let also X ∗ be the solution of (8) with γ¯c = γ/λkc +1 , ˜ = Mr XM ¯ c + E, ˜ where X ¯ ∈ LR(Pk , Qk ) and E ˜ ∈ X and E ∗ = U ∗ − U ¯ ∗ . The same inequalities with slight modification also hold for V ∗ , which we omit where U r kr because of space constraints. Proof: Please refer to Appendix E. ¯ =U ¯Σ ¯ V¯ > , we can say that the error with this approximate decoder is upper bounded by the product of the errors of As X the individual subspace decoders. Also note that the error again depends on the spectral gaps defined by the ratios λkc /λkc +1 and λkr /λkr +1 . This decoder is less expensive as compared to the alternate decoder and costs O(IKkn). Furthermore, each of the vectors of the subspaces U and V can be determined in parallel.
˜ and V˜ are determined by the SVD of X ˜ which is noise and 3) Step 2: Subspace Upsampling: Since our subspaces U outlier free, we can directly upsample them without needing a margin for the noise. Thus, we propose to re-formulate the approximate decoder as follows: min tr(U > Lr U ) and U
˜, s.t: Mr U = U
min tr(V > Lc V ) V
s.t: Mc> V = V˜ .
(15)
Note that now we have a parameter-free decode stage. B. Solution of the Subspace Upsampling Decoder The solution to the above problems is simply a graph upsampling operation as explained in Lemma 1. Lemma 1. Let S ∈ OUTPUT: The full low-rank X ∈ Lr U ) s.t: U
˜. Mr U = U
The solution for U is given by eq. 17. Then, we can write V as: s ρc ρr (1 − δ) > ˜ −1 V = Y UΣ np However, we do not need to explicitly determine V here. Instead the low-rank X can be determined directly from U with the projection given below: r np ˜ X =U Σ V > = U U > Y. ρc ρr (1 − δ) B. Approximate decoder 3 Similar to the approximate decoder 2, we can propose another approximate decoder 3 which performs a graph upsampling on V and then determines U via matrix multiplication operation. min tr(V > Lc V ) s.t: V
Mc> V = V˜
The solution for V is given by eq. 17. Using the similar trick as for the approximate decoder 2, we can compute X without computing U . Therefore, X = Y V V > . For the proposed approximate decoders, we would need to do one SVD to determine the singular values ˜(Σ). However, note ˜ ∈ Lc C) s.t: C
According to Lemma 1, the solution is given by: " C=
˜ Mc> C = C.
¯ ¯ ¯ ˜ −L−1 c (Ωc , Ωc )Lc (Ωc , Ωc )C V˜
(18)
# .
(19)
The solution C obtained by solving the above problem is not binary and some maximum pooling thresholding needs to be done to get the cluster labels, i.e, ( 1 if Cij = max{Cij ∀ j = 1 · · · k} Cij ← 0 otherwise. Algorithm 3 summarizes this procedure.
Algorithm 3 Approximate Decoder for clustering ˜ ∈ ci = c˜i using PCG i Lc ci end for 3. Set Cij = 1 if max{Cij ∀ j = 1 · · · k} and 0 otherwise. OUTPUT: cluster indicators for X: C ∈ {0, 1}n×k Table I: Summary of CPCA and its computational complexity for a dataset Y ∈ Lr Z) kMr X ∗ Mc − Xk
(27)
¯ = Pk Yb Q> as in the proof of theorem 2 in [29], where Yb ∈ Qk = Ik , P > Pk = Ik , Q ¯ > Qk = 0, P¯ > Pk = 0. From the proof of theorem 2 in [29] we also that kYb kF = kXk c c r r c r kc kr kc kr know that: ¯ c (X) ¯ > ) ≤ λk kXk ¯ 2F tr(XL c ¯ > Lr (X)) ¯ ≤ λk kXk ¯ 2F tr(X r ¯ k k2F tr(X ∗ Lc (X ∗ )> ) ≥ λkc+1 kX ∗ Q c tr(X ∗ Lr (X ∗ )> ) ≥ λkr+1 kP¯k>r X ∗ k2F Now using all this information in (27) we get ˜ 2F + γ¯c λk +1 kX ∗ Q ¯ k k2F + γ¯r λk +1 kP¯k> X ∗ k2F ≤ kEk ˜ 2F + (¯ ¯ 2F kMr X ∗ Mc − Xk γc λkc + γ¯r λkr )kXk c c r r From above we have: ˜ F ≤ kEk ˜ F+ kMr X ∗ Mc − Xk
p
¯ F (¯ γc λkc + γ¯r λkr )kXk
(28)
and q
¯ k k2 + γ¯r λk +1 kP¯ > X ∗ k2 ) ≤ kEk ˜ F+ (¯ γc λkc +1 kX ∗ Q c F r F kr
using γ¯c = γ
1 λkc +1
and
γ¯r = γ
p
1 λkr +1
,
and ¯ k k2 = kP¯ > X ∗ k2 kE ∗ k2F = kX ∗ Q kr F c F
¯ F (¯ γc λkc + γ¯r λkr )kXk
(29)
we get: s ˜ F ≤ kEk ˜ F+ kMr X Mc − Xk ∗
and
γ
s p
˜ F+ 2γkE kF ≤ kEk ∗
λ λkr ¯ kc + kXkF λkc +1 λkr +1
(30)
λ λkr ¯ kc kXkF + λkc +1 λkr +1
(31)
λ λkr ¯ kc + kXkF λkc +1 λkr +1
(32)
γ
which implies ˜ F kEk 1 kE ∗ kF ≤ √ +√ 2γ 2
s γ
˜ 2 now. As Mr , Mc are constructed with a sampling without replacement, we have kMr E ∗ Mc kF ≤ Focus on kMr X ∗ Mc − Xk F ∗ kE kF . Now using the above facts and the RIP we get: ˜ F = kMr (X ¯ ∗ + E ∗ )Mc − Mr XM ¯ c − Ek ˜ F kMr X ∗ Mc − Xk s ρr ρc (1 − δ) ¯ ∗ ¯ F − kEk ˜ F − kE ∗ kF kX − Xk ≥ np this implies ¯ ∗ − Xk ¯ F ≤ kX
r
s # " λ 1 ˜ np 1 λkr ¯ √ kc 2+ √ kEkF + ( √ + γ) + kXkF ρc ρr (1 − δ) λkc +1 λkr +1 2γ 2
Discussion Let A1 , A2 ∈ Lr U ) s.t: U
˜ k2F < U > U = Ik , kMr U − U
0 ˜ , then we can re-write it as: Let U is the zero appended matrix of U 0
min tr(U > Lr U ) s.t: U > U = Ik , kMr (U − U )k2F < U
0
The above problem is equivalent to (10), as the term kMr (U − U )k2F has been removed from the objective and introduced as a constraint. Note that the constant γr is not needed anymore. The new model parameter controls the radius of the L2 ball 0 kMr (U − U )k2F . In simple words it controls how much noise is tolerated by the projection of U on the ball that is centered 0 at U . To solve the above problem one needs to split it down into two sub-problems and solve iteratively between: 1) The optimization minU tr(U > Lr U ) s.t: U > U = Ik . The solution to this problem is given by the lowest k eigenvectors of Lr . Thus it requires a complexity of O((n + p)k 2 ) for solving both problems (10). 0 2) The projection on the L2 ball kMr (U − U )k2F whose complexity is O(ρc + ρr ). Thus the solution requires a double iteration with a complexity of O(Ink 2 ) and is almost as expensive as FRPCAG.
E. Proof of Theorem 4 We can write (11) and (12) as following: p X 0 min kMr ui − u ˜i k22 + γ r u> i Lr ui
u1 ···up
min
v1 ···vn
(33)
i=1 n X 0 kMc> vi − v˜i k22 + γ c vi> Lc vi
(34)
i=1
¯ . The proof for Problem (12) and the recovery of V¯ is identical. In this proof, we only treat Problem (33) and the recovery of U The above two problems can be solved independently for every i. From theorem 3.2 of [23] we obtain: s " ! ! # r p p 1 λkr u ∗ 0 2+ p k˜ ei k 2 + + γr λkr k¯ ui k 2 , k¯ ui − u ¯i k2 ≤ ρr (1 − δ) λkr +1 γr0 λkr +1 and ke∗i k2
1 k˜ eui k2 + ≤p γr0 λkr +1
s
λkr k¯ ui k2 , λkr +1
which implies
k¯ u∗i − u ¯i k22 ≤ 2
p 2+ p 1 ρr (1 − δ) γr0 λkr +1
!2
s k˜ eui k22 +
λkr + λkr +1
!2 p
γr0 λkr
k¯ ui k22 ,
and λkr 2 k˜ eu k2 + 2 k¯ ui k22 . γr0 λkr +1 i 2 λkr +1
ke∗i k22 ≤
Summing the previous inequalities over all i’s yields !2 p 1 ∗ 2 ¯ −U ¯ kF ≤ 2 ˜ u k2F + 2+ p kU kE ρr (1 − δ) γr0 λkr +1
s
λkr λkr +1
!2 p + γr0 λkr
¯ k2F , kU
and 2 ˜ u k2F + 2 λkr kU ¯ k2F . kE γr0 λkr +1 λkr +1
kE ∗ k2F ≤
Taking the square root of both inequalities terminates the proof. Similarly, the expressions for V¯ can be derived: s s " ! ! # p 2n 1 λkc ∗ v 0 ¯ ¯ ˜ ¯ kV − V kF ≤ 2+ p kE kF + + γc λkc kV kF ρc (1 − δ) λkc +1 γc0 λkc +1 and s ∗
kE kF ≤
2 ˜ v kF + kE γc0 λkc +1
s 2
λkc ¯ kV kF . λkc +1
F. Proof of Lemma 1 Let S = [Sa> |Sb> ]> . Further we split L into submatrices as follows: " # Laa Lab L= Lba Lbb Now (16) can be written as: " min Sa
Sa Sb
#> "
s.t: Sb = R
Laa Lba
Lab Lbb
#"
Sa Sb
#
further expanding we get: min Sa> Laa Sa + Sa> Lab R + R> Lba Sa + RLbb R Sa
using ∇Sa = 0, we get: 2Lab R + 2Laa Sa = 0 Sa = −L−1 aa Lab R G. Computational Complexities & Additional Results We present the computational complexity of all the models considered in this work. For a matrix X ∈ Lr X) = O(p|Ec | + n|Er |) = O(pnK + npK), where Er , Ec denote the number of non-zeros in Lr , Lc respectively. Note that we use the K-nearest neighbors graphs so Er ≈ Kp and Ec ≈ Kn. 4) The complexity for the construction of L˜c and L˜r for compressed data Y˜ is negligible if FLANN is used, i.e, O(ρc ρr log(ρc )) and O(ρc ρr log(ρr )). However, if the kron reduction strategy of Section V is used then the cost is O(KOl (n + p)) ≈ O(KOl n). ˜ ∈ Lr X). 7) We use the approximate decoders for low-rank recovery in the complexity calculations (eq. (17) in Section VIII). All the decoders for low-rank recovery are summarized in Table X. ˜ ∈ +γc tr(XLc X ) +γr tr(X > Lr X)
O(InpK)
gradient descent
no
˜ k2 minU kMr U − U F 0 > +γ r tr(U Lr U )
O(InK)
gradient descent
yes
minV kMc> V − V˜ k2F 0 +γ c tr(V > Lc V )
O(IpK)
gradient descent
yes
˜ > X = U ΣV
O(ρ2r ρc )
SVD
minU tr(U > Lr U ) ˜ s.t:Mr U = U
O(pkOl K)
PCG
minV tr(V > Lc V ) s.t:Mc> V = V˜
O(nkOl K)
PCG
˜ > X = U ΣV
O(ρ2r ρc )
SVD
minU tr(U > Lr U ) ˜ s.t:Mr U = U
O(pkOl K)
PCG
yes
O(nkOl K)
PCG
yes
yes
X = U U >Y minV tr(V > Lc V ) s.t:Mc> V = V˜
approximate 3
Low-rank complexity
X =YVV>
Complexity Gc O(np log(n))
+ – – + – + + – – + +
Model
LE [1] LLE [26] PCA GLPCA [11] NMF [13] GNMF [3] MMF [38] RPCA [4] RPCAG [28] FRPCAG [29] CPCA
– – – – – – – – – + +
Complexity Gr O(np log(p)) O(n3 ) O((p + k)n2 ) O(p2 n) O(n3 ) O(Inpk) O(Inpk) O(((p + k)k 2 + pk)I) O(Inp2 ) O(Inp2 ) O(InpK) O(Iρc ρr K)
Complexity Algorithm for p n – – – – – – – – – – O(ρc ρ2 r )
– – – – – – – – – – O(nkOl K)
– – O(n(p2 log(n) + p2 )) O(n(p log(n) + n2 )) O(Inpk) O(np log(n)) O(np log(n)) O(np2 I + n log(n)) O(np2 I + n log(n)) O(np log(n)) O(np log(n))
Overall Complexity (low-rank) Fast SVD Decoder Total for p n
Table XI: Computational complexity of all the models considered in this work
O(Inkp) O(Inkp) O(Inkp) O(Inkp) O(Inkp) O(Inkp) O(Inkp) O(Inkp) O(Inkp) O(Inkp) O(Iρc kρr )
– – – – – – – – – – O(nkOl )
O(n3 ) O(pn2 ) O(np2 log(n) + np2 ) O(n3 )) O(Inp(k + K)) O(np log(n)) O(np log(n)) O(np2 I + n log(n)) O(np2 I + n log(n)) O(np log(n)) O(np log(n))
Overall Complexity (clustering) k-means Decoder Total [7]
Table XII: Details of the datasets used for clustering experiments in this work.
Dataset ORL large ORL small CMU PIE YALE MNIST MNIST small USPS large USPS small
Samples 400 300 1200 165 70000 1000 10000 3500
Dimension 56 × 46 28 × 23 32 × 32 32 × 32 28 × 28 28 × 28 16 × 16 16 × 16
Classes 40 30 30 11 10 10 10 10
Table XIII: Range of parameter values for each of the models considered in this work. k is the rank or dimension of subspace or the number of clusters, λ is the weight associated with the sparse term for Robust PCA framework [4] and γ, α are the parameters associated with the graph regularization term.
Model LLE [26], PCA LE [1] GLPCA [11]
Parameters k
MMF [38] NMF [13] GNMF [3]
k, γ k, γ k k, γ
RPCA [4]
λ
RPCAG [28] FRPCAG [29] CPCA
λ, γ γr , γc γr , γc k (approximate decoder)
Parameter Range k ∈ {21 , 22 , · · · , min(n, p)} k ∈ {21 , 22 , · · · , min(n, p)} γ =⇒ β using [11] β ∈ {0.1, 0.2, · · · , 0.9} k ∈ {21 , 22 , · · · , min(n, p)} γ ∈ {2−3 , 2−2 , · · · , 210 } 3 2−3 : 0.1 : √ 2 } max(n,p) max(n,p) −3 −2 10 ∈ {2 , 2 , · · · , 2 }
λ ∈ {√ γ
γr , γc ∈ (0, 30) γr , γc ∈ (0, 30) ˜ k,k /Σ ˜ 1,1 < 0.1 Σ