Dictionary Learning by Nonnegative Matrix Factorization ... - IEEE Xplore

3 downloads 0 Views 155KB Size Report
Abstract—In this paper, we propose an overcomplete, non- negative dictionary learning method for sparse representation of signals, which is based on the ...
Dictionary Learning by Nonnegative Matrix Factorization with ℓ1/2-Norm Sparsity Constraint Zhenni Li, Zunyi Tang and Shuxue Ding School of Computer Science and Engineering, The University of Aizu, Tsuruga, Ikki-Machi, Aizu-Wakamatsu City, Fukushima 965-8580, Japan E-mail: [email protected]; [email protected]; [email protected] Abstract—In this paper, we propose an overcomplete, nonnegative dictionary learning method for sparse representation of signals, which is based on the nonnegative matrix factorization (NMF) with ℓ1/2 -norm as the sparsity constraint. By introducing the ℓ1/2 -norm as the sparsity constraint into NMF, we show that the problem can be cast as sequential optimization problems of quadratic functions and quartic functions. The optimization problem of each quadratic function can be solved easily since the problem has closed-form unique solution. The optimization problem of quartic function can also be formulated as solving a cubic equation, which can be efficiently solved by the Cardano formula and selecting one of solutions with a rule. To implement this nonnegative dictionary learning, we develop an algorithm by employing coordinate-wise decent strategy, i.e., coordinatewise decent based nonnegative dictionary learning (CDNDL). Numerical experiments show that the proposed algorithm performs better than the nonnegative K-SVD (NN-KSVD) and the other two compared algorithms. Index Terms—Nonnegative dictionary learning, overcomplete dictionary, sparse representation, NMF

I. I NTRODUCTION In recent years, the topic of sparse representations of signals has received growing attention [1], [2], [3]. Extensive research in this field concentrates mainly on the study of pursuit algorithms, such as matching pursuit (MP) [4], basic pursuit (BP) [5], and orthogonal matching pursuit (OMP) [6], which can be used to realize sparse representation of signals with respect to a given known dictionary. Using an overcomplete dictionary matrix W ∈ ℝ𝑚×𝑟 that contains 𝑟 atoms of size 𝑚 × 1 for columns, signals Y ∈ ℝ𝑚×𝑛 can be described by sparse and efficient linear combinations of few atoms, which is very effective in many signal and image processing applications. The overcomplete means 𝑚 < 𝑟. Y = WH or Y ≈ WH satisfying ∥ Y − WH ∥2 ≤ 𝜀 are two ways to represent Y. The H ∈ ℝ𝑟×𝑛 is termed as the coefficient matrix which contains the coefficients for representation of signals Y. In recent practices [7], [8], a learned dictionary has been proved to be critical for achieving superior results in the field of signal and image processing. On the other hand, in some applications the nonnegativities of the signals and the dictionary are required, such as the multilateral data analysis [9], [10] and the nonnegative factorization for recognition [11], [12]. These requirements call for the dictionary learning method imposed with the nonnegativity, namely, socalled nonnegative dictionary learning. Taking into account that the similarity of data dimensionality reduction between

978-1-4673-6469-0/13/$31.00 © 2013

63

sparse representation of nonnegative signals and nonnegative matrix factorization (NMF) [13], [14], we try to convert the overcomplete, nonnegative dictionary learning problem into the NMF problem with a sparsity constraint. The NMF aims to factorize a nonnegative matrix into a product of two nonnegative matrices with different properties, in which one matrix is termed as the base matrix and the other is termed as the coefficient matrix corresponding to the base matrix. The standard NMF algorithm does not have any constraints on the two matrices except for the nonnegativity. In order to render a sparser representation, some kinds of sparsityconstrained NMF methods, with different constraints imposed on the matrix factors, have been proposed. Expansion speaking, ℓ0 -norm, ℓ1 -norm and ℓ2 -norm have been usually used as the sparsity constraints [15], [16], [17]. Since the ℓ0 -norm optimization problem is generally NP-hard, fortunately it can be replaced by ℓ1 -norm for the convenience of optimization in the real-world applications. Some authors also impose sparsity constraints by using ℓ2 -norm [18], because of the particularity of the sparse NMF. In this paper, we propose a method for learning the overcomplete, nonnegative dictionary which is accomplished by posing the sparse representation of nonnegative signals as a problem of NMF with a different sparsity constraint, i.e., ℓ1/2 norm. The ℓ1/2 -norm, as a compromise between ℓ0 -norm and ℓ1 -norm, can give a better feature, while can be dealt with an efficient method. This method is achieved via adding a sparsity penalty term on coefficient matrix compared with standard NMF without any constraints. We adopt coordinatewise descent strategy [19] to develop an algorithm termed as coordinate-wise descent based on nonnegative dictionary learning algorithm (CDNDL). As a result, our algorithm can perform well on the dictionary learning. Numerical experiments show that the proposed algorithm can recover almost all aimed dictionary atoms from training data even in strong noisy environment, which is superior over the other algorithms including NMFSC [16], NN-KSVD [20], and NMFℓ0 -H [21]. The remaining part of paper is organized as follows. In section II, we describe the problem formulation. The proposed algorithm on nonnegative dictionary learning is presented in section III. In Section IV we give the results of numerical experiments for the proposed algorithm and compare these results with those of several other algorithms. Finally, section V concludes the paper and discusses the future work.

II. P ROBLEM F ORMULATION One common way of solving the NMF problem is to formulate it as an optimization problem. The NMF problem is formulated as follows. Given an input matrix Y ∈ ℝ𝑚×𝑛 , where each element is nonnegative and 𝑟 ≪ min(𝑚, 𝑛), NMF aims to find nonnegative matrices W ∈ ℝ𝑚×𝑟 and H ∈ ℝ𝑟×𝑛 satisfying the condition of Y = WH or Y ≈ WH. The factors W and H can usually be found by posed as the following optimization problem, which minimizes the Euclidean distance between Y and WH: 1 ∥Y − WH∥2𝐹 2 subject to W ≥ 0, H ≥ 0

min 𝑓 (W, H) =

(1)

where ∥ ⋅ ∥𝐹 represents the Frobenius norm. Further W is over-determined (if 𝑚 > 𝑟) or determined (if 𝑚 = 𝑟). The dictionary learning aims at sparse representations of a signal or a set of signals by a learned dictionary. In the case of 𝑚 < 𝑟 and W is a full-row rank (i.e. under-determined situation), an infinite number of the approximated results exist for problem (1) if there is not any constraint conditions imposed on factors W or H. In order to obtain sparser representation, some kinds of constraints on the coefficient factor H, e.g., ℓ0 -norm, ℓ1 -norm and ℓ2 -norm have been proposed. Here, we consider ℓ1/2 -norm as the sparsity constraint, which is a compromise between ℓ0 -norm and ℓ1 -norm. Then it will be seen in the following sections, while it can be dealt with an efficient method. Therefore the nonnegative dictionary learning problem can be formulated as the following objective function, 1 1/2 ∥Y − WH∥2𝐹 + 𝜆∥H∥1/2 2 subject to W ≥ 0, H ≥ 0 (2) ∑ where ∥H∥1/2 = ( 𝑖,𝑗 ∣H𝑖𝑗 ∣1/2 )2 represents the ℓ1/2 -norm of the matrix H. The regularization parameter 𝜆 is used to control the trade-off between the fidelity of NMF and the sparsity 1/2 constraint item ∥H∥1/2 , and it can be calibrated off-line on a specified problem. min 𝑓 (W, H) =

III. T HE A LGORITHM For solving the constrained NMF problem, many algorithms have been developed and most of them are structured with iterative strategy, which utilize the fact that the problem can be reduced into two sequential convex nonnegative problems about W or H whereas the other of them is regarded as fixed and known. Our algorithm is similar with the structure. However, the algorithm for each sub-sequential convex nonnegative least squares problem is different from the traditional optimization methods, such as MU [22] and ANLS [23]. We adopt coordinate-wise descent strategy to optimize objective problem (2). Here, we present a summarized deriving of the update rules for H and W in (2). In terms of the definition and properties of the Frobenius norm, for a matrix A ∈ ℝ𝑚×𝑛 , ∥A∥2𝐹 = Tr(AA𝑇 ) = Tr(A𝑇 A). Tr(⋅) denotes the trace of a square

64

matrix. Thus, the objective function (2) can be decomposed as follows: 𝑛 𝑛 ∑ 1∑ 𝑇 Y𝑗: Y:𝑗 − [Y𝑇 W]𝑗: H:𝑗 𝐽= 2 𝑗=1 𝑗=1 +

𝑛 𝑟 ∑ 𝑛 ∑ 1∑ 𝑇 𝑇 H𝑗: W WH:𝑗 + 𝜆 ∣H𝑖𝑗 ∣1/2 2 𝑗=1 𝑖=1 𝑗=1

(3)

If fixing H in (3), then (3) is a multivariable objective function of W𝑖𝑗 . For (3), we now consider optimizing only one variable W𝑖𝑘 , while fixing the other components in W. We first select the items related to W𝑖𝑘 from (3) and obtain a quadratic function with regard to W𝑖𝑘 as follows: 𝑟 ( ∑ 1 2 + W𝑖𝑘 W𝑖𝑙 [HH𝑇 ]𝑙𝑘 𝐽W𝑖𝑘 = [HH𝑇 ]𝑘𝑘 W𝑖𝑘 2 𝑙=1,𝑙∕=𝑘 ) 𝑇 (4) − [YH ]𝑖𝑘 where [HH𝑇 ]𝑘𝑘 denotes the entry in the 𝑘-th row and the 𝑘-th column of the multiplication of matrices H and H𝑇 . In terms of the properties of a single variable quadratic problem, 𝐽W𝑖𝑘 obtains the minimum when W𝑖𝑘 = ∑ [YH𝑇 ]𝑖𝑘 −

𝑟 𝑇 𝑙=1,𝑙∕=𝑘 W𝑖𝑙 [HH ]𝑙𝑘 . [HH𝑇 ]𝑘𝑘 W, W𝑖𝑘 is set to 0

Considering the nonnegativity when it is negative. Moreover, of factor since the optimal value for a given entry of W does not depend on the other components of the same column, one can optimize one whole column of W at the same time. Thus, the update rule of factor W becomes as follows, 𝑇 ) ( YH𝑇 − ∑ :𝑘 𝑙∕=𝑘 W:𝑙 H𝑙: H:𝑘 ∗ W:𝑘 = max 0, H𝑘: H𝑇:𝑘 ( R H𝑇 ) 𝑘 :𝑘 (5) = max 0, ∥H𝑘: ∥22 ∗ where ∑W:𝑘 denotes the 𝑘-th column of matrices W, R𝑘 = Y − 𝑙∕=𝑘 W:𝑙 H𝑙: , and ∥⋅∥2 represents the ℓ2 -norm. After W has been updated in a round, every column in W is required to be normalized in order to have a unit ℓ2 -norm. Next, we fix W and derive the update rule for H. We first consider optimizing only one variable H𝑘𝑗 , while fixing the other components in H. We obtain an objective function with regard to H𝑘𝑗 as follows: 𝑟 ( ∑ 1 [W𝑇 W]𝑘𝑘 H2𝑘𝑗 + H𝑘𝑗 𝐽H𝑘𝑗 = [W𝑇 W]𝑘𝑙 H𝑙𝑗 2 𝑙=1,𝑙∕=𝑘 ) 1/2 𝑇 −[W Y]𝑘𝑗 + 𝜆H𝑘𝑗 (6) where [W𝑇 W]𝑘𝑘 denotes the entry in the 𝑘-th row and the 𝑘-th column of the multiplication of matrices W𝑇 and W. 1/2 Letting Z𝑘𝑗 = H𝑘𝑗 , a real variable since H𝑘𝑗 ≥ 0, and substituting it into (6), one can obtain a single-variable quartic function with regard to Z𝑘𝑗 , 𝑟 ( ∑ 1 [W𝑇 W]𝑘𝑘 Z4𝑘𝑗 + Z2𝑘𝑗 𝐽Z𝑘𝑗 = [W𝑇 W]𝑘𝑙 H𝑙𝑗 2 𝑙=1,𝑙∕=𝑘 ) 𝑇 −[W Y]𝑘𝑗 + 𝜆Z𝑘𝑗 (7)

2013 IEEE International Conference on Cybernetics

To minimize (7), one may solve the points of extreme values in (7), and then determine the minimum point. Thus, the first derivative of (7) with respect to Z𝑘𝑗 is given as follows, ∂𝐽Z𝑘𝑗 ∂Z𝑘𝑗

=

2[W

𝑇

W]𝑘𝑘 Z3𝑘𝑗 𝑇

+ 2Z𝑘𝑗

𝑟 ( ∑

[W𝑇 W]𝑘𝑙 H𝑙𝑗

𝑙=1,𝑙∕=𝑘

)

−[W Y]𝑘𝑗 + 𝜆

(8)

∂𝐽Z

𝑘𝑗 = 0 to obtain its roots. The optimal solution is letting ∂Z𝑘𝑗 Since (8) is a cubic equations, it has three closed-form roots (𝑖) Z𝑘𝑗 (𝑖 = 1, 2, 3) that maybe include complex roots, and it can be easily obtained by the Cardano formula [24]. Next one can (𝑖) (𝑖) obtain the corresponding H𝑘𝑗 = (Z𝑘𝑗 )2 , (𝑖 = 1, 2, 3). Here (𝑖) we only consider the real roots of H𝑘𝑗 , and we can find the ˜ 𝑘𝑗 pointing the minimum value of (6) by comparing optimal H the function values corresponding to the real roots. Similar to the update rule for W, the optimal value for a given entry of H does not depend on the other components of the same row. Therefore, one can optimize one whole row of H at the same time. Additionally, H is required to be nonnegative. Thus, the update rule for H of (3) is expressed as follows: ) ( ˜ 𝑘: (9) H∗𝑘: = max 0, H

where H∗𝑘: denotes the 𝑘-th row of matrices H. A potential problem with CDNDL will be arisen if one of ∗ the vectors W:𝑘 (or H∗𝑘: ) becomes equal to zero vector. That leads to numerical instabilities. A possible way to overcome ∗ and this problem is to replace the zero lower bounds on W:𝑘 ∗ H𝑘: by a small positive constant 𝜀 ≪ 1 (typically, 10−8 ). Hence we get the following amended closed-form update rules, ( R H𝑇 ) 𝑘 :𝑘 ∗ W:𝑘 = max 𝜀, ∥H𝑘: ∥22 ) ( ˜ 𝑘: (10) H∗𝑘: = max 𝜀, H According to the analysis above, the proposed coordinatewise descent based nonnegative dictionary learning algorithm is termed as CDNDL and summarized in Algorithm 1. IV. N UMERICAL E XPERIMENTS In this section, we present the result of experiment by using CDNDL algorithm with synthetic signals. The results of experiment show that our proposed algorithm for a nonnegative dictionary has strong learning capacity and performs the robustness compared with other representative algorithms including NMFSC, NN-KSVD, and NMFℓ0 -H. In addition, another experiment on a 10 decimal digits dataset shows applicability to real-world signals of the proposed algorithm. A. Experiment on Synthetic Signals Generated with a Dictionary For our experiments, we begun with generating a stochastic nonnegative matrix W of size 20×50 with i.i.d. uniformly distributed entries[25]. Each vector was normalized to the unit ℓ2 -norm. Then we synthesized 1500 test signals of dimension

Algorithm 1 CDNDL Require: Data Matrix Y ∈ ℝ𝑚×𝑛 , initial matrices + 𝑟×𝑛 W ∈ ℝ𝑚×𝑟 and H ∈ ℝ , and set 𝜀 = 10−8 + + 1: while stopping criterion not satisfied do 2: Computing P = YH𝑇 and Q = HH𝑇 ; 3: for 𝑘 = 1 to 𝑟 do ( P:𝑘 −∑𝑟𝑙=1,𝑙∕=𝑘 W:𝑙 Q𝑙𝑘 ) ∗ 4: W:𝑘 ← max 𝜀, Q𝑘𝑘 W:𝑘 5: Normalizing W:𝑘 ← ∥W ; :𝑘 ∥2 6: end for 7: for 𝑘 = 1 to 𝑟 do 8: Parallel computing the following ‘for’ loop 9: for 𝑗 = 1 to 𝑛 do (𝑖) 10: Solving {H𝑘𝑗 }, 𝑖 = 1, 2, 3, using Cardano formula ˜ 𝑘𝑗 ← Finding the minimum point in {H(𝑖) } H 𝑘𝑗 11: end for ) ( ˜ 𝑘: 12: H∗𝑘: ← max 𝜀, H 13: end for 14: end while 20, each of which was produced by a linear combination of three different atoms in the generated dictionary W, with three corresponding coefficients in random and independent locations. The uniformly distributed noise of varying signal-tonoise ratio (SNR) for performance analysis of noise-robustness was considered in the experiment. We executed NMFSC, NN-KSVD, NMFℓ0 -H, and CDNDL on the test signals respectively for estimating W and evaluating its accuracy by comparing with the true W. For the four algorithms, the initialized dictionary matrices of size 20×50 were composed of the randomly selected parts of the test signals. Since NN-KSVD (nonnegative variant of K-SVD), NMFSC and NMFℓ0 -H were the three state-in-art algorithms for nonnegative dictionary learning, we compared our algorithm with these algorithms. The implementation of NN-KSVD algorithm is online available 1 . We executed the NN-KSVD algorithm for a total number of 200 iterations. Matlab code for NMFSC 2 and NMFℓ0 -H3 algorithms are also online available. We used the same test data with NMFSC and NMFℓ0 -H algorithms. The learning procedure with NMFSC was stopped after 3000 iterations because it converged fairly slower than the other algorithms. And the maximum number of iterations of NMFℓ0 -H algorithm was fairly set to 200. It was worth noting that, in the experiment, NN-KSVD and NMFℓ0 H needed the specified exact number of non-zero elements in coefficient matrix (3/50=0.06 for the case), while NMFSC was executed with a sparsity factor of 0.85 on the coefficients. For CDNDL, the sparsity of the coefficient matrices were adjusted via the regularization parameters 𝜆. The parameter 𝜆 can be determined off-line calibrating. We repeated the experiment with different 𝜆 and determined which value for 𝜆 was the optimal according to the output results. In the CDNDL 1 Online

available http://www.cs.technion.ac.il/˜elad/software/ available http://www.cs.helsinki.fi/u/phoyer/contact.html 3 Online available http://www3.spsc.tugraz.at/people/robert-peharz 2 Online

2013 IEEE International Conference on Cybernetics

65

100

80

90 80

The rates of recovery

The rates of recovery

100

NMFSC NN−KSVD 0 NMFl −H CDNDL

60

40

70 60 50 40 30 20

20

NN−KSVD

10

0

0

10 dB

20 dB 30 dB Noise level and no noise case

No Noise

Fig. 1. Experiment result when apply to the synthetic signal: for each of the tested algorithms and for each noise level, 15 trials were performed and their results were sorted. The averaged recovery rates of learned atoms and corresponding deviation of recovery rates are displayed.

algorithm 𝜆 was set to 0.015. The learned dictionaries were compared with the true generating dictionary. The comparisons were done as described in [25] by sweeping through the columns of the generating and the learned dictionaries and finding the closest column (in ℓ2 -norm distance) between the two dictionaries. A distance less than 0.01 was considered as a success. All trials were repeated 15 times. In the experiment, the CDNDL algorithm could recovery averaged 45.2%, 93.2%, 93% and 94.2% atoms under the noise levels of 10 dB, 20 dB, and 30 dB, and in the noiseless case, respectively. For NN-KSVD and NMFℓ0 -H, they could obtain averaged 15.7%, 68.0%, 82.9% and 86.5%, and as well 23.7%, 80.8%, 84.9% and 84.0% atoms, respectively, under the same conditions. For NMFSC, it recovered only averaged 0.4%, 13.5%, 38.4% and 49.3% atoms. The detailed results of the experiment for these algorithms are shown in the Fig. 1. The proposed CDNDL performed best on dictionary learning. We also analyzed the relationships between the recovery rates and the iteration numbers of algorithms. The result of the experiment (averaged over the 15 test signals under the noise levels of 20 dB.) is showed in Fig. 2. It can be observed in Fig. 2 that CDNDL could converge well and recovery about 92% atoms, while the NN-KSVD and NMFℓ0 H algorithms performed unsatisfactorily although NMFℓ0 -H executed better at the beginning of the recovering process. Since the recovery rates of NMFSC algorithm was much worse and the required number of iterations was much more than that of the other algorithms, NMFSC algorithm was not included in the comparison. B. Synthetic Experiment with Decimal digits Dictionary To further investigate the performance of the proposed nonnegative dictionary learning algorithm, we considered the

66

NMFl0−H CDNDL 0

20

40

60

80

100

120

140

160

180

200

The iteration number Fig. 2. The relationships between the recovery rates and the iterations of algorithms under the noise level of 20 dB.

10 decimal digits dataset that was originated in [20]. The dataset is composed of 90 images of size 8×8, representing 10 decimal digits with various position shifts. First, 3000 training signals were generated by random linear combinations of 5 different atoms in the dataset with random positive coefficients. Due to space limitations, we show noiseless case only of the experiment. For learning dictionary, the training signals were input into the four algorithms mentioned in above Section. The NN-KSVD, NMFℓ0 -H and CDNDL algorithms were all stopped after 200 iterations, and the NMFsc algorithm executed 3000 iterations. The experiment were repeated 15 times with different initialized matrices. In Fig. 3, we give an example of the experiment under noiseless conditions, in which the four algorithms could recover 56, 68, 75,and 86, respectively. With further experiments, the four algorithms, NMFSC, NN-KSVD, NMFℓ0 -H and CDNDL averagely recovered 54.8, 69.1, 75.5 and 85.6 atoms of 90 atoms, respectively. It is shown that the CDNDL algorithm can recover almost atoms comparing with other algorithms. V. C ONCLUSIONS In this paper we have presented a novel and efficient nonnegative dictionary learning algorithm. Utilizing the similarity of data dimensionality reduction between NMF and sparse representation of signals, we have converted overcomplete, nonnegative dictionary learning problem into the NMF problem with a sparsity constraint. Results of experiments on dictionary recovery show that the CDNDL can correctly learn a overcomplete, nonnegative dictionary on synthetic signals and further show that the proposed algorithm performs the robustness against noise in comparison with the other compared algorithms. We believe that this kind of dictionary learning can also perform better, compared with popular representation methods, in inpainting, image denoising and other application, which are remained as our further work.

2013 IEEE International Conference on Cybernetics

[7]

[8] [9] [10]

(a)

(b) [11]

[12]

[13] [14] [15]

(c)

(d)

[16] [17] [18] [19] [20] [21]

(e)

(f)

[22] [23]

Fig. 3. (a) The true dictionary composing of 90 atoms. (b) A part of the total training data. (c)-(f) The learned dictionaries by NMFSC, NN-KSVD, NMFℓ0 -H and CDNDL algorithms. The number of the learned atoms are 56, 68, 75 and 86 respectively.

[24] [25]

R EFERENCES

pursuit: recursive function approximation with applications to wavelet decomposition,” in Proc. 27th Annu. Asilomar Conf. Signals, Systems and Computers, vol. 1, Nov 1993, pp. 40–44. M. Plumbley, T. Blumensath, L. Daudet, R. Gribonval, and M. Davies, “Sparse representations in audio and music: From coding to source separation,” Proceedings of the IEEE, vol. 98, no. 6, pp. 995–1005, Jun 2010. M. Elad, M. Figueiredo, and Y. Ma, “On the role of sparse and redundant representations in image processing,” Proceedings of the IEEE, vol. 98, no. 6, pp. 972–982, 2010. V. P. Pauca, J. Piper, and R. J. Plemmons, “Nonnegative matrix factorization for spectral data analysis,” Linear Algebra and its Applications, vol. 416, no. 1, pp. 29–47, 2006. L. Miao and H. Qi, “Endmember extraction from highly mixed data using minimum volume constrained nonnegative matrix factorization,” IEEE Transactions on Geoscience and Remote Sensing, vol. 45, no. 3, pp. 765–777, 2007. S. Li, X. Hou, H. Zhang, and Q. Cheng, “Learning spatially localized, parts-based representation,” in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), vol. 1, 2001, pp. 207–212. I. Kotsia, S. Zafeiriou, and I. Pitas, “A novel discriminant non-negative matrix factorization algorithm with applications to facial image characterization problems,” IEEE Transactions on Information Forensics and Security, vol. 2, no. 3, pp. 588–595, Sep 2007. D. D. Lee and H. S. Seung, “Learning the parts of objects by nonnegative matrix factorization,” Nature, vol. 401, pp. 788–791, 1999. A. Cichocki and A.-H. Phan, “Fast local algorithms for large scale nonnegative matrix and tensor factorizations,” IEICE Trans. on Fundamentals of Electronics, vol. E92-A, no. 3, pp. 708–721, 2009. R. Peharz and F. Pernkopf, “Sparse nonnegative matrix factorization using ℓ0 -constraints,” Neurocomputing, vol. 80, pp. 38–46, Mar 2012. P. O. Hoyer, “Non-negative matrix factorization with sparseness constraints,” Journal of Machine Learning Research, vol. 5, pp. 1457–1469, 2004. V. P. Pauca, F. Shahnaz, M. W. Berry, and R. J. Plemmons, “Text mining using non-negative matrix factorizations,” in Proc. of the Fourth SIAM International Conference on Data Mining, 2004, pp. 452–456. F. Shahnaz, M. W. Berry, V. P. Pauca, and R. J. Plemmons, “Document clustering using nonnegative matrix factorization,” Information Processing & Management, vol. 42, no. 2, pp. 373–386, 2006. H. H. Jerome Friedman, Trevor Hastie and R. Tibshirani, “Pathwise coordinate optimization,” Annals of Applied Statistics, vol. 1, no. 2, pp. 302–332, 2007. M. Aharon, M. Elad, and A. Bruckstein, “K-SVD and its nonnegative variant for dictionary design,” vol. 5914, pp. 327–339, Jul. 2005. R. Peharz, M. Stark, and F. Pernkopf, “Sparse nonnegative matrix factorization using ℓ0 -constraints,” pp. 83–88, Sep. 2010. D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” pp. 556–562, 2001. M. W. Berry, M. Browne, A. N. Langville, V. P. Pauca, and R. J. Plemmons, “Algorithms and applications for approximate nonnegative matrix factorization,” Computational Statistics & Data Analysis, vol. 52, no. 1, pp. 155–173, 2007. F. C. Xing, “Investigation on solutions of cubic equations with one unkonwn,” Natural Sci. Ed., vol. 12, no. 3, pp. 207–218, 2003. M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. on Signal Processing, vol. 54, no. 11, pp. 4311–4322, Nov. 2006.

[1] D. L. Donoho, “Compressed sensing,” Information Theory, IEEE Transactions on, vol. 52, no. 4, pp. 1289 –1306, april 2006. [2] Y. He, T. Gan, W. Chen, and H. Wang, “Multi-stage image denoising based on correlation coefficient matching and sparse dictionary pruning,” Signal Processing, vol. 92, no. 1, pp. 139–149, 2012. [3] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 2, pp. 210 –227, feb. 2009. [4] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397–3415, Dec 1993. [5] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Review, vol. 43, no. 1, pp. 129–159, 2001. [6] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching

2013 IEEE International Conference on Cybernetics

67

Suggest Documents