NON-NEGATIVE MATRIX FACTORISATION OF ... - CiteSeerX

2 downloads 0 Views 757KB Size Report
NON-NEGATIVE SIGNALS. Paul D. O'Grady. Scott T. Rickard. Complex & Adaptive Systems Laboratory,. University College Dublin,. Belfield, Dublin 4, Ireland.
NON-NEGATIVE MATRIX FACTORISATION OF COMPRESSIVELY SAMPLED NON-NEGATIVE SIGNALS Paul D. O’Grady

Scott T. Rickard

Complex & Adaptive Systems Laboratory, University College Dublin, Belfield, Dublin 4, Ireland. ABSTRACT The new emerging theory of Compressive Sampling has demonstrated that by exploiting the structure of a signal, it is possible to sample a signal below the Nyquist rate and achieve perfect reconstruction. In this short note, we employ Non-negative Matrix Factorisation in the context of Compressive Sampling and propose two NMF algorithms for signal recovery—one of which utilises Iteratively Reweighted Least Squares. The algorithms are applied to compressively sampled non-negative data, where a sparse non-negative basis and corresponding non-negative coefficients for the original uncompressed data are discovered directly in the compressively sampled domain. 1. INTRODUCTION The Nyquist-Shannon sampling theorem states that in order for a continuous-time signal to be represented without error from its samples, the signal must be sampled at a rate that is at least twice its bandwidth. In practice, signals are often compressed soon after sampling, trading off perfect recovery for some acceptable level of error. Clearly, this is a waste of valuable sampling resources. In recent years, a new and exciting theory of Compressive Sampling (CS) [1, 2] (also known as compressed sensing among other related terms) has emerged, in which a signal is sampled and compressed simultaneously using sparse representations at a greatly reduced sampling rate. The central idea being that the number of samples needed to recover a signal perfectly depends on the structural content of the signal—as captured by a sparse representation that parsimoniously represents the signal—rather than its bandwidth. More formally, CS is concerned with the solution, x ∈ RN , of an under-determined systems of linear equations of the form ΦAx = Φy, where the sensing matrix Φ ∈ RM ×N has fewer rows than columns, i.e., M < N . Critical to the theory of CS is the assumption that the solution x is sparse, i.e., y has a parsimonious representation in a known fixed basis A ∈ RN ×N . The most natural norm constraint for this assumption is the `0 (pseudo-)norm, as it indicates the number of nonzero coefficients. However, minimisation of the `0 norm is a non-convex optimisation, which is NP-complete and cannot be computed in This material is based upon works supported by the Science Foundation Ireland under Grant No. 05/YI2/I677.

polynomial time. For these reasons the `1 norm is usually specified, as it is computationally tractable and also recovers sparse solutions, min kxk1 , subject to ΦAx = Φy,

x∈RN

(1)

where the recovered signal, x, is such a solution. In order to specify the minimal number of measurements, M , required to achieve perfect recovery, Φ needs to be maximally incoherent with A i.e., have a non-parsimonious representation in that basis—a notion which is contrary to sparseness. Typically, the entries of Φ are drawn from a random Gaussian distribution, as it is universally incoherent with sparse transformations, and performs exact recovery with the minimal number of measurements with high probability. Furthermore, Cand`es and Tao [3] present an important result that gives a lower bound on M that reliably achieves perfect recovery for a K-sparse signal (kxk0 = K): M ≥ CK log(N ), where C depends on the desired probability of success, which tends to one as N → ∞. In this short note, we outline two Non-negative Matrix Factorisation algorithms that discover factors for uncompressed nonnegative data in the compressively sampled domain. The first algorithm utilises Iteratively Reweighted Least Squares as an approximation to the `1 -norm objective, while the second algorithm is a modification of the standard least squares NMF algorithm of Lee and Seung [4]. This note is organised as follows: We discuss Iteratively Reweighted Least Squares in Section 2 and Non-negative Matrix Factorisation in Section 3. We overview Nonnegative Underdetermined Iteratively Reweighted Least Squares and demonstrate signal recovery in in Section 4. Finally, we propose two algorithms for Non-negative Matrix Factorisation in the CS domain, and present an image recovery example in Section 5; followed by a conclusion in Section 6. 2. ITERATIVELY REWEIGHTED LEAST SQUARES We desire the minimum `1 -norm solution for systems of linear equations, and require an objective function that recovers such solutions. However, the `1 -norm objective has a discontinuity at the origin, and is therefore non-differentiable and cannot be minimised using standard gradient methods. Typically, the `1 norm objective is approximated by Iteratively Reweighted Least

Squares (IRLS), which approximates the `1 -norm objective by reweighting the differentiable Least Squares objective: min kQk xk2 , subject to Φx = y,

(2)

x∈RN

where A is the canonical basis and Q0 = I (identity matrix) at the first iteration; resulting in the initial solution being the least squares solution. For subsequent iterations, Qk is formed from the residuals (rk = x − Φxk ) at each iteration k: Qk = diag(f (rk,1 ), · · · , f (rk,N )), where f (·) is the specified weighting function. IRLS algorithms, unlike the pseudo-inverse, have no closed-form solution, as Q is dependent on the previous `p norm solution estimate, x. Therefore, in order to improve the estimate of the `p -norm solution, the procedure is repeated for a number of iterations. For over-determined, M > N , systems of equations the IRLS update is xk+1 = (ΦT Qk Φ)−1 ΦT Qk y.

(3)

Typically, the weighting function is the Huber function [5], ( |rk,i |p−2 if |rk,i | >  k f (ri ) = , 1 ≤ p ≤ 2, (4)  p−2 if |rk,i | ≤  where the function penalises reconstruction error linearly for large error and behaves quadratically when the error falls beneath , which is close to the discontinuity. For under-determined systems of equations, the FOCUSS algorithm [6] may be used, min kgk2 , subject to ΦQg = y,

x = Qg

g∈RN

(5)

which performs non-convex `p -norm minimisations, 0 ≤ p ≤ 1, and recovers sparse solutions. The update equations for FOCUSS are: gk+1 = Qk ΦT (ΦQk ΦT )−1 y,

(6a)

xk+1 = Qk gk+1 ,

(6b) (1−(p/2))

Qk = diag(|xk |

).

(6c)

3. NON-NEGATIVE MATRIX FACTORISATION A popular method for the analysis of non-negative matrices is Non-negative Matrix Factorisation (NMF) [7, 4], which approximates a non-negative matrix, Y, as a product of two nonnegative matrices, Y ≈ AX; decomposing the matrix into a non-negative basis, A, and associated coefficients X. NMF is a parts-based approach that makes no statistical assumption about the data. Instead, it assumes for the domain at hand that negative numbers are physically meaningless—which is the foundation for the assumption that the search for a decomposition should be confined to the non-negative orthant, i.e., the nonnegativity assumption. The lack of statistical assumptions makes it difficult to prove that NMF will give correct decompositions. However, it has been shown in practice to give correct results. NMF, and its extensions, has been applied to a wide variety of problems including audio processing [8, 9] and automatic ASCII Art conversion [10].

3.1. Standard NMF Algorithm The NMF algorithm is as follows: Given a non-negative matrix Y ∈ R≥0,N ×T , the goal is to approximate Y as a product of two non-negative matrices A ∈ R≥0,N ×R and X ∈ R≥0,R×T , Y ≈ AX. The parameter R, which specifies the number of columns in A and rows in X, determines the rank of the approximation. Typically R ≤ M , where A is over-determined and NMF reveals low-rank features of the data. The selection of an appropriate value for R usually requires prior knowledge, and is important to obtaining a satisfactory decomposition. An important consideration in the derivation of the NMF algorithm is the selection of the objective function, Lee and Seung [4] utilise the Least Squares (LS) objective, DLS (Y, A, X) =

1 kY − AXk2 . 2

(7)

NMF minimises Eq. 7 while enforcing a non-negativity constraint on the resulting factors results in a parts-based decomposition, where the basis in A resembles parts of the input data, which can only be summed together to approximate Y. The NMF objective (Eq. 7) is convex in A and X individually but not together. Therefore NMF algorithms usually alternate updates of A and X. The objective is minimised using a diagonally rescaled gradient descent algorithm [4], which enforces the non-negativity constraint and leads to the following multiplicative update equations, A←A⊗

YXT , AXXT

X←X⊗

AT Y , AT AX

(8)

where ⊗ denotes an element-wise (also known as Hadamard or Schur product) multiplication, and division is also elementwise. As the NMF algorithm iterates, its factors converge to a local optimum of Eq. 7. For the interested reader, illustrative examples of the factors obtained by NMF when applied to synthetic data are presented in [9]. 4. NON-NEGATIVE UNDER-DETERMINED IRLS Recently we proposed an algorithm, referred to as Non-negative Under-determined Iteratively Reweighted Least Squares (NUIRLS) [11], for the recovery of non-negative sparse signals for the special case where compressive sampling is entirely non-negative, i.e. Φ, A, x, y ≥ 0. NUIRLS utilises under-determined IRLS (Eq. 5) and is derived within the framework of NMF resulting in a multiplicative update equation, which recovers non-negative minimum `p -norm 0 ≤ p ≤ 1 solutions. The NUIRLS algorithm employs the following objective function, DNUIRLS (y, Φ, Qk , g) =

1 ky − ΦQk gk2 , 2

(9)

which assumes A is the canonical basis and performs the minimisation problem stated in Eq. 5. Unlike NMF, NUIRLS does not perform factorisation but recovers non-negative `p -norm solutions given a fixed sensing matrix, Φ. However, NUIRLS can be used as a NMF update for X, when R > M .

Original Signal: x 0.44

0 1 0.69

Compressively Sampled Signal: y = Φx

128

0.24 1 0.44

Signal Recovery Using NUIRLS p = 1: x

50

used in the next NMF iteration and so on. As this process is repeated, NUIRLS converges to a local optimum of Eq. 9. Furthermore, unlike the standard NMF update equations (Eq. 8), which have update equations for each matrix A & X, the update equation is restricted to column vectors x, as Q is specific to each x. 4.1. Recovery Experiments

We present an illustrative example of the compressive sampling of non-negative signals using NUIRLS in Figure 1, where x is non-negative and the entries of Φ are drawn from a rectified Gaussian distribution. We run NUIRLS for 350 NMF itera0 1 128 tions with each having 100 IRLS iterations, and test NUIRLS using p = 1 & p = 2; where p = 2 results in solutions that Fig. 1. Signal recovery from a compressively sampled non-negative signal correspond to Least Squares NMF, which may be considered using NUIRLS. a non-negative pseudo-inverse. Figure 1 demonstrates that it M =80 N =128 is possible to recover sparse solutions from the compressively 1 sampled signals by selecting the minimum `2 -norm solution, L1 NUIRLS L0.5 NUIRLS 0.9 p = 2. In contrast, for real-valued signals, it is well known L0 NUIRLS LS NMF that the pseudo-inverse recovers dense solutions, which neces0.8 KLD NMF sitates the use of sparse norms for standard CS. This result sug0.7 gests that the non-negativity constraint of NUIRLS is enough to recover the original signal. 0.6 To further investigate, we perform signal recovery using 0.5 NUIRLS, and compare the recovered signals to those recov0.4 ered by the standard Lee and Seung update equations for Least Squares (X update: Eq. 8) and Kullback-Leibler Divergence 0.3 NMF [4]. The experiment is as follows: We perform compres0.2 sive sampling where Φ and x are of fixed dimension, M = 80 & N = 128, and test for a number of signals with increas0.1 ing K-sparseness, with K = 60 being the maximum. We run 0 10 20 30 40 50 60 NUIRLS for 1500 NMF iterations, each having 150 IRLS itK erations, and specify p = {0, 0.5, 1}. In order to keep both Fig. 2. Probability of recovering a signal with SNR ≥ 60 dB as a function of NUIRLS and standard NMF in an even setting, NMF is run for 225000 (1500 × 150) iterations. The experiment is repeated for signal K-sparseness. 40 Monte Carlo runs, where a new Φ is constructed for each run. The Signal-to-Noise Ratio (SNR) of the recovered signals The update equations for the standard NMF algorithm (Eq. 8) are averaged over all Monte Carlo runs, and the probability of are derived using the diagonally rescaled gradient descent algo- recovering a signal with SNR ≥ 60 dB as a function of K is rithm, the NUIRLS update equations are also derived in this plotted in Figure 2. way, resulting in the following algorithm: The plot indicates that for recovered signals with a required SNR of 60 dB, standard NMF successfully achieves the deT Qk Φ y , (10a) sired SNR for K ≤ 15, while NUIRLS achieves the same gk+1 = gk ⊗ Qk ΦT ΦQk gk SNR for K ≤ 19, which demonstrates that if the compressively xk+1 = Qk g, (10b) sampled signal is sufficiently sparse, in this case K ≤ 15, a Qk = diag((gk )(1−(p/2)) ). (10c) non-negativity constraint is enough to recover the signal [12]. Therefore, the least squares NMF update can be employed in NUIRLS is initialised with Q0 = I, resulting in the initial es- the recovery of sufficiently sparse signals from compressively timate, x1 , being the minimum non-negative `2 -norm solution. sampled non-negative data. However, as sparseness decreases, Furthermore, the NUIRLS algorithm resembles Eq. 6, the key NUIRLS recovers better signals. difference being the preservation of non-negativity through a multiplicative update equation. 5. NMF IN THE CS DOMAIN As discussed in Section 3, NMF is an iterative algorithm, IRLS is also an iterative algorithm, combining both results in We propose two algorithms for the factorisation of compresNUIRLS being a two-step iterative algorithm, where the mini- sively sampled non-negative matrices, where factors for the origmum `2 -norm solution at each NMF iteration is iteratively re- inal uncompressed matrix are discovered in the CS domain. weighted to recover a minimum `p -norm solution, which is This problem is a special case of CS where a sparse basis for Prob. of Rec. Sig. SNR ≥60 dB

0 1 0.44

Signal Recovery Using NUIRLS p = 2: x

128

Original Image

Recovered Image 1

Recovered Image 2

Compressed Image

Fig. 3. Image recovery from a compressed cameraman image using the proposed NMF algorithms, where both a sparse basis and coefficients are discovered. the data is not known in advance, and is discovered indirectly from the compressively sampled data. The goal is to approximate an uncompressed non-negative matrix Y ∈ R≥0,N ×T as a product of two non-negative matrices A ∈ R≥0,N ×R and X ∈ R≥0,R×T , where Φ is known and the factors are discovered from a compressively sampled non-negative matrix Ycs ∈ R≥0,M ×T ; Ycs = ΦY ≈ ΦAX. First, we propose an NMF algorithm based on IRLS: The NUIRLS algorithm (Eq. 10) is used to update the columns of X where Φ is replaced by Acs = ΦA—which is required to be under-determined—and an update based on over-determined IRLS (Eq. 3) is used to discover the rows, acs , of Acs : acs k+1 = Qk =

csT XQT k Qk y acs , k ⊗ T cs XQT k Qk X ak diag(f (ycsT − XT acs k )),

(11a) (11b)

where ycs is the corresponding row of Y and f (·) is the Huber function (Eq. 4). Since we know Φ we can recover A from Acs using the NUIRLS algorithm. Second, utilising the fact that the least squares NMF update can recover sufficiently sparse signals as demonstrated in Section 4.1, we propose a modification of least squares NMF as a CS recovery algorithm: min A,X

1 kYcs − ΦAXk2 , 2

which results in the following updates X←X⊗

ΦT Ycs XT (ΦA)T Ycs , A←A⊗ T . (12) T (ΦA) ΦAX Φ ΦAXXT

5.1. Image Example We present a simple example of image recovery using the proposed algorithms in Figure 3, where the image is partitioned into blocks, which are arranged as vectors, compressed and placed in the columns of Ycs . For the first algorithm the experiment is run for 150 NMF & 100 IRLS iterations, while the second is run for 150×100 iterations. Furthermore, M = 49, N = 64, R = 64 & p = 1. Both algorithms recover a good approximation (16dB SNR approx.) of the uncompressed image, demonstrating that an appropriate factorisation is discovered.

6. CONCLUSION In this note, we briefly outlined two Non-negative Matrix Factorisation algorithms that discover factors for uncompressed nonnegative data in the compressively sampled domain. We proposed an NMF algorithm based on the NUIRLS algorithm, and in light of recent results presented by the authors—which indicate that standard NMF updates can be used for signal recovery— we proposed a modification of the standard least squares NMF algorithm. Further details and investigation of these algorithms will be presented in later work. 7. REFERENCES [1] E. J. Cand`es, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” Information Theory, IEEE Transactions on, vol. 52, no. 2, pp. 489–509, 2006. [2] David L. Donoho, “Compressed sensing,” Information Theory, IEEE Transactions on, vol. 52, no. 4, pp. 1289–306, 2006. [3] Emmanuel Cand`es and Terence Tao, “Near optimal signal recovery from random projections: Universal encoding strategies?,” IEEE Transactions on Information Theory, vol. 52, pp. 5406–25, 2006. [4] Daniel D. Lee and H. Sebastian Seung, “Algorithms for non-negative matrix factorization,” in Adv. in Neu. Info. Proc. Sys. 13. 2001, pp. 556– 62, MIT Press. [5] P. J. Huber, Robust Statistics, Wiley, 1981. [6] Bhaskar D. Rao and Kenneth Kreutz-Delgado, “An affine scaling methodology for best basis selection,” IEEE Transactions on Signal Processing, vol. 47, no. 1, pp. 187–200, January 1999. [7] P. Paatero and U. Tapper, “Positive matrix factorization: A nonnegative factor model with optimal utilization of error estimates of data values,” Environmetrics, vol. 5, pp. 111–26, 1994. [8] Paris Smaragdis, “Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs,” in Fifth International Conference on Independent Component Analysis, Granada, Spain, Sept. 22–24 2004, LNCS 3195, pp. 494–9, Springer-Verlag. [9] Paul D. O’Grady and Barak A. Pearlmutter, “Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint,” Neurocomputing, 2008, In press. [10] Paul D. O’Grady and Scott T. Rickard, “Automatic ASCII art conversion of binary images using non-negative constraints,” in Proceedings of the Irish Signal and Systems Conference, 2008, pp. 186–191. [11] Paul D. O’Grady and Scott T. Rickard, “Compressive sampling of nonnegative signals,” in International Workshop on Machine Learning for Signal Processing, Can´cun, Mexico, Oct. 16–19 2008, IEEE Press. [12] Alfred M. Bruckstein, Michael Elad, and Michael Zibulevsky, “A nonnegative and sparse enough solution of an underdetermined linear system of equations is unique,” Submitted to IEEE Transactions on Information Theory, 2008.

Suggest Documents