Document not found! Please try again

Row-action Projections for Nonnegative Matrix Factorization

0 downloads 0 Views 317KB Size Report
complexity of the optimization algorithms used for factor learning from ... A good representative of the row-action technique is the Kaczmarz algorithm [14].
ICANN2014, 177, v1 (final): ’Row-action Proj...’

Row-action Projections for Nonnegative Matrix Factorization Rafal Zdunek Department of Electronics, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, POLAND

Abstract. Nonnegative Matrix Factorization (NMF) is more and more frequently used for analyzing large-scale nonnegative data, where the number of samples and/or the number of observed variables is large. In the paper, we discuss two applications of the row-action projections in the context of learning latent factors from large-scale data. First, we show that they can be efficiently used for improving the on-line learning in dynamic NMF. Next, they can also considerably reduce the computational complexity of the optimization algorithms used for factor learning from strongly redundant data. The experiments demonstrate high efficiency of the proposed methods. Key words: NMF, Kaczmarz method, On-line NMF, Row-action projections, Feature extraction

1

Introduction

Recent advances in the development of information technologies have led to a considerably increase in the amount of high-dimensional massive data that needs to be analyzed and interpreted. Nonnegative Matrix Factorization (NMF) [1,2] is a commonly-used static tool for dimensionality reduction of nonnegative smallscale and medium-scale data. Processing of large-scale data or dynamic data with NMF is still a challenging and open problem. To tackle this problem, several computational strategies have been recently proposed in the literature. When the data is large, i.e. both in the number of samples and observed variables, and the aim is to extract a few feature vectors from the dataset, the alternating minimization problems in NMF are highly over-determined. To decrease the computational complexity, the optimization tasks can be partitioned into some blocks, and then the computations can be arranged in parallel or distributed modes [3–6]. The heavy computations can be also performed in the cloud computing [7] using the MapReduce framework. Efficient algorithms for factorizing large-scale multi-linear data in parallel can be found, e.g. in [8]. Another computational strategy assumes an approximate factorization, which is a typical case in practice. Highly over-determined problems usually come from huge redundancy. To alleviate the computational complexity of such problems, the solution can be updated using only partial but most relevant information extracted from an entire dataset. When the data is represented by a matrix, the

1

2

ICANN2014, 177, v1 (final): ’Row-action Proj...’

2

R. Zdunek

feature vectors in NMF can be updated using only the selected columns from the observation matrix but the encoding vectors with some subset of the rows. This concept was proposed in [2], and referred to as the Large-Scale NMF (LS-NMF). In this paper, we extend the LS-NMF by using a different computational strategy for updating the factors. Using the row-action projection technique, we can update the factors using a smaller subset of data than in the LS-NMF. Moreover, the sequential projections have a better numerical behavior for ill-conditioned least-squares problems than the batch projections. When the number of samples is large whereas the other dimensions are relatively small, a good choice seems to be Online NMF (ONMF). This technique was proposed by Cao et al. [9] in the context of temporal data analysis and tracking latent factors from time-varying data streams. Another version of ONMF was proposed in [10] for document clustering. ONMF is also a key tool for analyzing long records of audio signals, especially for blind separation of real-time sources. An interesting property of ONMF is not only a considerable reduction in computational complexity both in time and memory but also a possibility of updating the feature vectors over the time. Applying NMF to magnitude spectrograms of audio signals, the feature vectors represent frequency profiles, common for the whole observed signals. ONMF gives us a possibility to extract time-varying frequency profiles that are most suitable for analyzing non-stationary stochastic signals. In this context, Lefevre et al. [11] proposed ONMF that minimizes the Itakura-Saito distance. ONMF can be also resolved with the geometry-based algorithms [12] or stochastic approximation algorithms [13]. In this paper, we also extend ONMF by using the row-action projection-based algorithms that are very fast and are known to be efficient in other applications. A good representative of the row-action technique is the Kaczmarz algorithm [14] that was proposed for solving an over-determined system of linear equations. Due to its row-action projections, it has already found numerous real-world applications, especially in tomographic image reconstruction [15, 16]. In the Kaczmarz algorithm, a whole unknown vector of a solution can be updated at one iterative step, using only one equation, i.e. one row of a system matrix. This property can be efficiently used for updating a whole set of feature vectors using only one sample at a given time instant. This rule motivates the usage of this algorithm both for ONMF and LS-NMF. The paper is organized as follows: The row-action projections are discussed in Section 2. The application of the row-action projections to large-scale and on-line NMF is presented Section 3. The numerical experiments are described in Section 4. Finally, the conclusions are drawn in Section 5.

2

Row-action projections

Let us consider the Linear Least-Squares (LLS) problem: minx ||y t − Axt ||22 , where A = [aij ] ∈ RI×J , y t = [yit ] ∈ RI , xt = [xjt ] ∈ RJ and I ≥ J. Let ai ∈ R1×J denote the i-th row of the matrix A. In a geometrical approach, ai determines a hyperplane in RJ . Assuming ai ̸= 0, an orthogonal projection of

ICANN2014, 177, v1 (final): ’Row-action Proj...’

Row-action Projections in NMF

3

any point xt ∈ RJ onto that hyperplane can be expressed by the linear mapping P (i) : RJ → RJ : P (i) (xt ) = xt +

yit − ai xt T ai . ||ai ||22

(1) (0)

Let P (xt ) = P (I) ◦ . . . ◦ P (1) ∈ RJ be a composed mapping, and xt ∈ RJ be an initial guess. The Kaczmarz method [14] iteratively generates the sequence (k) {xt } for k = 1, 2, . . ., by the following updating rule: (k+1)

xt

(k)

= P (xt ).

(2)

(k)

Tanabe [17] proved that the sequence {xt } converges and (k)

lim xt

k→∞

(0)

= PN (A) (xt ) + xLS ,

(3)

where PN (A) (x) is the projection of the point x onto the nullspace of A, and xLS is the minimal-norm least-squares solution. If the LSS problem is consistent, i.e. y t ∈ R(A) (column space of A), then the limit in (3) is its solution. The composition P (xt ) can take different forms. It may be ordered in any random way. However, if the successive hyperplanes are selected to be as orthogonal as possible, the convergence is the fastest. Since this method updates a solution using only one hyperplane that corresponds to one row of the matrix A in each iterative step, it is called the row-action projection method. After sweeping all the rows, one full cycle is completed. The original Kaczmarz method estimates an unconstrained solution vector given only one data vector. In the basic version of NMF, the aim is to simultaneously estimate a group of vectors, subject to nonnegativity constraints. To enforce a nonnegative solution, the following additional projection of P (i) (xt ) [ ] (i) onto the nonnegative orthant RJ+ can be imposed to (1): P+ (xt ) = P (i) (xt ) + , (I)

(1)

where [ξ]+ = max{0, ξ}. Thus P+ (xt ) = P+ ◦. . .◦P+ ∈ RJ+ , which is not equivalent to the projection [P (xt )]+ . The constrained Kaczmarz algorithm has been successfully applied to tomographic image reconstruction from limited-data [18]. Due to the series of constrained projections, it somehow resembles the Hierarchical Alternating Least-Squares (HALS) algorithm [2] that is commonly-used for solving NMF problems. The estimation of several samples of a solution can be done in a sequential way or by vectorization. However, such approaches are usually computationally very expensive. To alleviate this problem, we propose an extended constrained Kaczmarz method for processing all multiple right-hand vectors simultaneously. Let Y = [y 1 , . . . , y T ] ∈ RI×T and X = [x1 , . . . , xT ] ∈ RJ×T . The extended constrained Kaczmarz method for the system AX = Y can be expressed by the following rule: ] [ y − aik X (k) (k+1) (k) T ik , (4) X = X + a ik ||aik ||22 +

3

4

ICANN2014, 177, v1 (final): ’Row-action Proj...’

4

R. Zdunek

where ik ∈ {1, . . . , I} is the index of the row selected in the k-th iterative step. Note that the computational cost of one iterative step in (4) is only O(JT ). Obviously, for one full cycle it amounts to be O(IJT ).

3

Nonnegative Matrix Factorization

The aim of NMF is to find such lower-rank nonnegative matrices A = [aij ] ∈ that Y = [yit ] ∼ and X = [xjt ] ∈ RJ×T RI×J , given the data = AX ∈ RI×T + + + matrix Y , the lower rank J, and possibly some prior knowledge on the matrices A or X. Usually: J

Suggest Documents