Nonnegative Matrix Factorization: Algorithms and Applications - SIAM

0 downloads 0 Views 619KB Size Report
This work was supported in part by the National Science Foundation. ... Nonnegative Matrix Factorization: Algorithms and Applications ... Math, Georgia Tech.
Nonnegative Matrix Factorization: Algorithms and Applications Haesun Park [email protected] School of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA, USA SIAM International Conference on Data Mining, April, 2011

This work was supported in part by the National Science Foundation.

Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

Co-authors

Jingu Kim

CSE, Georgia Tech

Yunlong He

Math, Georgia Tech

Da Kuang

CSE, Georgia Tech

Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

Outline Overview of NMF Fast algorithms with Frobenius norm Theoretical results on convergence Multiplicative updating Alternating nonnegativity constrained least Squares: Active-set type methods, ... Hierarchical alternating least squares

Variations/Extensions of NMF : sparse NMF, regularized NMF, nonnegative PARAFAC Efficient adaptive NMF algorithms Applications of NMF, NMF for Clustering Extensive computational results Discussions

Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

Nonnegative Matrix Factorization (NMF) (Lee&Seung 99, Paatero&Tapper 94)

Given A ∈ R+ m×n and a desired rank k k , compute NMF for A − WH ≈ ∆W ∆H. Set W ˜ and H = [H; ∆H] ˜ and H ˜ with k˜ pairs of (wi , hi ) with largest If k˜ < k , initialize W T kwi hi kF = kwi k2 khi k2 ˜ and H ˜ using HALS algorithm. Update W Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

Model Selection in NMF Clustering (He, Kim, Cichocki, and Park, in preparation) Consensus matrix based on A ≈ WH: ( 0 max(H(:, i)) = max(H(:, j)) Cijt = , t = 1, . . . , l 1 max(H(:, i)) 6= max(H(:, j)) Dispersion coefficient ρ(k ) = P t C = 1l C

1 n2

Pn

i=1

Pn

− 21 )2 , where

j=1 4(Cij

Reordered Consensus matrix, k=4

Reordered Consensus Matrix, k=3

1

1 200

0.9

200

0.9

400

0.8

400

0.8

600

0.7

600

0.7

800

0.6

800

0.6

1000

0.5

1000

0.5

1200

0.4

1200

0.4

1400

0.3

1400

0.3

0.2

1600

0.1

1800

0

2000

0.95

0.9

1600

Dispersion coefficient

0.85

0.8

0.75

0.7

0.2 0.65

1800 2000

500

1000

1500

2000

Reordered Consensus Matrix, k=5

0.1

500

1000

1500

2000

0

3

3.5

4

4.5 5 Approximation Rank k

5.5

6

Reordered Consensus Matrix, k=6 1

1 12

200

0.9

200

0.9

400

0.8

400

0.8

600

0.7

600

0.7

800

0.6

800

0.6

1000

0.5

1000

0.5

1200

0.4

1200

0.4

1400

0.3

1400

0.3

1600

0.2

1600

0.2

1800

0.1

1800

0.1

0

2000

11

Execution time (seconds)

10

9

8

7

6

5

4

2000

500

1000

1500

2000

500

1000

1500

2000

0

AdaNMF

Recompute

Warm−restart

Clustering results on MNIST digit images (784 × 2000) by AdaNMF with k = 3, 4, 5 and 6. Averaged consensus matrices, dispersion coefficient, execution time Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

Adaptive NMF for Varying Reduced Rank (He, Kim, Cichocki, and Park, in preparation)

0.18

0.18 Recompute Warm−restart AdaNMF

0.16

Recompute Warm−restart AdaNMF

0.16 0.14

Relative error

Relative error

0.14

0.12

0.12 0.1 0.08

0.1 0.06 0.08 0.04 0.06

0

1

2

3 4 Execution time

5

6

7

0.02

0

2

4

6 Execution time

8

10

12

Relative error vs. exec. time of AdaNMF and “recompute”. Given an NMF of 600 × 600 synthetic matrix with k = 60, compute NMF with k˜ = 50, 80.

Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

Adaptive NMF for Varying Reduced Rank (He, Kim, Cichocki, and Park, in preparation)

Theorem: For A ∈ Rm×n + , If rank+ (A) > k , then (k +1) (k +1) min kA − W H kF < min kA − W (k ) H (k ) kF , m×i (i) where W ∈ R+ and H (i) ∈ Ri×n + . 0.35 Training error Testing error

rank=20 rank=40 rank=60 rank=80

0.3

0.3

0.25

0.25 Classification error

Relative Objective Function Value of NMF

0.35

0.2

0.15

0.2

0.15

0.1

0.1

0.05

0.05

0

0

20

40

60 80 100 120 Approximation Rank k of NMF

140

160

0

5

10

15

20 25 30 35 Reduced rank k of NMF

40

45

50

Rank path on synthetic data set: relative residual vs. k ORL Face image (10304 × 400) classification errors (by LMNN) on training and testing set vs. k . k -dim rep. HT of training data T by BPP minHT ≥0 kWHT − T kF Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

NMF for Dynamic Data (DynNMF) (He, Kim, Cichocki, and Park, in preparation) ˆ how to compute NMF Given an NMF (W , H) for A = [δA A], ˜ ˜ ˜ ˆ (W , H) for A = [A ∆A] fast ? (Updating and Downdating)

DynNMF (Sliding Window NMF) ˜ as follows: Initialize H ˆ be the remaining columns of H. Let H Solve min∆H≥0 kW ∆H − ∆Ak2F using block principal pivoting ˜ = [H ˆ ∆H] Set H

˜ with initial factors W ˜ = W and H ˜ Run HALS on A Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

DynNMF for Dynamic Data (He, Kim, Cichocki, and Park, in preparation)

PET2001 data with 3064 images from a surveillance video. DynNMF on 110, 592 × 400 data matrix each time, with 100 new columns and 100 obsolete columns. The residual images track the moving vehicle in the video. Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

NMF as a Clustering Method (Kuang and Park, in preparation) Clustering and Lower Rank Approximation are related. NMF for Clustering: Document (Xu et al. 03), Image (Cai et al. 08), Microarray (Kim & Park 07), etc.

Equivalence of objective functions between k-means and NMF: (Ding, et al., 05; Kim & Park, 08)

min

n X

kai − wSi k22 = min kA − WHk2F

i=1

Si = j when i-th point is assigned to j-th cluster (j ∈ {1, · · · , k }) k-means: W : k cluster centroids, H ∈ E NMF: W : basis vectors for rank-k approximation, H: representation of A in W space (E: matrices whose columns are columns of an identity matrix )

NOTE: The equivalence of obj. functions holds when H ∈ E, A ≥ 0.

Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

NMF and K-means min kA − WHk2F s.t. H ∈ E Paths to solution: K-means: Expectation-Minimization NMF: Relax the condition on H to H ≥ 0 with orthogonal rows or H ≥ 0 with sparse columns - soft clustering

TDT2 text data set: (clustering accuracy aver. among 100 runs) # clusters 2 6 10 14 18 K-means 0.8099 0.7295 0.7015 0.6675 0.6675 NMF/ANLS 0.9990 0.8717 0.7436 0.7021 0.7160 Sparsity constraint improves clusteringP result (J. Kim and Park, 08): minW ≥0,H≥0 kA − WHk2F + ηkW k2F + β nj=1 kH(:, j)k21 # of times achieving optimal assignment (a synthetic data set, with a clear cluster structure ): k NMF SNMF

3 69 100

6 65 100

9 74 100

12 68 100

15 44 97

NMF and SNMF much better than k-means in general. Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

NMF, K-means, and Spectral Clustering (Kuang and Park, in preparation) Equivalence of objective functions is not enough to explain the clustering capability of NMF: NMF is more related to spherical k-means, than to k-means → NMF shown to work well in text data clustering Symmetric NMF: minS≥0 kA − SS T kF , A ∈ R+ n×n : affinity matrix Spectral clustering → Eigenvectors (Ng et al. 01), A normalized if needed, Laplacian,... Symmetric NMF (Ding et al.) → can handle nonlinear structure, and S ≥ 0 natually captures a cluster structure in S

Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

Summary/Discussions Overview of NMF with Frobenius norm and algorithms Fast algorithms and convergence via BCD framework Adaptive NMF algorithms Variations/Extensions of NMF : nonnegative PARAFAC and sparse NMF NMF for clustering Extensive computational comparisons

Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

Summary/Discussions Overview of NMF with Frobenius norm and algorithms Fast algorithms and convergence via BCD framework Adaptive NMF algorithms Variations/Extensions of NMF : nonnegative PARAFAC and sparse NMF NMF for clustering Extensive computational comparisons NMF for clustering and semi-supervised clustering NMF and probability related methods NMF and geometric understanding NMF algorithms for large scale problems, parallel implementation? GPU? Fast NMF with other divergences (Bregman and Csiszar divergences) NMF for blind source separation? Uniqueness? More theoretical study on NMF especially for foundations for computational methods NMF Matlab codes and papers available at http://www.cc.gatech.edu/∼hpark and http://www.cc.gatech.edu/∼jingu Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

Collaborators Today’s talk

Jingu Kim CSE, Georgia Tech

Yunlong He Math, Georgia Tech

Krishnakumar Balasubramanian Prof. Michael Berry Prof. Moody Chu Dr. Andrzej Cichocki Prof. Chris Ding Prof. Lars Elden Dr. Mariya Ishteva Dr. Hyunsoo Kim Anoop Korattikara Prof. Guy Lebanon Liangda Li Prof. Tao Li Prof. Robert Plemmons Andrey Puretskiy Prof. Max Welling Dr. Stan Young

Da Kuang CSE, Georgia Tech

CSE, Georgia Tech EECS, Univ. of Tennessee Math, North Carolina State Univ. Brain Science Institute, RIKEN, Japan CSE, UT Arlington Math, Linkoping Univ., Sweden CSE, Georgia Tech Wistar Inst. CS, UC Irvine CSE, Georgia Tech CSE, Georgia Tech CS, Florida International Univ. CS, Wake Forest Univ. EECS, Univ. of Tennessee CS, UC Irvine NISS Thank you!

Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

Related Papers by H. Park’s Group H. Kim and H. Park, Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares for Microarray Data Analysis. Bioinformatics, 23(12):1495-1502, 2007. H. Kim, H. Park, and L. Eldén. Non-negative Tensor Factorization Based on Alternating Large-scale Non-negativity-constrained Least Squares. Proc. of IEEE 7th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1147-1151, 2007. H. Kim and H. Park, Nonnegative Matrix Factorization Based on Alternating Non-negativity-constrained Least Squares and the Active Set Method. SIAM Journal on Matrix Analysis and Applications, 30(2):713-730, 2008. J. Kim and H. Park, Sparse Nonnegative Matrix Factorization for Clustering, Georgia Tech Technical Report GT-CSE-08-01, 2008. J. Kim and H. Park. Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons. Proc. of the 8th IEEE International Conference on Data Mining (ICDM), pp. 353-362, 2008. B. Drake, J. Kim, M. Mallick, and H. Park, Supervised Raman Spectra Estimation Based on Nonnegative Rank Deficient Least Squares. In Proceedings of the 13th International Conference on Information Fusion, Edinburgh, UK, 2010. Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications

Related Papers by H. Park’s Group A. Korattikara, L. Boyles, M. Welling, J. Kim, and H. Park, Statistical Optimization of Non-Negative Matrix Factorization. Proc. of The Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR: W&CP 15, 2011. J. Kim and H. Park, Fast Nonnegative Matrix Factorization: an Active-set-like Method and Comparisons, Submitted for review, 2011. J. Kim and H. Park, Fast Nonnegative Tensor Factorization with an Active-set-like Method, In High-Performance Scientific Computing: Algorithms and Applications, Springer, in preparation. Y. He, J. Kim, A. Cichocki, and H. Park, Fast Adaptive NMF Algorithms for Varying Reduced Rank and Dynamic Data, in preparation. L. Li, G. Lebanon, and H. Park, Fast Algorithm for Non-Negative Matrix Factorization with Bregman and Csiszar Divergences, in preparation. D. Kuang and H. Park, Nonnegative Matrix Factorization for Spherical and Spectral Clustering, in preparation.

Haesun Park [email protected]

Nonnegative Matrix Factorization: Algorithms and Applications