Nonnegative Matrix Factorization: Algorithms and Applications Haesun Park
[email protected] School of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA, USA SIAM International Conference on Data Mining, April, 2011
This work was supported in part by the National Science Foundation.
Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Co-authors
Jingu Kim
CSE, Georgia Tech
Yunlong He
Math, Georgia Tech
Da Kuang
CSE, Georgia Tech
Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Outline Overview of NMF Fast algorithms with Frobenius norm Theoretical results on convergence Multiplicative updating Alternating nonnegativity constrained least Squares: Active-set type methods, ... Hierarchical alternating least squares
Variations/Extensions of NMF : sparse NMF, regularized NMF, nonnegative PARAFAC Efficient adaptive NMF algorithms Applications of NMF, NMF for Clustering Extensive computational results Discussions
Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Nonnegative Matrix Factorization (NMF) (Lee&Seung 99, Paatero&Tapper 94)
Given A ∈ R+ m×n and a desired rank k k , compute NMF for A − WH ≈ ∆W ∆H. Set W ˜ and H = [H; ∆H] ˜ and H ˜ with k˜ pairs of (wi , hi ) with largest If k˜ < k , initialize W T kwi hi kF = kwi k2 khi k2 ˜ and H ˜ using HALS algorithm. Update W Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Model Selection in NMF Clustering (He, Kim, Cichocki, and Park, in preparation) Consensus matrix based on A ≈ WH: ( 0 max(H(:, i)) = max(H(:, j)) Cijt = , t = 1, . . . , l 1 max(H(:, i)) 6= max(H(:, j)) Dispersion coefficient ρ(k ) = P t C = 1l C
1 n2
Pn
i=1
Pn
− 21 )2 , where
j=1 4(Cij
Reordered Consensus matrix, k=4
Reordered Consensus Matrix, k=3
1
1 200
0.9
200
0.9
400
0.8
400
0.8
600
0.7
600
0.7
800
0.6
800
0.6
1000
0.5
1000
0.5
1200
0.4
1200
0.4
1400
0.3
1400
0.3
0.2
1600
0.1
1800
0
2000
0.95
0.9
1600
Dispersion coefficient
0.85
0.8
0.75
0.7
0.2 0.65
1800 2000
500
1000
1500
2000
Reordered Consensus Matrix, k=5
0.1
500
1000
1500
2000
0
3
3.5
4
4.5 5 Approximation Rank k
5.5
6
Reordered Consensus Matrix, k=6 1
1 12
200
0.9
200
0.9
400
0.8
400
0.8
600
0.7
600
0.7
800
0.6
800
0.6
1000
0.5
1000
0.5
1200
0.4
1200
0.4
1400
0.3
1400
0.3
1600
0.2
1600
0.2
1800
0.1
1800
0.1
0
2000
11
Execution time (seconds)
10
9
8
7
6
5
4
2000
500
1000
1500
2000
500
1000
1500
2000
0
AdaNMF
Recompute
Warm−restart
Clustering results on MNIST digit images (784 × 2000) by AdaNMF with k = 3, 4, 5 and 6. Averaged consensus matrices, dispersion coefficient, execution time Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Adaptive NMF for Varying Reduced Rank (He, Kim, Cichocki, and Park, in preparation)
0.18
0.18 Recompute Warm−restart AdaNMF
0.16
Recompute Warm−restart AdaNMF
0.16 0.14
Relative error
Relative error
0.14
0.12
0.12 0.1 0.08
0.1 0.06 0.08 0.04 0.06
0
1
2
3 4 Execution time
5
6
7
0.02
0
2
4
6 Execution time
8
10
12
Relative error vs. exec. time of AdaNMF and “recompute”. Given an NMF of 600 × 600 synthetic matrix with k = 60, compute NMF with k˜ = 50, 80.
Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Adaptive NMF for Varying Reduced Rank (He, Kim, Cichocki, and Park, in preparation)
Theorem: For A ∈ Rm×n + , If rank+ (A) > k , then (k +1) (k +1) min kA − W H kF < min kA − W (k ) H (k ) kF , m×i (i) where W ∈ R+ and H (i) ∈ Ri×n + . 0.35 Training error Testing error
rank=20 rank=40 rank=60 rank=80
0.3
0.3
0.25
0.25 Classification error
Relative Objective Function Value of NMF
0.35
0.2
0.15
0.2
0.15
0.1
0.1
0.05
0.05
0
0
20
40
60 80 100 120 Approximation Rank k of NMF
140
160
0
5
10
15
20 25 30 35 Reduced rank k of NMF
40
45
50
Rank path on synthetic data set: relative residual vs. k ORL Face image (10304 × 400) classification errors (by LMNN) on training and testing set vs. k . k -dim rep. HT of training data T by BPP minHT ≥0 kWHT − T kF Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
NMF for Dynamic Data (DynNMF) (He, Kim, Cichocki, and Park, in preparation) ˆ how to compute NMF Given an NMF (W , H) for A = [δA A], ˜ ˜ ˜ ˆ (W , H) for A = [A ∆A] fast ? (Updating and Downdating)
DynNMF (Sliding Window NMF) ˜ as follows: Initialize H ˆ be the remaining columns of H. Let H Solve min∆H≥0 kW ∆H − ∆Ak2F using block principal pivoting ˜ = [H ˆ ∆H] Set H
˜ with initial factors W ˜ = W and H ˜ Run HALS on A Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
DynNMF for Dynamic Data (He, Kim, Cichocki, and Park, in preparation)
PET2001 data with 3064 images from a surveillance video. DynNMF on 110, 592 × 400 data matrix each time, with 100 new columns and 100 obsolete columns. The residual images track the moving vehicle in the video. Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
NMF as a Clustering Method (Kuang and Park, in preparation) Clustering and Lower Rank Approximation are related. NMF for Clustering: Document (Xu et al. 03), Image (Cai et al. 08), Microarray (Kim & Park 07), etc.
Equivalence of objective functions between k-means and NMF: (Ding, et al., 05; Kim & Park, 08)
min
n X
kai − wSi k22 = min kA − WHk2F
i=1
Si = j when i-th point is assigned to j-th cluster (j ∈ {1, · · · , k }) k-means: W : k cluster centroids, H ∈ E NMF: W : basis vectors for rank-k approximation, H: representation of A in W space (E: matrices whose columns are columns of an identity matrix )
NOTE: The equivalence of obj. functions holds when H ∈ E, A ≥ 0.
Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
NMF and K-means min kA − WHk2F s.t. H ∈ E Paths to solution: K-means: Expectation-Minimization NMF: Relax the condition on H to H ≥ 0 with orthogonal rows or H ≥ 0 with sparse columns - soft clustering
TDT2 text data set: (clustering accuracy aver. among 100 runs) # clusters 2 6 10 14 18 K-means 0.8099 0.7295 0.7015 0.6675 0.6675 NMF/ANLS 0.9990 0.8717 0.7436 0.7021 0.7160 Sparsity constraint improves clusteringP result (J. Kim and Park, 08): minW ≥0,H≥0 kA − WHk2F + ηkW k2F + β nj=1 kH(:, j)k21 # of times achieving optimal assignment (a synthetic data set, with a clear cluster structure ): k NMF SNMF
3 69 100
6 65 100
9 74 100
12 68 100
15 44 97
NMF and SNMF much better than k-means in general. Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
NMF, K-means, and Spectral Clustering (Kuang and Park, in preparation) Equivalence of objective functions is not enough to explain the clustering capability of NMF: NMF is more related to spherical k-means, than to k-means → NMF shown to work well in text data clustering Symmetric NMF: minS≥0 kA − SS T kF , A ∈ R+ n×n : affinity matrix Spectral clustering → Eigenvectors (Ng et al. 01), A normalized if needed, Laplacian,... Symmetric NMF (Ding et al.) → can handle nonlinear structure, and S ≥ 0 natually captures a cluster structure in S
Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Summary/Discussions Overview of NMF with Frobenius norm and algorithms Fast algorithms and convergence via BCD framework Adaptive NMF algorithms Variations/Extensions of NMF : nonnegative PARAFAC and sparse NMF NMF for clustering Extensive computational comparisons
Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Summary/Discussions Overview of NMF with Frobenius norm and algorithms Fast algorithms and convergence via BCD framework Adaptive NMF algorithms Variations/Extensions of NMF : nonnegative PARAFAC and sparse NMF NMF for clustering Extensive computational comparisons NMF for clustering and semi-supervised clustering NMF and probability related methods NMF and geometric understanding NMF algorithms for large scale problems, parallel implementation? GPU? Fast NMF with other divergences (Bregman and Csiszar divergences) NMF for blind source separation? Uniqueness? More theoretical study on NMF especially for foundations for computational methods NMF Matlab codes and papers available at http://www.cc.gatech.edu/∼hpark and http://www.cc.gatech.edu/∼jingu Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Collaborators Today’s talk
Jingu Kim CSE, Georgia Tech
Yunlong He Math, Georgia Tech
Krishnakumar Balasubramanian Prof. Michael Berry Prof. Moody Chu Dr. Andrzej Cichocki Prof. Chris Ding Prof. Lars Elden Dr. Mariya Ishteva Dr. Hyunsoo Kim Anoop Korattikara Prof. Guy Lebanon Liangda Li Prof. Tao Li Prof. Robert Plemmons Andrey Puretskiy Prof. Max Welling Dr. Stan Young
Da Kuang CSE, Georgia Tech
CSE, Georgia Tech EECS, Univ. of Tennessee Math, North Carolina State Univ. Brain Science Institute, RIKEN, Japan CSE, UT Arlington Math, Linkoping Univ., Sweden CSE, Georgia Tech Wistar Inst. CS, UC Irvine CSE, Georgia Tech CSE, Georgia Tech CS, Florida International Univ. CS, Wake Forest Univ. EECS, Univ. of Tennessee CS, UC Irvine NISS Thank you!
Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Related Papers by H. Park’s Group H. Kim and H. Park, Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares for Microarray Data Analysis. Bioinformatics, 23(12):1495-1502, 2007. H. Kim, H. Park, and L. Eldén. Non-negative Tensor Factorization Based on Alternating Large-scale Non-negativity-constrained Least Squares. Proc. of IEEE 7th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1147-1151, 2007. H. Kim and H. Park, Nonnegative Matrix Factorization Based on Alternating Non-negativity-constrained Least Squares and the Active Set Method. SIAM Journal on Matrix Analysis and Applications, 30(2):713-730, 2008. J. Kim and H. Park, Sparse Nonnegative Matrix Factorization for Clustering, Georgia Tech Technical Report GT-CSE-08-01, 2008. J. Kim and H. Park. Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons. Proc. of the 8th IEEE International Conference on Data Mining (ICDM), pp. 353-362, 2008. B. Drake, J. Kim, M. Mallick, and H. Park, Supervised Raman Spectra Estimation Based on Nonnegative Rank Deficient Least Squares. In Proceedings of the 13th International Conference on Information Fusion, Edinburgh, UK, 2010. Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Related Papers by H. Park’s Group A. Korattikara, L. Boyles, M. Welling, J. Kim, and H. Park, Statistical Optimization of Non-Negative Matrix Factorization. Proc. of The Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR: W&CP 15, 2011. J. Kim and H. Park, Fast Nonnegative Matrix Factorization: an Active-set-like Method and Comparisons, Submitted for review, 2011. J. Kim and H. Park, Fast Nonnegative Tensor Factorization with an Active-set-like Method, In High-Performance Scientific Computing: Algorithms and Applications, Springer, in preparation. Y. He, J. Kim, A. Cichocki, and H. Park, Fast Adaptive NMF Algorithms for Varying Reduced Rank and Dynamic Data, in preparation. L. Li, G. Lebanon, and H. Park, Fast Algorithm for Non-Negative Matrix Factorization with Bregman and Csiszar Divergences, in preparation. D. Kuang and H. Park, Nonnegative Matrix Factorization for Spherical and Spectral Clustering, in preparation.
Haesun Park
[email protected]
Nonnegative Matrix Factorization: Algorithms and Applications