Non negative Matrix Factorization, Sparse Coding and Dictionary

0 downloads 0 Views 2MB Size Report
and Dictionary Learning Techniques on fMRI images. Imran Ahmad Qureshi a ..... Static and active visual stimulations were shown to monkeys for data collection ...
Non negative Matrix Factorization, Sparse Coding and Dictionary Learning Techniques on fMRI images Imran Ahmad Qureshi a, Syed Zubair b, Khurram Khurshid a a Department of Electrical Engineering, Institute of Space Technology, Islamabad, Pakistan b Department of Electronics Engineering, International Islamic University, Islamabad, Pakistan Abstract—Matrix factorization techniques have proved to be successful for the Source Separation (SS) of different types of data. Recent developments in Matrix factorization techniques have led to sparse representation of signals using learned dictionaries. In this research we have applied Alternating Least Square Non-Negative Matrix Factorization (ALSNMF) and Dictionary Learning (DL) technique for sparse representation to simulated and real Functional Magnetic Resonance Images (fMRI) to extract corresponding sources and time courses. These different techniques with varying the rank values/dictionary sizes for ALSNMF/DL respectively for SS of fMRI have been analyzed and conclusion has been made in terms of quality and efficiency of the SS of fMRI with respect to the variation of rank values/dictionary sizes. ALSNMF method is the best among both applied methods in terms of fast convergence and performance for high rank values and can extract best time courses and Sources simultaneously. K-SVD algorithm performed well particularly for real fMRI datasets. However, for small dictionary size, sources are extracted well with degraded time course extraction and vice versa for large dictionary size. Index Terms—Blind Source Separation(BSS), Sparse Coding, Dictionary Learning, Non Negative Matrix Factorization (NMF).

1. INTRODUCTION Magnetic Resonance Imaging (MRI) is a medical test used to get detailed pictures of organs, soft tissues, bones and internal body structures. Detailed MRI images provide information to the physicians so that they can evaluate various parts of body and determine presence of diseases. Functional Magnetic Resonance Imaging (fMRI) is a modified form of MRI which is used to explore the brain characteristics towards human physical activities. fMRI comprises of activated and non-activated regions which are usually measured by Blood Oxygen Level Dependent (BOLD) [1] signals. BOLD signals correspond to tasks performed by the patient. When some tasks are assigned to the patients to carry out, their neurons are activated. The increase in neural activity corresponds to increase in the oxygen consumption. The increase in blood consumption increases blood flow which increases blood oxygenation and reduces deoxygenated hemoglobin. This change in the concentration of oxygen in the blood is reflected as activated regions in the brain which are captured by fMRI scanner. Each point in multi-image data is called voxel which is generally a 3D point. The voxels of activated region have different oxygen level as compared to non-activated regions. These voxels act as spatial maps of the brain activities. Voxels captured at different time instances

make fMRI time series. These time series are mixture of activity, non-activity, noise and structural information of brain tissues. One component of fMRI shows spatial activation of voxels while other shows values of a particular voxel along the time. We want to separate the voxel maps from time courses to better understand fMRI data. This can be done by applying Blind Source Separation (BSS) methods to fMRI data. BSS is a technique used to extract sources from mixed data. Many techniques have been used for BSS of fMRI data. Techniques require different parameters such as non-negative Matrix Factorization (NMF) deals with non-negative data sets [1]. With the removal of limitation of non-negativity in NMF, a new algorithm has been established which limits search area as compared to NMF and converges fast which is known as Matrix Factorization (MF) [1]. It has shown that MF algorithm perform well in terms of correlation and time for convergence for simulated and actual fMRI data due to whitening process and weight updating rules. NMF algorithm face problem of permutation due to dependence of time courses on sources, hence MF has been further modified to avoid the problem of permutation [2]. Results showed that it has extracted better sources. MF can be applied to many matrix factorization problems [3] while Principle Component Analysis (PCA) deals with data sets which are uncorrelated both in temporal and spatial domain and thus needs low dimensional data [2]. Independent Component Analysis (ICA) has effectively applied for analysis of fMRI data [4]. But when different methods of ICA have applied on fMRI to extract sources and time courses, it has noticed that conditions are limited for fMRI especially with increase in number of sources. ICA get characteristics of brain function which are not completely predictable [5]. Results have shown that higher-order based ICA methods are consistent for fMRI data [6]. Infomax, Fast ICA, and Joint Approximate Diagonalization of Eigen Matrices (JADE) have provided better results while Eigen Value Decomposition (EVD) is also second order based method but it does not perform well enough for fMRI data. Compressed dictionary learning (CDL) has been applied on real fMRI data to detect activations. Double sparsity model was applied to solve inverse problem which was induced by general linear model in analysis, where sparsity was imposed on both dictionary learnt and sparse representation of the BOLD signal. Compressed sensing measurements were used for DL instead of entire BOLD signal and reduce data volume to be processed. Results showed that CDL could successfully detect the activated voxels with less data samples used [7].

DL takes much computational time especially for largescale data. Rank-1 DL (D-r1DL) model has applied for fMRI big data analysis [8]. The model estimates one rank-1 basis vector with sparsity constraint on its loading coefficient from input data through ALS updates at each learning step. Results showed that D-r1DL is efficient and scalable towards big data analysis of fMRI. Above contributions are novel DL methods for fMRI analysis, however it does not show strength and weakness of K-SVD DL. In this research, we have applied ALSNMF and DL methods considering their effectiveness towards fMRI unmixing. We have varied the rank values/dictionary sizes for ALSNMF/DL respectively to compare the extraction of sources and time courses of fMRI data. ALSNMF method show different type of SS quality. As we increase the size of factorized matrices in ALSNMF, quality of extracted sources and time courses increases. However, using K-SVD DL, reduced rank factorized matrix give better SS as compared to ALSNMF method particularly for the real fMRI data. A. Organization of the Paper This paper is organized as follows: After introductory section, Section 2 describes the methods applied for source separation. Section 3 describes the experiments performed and the data set used. Section 4 discusses the results and then paper is concluded in Section 5. 2. MATRIX FACTORIZATION METHODS To extract sources and time courses of fMRI data, we apply one method of NMF and DL technique: a) Alternating Least Square NMF (ALSNMF) b) K-SVD for sparse dictionary learning (DL) 2.1 NMF Theory. NMF is a matrix factorization method which decomposes a non-negative input matrix ) into two non-negative matrices; time course (W) and source ). Like other constrained methods, NMF applies non-negative constraints on decomposed matrices. In NMF, we can also get reduced ranked matrices as compared to the input data matrix which simplifies analysis of decomposed matrices particularly in case of fMRI analysis. For an input fMRI image Y ∈ R m x n , NMF decomposes it into two decomposed matrices. Y=WH

(1)

Where W ∈ R m x r and H ∈ R r x n . 2.1.1 Alternating least Square NMF (ALSNMF). ALSNMF decomposes input data matrix (Y) into two matrices by minimizing euclidean distance between Y and WH. The W and H are initialized with non-negative random values and are computed iteratively using least squares solution in alternating minimization steps where one of the matrices is fixed while other is updated [9]. Procedure will be stopped when converge to local minimum of objective function.

min W,H f( W, H) =

1 2

Y − WH

2 F

w.r.t

w ij ≥ 0 , h ij ≥ 0 , ∀ ij T

T -1

(2) T -1

T

W = [YH (HH ) ] + , H = [(WW ) W Y] +

(3)

2.2 Dictionary Learning (DL) for Sparse representation. DL for sparse representations are recent developments in the area of matrix factorization with some constraints on decomposed matrices. These are particularly applied for underdetermined systems where sparsity constraints are applied on one of the decomposed matrices to find unique solution. For input data matrix Y ∈ R m x n , the objective function for DL is

Y − WH

2 F

s.t h

0

≤ K

(4)

Where h 0 means l0 norm for vector h and K is the number of atoms. W ∈ R m x p is dictionary matrix representing sources and H ∈ R p x n is sparse coefficient matrix representing time courses. DL work as two-step process: in first step, dictionary W is learned while fixing coefficient matrix H and in second step, H is learned using sparse coding algorithms while fixing dictionary matrix W. This process of alternate minimization is repeated iteratively until a stopping criterion is reached. We have used K-SVD [10] which has shown improved performance over conventional method for many fields [11]. K-SVD Algorithm. In K-SVD algorithm we solve (4) 2.2.1 iteratively using two stages; sparse coding stage and dictionary update stage. In the sparse coding stage, we first find out the coefficients matrix H using any pursuit method, and it is very important to keep each coefficient vector to have less than or equal to T0 non-zero elements. Then, we update each dictionary element sequentially with its corresponding coefficient row vector in matrix H, to get the signals that represent it. Some methods freeze H during finding a better W [12], while we update the columns of W sequentially and update relevant coefficients as well. Equations are as follows;

min

W,H

{ Y − WH } s.t ∀i , 2 F

p

Y −  w jh j=1

h i 0 ≤ T0

(5)

2

2 j

j

= ( Y −  w j h ) −w k h F

j≠ k

= Ek − w khk

k

F

2 F

(6)

To update k-th column vector wk and its corresponding row vector hk in coefficient matrix H, we take singular value decomposition (SVD) of error matrix Ek,

E k = UΔV T (7) Dictionary vector wk is updated with first column of U and spare coefficient vector hk is updated by multiplying first value of diagonal matrix ∆ with first row of VT. However, to keep maintain the sparse structure of coefficient vector, error matrix Ek is calculated for only those input vectors which correspond to the support of hk. Dictionary contains data (atoms) in its columns while sparse coefficients matrix contain data in its rows. We have learnt both over and under complete dictionary to represent sparse coefficients to analyze their effects on performance. Sparse coefficients are calculated using Orthogonal Matching Pursuit (OMP) [11].

EXPERIMENTS 3. We have applied ALSNMF and DL on fMRI images to extract corresponding time courses and sources. A. Simulated fMRI datasets The set of simulated sources consist of five highly super Gaussian sources, a Gaussian source and two sub Gaussian source and time courses represent sources that are task related (H1), transiently task related (H2, H6) and artifact related (H3, H4, H5, H7, and H8). Image data is reshaped into vectors by concatenating columns of image matrix to form fMRI mixture. Source matrix and time course matrix are multiplied to obtain a mixture that simulates 100 scans of a single slice of fMRI data. W matrix consist of 8 column vectors with each having 100 rows, each column vector represents time course so dimension of matrix W is 100x8 as shown in Fig. 1. H matrix consist of 8 rows with each having 3600 columns, each row represents source so dimension of matrix

Fig. 1. Simulated Sources and Time Courses

Fig. 2. Simulated FMRI Images

H is 8x3600. Dimension of each source is 60x60. All images are converted to a single vector of size 3600, so the product of 100x8 and 8x3600 make an input matrix Y of size 100x3600, some of images of Y are shown in Fig. 2. We have used multiple rank values/number of atoms for ALSNMF/DL algorithm respectively. Rank values/number of atoms used are 8, 80, 200 and 400. If we have learnt W and H during decomposition of Y using rank value/number of atom of 400, then 400 columns of W would represent time courses while 400 rows of H would contain sources. Dimension of dictionary (W) is 100x400 while dimension of sparse coefficients matrix (H) is 400x3600. From those multiple columns and rows, we have selected 8 best columns and rows of factor matrices, W and H respectively in each algorithm as we have 8 sequences of simulated sources and time courses to be compared. Same procedure has been done for remaining rank values/number of atoms of 8, 80 and 200. B. Real fMRI datasets Data is taken from MAPAWAMO project "fMRI scanning monkeys" which has been sponsored by EU and is available on web at http://cogsys.imm.dtu.dk/toolbox/ica/. Static and active visual stimulations were shown to monkeys for data collection experimentation. Between the visual stimulations, same period of rest was also used. This data consists of two 3D slices each having 80 images, and dimension of each image is 29x33. We have taken only one slice. After converting each image in vector form, the dimension of data matrix Y is 80x957. Some of images of Y are shown in Fig. 3. Factor matrix W consists of 8 column vectors with each having 80 rows, each column vector represents time course so dimension of whole matrix W is 80x8 as shown in Fig. 4. H matrix consist of 8 rows with each having 957 columns, each row represents source so dimension of whole matrix H is 8x957. Dimension of each source is 29x33. All images are converted to a single input matrix Y of size 80x957. We have used same rank values/number of atoms for ALSNMF/DL method respectively to extract real sources and time courses as we have used to extract simulated sources and time courses. If

Fig. 3. Real FMRI Images

Fig. 4. Real Sources and Time Courses

we have learnt W and H during decomposition of Y using rank value/number of atom of 400, the dimension of W would be 80x400 while dimension of H would be 400x957. We have selected 8 best columns and rows of extracted W and H respectively in each algorithm. Same procedure has been done for remaining rank values/number of atoms. 4.

RESULTS AND DISCUSSION

We have applied two methods; ALSNMF and DL on simulated and real fMRI data to extract corresponding time courses and sources. We have used different rank values/dictionary sizes of 8, 80, 200 and 400 for ALSNMF/DL methods respectively. Different algorithms have behaved differently regarding source separation of fMRI data. For ALSNMF algorithm, small size of factor matrix W does not extract good time courses and sources as compared to the DL method as shown in Fig. 5 and TABLE.I.

Fig. 5. Simulated Sources and Time Courses extracted through ALSNMF with rank value of 8

Fig. 6. Simulated Sources and Time Courses extracted through ALSNMF with rank value of 400

As we increase the size of W matrix in ALSNMF method, extraction of time courses and sources improves as depicted by Fig. 6 and TABLE.II. The results show that ALSNMF can extract best time courses and sources among all methods. However, DL method has different extractions quality for sources and time courses for small and large dictionary size. For small dictionary size, good quality sources are extracted while time courses are not extracted well as shown in Fig. 7. The increase in dictionary size give better extraction of time course while sources quality is highly compromised as can be seen in Fig. 8. Similar trends can be observed in case of real fMRI datasets as shown in Fig. 9 to Fig. 12 and TABLE.II. In DL, quality of sources images become better for small dictionary size and as we increase the size of dictionary, sources quality decreases while the time course quality gets better. This can be explained in light of sparse representation

Fig. 7. Simulated Sources and Time Courses extracted through DL with dictionary size of 8

Fig. 8. Simulated Sources and Time Courses extracted through DL with dictionary size of 400

Fig. 10. Real Sources and Time Courses extracted through ALSNMF with rank value of 400

theory. When dictionary size is small, then the dictionary atoms are nearly independent of each other. This causes sparse coding stage of DL algorithm to keep on finding coefficient values unless it reaches a threshold value for sparse representation. Hence, sparse approximation stage calculates large number of coefficients which give rise to better quality source images. However, when dictionary size is increased, number of dependent atoms increases which provide better resolution/representation [13] in terms of sparsity. Hence fewer atoms are sufficient for input fMRI signal and hence small number of coefficients are calculated which give rise to degraded source image. Large dictionary size give high probability of finding time course related to input fMRI data.

We have also calculated correlations of time courses and sources of simulated and real data with corresponding time courses and sources extracted through applied methods to show performance of these methods. S1 and TC1 in TABLE.I are representing individual correlation of first simulated or real source and time course with their corresponding extracted first source and time course respectively and so on for S8 and TC8, while S and TC in TABLE.I and TABLE.II are representing average of individual correlations. TABLE I. EXECUTION TIME AND PERFORMANCE OF ALSNMF AND DL IN TERMS OF CORRELATION OF EXTRACTED SOURCES/TIME COURSES WITH SIMULATED AND REAL SOURCES/TIME COURSES FOR RANK/DICTIONARY SIZE OF 8

Scheme

Execution time (sec) S1/TC1 S2/TC2 S3/TC3 S4/TC4 S5/TC5 S6/TC6 S7/TC7 S8/TC8 Fig. 9. Real Sources and Time Courses extracted through ALSNMF with rank value of 8

Average correlation (S/TC)

ALSNMF on Simulated Data

DL on Simulated Data

ALSNMF on Real Data

DL on Real Data

9

18

7

15

0.683 / 0.430 0.478 / 0.783 0.830 / 0.730 0.847 / 0.687 0.726 / 0.277 0.219 / 0.537 0.279 / 0.503 0.333 / 0.480

0.663 / 0.097 0.179 / 0.991 0.207 / 0.792 0.248 / 0.587 0.495 / 0.318 0.679 / 0.196 0.186 / 0.708 0.516 / 0.029

0.751 / 0.011 0.409 / 0.012 0.329 / 0.621 0.593 / 0.013 0.444 / 0.217 0.339 / 0.265 0.281 / 0.456 0.430 / 0.076

0.184 / 0.929 0.138 / 0.099 0.561 / 0.665 0.795 / 0.795 0.380 / 0.321 0.756 / 0.683 0.670 / 0.310 0.382 / 0.122

0.550 / 0.553

0.404 / 0.465

0.447 / 0.207

0.483 / 0.490

5. CONCLUSION We have seen application of ALSNMF and DL on simulated and real fMRI data with varying rank values/dictionary sizes for ALSNMF/DL. Results have shown that ALSNMF method is best among applied methods in terms of fast convergence and performance for high rank values and can extract best time courses and Sources simultaneously but cannot extract good time courses and source for small rank values. K-SVD method performed well particularly for real fMRI data. However, for small dictionary size, sources are extracted well with degraded time course extraction and vice versa for large dictionary size. The computational complexity of the DL method is higher than ALSNMF method. REFERENCES

Fig. 11. Real Sources and Time Courses extracted through DL with dictionary size of 8

Fig. 12. Real Sources and Time Courses extracted through DL with dictionary size of 400

TABLE II. PERFORMANCE OF ALSNMF AND DL IN TERMS OF AVERAGE CORRELATION OF EXTRACTED SOURCES/TIME COURSES WITH SIMULATED AND REAL SOURCES/TIME COURSES FOR DIFFERENT RANK/DICTIONARY SIZES OF 8, 80, 200 AND 400 Scheme

ALSNMF on Simulated Data

DL on Simulated Data

ALSNMF on Real Data

DL on Real Data

S/TC for Rank 8 S/TC for Rank 80 S/TC for Rank 200 S/TC for Rank 400

0.550 / 0.553 0.672 / 0.616 0.524 / 0.790 0.571 / 0.811

0.404 / 0.465 0.296 / 0.650 0.251 / 0.698 0.202 / 0.730

0.447 / 0.207 0.340 / 0.330 0.542 / 0.357 0.650 / 0.410

0.483 / 0.490 0.402 / 0.475 0.313 / 0.648 0.202 / 0.660

1. A. A. Khaliq, "Unmixing Functional Magnetic Resonance Imaging Data," Wiley Periodicals, Inc., vol. 22, p. 5, 2012. 2. A. A. Khaliq, "Detection of Brain Activity in Functional Magnetic Resonance Imaging Data using Matrix Factorization," Research Journal of Applied Sciences, Engineering and Technology, p. 6, 2013. 3. D.R. Langers, "Blind source separation of fMRI data by means of factor analytic transformations," Neuroimage, 47: 77–87, 2009. 4. N. Correa, "Comparison of blind source separation algorithms for fMRI using new Matlab toolbox," IEEE Int Conf Acoust, Speech, Signal Processing (ICASSP),5, 2005. 5. X. Wang, "Detecting brain activations by constrained nonnegative matrix factorization from task related BOLD FMRI," Proceedings SPIE 5369. 1605-7422 2004. 6. N. Correa, "Performance of Blind Source Separation Algorithms for FMRI Analysis using Group ICA Method," Journal of Magnetic Resonance Imaging, 684–694, 2007. 7. S. Li, "Compressed dictionary learning for detecting activations in fMRI using double sparsity," Signal and Information Processing (GlobalSIP) IEEE Global Conference on. IEEE 2014. 8. X. Li, "Scalable Fast Rank-1 Dictionary Learning for fMRI Big Data Analysis," Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM 2016. 9. X. Ding, "Performance evaluation of nonnegative matrix factorization algorithms to estimate task-related neuronal activities from FMRI data," Magnetic Resonance Imaging, 31: 466-476, 2013. 10. M. Aharon, "K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation," IEEE TRANSACTIONS ON SIGNAL PROCESSING, vol. 54, p. 12, 2006. 11. M. Elad, "Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries," IEEE TRANSACTIONS ON IMAGE PROCESSING, 15:10, 2006. 12. K. Engan, "Multi-frame compression: Theory and design," Signal Processing, 80: 2121-2140, 2000. 13. C. Shaobing, Basis Pursuit. Technical Report, Department of Statistics, Stanford University, 1994.

Suggest Documents