Detecting brain activations by constrained non-negative matrix factorization from task-related BOLD fMRI Xiaoxiang Wanga, Jie Tian∗a, Xingfeng Lia, Jianping Daib, Lin Aib a
Medical Image Processing Group, Institute of Automation, Chinese Academy of Sciences Department of Radiology, Tiantan Hospital, Chinese Capital University of Medical Sciences
b
ABSTRACT Non-negative Matrix Factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. In this paper we introduce this new technique to the field of fMRI data analysis. In order to make the representation suitable for task-related brain activation detection, we imposed some additional constraints, and defined an improved contrast function. We deduced the update rules and proved the convergence of the algorithm. In the procedure, the number of factors was determined by visual assessment. We studied 8 healthy right-handed adult volunteers by a 3.0T GE Signa scanner. A block design motor paradigm (bilateral finger tapping) stimulated the blood oxygenation leveldependent (BOLD) response. Gradient Echo EPI sequence was utilized to acquire BOLD contrast functional images. With this constrained NMF (cNMF) we could obtain major activation components and the corresponding time courses, which showed high correlation with the reference function (r>0.7). The results showed that our method would be feasible for detection brain activations from task-related fMRI series. Keywords: Non-negative Matrix Factorization (NMF), fMRI, BOLD
1.
INTRODUCTION
Functional Magnetic Resonance Imaging (fMRI) is a non-invasive technique that attempts to estimate the neural response to a stimulus by measuring small temporal changes in a sequence of MR images. Blood oxygenation level dependent (BOLD) is the most commonly used contrast. It depends on a decrease in local deoxy-hemoglobin concentration in an area of neuronal activity 1, 2. This local decrease in paramagnetic material increases the apparent transverse relaxation constant T2*, resulting in an increase of MR signal intensity in the area affected. However, the low signal-to-noise ratio and the many possible sources of variability (subject movements, respiratory and heart artifacts, temperature drift, machine noise) make the analysis of such data a most challenging task. fMRI analysis approaches range from model-driven to data-driven. Model-based methods 3, 4 assume the hemodynamic response function (HRF) to be known, estimate the most likely voxel amplitude of the response given that HRF, and test the significance of this amplitude against the null hypothesis of no effect. Data-based techniques, like Principal Components Analysis (PCA), Independent Components Analysis (ICA) 5, 6 or clustering 7, 8, give an account of the data content with little prior knowledge. But it is necessary to control the validity of the underlying hypotheses. For example, PCA assumes a priori that structures of interest in the data are uncorrelated both in the temporal and spatial domain, ICA, that they are independent in the spatial domain9 or in the temporal domain 10, 11, 12. Recently, ICA has been widely studied, Beckmann et al.13, 14 proposed probabilistic ICA for fMRI data analysis and Calhoun et al.15, 16, 17 extended ICA to complex domain. However, some structures of interest may not fully satisfy these assumptions. For clustering, although the reduced signals will not be independent or orthogonal to one another, these signals will still be linear mixtures of the sources. However, the Non-negative Matrix Factorization (NMF) 18, 19, 20 is a relatively new technique proposed for dimensionality reduction. This technique uses Poisson statistics as a noise model and preserves much of the structure of the original data. The main motivation of introducing NMF is because our fMRI data positively defined and NMF is based on positive restrictions, meaning that NMF can be a suitable technique for such a problem. In addition, NMF computation is based on the simple iteration algorithm, and it provides a nice simple learning rule, which is ∗
Corresponding author: Jie Tian; Telephone: 8610-62532105; Fax: 8610-62527995 Email:
[email protected],
[email protected]; Website: http://www.3dmed.net
Medical Imaging 2004: Physiology, Function, and Structure from Medical Images, edited by Amir A. Amini, Armando Manduca, Proceedings of SPIE Vol. 5369 (SPIE, Bellingham, WA, 2004) · 1605-7422/04/$15 · doi: 10.1117/12.536186
675
guaranteed to converge monotonically. Furthermore, several special properties can be obtained as a result of the constrained optimization problem of NMF20. The aim of this paper is to tailor the Non-negative Matrix Factorization (NMF) method for task-related brain activation detection in fMRI data. Inspired by the original NMF, we imposed some additional constraints, and defined an improved contrast function to make the representation suitable for task-related brain activation detection. We deduced the update rules and proved the convergence of the algorithm. In the procedure, the number of factors was determined by visual assessment, which was easy and reliable. The remainder of this paper is organized as follows: in section 2 we start by presenting the signal model of the fMRI data (2.1); then we review the three NMF algorithms used in practice and present their individual update strategies (2.2); based on these algorithms, we introduce cNMF algorithm and its learning procedure (2.3); and prove the convergence of it (2.4). In section 3 we show some experimental results on real fMRI data sets. We end this paper with conclusions and discussions in Section 4.
2. METHODS 2.1. fMRI data model We start by presenting the signal model of fMRI data before introducing the algorithm. Let X=Xn(t), t=1…T, n=1…N be a fMRI dataset; T is the length of the time series and N is the number of voxels. If we consider X to be a noisy version, then the linear decompositions describe this data as: R X n (t ) = ∑ A (t ) S n + ω n (t ) (1) k k k =1 where 1 ≤ R ≤ min(T , N ) is the rank of the decomposition. If written as a matrix form, arranging the fMRI time series of each pixel into the columns of a matrix X, that is: (2) X = A⋅ S +W The matrix A is called the mixing matrix, and contains as its columns the basis vectors (features) of the temporal signals. The rows of S contain the corresponding hidden components that give the contribution of each basis vector in the input vectors, which can be interpreted as images. W = ω n (t ) t =1...T , n =1... N is the residual of the decomposition, and can be thought of as random noise. 2.2. NMF algorithms NMF has previously been shown to be a useful decomposition for multivariate data 18, 19, 20. They seek to find nonnegative factors A and S such that X ≈ Xˆ = A ⋅ S . According to the above fMRI data model, we assume that the data X obey Poisson distribution, then the PDF of X is: P ( X | ( AS ) iµ ) =
( AS )iXµ e ( AS )
iµ
(3) X! Intuitively, we can maximize the log probability of the PDF over A and S, leaving the relevant objective function to be: (4) log P ( X | AS ) = X ij log( AS ) ij − ( AS ) ij
∑ ij
Besides, there are three algorithms used in practice in the literature. Next we will overview the two traditional objective functions that produce the NMF algorithms19 and also a recently proposed refinement called Local NMF (LMNF) that leads to a third NMF algorithm20. Each of the three algorithms introduced below seeks to minimize a different objective function (distance measure) and each of these objective functions could be minimized with several different iterative procedures. The particular update strategies given here are shown because of their implementation ease and because they have been proven to monotonically decrease their respective objective function19, 20, 21. Algorithm 1: Euclidean Distance Algorithm To find an approximate factorization X ≈ AS , we can easily think of Euclidean distance between X and AS as the objective function. Using this distance measure, we arrive at the following objective function, which uses the Frobenius norm for matrices introduced earlier: D1 ( X | AS ) =|| X − AS || 2 = ( X ij − ( AS ) ij ) 2 (5)
∑ ij
676
Proc. of SPIE Vol. 5369
We wish to find the factors A and S that minimize the objective function || X − AS || 2 . The lower bound of this objective function is zero and will only be attained when a strict equality X =AS is obtained. There are many ways to minimize this objective function, however, because of the lack of convexity in both variables A and S, we can, at best, expect to achieve only local minima19. Thus far, researchers in this area have chosen to balance algorithm complexity and convergence speed by using the following update procedure: ( A T X ) kl S kl( t +1) = S kl( t ) T (6a) ( A AS ( t ) ) kl Akl( t +1) = Akl(t )
( XS T ) kl
(6b)
( A (t ) SS T ) kl
Algorithm 2: Divergence Algorithm The second objective function that is commonly used in practice is called the divergence, or entropy, measure: X D 2 ( X | AS ) = ( X log − X + A⋅ S) A⋅ S ij
∑
(7)
The objective function D 2 ( X | AS ) is not a distance measure because, strictly speaking, it is not symmetric in X and AS. When
∑ X = ∑ ( AS ) ij
ij
ij
= 1 , D 2 ( X | AS ) reduces to Kullback-Leibler information measure used in probability
ij
theory19. This objective function is related to the likelihood of generating the columns in X from the basis A and encoding coefficients S. This objective function equals its lower bound of zero only when we have strict equality, X =AS. To balance complexity and speed, the following update rules are commonly used: S kl( t +1)
=
Akl( t +1)
=
S kl( t )
Akl(t )
∑A
ik
i
∑S i
X il /( AS (t ) ) il
∑A j
li
(8a)
jk
X ki /( A (t ) S ) ki
∑S j
(8b)
lj
Algorithm 3: Local NMF Algorithm A recently proposed refinement20 of NMF is a slight variation of the Divergence Algorithm detailed above. Local Nonnegative Matrix Factorization (LNMF) has an objective function which seeks to impose constraints on the spatial locality of the features of a data set: X (9) D3 ( X | AS ) = ( X log − X + A ⋅ S ) + α u ij − β v ij A⋅ S ij ij ij
∑
∑
∑
where α , β > 0 are some constants, U =WTW and V = HHT . This objective function is the Divergence objective function D 2 ( X | AS ) with three additional terms. A set of update rules that minimize this objective function are:
∑ X ∑ AA S ∑ S X /( A S ) ∑S
S kl(t +1) = S kl(t )
ik
il
i
k
ik
(10a)
(t ) kl
(t )
Akl( t +1)
=
Akl(t )
i
li
ki
j
Akl =
ki
(10b)
lj
Akl A i il
∑
(10c)
Having introduced the existing NMF algorithms, we now turn to define an improved contrast function, based on the algorithm 2 and algorithm 3, to make the representation suitable for task-related brain activation detection. 2.3. Constrained NMF
Proc. of SPIE Vol. 5369
677
fMRI dataset is time series of gray images, consisting entirely of non-negative elements is a natural property of it, so we think of NMF to be used for our purpose. But in order to make the representation suitable for task-related brain activation detection, we must impose some additional constraints, such as: (1) the factor of S should have the property of sparseness. Typically, we choose ∑ Sij =min. (2) different basis should be as uncorrelated as possible. This can be ij T imposed by ∑ ( A A)ij =min. Then based on the divergence D 2 ( X | AS ) , we arrive at the following constrained i≠ j objective function for NMF, which we term it constrained NMF (cNMF), ∀ij : X ij ≥ 0 X D 4 ( X | AS ) = ∑ ( X log − X + A ⋅ S ) + α ∑ S ij + β ∑ min A⋅ S A, S ij
i≠ j
ij
( AT A) ij
(11)
subject to the constraints ∀ij : Aij ≥ 0, S ij ≥ 0 , and ∀i :|| ai ||= 1 where ai denotes the i’th column of A. It is also assumed that the constant α , β > 0 , which is used to control the tradeoff between sparseness or redundancy and the reconstruction accuracy. A solution to the above constrained minimization can be found by using the following three-step update rules, which have been proved to be convergence in 2.4: Aik S kl(t ) i X il A S (t ) j ij jl ( t +1) S kl = (12a) Aik + α
∑ ∑ ∑ i
∑ X ∑A S = 2β ∑ A + ∑ S S lj
Akl( t )
Akl( t +1)
kj
j
l
j ≠l
Akl =
(t ) kj
(t ) kl
lj
j
lj
(12b)
Akl A i il
∑
(12c)
In the procedure, the number of factors is determined by visual assessment. 2.4 Proof of convergence The convergence of the learning procedure is proven by using an auxiliary function, which is similar to that used in the Expectation-Maximization algorithm. G ( S , S (t ) ) is an auxiliary function for D(S) if the conditions of G ( S , S ( t ) ) ≥ D( S ) and G ( S , S ) = D (S ) are satisfied. If G is an auxiliary function, then D is nonincreasing under the
update S ( t +1) = arg min G ( S , S (t ) ) because of D( S ( t +1) ) ≤ G( S ( t +1) , S ( t ) ) ≤ G ( S (t ) , S ( t ) ) = D(S (t ) ) . It has been proved S
that the function G 2 (S , S ( t ) ) is the auxiliary function for D 2 ( X | AS ) 19: G 2 (S , S
(t )
)=
∑ (X
ij
log X ij − X ij ) +
ij
Aik S kj(t )
∑ ( AS ) − ∑∑ X ∑ A S ij
ij
ij
ij
k
is
(t ) sj
[log( Aik S kj ) − log
s
Aik S kj( t )
∑A S is
(t ) sj
(13)
]
s
Let G 4 (S , S
(t )
)=
∑ (X
ij
log X ij − X ij ) +
ij
Aik S kj(t )
∑ ( AS ) − ∑∑ X ∑ A S ij
ij
ij
ij
k
is
s
+ α ∑ S ij + β ∑ ij i≠ j
678
Proc. of SPIE Vol. 5369
( AT A) ij
(t ) sj
[log( Aik S kj ) − log
Aik S kj( t )
∑A S is
(t ) sj
]
s
(14)
then G 4 (S , S ( t ) ) is an auxiliary function for D 4 ( X | AS ) with respect to S. Proof: Since G 2 (S , S ( t ) ) is the auxiliary function for D 2 ( X | AS ) , then G 2 ( S , S (t ) ) ≥ D 2 ( X | AS )
(15a) (15b)
G 2 ( S , S ) = D 2 ( X | AS )
with D 4 ( X | AS ) = D 2 ( X | AS ) + α ∑ S ij + β ∑ ( AT A) ij and G 4 ( S , S (t ) ) = G 2 (S , S ( t ) ) + α ∑ S ij + β ∑ ( AT A) ij , ij ij i≠ j i≠ j we can easily obtain: G 4 ( S , S (t ) ) ≥ D 4 ( X | AS ) (16a) G 4 ( S , S ) = D 4 ( X | AS ) (16b) So G 4 (S , S (t ) ) is an auxiliary function for D 4 ( X | AS ) with respect to S. Similarly, let G 4' ( A, A (t ) ) =
∑
( X ij log X ij − X ij ) +
ij
∑
( AS ) ij −
ij
Aik( t ) S kj
∑∑ ∑ A X ij
ij
(t ) is S sj
k
[log( Aik S kj ) − log
s
Aik( t ) S kj
∑A
(t ) is S sj
]
s
+ α ∑ S ij + β ∑ ( AT A) ij ij i≠ j
(17)
We can prove that G 4' ( A, A ( t ) ) is an auxiliary function for D 4 ( X | AS ) with respect to A. Then D 4 ( X | AS ) is nonincreasing when A and S are updated by: S ( t +1) = arg min G 4 ( S , S (t ) )
(18a)
S
A ( t +1) = arg min G 4' ( A, A ( t ) )
(18b)
A
The minimum of G 4 (S , S ( t ) ) with respect to S is determined by setting the gradient to zero: ∂G 4 ( S , S (t ) ) =− ∂S kl
∑ ∑AS
Aik S kl(t )
X il i
ij
j
(t ) jl
1 + S kl
∑A
ik
+α = 0
(19)
i
Solve the above equation, we obtain:
∑ X ∑ AA S ∑ A +α
S kl(t )
ik
il
i
S kl( t +1) =
ij
j
(t ) jl
ik
i
The minimum of G 4' ( A, A ( t ) ) with respect to A is determined by setting the gradient to zero: ∂G 4' ( A, A (t ) ) =− ∂Akl
Solve the above equation, we obtain:
∑ X ∑A
Akl( t ) S lj
j
kj
l
1 + Akl
∑S
lj
+ 2β
∑A j ≠l
j
∑ X ∑A S = 2β ∑ A + ∑ S Akl( t )
Akl( t +1)
(t ) kl S lj
(t ) kj
=0
(20)
S lj
j
kj
l
j ≠l
(t ) kj
(t ) kl
lj
j
lj
So the repeated iteration of the update rules (12a, 12b, 12c) is guaranteed to converge to a locally optimal matrix factorization.
3. EXPERIMENTAL RESULTS
Proc. of SPIE Vol. 5369
679
To demonstrate the effectiveness of our method, we studied 8 healthy right-handed adult volunteers (20~40 years, 4 males, 4 females) by a 3.0T GE Signa scanner (GE Medical Systems, Milwaukee, WI, USA). A block design motor paradigm (bilateral finger tapping, consisting of 10 scans “on”, 10 scans “off”, over 100 time points) stimulated the blood oxygenation level-dependent (BOLD) response. Gradient Echo EPI sequence was utilized to acquire BOLD contrast functional images (TR = 3000 ms, TE = 30ms, flip angle = 90, FOV = 24 X 24cm). One volume has 64 X 64 X 24 voxels and the spatial resolution is 3.75 X 3.75 X 5 mm. Also T1-weighted anatomic images were acquired for reference (512 X 512), see Figure.1 for a diagram of paradigm timing. Lead-in
0:00 0:30
Lead-out
1:00 1:30
2:00 2:30
Finger tapping
3:00 3:30
4:00 4:30 5:00
Rest
Figure 1 Experimental paradigm timing The fMRI data were analyzed using the method presented above. The number of sources was estimated by visual assessment as ten. With cNMF we could obtain major activation components and the corresponding time courses, which showed high correlation with the reference function (r>0.7). Results demonstrate successful separation of activation sources in motor cortices (Figure.2). cNMF could be a suitable technique for such a problem. Figure.2 shows the brain activations superimposed on the T1 weighted images and the corresponding time courses obtained using cNMF from one of the volunteers. Pixels in the motor area are activated obviously. For all the eight volunteers, we obtain the almost same results.
4. CONCLUSIONS AND DISCUSSIONS This paper introduced a new technique NMF to the field of fMRI data analysis. Inspired by the original NMF, we defined an improved contrast function to make the representation suitable for task-related brain activation detection. Then we deduced the update rules and proved the convergence of the algorithm. From the experimental results, it is evident that with the assumption of proper number of factors, cNMF presents good results at least similar with those by ICA. Although further verification is necessary, our preliminary study shows that cNMF would be feasible for detection brain activations from task-related fMRI series. Several aspects of the method can be improved upon or should be investigated further. The first is the initialization for the cNMF algorithm, which greatly influences the detection results. The second is the determination of the number of the sources. It is a very difficult task to automatically obtain the optimal rank. In this paper, we determine it by visual assessment. The third is to compare it with other methods such as ICA and CCA.
ACKNOWLEDGEMENTS This paper is supported by the Project for National Science Fund for Distinguished Young Scholars of China under Grant No. 60225008, the Special Project of National Grand Fundamental Research 973 Program of China under Grant No. 2002CCA03900, the National High Technology Development Program of China under Grant No. 2002AA234051,
680
Proc. of SPIE Vol. 5369
the National Natural Science Foundation of China under Grant Nos. 90209008, 30370418, 60302016, 60172057, 30270403.
(a) Brain activations superimposed on the T1 weighted images (From left to right: slice 19, 20, 21, 22, and 23) Component Time Course
Component Time Course
Component Time Course
0.02
0.0365
0.035
0.0195
0.036
0.0345
0.019
0.0355
0.034
0.035
0.0335
0.0185 0.0345
0.033
0.034
0.0325
0.018
0.0175
0.017
0.0165
0
10
20
30
40 50 60 image number
70
80
90
100
0.0335
0.032
0.033
0.0315
0.0325
0
10
20
30
40 50 60 image number
70
80
90
100
0.031
0
10
20
30
40 50 60 image number
70
80
90
100
Component Time Course
Component Time Course 0.029
0.026
0.0285 0.025
—time courses detected
0.028 0.024 0.0275 0.027
---reference function
0.023
0.0265 0.022 0.026 0.021 0.0255 0.025
0
10
20
30
40 50 60 image number
70
80
90
100
0.02
0
10
20
30
40 50 60 image number
70
80
90
100
(b) The corresponding activation time courses of each slice in (a) Figure 2 Color-washed activation maps for 5 slices of fMRI data from one of the volunteers with cNMF method: (a) Colored pixels show locations of detected activations in the brain. (b) Time activity curves.
REFERENCES 1. 2.
3. 4. 5. 6. 7. 8. 9.
S. Ogawa, T. M. Lee, A. R. Kay, and D. W. Tank, “Brain magnetic resonance imaging with contrast dependent on blood oxygenation”, Proc. Natl. Acad. Sciences, 87, pp. 9868-9872, 1990. K. K. Kwong, J. W. Belliveau, D. A. Chesler, I. E. Goldberg, R. M. Weissko, B. P. Poncelet, D. N. Kennedy, B. E. Hoppel, M. S. Cohen, R. Turner, H. Cheng, T. J. Brady, and B. R. Rosen, “Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation”, Proc. Natl. Acad. Sciences, 89, pp. 5675-5679, 1992. K.J. Friston, J. Ashburner, et al., SPM 97 course notes. Wellcome Department of Cognitive Neurology, University College, London, 1997. K.J. Worsley, C.H. Liao, J. Aston, V. Petre, G.H. Duncan, F. Morales, A.C. Evans, “A general statistical analysis for fMRI data”, NeuroImage, 15, pp. 1-15, 2002. J. V. Stone, “Independent component analysis: An introduction”, Trends in Cognitive Sciences, 6(2), pp. 59-64, 2002. V. Calhoun, T. Adali, L. K. Hansen, J. Larsen, and J. Pekar, “ICA of functional MRI data: an overview”, Proc. IEEE Workshop on ICA, Nara, Japan, April 2003. C. Goutte, P. Toft, E. Rostrup, F. A. Nielsen, and L. K Hansen, “On clustering fMRI time series”, Neuroimage, 9, pp. 298-310, 1999. K. H. Chuang, M.J. Chiu, C. C. Lin, and J. H. Chen, “Model-free functional MRI analysis using Kohonen clustering neural network and fuzzy c-means”, IEEE Trans. on Medical Imaging, 18, no. 12, pp. 1117-1128, December 1999. M.J. McKeown, S. Makeig et al., “Analysis of fMRI data by blind separation into independant spatial components”, Human Brain Mapping, 6, pp. 160–188, 1998.
Proc. of SPIE Vol. 5369
681
10. V.D. Calhoun, T. Adali, G.D. Pearlson, J.J. Pekar, “Spatial and temporal independent component analysis of functional MRI data containing a pair of task-related waveforms”, Human Brain Mapping, 13, pp. 43–53, 2001. 11. J. Stone, J. Porfill, N. Porter, and I. Wilkinson, “Spatiotemporal independent component analysis of event-related fMRI data using skewed probability density functions”, Neurolmage, 15, pp. 407-421, 2002. 12. V.D. Calhoun, T. Adali, J.J. Pekar, & G.D. Pearlson, “Latency (in)sensitive ICA: Group Independent Component Analysis of fMRI Data in the Temporal Frequency Domain”, NeuroImage, 20, pp. 1661, 2003. 13. C. F. Beckmann, and S. M. Smith, “Probabilistic extensions to independent component analysis for FMRI”, NeuroImage, 16. Presented at the 8th International Conference on Functional Mapping of the Human Brain, Sendai, Japan, June 2-6, 2002. 14. C. F. Beckmann, and S. M. Smith Beckmann, “Probabilistic ICA for FMRI -- noise and inference”, Fourth Int. Symp. on Independent Component Analysis and Blind Signal Separation, 2003. 15. V. D. Calhoun, T. Adali, G. D. Pearlson, P. C. M. van Zijl, and J. J. Pekar, “Independent component analysis of fMRI data in the complex domain”, Magnetic Resonance in Medicine, 48, pp. 180-192, 2002. 16. V. D. Calhoun, T. Adali, G. D. Pearlson, and J. J. Pekar, “A infomax method for performing ICA of fMRI data in the complex domain”, NeuroImage, 16(2), pp. 349, 2002. 17. V. Calhoun and T. Adali, “Complex ICA for FMRI analysis: performance of several approaches”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), Hong Kong, April 2003. 18. D. D. Lee, H. S. Seung, “Learning the parts of objects by non-negative matrix factorization”, Nature, 401, pp. 788791, 1999. 19. D. D. Lee, H. S. Seung, “Algorithms for non-negative matrix factorization”, Advances in Neural Information Processing Systems, 13, pages 556–562, 2000. 20. S. Z. Li, X. W. Hou, and H. J. Zhang, “Learning spatially localized parts-based representation”, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, Hawaii, 1, pp.207-212, 2001. 21. S.M. Wild, “Seeding Non-Negative Matrix Factorizations with the Spherical K-Means Clustering”, Thesis for the Department of Applied Mathematics, University of Colorado, April 2003.
682
Proc. of SPIE Vol. 5369