These maps are then the basis for a further in-deep analysis of the fMRI data. We demonstrate ... itself need to be similar, but the time-courses of their activations ...
AUTOMATED CLUSTERING OF ICA RESULTS FOR FMRI DATA ANALYSIS I R Keck, F J Theis, P Gruber, E W Lang; K Specht, GR Fink; A M Tom´ e; C G Puntonet Univ. Regensburg, Germany; RC J¨ ulich, Germany; IEETA, Portugal ; Univ. Granada, Spain
ABSTRACT While independent component analysis can be a fruitful method to analyze fMRI data, the manual work that is usually necessary in viewing the results is complex and time consuming and thus limits its clinical application. In this article we try to solve this problem by presenting a new way to cluster the results of an ICA into few, easy to classify activation maps by using incomplete ICA. These maps are then the basis for a further in-deep analysis of the fMRI data. We demonstrate our approach on a real world WCST example. Keywords: BSS, fMRI, ICA, clustering, WCST INTRODUCTION Functional magnetic resonance imaging (fMRI) is one of the leading technologies for functional human brain research due to its high spatial resolution and noninvasiveness. fMRI also is a grateful example to utilize spatial independent component analysis (sICA) as the functional segregation of the brain (see Frackowiak et al. (1)) closely matches the requirement of statistical spatial independence. However, a problem in the case of fMRI still is the mass of data that has to be analyzed to find interesting components: Each fMRI session will typically yield hundreds of different components. Clustering can be part of the solution to this problem, as the regions of the brain that work together during the experiment also will form clusters of activation in the time domain that can be exploited to cluster the components that represent the collaborating parts of the brain. Surprisingly, for independent component analysis the literature so far concentrates on the comparison of the independent components (ICs) themselves. Two recent published algorithms for these problem are the tree-dependent ICA by Bach and Jordan (2) and the topographic ICA by Hyv¨ arinen and Hoyer (3). In tree-dependent ICA the assumption of independence is weakened and a transformation of the data into a tree of independent clusters of dependent sources is searched. In topographic ICA also the resulting components do not have to be completely independent:
the variances corresponding to neighboring components have to be positively correlated while the other variances remain independent. Both algorithms have been applied to fMRI data by Meyer-B¨ase et al. (4) with varying results. However, in the search for cooperating networks of brain areas in fMRI not the independent components itself need to be similar, but the time-courses of their activations, i.e. their columns in the mixing matrix estimated by the ICA. In this article therefore we demonstrate a new algorithm that, instead of comparing the components itself, will cluster the components dependent on their appearance in the mixtures. For the fMRI case this means that activation-maps that have similar time-courses will be clustered together by this algorithm. THEORY First we will give a short overview on independent component analysis. Then we will describe the idea behind clustering with incomplete ICA and demonstrate our algorithm. spatial Independent Component Analysis Let s1 (t), . . . , s( t) be m independent signals with unit variance for simplicity, represented by a vector ~s(t) = (s1 (t), . . . , sm (t))T , where T denotes the transpose. Let the mixing matrix A generate n linear mixtures ~x(t) = (x1 (t), . . . , xn (t))T from these source signals according to: ~x(t) = A~s(t) (1) Note that each column of the mixing matrix A then represents the contribution of one source to each mixture. Assume that only the mixtures ~x(t) can be observed. Then the task to recover the original sources ~s(t) along with the mixing matrix A is commonly referred to as “independent component analysis” (ICA). If the mixtures represent spatial vectors of data points, e.g. each ~x(t) one image of a fMRI session, this ICA is called “spatial” and the estimated sources will be spatial independent. For the complete case n = m (i.e. as many mixtures as sources) many algorithms exist to tackle this prob-
lem, e.g. Infomax (based on entropy maximization, see Bell and Sejnowski (5) for details) and FastICA (based on negentropy using fix-point iteration, see Hyv¨arinen (6) for details), just to mention some of the most popular ones. The other cases like the more difficult over-complete (n < m) and the more trivial under-complete (n > m) case have also been widely studied in the literature, see e.g. Amari (7), Theis et al. (8). Incomplete ICA In this paper we concentrate on the incomplete case, where we try to extract deliberately fewer sources than can be extracted from the mixtures ~x(t). This incomplete case differs from the over-complete case in that we do not want to extract a subset of all independent sources. Rather we try to cluster all sources into fewer components than could be extracted in principle. This is equal to a dimension reduction of the data set. It is therefore obvious that this is not possible without the sacrifice of some information content. A common way to keep the loss of information as low as possible is to apply a principal component analysis to the mixtures ~x(t) and to do the dimension reduction based on the eigenvectors ~ei corresponding to the smallest eigenvalues λi of the data covariance matrix C, see e.g. Hyv¨ arinen and Oja (9). This is also a convenient and often necessary preprocessing step (“whitening”) for many ICA algorithms, as it reduces the degrees of freedom in the space of the solutions by removing all second order correlations of the data and setting the variances to unity: 1 ~x ˜ = EΛ− 2 ET ~x,
(2)
where E is the orthogonal matrix of eigenvectors of the covariance matrix of ~x, with C(~x) = E((~x − E(~x))(~x − E(~x))T ), and Λ the diagonal matrix of its eigenvalues. It can easily be shown that this dimension reduction will cluster the independent components si (t) based on their presence in the mixing matrix A, as the covariance matrix of ~x depends on A (see also Belouchrani et al. (10)): E(~x~xT )
= E(A~s~sT AT ) = AE(~s~sT )AT = AAT
(3) (4) (5)
If two columns in the mixing matrix A are almost identical up to a linear factor, i.e. are linearly dependent, this means that the two sources represented by those columns are almost identically represented (up to a linear factor) in the mixtures. A matrix with two linearly dependent columns does not have full rank,
Figure 1: incomplete ICA: If the ICA is forced to interpret fewer sources into the data than are existent in the data then the algorithm will cluster together independent components that share similar columns (i.e. time-courses in the fMRI case) in the mixing matrix (above). hence will have at least one zero eigenvalue due to its restricted dimensionality. This also holds for the transpose AT of the matrix A as the transpose has the same dimensionality as the original matrix, as well as for the product of both matrixes AAT . Setting this close-to-zero eigenvalue to zero in the course of a dimensionality reduction, will thus combine two almost identical columns of A to a single one. This means that components that appear to be similar to each other in most of the mixtures, will be clustered together into new components by the dimension reduction with PCA. After the dimension reduction, a standard ICA algorithm can be used to find the independent components in the dimension reduced dataset. Clustering with incomplete ICA The idea behind the clustering with a intentionally incomplete ICA is to compare different ICA runs with a different level of dimension reduction applied beforehand. First a complete ICA is performed extracting the maximal number of independent components from the data set. In a second run, an incomplete ICA is performed on a reduced data set which resulted from a dimension reduction during PCA preprocessing. The independent components (ICs) of the complete ICA without dimension reduction are then compared to the ICs of several incomplete ICA runs. Independent components which form part of the components of the incomplete ICA are then grouped into
fMRI data set
ICA1 without dimension reduction
ICA2 with dimension reduction
search for interesting components
compare components with ICA1
Figure 3: The searched-for activation pattern resulting from ICA2 for 10 dimensions. The images appear flipped. On the lower right corner the time course of this activation is displayed. Figure 2 shows a diagram of this workflow.
clusters of maps from ICA1
Due to the data reduction the components will be altered within the incomplete ICA, so for the interpretation of the dataset the results from the complete ICA1 should be used. ANALYSIS OF A WCST FMRI EXAMPLE
Figure 2: The workflow of a fMRI analysis using incomplete ICA for clustering the cluster which is represented by the IC of the incomplete ICA at hand. Hence the ICs of any incomplete ICA form sort of prototype ICs of the clusters formed by ICs from the complete set. Figure 1 shows a schematic example. fMRI workflow The workflow for a fMRI analysis using this clustering approach will be as follows: 1. apply a standard ICA to the dataset without dimension reduction: ICA1 2. apply a standard ICA to the dataset with dimension reduction: ICA2 3. search for interesting “cluster” components in ICA2 4. repeat (2) and (3) for different levels of dimension reduction in ICA2 to find the best results 5. find the independent components in ICA1 that are similar to some forms of components in ICA2. These components and their time courses can now be analyzed further within the framework of normal brain research to understand their connectivity and collaboration.
The clustering algorithm was then applied to fMRI data of a modified Wisconsin Card Sorting Test of one subject. The subject has the task to sort subsequently presented cards of symbols with respect to an attribute, like color, shape of the symbol or number of displayed symbols. In the beginning, the subject does not know the sorting role and has to figure it out by trial and error. The test was modified in a way that the subject also had to look for the spatial position of the symbol. Control blocks were introduced where the subject did know in advance the attribute that was searched in the test. The analyzed data set consisting of 467 scans was created at the institute for medicine at the Research Center J¨ ulich, Germany, preprocessed to remove motion artefacts, normalized and filtered with a gaussian filter to increase the signal to noise ratio. Spatial ICA was used for the analysis so that the independent components correspond to activation maps and the columns of the mixing matrix correspond to the time courses of this activation (for more information on ICA of fMRI data see e.g. McKeown and Sejnowski (11)). For the clustering with the incomplete ICA the data was first reduced via PCA to 450 dimensions, so that almost all information was retained. Then the (spatial) ICA1 was calculated using the extended Infomax algorithm. For the (spatial) ICA multiple runs were made using
Figure 4: The searched-for activation pattern resulting from ICA2 for 20 dimensions. different levels of dimension reduction. The results of these runs then were manually searched for the activation pattern that is typical for this modified Wisconsin Card Sorting Test and has been found earlier through a classic general linear model analysis. It consists of a bilateral network of frontal and parietal areas that is relevant for the processing of spatial and object-oriented information and for the direction of attention. Figure 3 shows the activation map that was the result of a reduction to 10 dimensions. The searchedfor patterns appear only marginally while the time course (lower right corner of the figure) already shows the frequency of the basic blocks of the experiment.
Figure 5: The searched-for activation patterns resulting from ICA2 for 40 dimensions. The ICA splits the network into two patterns with roughly similar time courses
Figure 4 shows the activation map that was the result of a reduction to 20 dimensions. The searched-for patterns here appear already well formed. For 40 dimension the ICA splits the activation into two activation maps with different time courses (see Figure 5). Obviously the network consists of two subnetworks that are used to process the information. For 50 dimensions the quality of the result further enhances, as can bee seen in figure 6. This effect is expected, as more and more information is available to the analysis and will be used to construct the components. Then we searched for the components in ICA1 that correspond to the two activation maps that we found in ICA2 with 50 dimensions. To achieve this we compared the activation maps and looked for components in ICA1 and ICA2 that have an overlap in their maps. Prior to the comparison the components were de-noised: all values of the activation maps that were below 4 times the standard deviation were neglected. This is also the de-noising scheme we used for the figures in this article. Figure 7 shows a subset of these components of ICA1 that overlap at least 10% with the components of
Figure 6: The searched-for activation patterns resulting from ICA2 for 50 dimensions.
ICA2 as shown in figure 6. These components that form the “cluster maps” of ICA2, are now open for a further study of their interplay using their time courses. CONCLUSION We have shown that by clustering using incomplete ICA it is possible to reduce the necessary manual work of a fMRI analysis with ICA. While this is only a part in the ongoing attempt to enhance the use of independent component analysis in medicine, we expect that this work will lead to a better understanding of the interconnectivity within the human brain. ACKNOWLEDGMENT This work was supported by the German ministry of education and research BMBF (project ModKog) and by the Deutsche Forschungsgemeinschaft (DFG) (GRK 638).
References 1. Frackowiak, R., Friston, K., Frith, C., Dolan, R., and Mazziotta, J., 1997, Human Brain Function, Academic Press, San Diego, USA 2. Bach, F. and Jordan, M., Journal of Machine Learning Research, 1205–1233
2003, 4,
3. Hyv¨arinen, A. and Hoyer, P., Neural Computation, 13, 1527–1558
2001,
4. Meyer-B¨ase, A., Theis, F., Lange, O., and Puntonet, C., 2004, in Volume 3195 of Lecture Notes in Computer Science, 782–789, Springer 5. Bell, A. and Sejnowski, T., Neural Computation, 7(6), 1129–1159
1995,
6. Hyv¨arinen, A., 1999, IEEE Transactions on Signal Processing, 10(3), 626–634 7. Amari, S., 1999, Neural Computation, 11, 1875– 1883 8. Theis, F., Jung, A., Puntonet, C., and Lang, E., 2003, Neural Computation, 15, 419–439 9. Hyv¨arinen, A. and Oja, E., Neural Networks, 13(4–5), 411–430
2000,
10. Belouchrani, A., Abed-Meraim, K., Cardoso, J.-F., and Moulines, E., 1997, IEEE Transactions on Signal Processing, 45(2), 434–444 11. McKeown, M. and Sejnowski, T., Human Brain Mapping, 6, 160–188
1998,
Figure 7: A subset of the components of ICA1 that are related to the activation maps shown in figure 6.