Spatial, Temporal, and Spatiotemporal Independent Component

To appear in conference proceedings of \Spatial-temporal Modelling and its applications", July 1999, Department of Statistics, University of Leeds.

Spatial, Temporal, and Spatiotemporal Independent Component Analysis of fMRI Data JV Stone1, J Porrill1, C Buchel2, and K Friston2 1 Psychology Department, Sheeld University, Sheeld. 2 Wellcome Department of Cognitive Neurology, Queens Square, London.

1 Introduction The fMRI signal associated with a given voxel is aected by a subject's general arousal levels, the experimental task being executed, drifting sensor outputs, and noise. Thus, the signal at each voxel consists of a mixture of underlying source signals. One method for separating signal mixtures into a set of statistically independent signals is independent component analysis (ICA) (Bell and Sejnowski, 1995). ICA depends critically on the following two observations. First, the central limit theorem ensures that a linear mixture of any source signals tends to be Gaussian. Second, source signals tend to be statistically independent of each other. Together, these observations appear to suggest that, if a signal mixture can be decomposed into a set of statistically independent, non-Gaussian signals then these signals are likely to be the source signals of that signal mixture. ICA can be used in two complementary ways to decompose an image sequence into a set of images and a corresponding set of time-varying image amplitudes. Spatial ICA (sICA) (McKeown et al, 1998) nds a set of mutually independent component (IC) images and a corresponding (dual) set of unconstrained time courses, whereas temporal ICA (tICA) (Bell and Sejnowski, 1995) nds a set of IC time courses and a corresponding (dual) set of unconstrained images. Critically, for both sICA and tICA, the unconstrained nature of the dual signals permits them to adopt physically improbable forms in order to satisfy the constraint that their corresponding ICs are statistically independent. That is, ICA can nd statistically independent signals by making use of the extra degrees of freedom implicit in the unconstrained dual signals, even if the underlying sources are not statistically independent. ICA can therefore nd independent signals which are not the underlying sources. Thus, the independence of extracted (spatial or temporal) signals can be achieved at the cost of physically improbable forms for their unconstrained (temporal or spatial, respectively) dual signals. However, we can constrain the solutions found by ICA by placing constraints on these dual signals. One natural modi cation to ICA consists of maximising the degree of independence between dual signals. Accordingly, we introduce spatiotemporal ICA (stICA). stICA places the ICs and their dual signals on an equal footing, so that there are two sets of ICs (spatial and temporal). The critical dierence between conventional ICA (e.g. sICA, tICA) and stICA is as follows. ICA achieves independence over space (sICA) or time (tICA) by `sacri cing' the dual temporal (sICA) or dual spatial (tICA) signals. In contrast, stICA maximises the degree of independence over space and time, without necessarily producing independence in either space or time. That is, stICA permits a trade-o between the mutual independence of images and the mutual independence of their corresponding time courses.

2 Spatial, Temporal and Spatiotemporal ICA

Given an m n matrix containing a sequence of n images x = (x1; : : : ; xn), ICA can be used to nd either spatial or temporal independent components (ICs). Spatial ICA: sICA embodies the assumption that xt can be decomposed: xt = AT zS , where AT is an n k mixing matrix in which each column is a temporal sequence, and zS is an k m matrix of k statistically independent images. sICA can be used to obtain the decomposition yS = WT xt with an unmixing matrix WT = A?T 1, where each row vector in yS is approximately equal to a scaled version of exactly one row vector in zS . This is achieved by maximising the entropy hS = H (YS ) of Y = (yS ), where is a non-linear monotonic function which approximates the cdf of each of the source signals zS . Temporal ICA: tICA embodies the assumption that x can be decomposed: x = AS zT , where AS is an m k mixing matrix in which each column is an image, and zT is an k n matrix of k statistically independent temporal sequences. tICA can be used to obtain the decompostion yT = WS x, with an unmixing matrix WS = A?S 1, where each row vector in yT is approximately equal to a scaled version of exactly one row vector in zT . This is achieved by maximising the entropy hT = H (Y) of YT = (yT ), where is a non-linear monotonic function which approximates the cdf of the source signals zT . Spatiotemporal ICA: Spatiotemporal ICA is a novel technique, which will be described by analogy with singular value decomposition (SVD). Given a sequence of n images x = (x1; :::; xn) in which each image has m pixels, SVD can be used to nd a set of k n eigenimages U = (u1; : : : ; uk ), and a corresponding set of k n eigensequences V = (v1; : : :; vk ), x UDV t; where k is the number of required eigenvectors. The diagonal matrix D contains one singular value per eigenvector. For convenience, we de ne x~ = UDV t. If we de ne U~ = UD1=2 and V~ = V D1=2 then x~ = U~ V~ t. Critically, SVD decomposes x~ into two matrices U~ and V~ , and places constraints on the form of both U~ and V~ (i.e. orthogonal columns). ICA also decomposes x~ = Az into two matrices, but places constraints only on z (i.e. independent row vectors). By analogy with SVD, we propose to constrain the rows of z and the columns of A to be as independent as possible. Using a change of notation, the desired decomposition is de ned as: x~ = S T t, where S is an m k matrix of k mutually independent images, T is a n k matrix of mutually independent sequences, and is a diagonal scaling matrix. is required to ensure that S and T have amplitudes appropriate to their respective cdfs and . If x~ = U~ V~ t then ~ S and T = V~ WT : two k k unmixing matrices WS and WT exist such that S = UW ~ S (V~ WT )t = UW ~ S WTt V~ t (1) x~ = S T t = UW Given that x~ = U~ V~ t = S T t, it can be shown that WT = (WS?1)t (?1)t. WS and WT can be found by simultaneously maximising a function h of the spatial and temporal entropy of extracted signals. The temporal entropy is hT = H (YT ), where YT = (yT ) and yT = V~ WT is a set of extracted temporal signals. The spatial entropy is hS = H (YS ), ~ S is a set of extracted spatial signals. The function h where YS = (yS ) and yS = UW to be maximised is de ned: h(WS ; ) = hS + (1 ? )hT , where de nes the relative weighting aorded to spatial and temporal entropy, and where the latter are de ned as: m k n k hS = log jWS j + m1 log i0(yijS ); hT = log jWT j + n1 log i0(yijT ) (2) j =1 i=1 j =1 i=1 The functions i and i are assumed to be the cdfs of the individual spatial and temporal signals, respectively, and their derivatives i0 and i0 are the corresponding pdfs.

XX

XX

3 Results We applied SVD, sICA, tICA and stICA to n = 360 horizontal fMRI slices, obtained from a visual stimulation experiment [3]. The rows of each (53 63) image were concatenated to form one column in an m n data matrix x. x was pre-processed so that each row and each column had zero mean. SVD was applied to x to obtain x~ = U~ V~ t. It was assumed that the spatial and temporal sources have high kurtosis, so that their cdfs are approximated by = = tanh. We used k = 4 eigenvectors as input to sICA, tICA and stICA, and their respective solutions were based on the same (random) unmixing matrix. Rather than using the data matrix x, we used U~ for sICA and V~ for tICA. Solutions for sICA, tICA and stICA (with = 0:5) were obtained using a conjugate gradient method (each run required about 3 minutes on an SGI Indy R4400). The dierent fMRI experimental conditions de ne a periodic visual stimulation. The ability to isolate this periodic pattern (and the corresponding activation of visual area V1) is used as a measure of success. Figures 1-4 display the `best' periodic time course extracted by each of the four methods. Space limitations preclude inlcusion of the corresponding images. SVD: SVD was used to extract four eigenimages and their corresponding eigensequences. SVD failed to isolate the periodic pattern, which is a prominent feature of three eigensequences, and weak V1 activation appears in the corresponding eigenimages. tICA: The four ICs resulting from tICA are time courses. Results are qualitatively similar to those obtained with SVD. sICA: The four ICs resulting from sICA are images, which contain non-overlapping regions of activity. These are: 1) visual area V1, with a dual time course that is a wellde ned periodic signal, 2) localised frontal activity with a dual high frequency time course superimposed on two abrupt changes in amplitude, located at two of the three rest periods in the experiment, 3) the eye sockets (i.e. eye movements), with a dual high frequency time course superimposed on a single abrupt change in amplitude, located at one of the two abrupt changes mentioned in 2, 4) spatially distributed activity, with a high frequency time course superimposed on an underlying approximately monotonic signal. stICA: The components extracted by stICA consist of a set of IC images and a corresponding set of IC time courses. Results are broadly similar to those reported for sICA, above. The main dierence is that the abrupt changes in the amplitude of time courses are now con ned to one IC time course, so that the large amplitude changes in the eye socket activations produced by sICA are absent with stICA. The similarity of results from sICA and stICA can be explained as follows. If each source image has a cdf which is approximated by then sICA extracts these source images; the image ICs found by sICA de ne a unique set of dual time courses, and these are correct i the IC images extracted are the source images. The same type of reasoning can be used to explain why tICA failed to nd the temporal ICs. If each source time course has a cdf which is poorly approximated by then tICA fails to extract these time courses. The time course ICs found by tICA de ne a unique set of dual images, which are incorrect if the IC time courses extracted are not the source time courses. stICA can be viewed as a constrained version of tICA. That is, the poor results of tICA can apparently be improved by imposing the constraint that the dual images of tICA must be as independent as possible. Such a constraint appears to prevent stICA from producing the spurious results of tICA. This suggests that stICA can work if either or approximates the spatial or temporal cdf (respectively) of sources.

Uniquely, stICA is based on the assumption that both spatial and temporal signals are independent. This assumption is rarely valid, and so stICA provides solutions in which the degree of spatial independence is maximised subject to the constraint that the degree of temporal independence is maximised (and vice versa). Essentially, stICA is based on the physically realistic assumption that both spatial and temporal sources are almost independent. This permits recovery of sources that are correlated over time and space. Such sources are likely to exist in a classically spatiotemporal medium such as brain tissue, in which correlations in activity between nearby points in space and time are the norm. These considerations make stICA a potentially powerful tool for a functional decomposition of brain activity. Acknowledgements: J Stone is a Wellcome Mathematical Biology Research Fellow.

References McKeown, MJ et al. (1998). Proc. National Academy of Sciences, 95:803-810. Bell A.J. and Sejnowski T.J. (1995). Neural Computation, 7:1129-1159. Buchel C. and Friston K. (1997). Cerebral Cortex, 7: 768-778. SVD time course

Temporal ICA time course

Spatial ICA time course

Spatiotemporal ICA time course

Figure 1: Time course associated with 360 fMRI scans for each of four methods. The oscillations of the time course correspond to alternating experimental conditions of visual stimulation. Only the time courses of sICA and stICA are associated with a well de ned region of activity in the primary visual cortex (not shown here).