Biol Cybern (2007) 97:33–45 DOI 10.1007/s00422-007-0154-4
ORIGINAL PAPER
DWT–CEM: an algorithm for scale-temporal clustering in fMRI João Ricardo Sato · André Fujita · Edson Amaro Jr · Janaina Mourão Miranda · Pedro Alberto Morettin · Michal John Brammer
Received: 4 October 2006 / Accepted: 14 March 2007 / Published online: 30 May 2007 © Springer-Verlag 2007
Abstract The number of studies using functional magnetic resonance imaging (fMRI) has grown very rapidly since the first description of the technique in the early 1990s. Most published studies have utilized data analysis methods based on voxel-wise application of general linear models (GLM). On the other hand, temporal clustering analysis (TCA) focuses on the identification of relationships between cortical areas by measuring temporal common properties. In its most general form, TCA is sensitive to the low signal-to-noise ratio of BOLD and is dependent on subjective choices of filtering parameters. In this paper, we introduce a method for wavelet-based clustering of time-series data and show that it may be useful in data sets with low signal-to-noise ratios, allowing the automatic selection of the optimum number of J. R. Sato (B) · A. Fujita · P. A. Morettin Institute of Mathematics and Statistics, University of São Paulo, Rua do Matão, 1010, Cidade Universitria, CEP 05508-090, São Paulo, S.P., Brazil e-mail:
[email protected] A. Fujita e-mail:
[email protected] P. A. Morettin e-mail:
[email protected] E. Amaro Jr LIM44-NIF, Department of Radiology, University of São Paulo, Av. Dr. Enéas de Carvalho Aguiar, 255, 3o. andar, Cerqueira César, CEP 05403-001, São Paulo, S.P., Brazil e-mail:
[email protected] J. M. Miranda · M. J. Brammer Brain Image Analysis Unit, Institute of Psychiatry, King’s College, London, De Crespigny Park, London SE5 8AF, UK e-mail:
[email protected] M. J. Brammer e-mail:
[email protected]
clusters. We also provide examples of the technique applied to simulated and real fMRI datasets.
1 Introduction Functional neuroimaging is making an increasingly important contribution to experimental neuroscience. One of the most popular imaging methods is fMRI, which infers changes in brain function from the temporal evolution of the BOLD signal (Ogawa et al. 1990), an indirect measure of brain activity. Following the initial description of the BOLD technique, a very large number of approaches for data analysis have been described. Despite this apparent variety in methodology, GLM-based (general linear model) methods in which the fits between measured BOLD responses and estimated haemodynamic functions (HRF) are evaluated voxel by voxel are by far the most commonly used methods for producing images of brain regions related to the task performed. Data-driven approaches represent an attractive alternative when HRF predictions are either unavailable or imprecise (not well established model) and may also be a valuable adjunct to GLM-based methods when timing or waveshape uncertainties may reduce goodness of fit. A large class of analysis methods is available (e.g., clustering and signal separation methods such as support vector machines (MouraoMiranda et al. 2005), self-organizing maps (SOM), k-means (Dimitriadou et al. 2004) and ICA (Biswal et al. 1999) for which an estimated HRF is not necessary. These methods focus the extraction or identification of commonalities and patterns between BOLD signals across voxels. Temporal cluster analysis (Yee and Gao et al. 2002; Gao and Yee 2003) does not require an input HRF model and has been used to identify neural modules and networks. Most temporal clustering algorithms group data based on similarity
123
34
measures between BOLD signals. Dimitriadou et al. (2004) have performed extensive comparisons of several different clustering algorithms used currently for fMRI analysis. The results suggest a better performance of k-means and gas neural networks than other methods tested. Despite the apparent attractiveness of data driven temporal clustering methods such as ICA, k-means and fuzzy clustering for fMRI analysis (see Jahanian et al. 2004; Baumgartner et al. 2000), they often require the specification of the expected number of clusters or components. In fMRI datasets, however, the number of clusters often cannot be determined objectively and frequently is either a subjective choice or is obtained using a heuristic approach (Fadili et al. 2001). The classification expectation maximization (CEM) (Celeux et al. 1992, 1995) clustering algorithm is a potentially appealing model-based solution that does not require a priori specification of the number of clusters. It is based on probabilistic distribution mixture theory and is an iterative procedure involving classification, maximum likelihood estimation and expectation–maximization algorithms. The main advantage of CEM compared to independent component analysis or k-means clustering, is that the total number of clusters is not fixed (a priori or a posteriori), but determined from the data using the Akaike (AIC) or Bayesian information criterion (BIC), the properties of which are mathematically well established. Nevertheless, CEM does require the prior specification of the probability distribution of the clusters, since it is based on the maximum likelihood principle. Further, the low signal-to-noise ratio (SNR) of BOLD signal is another potential obstacle for clustering time series using CEM, as the algorithm would cluster noise rather than signal in cases of low SNR. In this work, we introduce a combination of the discrete wavelet transform (DWT) and the CEM algorithm as a method of fMRI analysis that may be valuable when GLM model assumptions are questionable or signal to noise ratio is low. The DWT step allows flexibility in response detection and also produces the probability density behavior necessary for model based clustering. We present the theoretical bases of wavelet transform and the CEM algorithm, show how both techniques can be combined, and illustrate the method by numerical simulations and analysis of real fMRI datasets.
2 Theory 2.1 The discrete wavelet transform (DWT) Wavelet analysis has been widely applied with success in signal processing and image analysis. One common waveletbased technique is multiresolution analysis of a time series in which the latter is decomposed into a representation at
123
Biol Cybern (2007) 97:33–45
different temporal detail levels or resolutions (Daubechies 1992). This is superficially analogous to Fourier analysis where time series energy is transformed into a frequency representation. However, the wavelet representation of the time series has the advantage of being localized in time as well as scale, and. offers advantages for the representation and analysis of signals in which local transients may be important features. Basically, the aim of multiresolution analysis is the approximation of a function f over many levels of resolution. Each subspace is composed of wavelet functions and the optimal approximation is given by the orthogonal projection of f on each subspace. Many authors have commented that the extensive application of wavelet analysis in fMRI reflects its flexibility and ability to address significant image processing issues (e.g., transient signals, colored noise). In fMRI studies, wavelets are commonly used for the evaluation of statistical significance of parameters by permutations and bootstrap and/or by making use of the decorrelating properties of the wavelet transform (Bullmore et al. 2001, 2003; Fadili and Bullmore 2003). In addition, whereas these applications utilized wavelet transformation in the time domain of fMRI data, other studies have extended the application of wavelet analysis by using spatiotemporal wavelet analysis (Long et al. 2004; Van de Ville et al. 2004; Aston et al. 2005). Bullmore et al. (2004) and Shimizu et al. (2004) have also used wavelets to identify fractal structure in fMRI data. Details and formal definitions regarding wavelets can be found in Daubechies (1992) and Vidakovic et al. (1999). A wavelet basis is generated by dilations and translations of a mother wavelet, i.e., if Z denotes the set of integers, ψ j,k (t) = 2 j/2 ψ(2 j t − k),
j, k ∈ Z,
(1)
where ψ(t) should satisfy certain properties (see Daubechies 1992). The discrete wavelet transform of data (X 0 , . . . , X T −1 ) is defined by d j,k (t) =
T −1
X t ψ j,k (t).
(2)
t=0
Note that applying the DWT, we obtain a set of coefficients for each scale of the decomposition. Hence, assuming that acquisition artifacts and haemodynamic responses lie in different scales, multiresolution analysis facilitates discrimination between signal and noise. To optimize computational efficiency, the discrete wavelet is commonly calculated by a cascade or pyramidal algorithm (Vidakovic et al. 1999). The algorithm is based on down sampling (decimation) and the application of a sequence of low and high pass filters (the filter coefficients are related to the chosen wavelet basis). Figure 1 illustrates the discrete wavelet transform of a signal
Biol Cybern (2007) 97:33–45
35
and a signal with added Gaussian white noise, indicating that the significant effects of the noise lie mostly on fine detail scales. An important theoretical property of the wavelet transform of stochastic time series is the asymptotic normality (Vidakovic et al. 1999). This is a useful property in linking the wavelet transform to the CEM algorithm, which will be described in the next section.
fMRI scanning involves the acquisition of data at many voxels and time points. Consider the case we have n observations of a d-dimensional variable x. Let the observations matrix
2.2 CEM algorithm (classification expectation aximization)
where
Clustering consists of rearranging data by categorizing or grouping similar items together. Clustering methods are divided into two basic types: supervised and unsupervised. Supervised clustering uses well-characterized datasets to train an algorithm before clustering an unknown dataset. Information from outside the experiment can help to structure the cluster. This is useful when the pattern similarity is difficult to define. Unsupervised clustering does not use prior information. Commonly used methods are the hierarchical, k-means (McQueen 1967) and self-organizing map (SOM) (Kohonen 1995) analyses. The k-means and SOM analyses are methods of partition clustering that account for common characteristics (features) of elements in a dataset. In k-means clustering, a pre-determined value k (the number of expected clusters) is needed. The algorithm will calculate the distance between each item and the centroid of each cluster. If an item is actually closer to the centroid of another cluster than the one it is currently assigned to, it is reassigned to that cluster. After assigning all items to the their closest cluster, the centroids are recalculated. The algorithm stops after a preset number of iterations the cluster centroids will not change. The SOM analysis is similar to k-means, but instead of allowing centroids to move freely in multidimensional space, they are constrained to a two-dimensional grid. The algorithm then organizes itself to best accommodate the data in this grid. The end result is a clustering of the data in a grid of whatever size is specified. The grid structure implies relationships between neighboring clusters on the grid. A problem with these clustering methods is that, in the main, they prefer certain cluster shapes, and the algorithms will always assign the data to clusters of such shapes even if there were no clusters present. Another potential problem is that the choice of the number of clusters may be critical and subjective, i.e., different kinds of clusters may emerge in response to the prior specification of different numbers of clusters. In order to solve these problems, CEM (classification expectation maximization) was developed by Celeux et al. (1992) and has been applied successfully in molecular biology by Yeung et al. (2001). Further, the stability of the CEM algorithm in processing large datasets is very important, as
x = (x1 , x2 , . . . , xn ),
(3)
and the unknown associated label variables (K groups) z = (z1 , z2 , . . . , zn ),
(4)
zi = (z i1 , z i2 , . . . , z in ),
(5)
and z ig = 1 if observation xi belongs to group g or 0 otherwise. The log-likelihood is given by l(θ |x, z) =
n K
z ig log[ pg f (xi |λg )],
(6)
i=1 g=1
where θ is a vector containing the mixtures parameters pi , λi (i = 1, 2, . . . , K ), and f (x) is the probability density function of x. A diagram illustrating the steps of CEM algorithm is shown in Fig. 2. The first step is to set initial values to the mixture parameters θ (0) , which contain proportions, means and variances of the Gaussian mixture. Then, the current conditional probabilities of each observation belong to the group g are estimated considering the current estimates of the vector of mixture parameters θ (m−1) , where m = 1, 2, . . . , M denotes the iteration number. The conditional probabilities (expectation step) are estimated using the formula (m−1)
pg (m) wig = K
(m−1)
f (xi |λg
(m−1) l=1 pl
)
(m−1) f (xi |λl )
,
for g in 1, 2, . . . , K . (7)
The following step (classification) is to allocate each observation to the group with the highest probability of containing it, i.e., label the data considering ⎧ ⎨ 1, if g = arg max(w (m) ), ig (m) l=1,...,K (8) zˆ ig = ⎩ 0, otherwise. The maximization step is then achieved by obtaining maximum likelihood estimates for the mixture parameters in θ (m) . The algorithm then returns to the expectation step until convergence of all parameters and labels has been achieved within allowed limits. Note that as CEM is based on likelihood maximization, it requires additional information about the probability distribution of the data, which in most cases is assumed to be multivariate Gaussian. For automatic selection of the number of clusters K , the Bayesian information criterion (BIC) introduced by Schwarz (1978) is commonly employed. The
123
36
Biol Cybern (2007) 97:33–45
Fig. 1 The discrete wavelet transform of a function and function + Gaussian white noise, respectively
asymptotic properties and consistency of BIC are well known and have been described extensively in the literature (Hannan and Quinn 1979; Haughton 1988). The optimum number of clusters minimizes the quantity
3 Methods
BIC K = −2l(x|K , θˆ ) + vk log(n),
The aim of this work is: given a set of time series corresponding to the BOLD signal in all voxels within the brain, to cluster the voxels using the similarities in the signal waveshapes. The direct application of the CEM algorithm to fMRI data is not effective due to low signal-to-noise ratio of the BOLD response especially in lower-field magnets or in experiments where the magnitude of the BOLD change is low an increasingly common situation as fine changes in brain function are probed using fMRI. Some authors (Biswal et al. 1995) suggest a low-pass filtering procedure, but this solution requires assumptions regarding the properties of the filter. An objective criterion would be the use of information as present in paradigm design, which suggests the expected degree of
(9)
where vk is the number of parameters in a model with K clusters. As CEM involves maximum likelihood estimation and the calculation of conditional probabilities at each iteration, it is computationally intensive. For very large datasets, this may lead to convergence problems and singularities. These issues have received considerable attention and a very efficient (O(n)) and stable implementation of CEM algorithm for large data sets using BIC criterion has been described by Harris et al. (2000) and can be found in http://klustakwik. sourceforge.net/.
123
3.1 DWT–CEM in fMRI
Biol Cybern (2007) 97:33–45
37
cluster (e.g., Gaussian). This assumption is too restrictive in fMRI data, as the BOLD signal is not stationary, the observations are not independent and the scanner noise components are not Gaussian. As seen in the previous section, wavelet analysis is potentially a useful approach to confronting signal-to-noise problems, as it facilitates a multiresolution analysis (Vidakovic et al. 1999) of time series in which the signal may be analyzed at a number of different levels of temporal resolution. Furthermore, by using the central limit theorem, it can be shown that the wavelet transform of stochastic time series is asymptotic Gaussian (Vidakovic et al. 1999). These properties suggest that a direct combination of DWT and CEM algorithm can be appropriate to cluster fMRI time series. Thus, we propose the following scale-clustering algorithm (see the flowchart in Fig. 3): Step 1: Extract the wavelet coefficients of the detail level in the scale(s) of interest. Step 2: Apply the CEM algorithm to the wavelets coefficients in this (these) scale(s). Each wavelet coefficient is a dimension in the multivariate observations, i.e., the observation matrix is ⎡
(1)
d j,0
(n) d (2) j,0 · · · d j,0
⎢ d (1) d (2) ⎢ j,1 j,1 x=⎢ .. ⎢ .. ⎣ . . (2) d (1) d j,2 j−1 j,2 j−1 Fig. 2 CEM algorithm
smoothness of haemodynamic response. Furthermore, CEM is a model based clustering algorithm and requires knowledge of the nature of the probability density function in each
⎤
· · · d j,1 ⎥ ⎥ .. ⎥ .. ⎥, . . ⎦ · · · d (n) j,2 j−1 (n)
(l)
where d j,k denotes the k-th wavelet coefficient at scale of the l-th observation. Step 3: After obtaining the labels, extract the average or representative time series corresponding to each cluster.
Fig. 3 Wavelets clustering algorithm for fMRI
123
38
Biol Cybern (2007) 97:33–45
Fig. 4 Some wavelet functions
ψ(x)
0.5 0.0
ψ(x)
−1.0
−1.0 −0.5
0.0 0.5 1.0 1.5
D4 Wavelet
1.0
Haar Wavelet
0.0
0.2
0.4
0.6
x
0.8
−1.0
1.0
−0.5
0.0
0.5
1.0
1.5
x D16 Wavelet
ψ(x) −1.5
−1.0
−0.5
0.0
0.5 x
These time series describe the cluster BOLD signal and they can be used to identify clusters of interest, i.e., those related to the experimental paradigm. This last step is similar to the strategy used in ICA to identify independent components of interest. Considering the smoothness of expected haemodynamic response, we suggest the use of D8 or D16 wavelet bases (see Fig. 4; Daubechies 1992). The wavelet with the smoothness degree most similar with the expected signal variation should be used. However, as a property of multiresolution analysis, the effects of incorrect specification will be diluted through all scales, and the results will be approximately the same, independently of the wavelet chosen. On the other hand, the selection of the scale of interest from the wavelet decomposition of the data is of great importance. If it is too fine a high level of noise will be included. In contrast, low-frequency global trends that are not related to experimental responses will be included in the cluster analysis if a too coarse a scale of wavelet coefficients is included. An objective criterion for the selection is to identify the scale of the wavelet decomposition with the largest mean absolute value (energy) of the wavelet coefficients obtained by applying the DWT to the experimental design. In cases of two or more types of stimuli with energy in different scales, one may use the two or more respective scales. Computationally, prior specification of the detail level is relatively unimportant using modern high-speed computers, as the number of voxels is much greater than the number of time points. However, as the interest is to obtain information related to the design paradigm, the wavelet coefficients in other scales will work as confounding variables. This
123
1.0
1.5
2.0
−1.0 −0.5 0.0
−1.0 −0.5 0.0
ψ(x)
0.5
0.5
1.0
1.0
D8 Wavelet
−3
−2
−1
0
1
2
3
x
dimension reduction and the minimization of the effects of confounding or misleading variables is desirable, as CEM is a non-supervised clustering algorithm and will cluster voxels on the basis of similar noise as well as similar signal characteristics. Although utilization of experimental design information to select the scale of interest for analysis may appear to be a contradiction of the basic aim of using an exploratory data analysis (EDA) rather than a GLM approach, this view is probably oversimplistic. The main goal of using EDA is often to avoid the bias introduced by incorrect assumptions of parametric models. However, valid assumptions will leads to higher sensitivity to detect responses. If information about the experimental design is available and reliable, we argue that its inclusion will improve the power and performance of the DWT–CEM method, as it will inform our analysis without imposing the more rigid (and possibly incorrect) assumptions required for GLM-based methods. Thus, the clustering technique described here can also be used with a priori information regarding the scale selection for wavelets decomposition. We will illustrate this possibility in the next sections. The connection between the wavelet transform and the CEM algorithm needs to be highlighted. The wavelet transform is not only important to discriminate signal from artifacts (noise, drifts, etc.) but also results in a Gaussian distribution of coefficients. This is an important step, as multivariate normality is an assumption underlying the application of CEM algorithm. Clustering time series in multisubject experiments is also an interesting possibility. The extraction of common components not only among voxel time series but also across subjects is an interesting tool. We suggest clustering the
Biol Cybern (2007) 97:33–45
voxel BOLD signals of all subjects jointly. In other words, the BOLDs wavelet coefficients of all subjects are inputs to the CEM algorithm. 3.2 Simulations Computational simulations were performed in order to evaluate the proposed combination of the DWT and the CEM algorithms. The first point of interest is to verify whether the assumption of a Gaussian distribution of wavelets coefficients is still valid in cases of autocorrelated time series, which is the situation for most fMRI datasets. For all simulations we used the D16 wavelet (Daubechies 1992) in the DWT. Different types of noise (white, AR(1) and long memory) were added to a simulated HRF (T = 128 time points) consisting of a linear combination of two Poisson functions with peaks at 4 and 8 s replicated 6 times in a block design (see Fig. 1, top). The AR(1) parameter is 0.8 and for the long memory time series fractional integrated white noise (d = 0.4) was considered (Hosking 1981). Furthermore, the accuracy performance of the combination between the wavelet transform and the CEM algorithm was evaluated considering three simulated situations. These were (a) two out of phase responses (denoted HRF-1 and HRF-2, T = 128 time points), (b) a linear combination of two Poisson functions with peak in 4 and 8 replicated 6 times in a block design fashion (see Fig. 6 for illustrative examples) and (c) no haemodynamic response. Considering that only a small percentage of voxels in fMRI datasets normally respond to any given stimulus, the simulated data was composed of 4,096 time series, 128 (3.125%) using HRF-1, 128 (3.125%) using HRF-2 and 3,840 (93.75%) of no response. Gaussian white noise was than added to these curves (SNR = 0.5 and 1, SNR here indicating the mean ratio of signal to noise standard deviations). The DWT–CEM algorithm was than applied to the simulated data (200 simulations). In order to compare the CEM algorithm with other clustering methods, we also applied k-means and fuzzy clustering (FCM) methods (Jahanian et al. 2004; Baumgartner et al. 2000) to the wavelet-transformed simulated data. For each simulation, the maximum number of clusters was selected using the BIC. 3.3 Data acquisition and image preprocessing In order to illustrate the usefulness and performance of the proposed approach, DWT–CEM was also applied to two different datasets. The first is a simple visual–auditory stimulation fMRI experiment for which the HRF model and expected responding brain activated regions are well established. The second dataset is from a more complicated study related to limbic system, involving responses to pleasant and unpleasant stimuli. Here the expected responses are less well defined both in terms of timing and shape as the task
39
requires complex cognitive processing. The images were preprocessed by realignment to minimize the effects of subject motion, slice-time correction and spatial smoothing. GLM activation maps in individual native space were obtained using the software XBAM written in house at the Institute of Psychiatry (Brammer et al. 1997) .
3.4 Visual and auditory experiment Four healthy normal volunteers were scanned for the experiment involving visual and auditory stimulation, according to Ethics Committee of the Hospital das Clínicas University of São Paulo (Brazil). All the images were collected at using a 1.5 Tesla GE Signa scanner (TR = 2 s, TE = 40 ms, 24 slices oriented to AC/PC line), 128 volumes acquired). The visual stimulus consisted of an AB periodic block design (block duration = 24 s), alternating an 8Hz flashing checkerboard stimulus with a fixation cross in the centre of an average gray level background, with six cycles. The auditory stimulus was delivered in the same run, based on an AB periodic block design (block duration = 36 s, four cycles), alternating between silence and passive listening to words via MR compatible headphones (background scanner noise was present in both conditions). In summary, the subject was exposed to visual and auditory stimulation, both presented in block design with different cycle durations and were out of phase with each other. For this experiment, the DWT–CEM computation took approximately 3 hours in a Pentium 4 2.66 GHz.
3.5 Pleasant vs. unpleasant experiment A healthy right-handed US male college student was used in this fMRI study (carried out in accordance with the local ethics committee of the University of North Carolina). The images were acquired at the Magnetic Resonance Imaging Research Center at the University of North California using a 3 Tesla Allegra head-only scanner (Siemens, Germany). The acquisition was performed using T2* sequences, 43 slices (thickness of 3 mm, no gap), TR = 3 s, TE = 30 ms, FA = 80◦ , FOV = 192 × 192 mm, 64 × 64 (3 × 3 mm), 254 volumes. Three different visual active stimuli and a baseline were presented in a block design. The conditions were: unpleasant (dermatological diseases), neutral (people), pleasant (pretty women in swimsuits) and a baseline (fixation). The run were designed as six block of active conditions (seven volumes, contents in random order) alternating with fixation blocks (seven volumes). The DWT–CEM computation took approximately 45 min using a Quad Core MacPro Computer.
123
40
White Noise − SNR=1
0.2
Density
0.1
0.2
Density
0.3
0.3
0.4
White Noise − SNR=0
0.0
0.0
0.1
Fig. 5 Histogram, kernel estimates (solid) and theoretical Gaussian density (dashed) for a wavelet coefficient (first coefficient scale 3). The results are analogous for other coefficients in any scale
Biol Cybern (2007) 97:33–45
−2
0
2
4
0
2
4
Wavelet Coefficient
Density 5
10
Wavelet Coefficient
0.00 0.05 0.10 0.15 0.20 0.25
0.15 0.10 0.00
0.05
Density
8
Long Memory − SNR=1
0.20
AR(1) − SNR=1
0
6
Wavelet Coefficient
−2
0
2
4
6
8
10
Wavelet Coefficient
4 Result 4.1 Simulations The histogram, kernel estimates and Gaussian theoretical densities (for the first wavelet coefficient in scale 3) are plotted in Fig. 5 (1,000 simulations). The results are similar for all wavelet coefficients at any chosen scale. Note that the normality assumption seems to be valid for the three types of signal investigated. This provides strong evidence that the wavelet transform is an adequate preprocessing step for achieving approximate multivariate normality a necessary property of the data—prior to the application of CEM algorithm. The results for classification mean accuracy are presented in Fig. 7. The simulations suggest a good performance of DWT–CEM algorithm, and indicate that it performs better overall than k-means and FCM (fuzzy C-means) approaches, particularly in cases of low SNR.
4.2 Visual and auditory experiment The GLM activation maps for visual and auditory experiment are shown in Figs. 8 and 9. For the cluster analysis, the time series in all voxels were normalized to zero mean and unit standard deviation. This procedure is important in order to prevent clustering based on structural image features (reflecting the average image intensity at each voxel) rather than the intended clustering of BOLD responses. The DWT
123
Fig. 6 Illustrative simulated time series for HRF-1, HRF-2 and white noise
of the experiment design was computed in order to find the decomposition level with the largest mean absolute value of the wavelet coefficients (scale of interest). The detail levels identified were the third and fourth, for visual and auditory experiments respectively (the zero level is the finest scale),
Biol Cybern (2007) 97:33–45
41
clusters in individual native space are presented in Fig. 10. The results of DWT–CEM analysis evidence a clear pattern of clusters in both auditory and visual primary areas, which is consistent across subjects and similar to GLM results presented in Figs. 8 and 9. In addition some clusters are also identified in areas where artifacts are commonly described, but this pattern is not recurrent across subjects. These results are obtained without any prior specification about the HRF, only the wavelet scales of stimulation are informed. The time series of visual and auditory cortical clusters are presented in Fig. 11. 4.3 Pleasant vs. unpleasant experiment
Fig. 7 Classification mean accuracy for simulated data (200 simulations). The error bars describe two standard errors
resulting in 24 wavelet coefficients (predictors) to be considered in CEM analysis. Using the Bayesian information criterion, we obtained 12 clusters in the whole volume. The
The GLM activation maps (active conditions vs. baseline) for the second experiment, designed to evoke emotional responses to images, are presented in Fig. 12, left. The DWT– CEM analysis produced 13 clusters using the BIC. The highest energy of the DWT decomposition was detected at scale 2. The cluster, for which average BOLD was most correlated to the experimental design is shown in Fig. 12, right and the associated time series in Fig. 13.
Fig. 8 Visaud experiment: auditory task brain activation maps obtained using the GLM (cluster p-value < 0.01)
123
42
Biol Cybern (2007) 97:33–45
Fig. 9 Visaud experiment: visual task brain activation maps obtained using the GLM (cluster p-value < 0.01)
The GLM maps and the cluster of higher BOLD correlation show activations in primary visual cortex (V1) and bilateral pre-frontal cortex (PFC). The pre-frontal cortex is possibly involved in making decisions about the image contents. However, the clusters maps are more informative, showing clusters also in white matter, auditory cortex, orbito-ocular areas and bilateral and medial parietal region. The auditory cortex is not involved in the paradigm stimulation, but is inherent to images acquisition (scanner noise). The DWT–CEM analysis also detected a cluster in posterior cingulated/parietal areas often described in so-called resting state or default network experiments (Damoiseaux et al. 2006; De Luca et al. 2006).
5 Discussion General linear model-based (GLM) analysis is a very popular approach in brain activation mapping. However, temporal clustering may be also an interesting tool, providing an alternative way to identify activation foci. Temporal clustering,
123
unlike most GLM approaches is also naturally multivariate. In this paper, we propose a wavelet cluster analysis method (DWT–CEM), which automatically selects the optimum number of clusters and works well in low SNR. The clusters are based on temporal similarity of signals, not highlighted by purely model-driven GLM analysis, indicating possible neural modules or networks. Furthermore, the analysis suggests functional regions of interest, which could be objective candidates for subsequent ROI-based analysis. DWT–CEM can identify auditory and visual responses in real fMRI data but without the necessity for a detailed prior specification of an expected HRF or the total number of clusters. Providing the response is contained within the chosen scale(s) of interest, DWT–CEM will also be insensitive to phase shifts of the response and to variations of response amplitude between blocks. In addition to real experimental responses, clustered artefactual components can also be identified in the chosen scale. The second dataset examined, involving responses to emotionally salient stimuli, is a more complex. In cases such as this the combined cognitive/HRF response form is often
Biol Cybern (2007) 97:33–45
43
Fig. 10 Visaud experiment: DWT–CEM analysis. The time series of clusters indicated by the numbers are presented in Fig. 9
Fig. 11 Visaud experiment: BOLD signals of clusters at visual and auditory cortex. The green dotted lines describe the respective stimulation
not well established, and GLM analysis may not provide all relevant information related to brain dynamics. The main obstacle for applying parametric models in experiment involving the limbic system is that there is often no control
or description of the effective time course of the response. Despite the fact that the time of visual presentation is known, the time for internal processing and task accomplishment are often completely unknown. In addition to aiding with these
123
44
Biol Cybern (2007) 97:33–45
Fig. 12 Pleasant vs. unpleasant experiment: Left GLM activation map (cluster p-value < 0.01); Right DWT–CEM cluster most correlated with paradigm design ( p < 0.001)
Fig. 13 Pleasant vs. unpleasant experiment: all clusters obtained in DWT–CEM
123
issues, DWT–CEM analysis allocates all intracerebral voxels to clusters, revealing features of the brains activity, e.g., in this case, temporal clusters in auditory cortex and also the existence of a brain default network, which would not be revealed by model-driven GLM type analysis. The DWT– CEM analysis shows groups of voxels clustered accordingly to their scale-temporal waveshape similarities independently of model assumptions. However, if we choose to do so, by limiting the resolution using the design, we can focus on phenomena occurring on the approximate time-scale of the experimental response. Our main aim of applying DWT–CEM to fMRI, however, is not the simple identification of activated areas, a replacement for standard GLM methodology, but the unsupervised grouping voxels according to their temporal similarities at scales of interest. Additionally, the property to separate different clusters in the same functionally identified region represents an avenue of possible use for wavelet clustering, based on discriminating different wave shapes. We conclude that DWT–CEM may provide a powerful tool for the identification of brain areas related in terms of their time-series behavior. Because it can accommodate phase shifts and variations in response amplitude and represents a multivariate approach to data analysis, our suggested approach may complement existing methods of identifying functionally similar brain components.
Biol Cybern (2007) 97:33–45 Acknowledgements We are grateful to Maria G. M. Martin for providing the visual–auditory datasets. This research is supported by FAPESP (03/10105-2) and CNPq (142616/2005-2) Brazil.
References Aston JA, Gunn RN, Hinz R, Turkheimer FE (2005) Wavelet variance components in image space for spatiotemporal neuroimaging data. Neuroimage 25(1):159–168 Baumgartner R, Ryner L, Richter W, Summers R, Jarmsaz M, Somorjai R (2000) Comparision of two exploratory data analysis methods for fMRI: fuzzy clustering vs. principal component analysis. Magn Reson Imaging 18:89–94 Biswal B, Yetking FZ, Haughton VM, Hyde JS (1995) Functional connectivity in the motor cortex of resting human brain using echoplanar MRI. Magn Reson Med 34:537–541 Biswal B, Ulmer JL (1999) Blind source separation of multiple signal sources of fMRI data sets using independent component analysis. J Comput Assist Tomogr 23(2):265–271 Brammer MJ, Bullmore ET, Simmons A, Williams SC, Grasby PM, Howard RJ, Woodruff PW, Rabe-Hesketh S (1997) Generic brain activation mapping in functional magnetic resonance imaging: a nonparametric approach. Magn Reson Imaging 15(7): 763–770 Bullmore E, Long C, Suckling J, Fadili J, Calvert G, Zelaya F, Carpenter TA, Brammer M (2001) Colored noise and computational inference in neurophysiological (fMRI) time series analysis: resampling methods in time and wavelet domains. Hum Brain Mapp 12(2):61–78 Bullmore E, Fadili J, Breakspear M, Salvador R, Suckling J, Brammer M (2003) Wavelets and statistical analysis of functional magnetic resonance images of the human brain. Stat Methods Med Res 12(5):375–399 Bullmore E, Fadili J, Maxim V, Sendur L, Whitcher B, Suckling J, Brammer M, Breakspear M (2004) Wavelets and functional magnetic resonance imaging of the human brain. Neuroimage 23(Suppl 1):S234–S249 Celeux G, Govaert G (1992) A classication EM algorithm for clustering and two stochastic versions. Comput Statist Data Anal 14(3): 315–332 Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognition 28:781–793 Damoiseaux JS, Rombouts SA, Barkhof F, Scheltens P, Stam CJ, Smith SM, Beckmann CF (2006) Consistent resting-state networks across healthy subjects. Proc Natl Acad Sci USA 103(37): 13848–13853 Daubechies I (1992) Ten lectures on wavelets. SIAM, Philadelphia PA De Luca M, Beckmann CF, De Stefano N, Matthews PM, Smith SM (2006) fMRI resting state networks define distinct modes of long-distance interactions in the human brain. Neuroimage 29(4): 1359–1367 Dimitriadou E, Barth M, Windischberger C, Hornik K, Moser E (2004) A quantitative comparison of functional MRI cluster analysis. Artif Intell Med 31(1):57–71 Fadili MJ, Bullmore ET (2003) Wavelet-generalized least squares: a new BLU estimator of linear regression models with 1/f errors. Neuroimage 15(1):217–232
45 Fadili MJ, Ruan S, Bloyet D, Mazoyer B (2001) On the number of clusters and the fuzziness index for unsupervised FCA application to BOLD fMRI time series. Med Image Anal 5(1):55–67 Gao JH, Yee SH (2003) Iterative temporal clustering analysis for the detection of multiple response peaks in fMRI. Magn Reson Imaging 21(1):51–53 Hannan EJ, Quinn BG (1979) The determination of the order of an autoregression. J Roy Statis Soc Ser B 41:190–195 Harris KD, Henze DA, Csicsvari J, Hirase H, Buzsaki G (2000) Accuracy of tetrode spike separation as determined by simultaneous intracellular and extracellular measurements. J Neurophysiol 84(1):401–414 Haughton D (1988) On the choice of model to fit data from an exponential family. Ann Statist 16:342–355 Hosking JRM (1981) Fractional differencing. Biometrika 68:165–176 Jahanian H, Hossein-Zadeh GA, Soltanian-Zadeh H, Ardekani BA (2004) Controlling the false positive rate in fuzzy clustering using randominzation: application to fMRI activation detection. Magn Reson Imaging 22:631–638 Jia Z, Xu S (2005) Clustering expressed genes on the basis of their association with a quantitative phenotype. Genet Res 86(3):193–207 Kohonen T (1995) Self-organizing maps. Springer, Berlin Long C, Brown EN, Manoach D, Solo V (2004) Spatiotemporal wavelet analysis for functional MRI. Neuroimage 23(2):500–516 McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proc Fifth Berkeley Symposium on Math Stat and Prob, vol 1, pp 281–296 Mourao-Miranda J, Bokde AL, Born C, Hampel H, Stetter M (2005) Classifying brain states and determining the discriminating activation patterns: Support Vector Machine on functional MRI data. Neuroimage 28(4):980–995 Ogawa S, Lee TM et al. (1990) Brain magnetic resonance imaging with contrast dependent on blood oxygenation. Proc Natl Acad Sci USA 87(24):9868–9872 Pan W (2007) Incorporating gene functions as priors in model-based clustering of microarray gene expression data. Bioinformatics (in press) Shimizu Y, Barth M, Windischberger C, Moser E, Thurner S (2004) Wavelet-based multifractal analysis of fMRI time series. Neuroimage 22(3):1195–1202 Schwarz G (1978) Estimating the dimension of a model. Ann Statist 6:461–464 Strainer JC, Ulmer JL, Yetkin FZ, Haughton VM, Daniels DL, Millen SJ (1997) Functional MR of the primary auditory cortex: an analysis of pure tone activation and tone discrimination. AJNR Am J Neuroradiol 18(4):601–610 Tjaden B (2006) An approach for clustering gene expression data with error information. BMC Bioinformatics 7:17 Van De Ville D, Blu T, Unser M (2004) Integrated wavelet processing and spatial statistical testing of fMRI data. Neuroimage 23(4):1472–1485 Vidakovic B (1999) Statistical modeling by wavelets. Wiley Series in Probability and Statistics. ISBN: 0471293652 Yee SH, Gao JH (2002) Improved detection of time windows of brain responses in fMRI using modified temporal clustering analysis. Magn Reson Imaging 20(1):17–26 Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Modelbased clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987
123