Multiple Hypothesis Mapping of Functional MRI Data in ... - IEEE Xplore

1 downloads 0 Views 1MB Size Report
Abstract—We are interested in methods for multiple hypothesis testing that optimize power to refute the null hypothesis while con- trolling the false discovery rate ...
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 9, SEPTEMBER 2005

3413

Multiple Hypothesis Mapping of Functional MRI Data in Orthogonal and Complex Wavelet Domains Levent S¸endur, Member, IEEE, Voichit¸a Maxim, Brandon Whitcher, and Ed Bullmore

Abstract—We are interested in methods for multiple hypothesis testing that optimize power to refute the null hypothesis while controlling the false discovery rate (FDR). The wavelet transform of a spatial map of brain activation statistics can be tested in two stages to achieve this objective: First, a set of possible wavelet coefficients to test is reduced, and second, each hypothesis in the remaining subset is formally tested. We show that a Bayesian bivariate shrinkage operator (BaybiShrink) for the first step provides a powerful and expedient alternative to a subband adaptive chi-squared test or an enhanced FDR algorithm based on the generalized degrees of freedom. We also investigate the dual-tree complex wavelet transform (CWT) as an alternative basis to the orthogonal discrete wavelet transform (DWT). We design and validate a test for activation based on the magnitude of the complex wavelet coefficients and show that this confers improved specificity for mapping spatial signals. The methods are applied to simulated and experimental data, including a pharmacological magnetic resonance imaging (MRI) study. We conclude that using BaybiShrink to define a reduced set of complex wavelet coefficients, and testing the magnitude of each complex pair to control the FDR, represents a competitive solution for multiple hypothesis mapping in fMRI. Index Terms—Bayes, brain mapping, multiple comparisons, neuroimaging.

I. INTRODUCTION A. Multiple Hypothesis Testing and fMRI

T

HE detection of signals based on noisy observations is a problem common in many real-life image processing problems. Here, we are specifically concerned with the detection of spatially extended brain activation signals based on twoor three-dimensional (2-D or 3-D) maps of noisy statistics estimated by linear modeling of functional magnetic resonance imaging (fMRI) time series. This problem is complicated by the ) of fMRI, inherently low signal-to-noise ratio (SNR by the modest numbers of subjects typically studied in an ex), and by the common pracperimental sample ( tice of independently testing for significant activation at each

Manuscript received September 29, 2004; revised April 6, 2005. This work was supported by a Human Brain Project grant from the National Institute of Biomedical Imaging and Bioengineering and the National Institute of Mental Health. The Wolfson Brain Imaging Centre is supported by an MRC Co-operative Group grant. The pharmacological MRI data were acquired as part of a study funded by GlaxoSmithKline plc. The associate editor coordinating the review of this manuscript and approving it for publication was Guest Editor Keith Worsley. L. S¸endur and E. Bullmore are with the Brain Mapping Unit, Addenbrooke’s Hospital, University of Cambridge, Cambridge CB2 2QQ, U.K. (e-mail: [email protected]; [email protected]). V. Maxim is with the Department of Mathematiques, INSA de Toulouse, 31077 Toulouse Cedex, France (e-mail: [email protected]). B. Whitcher is with Translational Medicine and Genetics, GlaxoSmithKline, Greenford, U.K. (e-mail: [email protected]). Digital Object Identifier 10.1109/TSP.2005.853098

one of several thousand voxels comprising a map of the entire brain, which mandates conservative thresholds for significance to control type 1 (false positive) error in the context of multiple comparisons. The combination of low SNR, small , and stringent criteria for significance is a recipe for type 2 (false negative) error, and there is, consequently, considerable interest in methods for multiple testing that confer enhanced power to reject the null hypothesis without sacrificing stringency of type 1 error control or restricting the anatomical scope of analysis to a few “regions of interest.” We can formalize this problem in terms of a number and alternative hypotheses of null hypotheses ( ), where the search volume or number of is typically in the order of 20 000. hypotheses to be tested In the context of multiple comparisons, the conventional voxel threshold for refuting an individual null hypothesis is clearly liable to yield a large number of false positives, specifically, the expected number of false positive FP . One well-known technique for tests achieving stronger control over the type 1 error rate is to set the voxel threshold such that the family-wise error rate (FWER) is , controlled at some acceptable level [1], i.e., where reject at least one null hypothesis

is true for all

(1) This is effectively controlling the type 1 error for testing is true for all versus is true for some . The Bonferroni correction is valid whether or not the tests are independent and exact in the case of independence. For tests, the Bonferroni correction defines the voxel threshold for significance as the FWER divided by the number of tests, i.e., . It can be shown that the Bonferroni correction has strong control of the type I error and will typically lead to highly conservative thresholds (e.g., assuming and , then ) with subsequent loss of power to reject the null hypothesis when the alternative is true. This can be particularly problematic for the analysis of statistic maps derived from fMRI studies where power to detect activation on average over a group of subjects is often constrained a priori by the small number of subjects studied. Under the circumstances of low degrees of freedom for group activation statistics, methods for controlling FWER based on Gaussian random field theory, which do not assume that the tests are independent, may be even more conservative than the Bonferroni correction [2]. A less conservative alternative to the Bonferroni correction, which achieves the same degree of control over the FWER [3],

1053-587X/$20.00 © 2005 IEEE

3414

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 9, SEPTEMBER 2005

is to set such that the false discovery rate (FDR) is controlled [4]. The false disat some acceptable level, e.g., covery rate is defined as FP (2) FDR FP TP where FP is the number of false positive tests, and TP is the number of true positive tests. To find the appropriate voxel threshold for controlling the FDR, the algorithm originally proposed in [4] has been widely applied in fMRI [5]; see Box 1 for algorithmic details. Box 1—False discovery rate control algorithm . 1. Select the FDR bound 2. Compute the individual -values of each voxel statistic under the null hypothesis , i.e., assuming a Normal distribution for the noise where is the Gaussian cumulative distribution function. 3. Compute the order statistics of the -values Fig. 1. General schematic of wavelet-based, two-stage multiple hypothesis testing algorithms.

4. Choose to be the largest fying the following condition:

satis-

5. Reject all of the null hypotheses corresponding to . B. Wavelet Methods for Multiple Hypothesis Testing Wavelets provide a relatively new mathematical framework for multiresolution decomposition and hypothesis testing of images; see Laine [6] for an introduction to wavelets in the context of biomedical image analysis, Bullmore et al. [7] for a review of recent applications of wavelets to fMRI data analysis, and Mallat [8] for a comprehensive technical introduction to multiresolution analysis. The main argument in favor of using wavelets for probabilistic signal detection is that the discrete wavelet transform (DWT) tends to concentrate signals in a few large coefficients, whereas noise in the image is represented by a large number of much smaller wavelet coefficients. Moreover, the DWT is inherently a multiresolution analysis; therefore, the power to detect signals of variable spatial extent is not biased toward the arbitrary size of a single Gaussian smoothing kernel. Finally, wavelet coefficients of correlated processes are often substantially decorrelated [9]–[12], and tests on multiple coefficients can therefore often be regarded as approximately independent. Ruttimann et al. [13] made the first effort to apply wavelets to the problem of multiple hypothesis testing in the analysis of fMRI statistic maps. They applied the orthogonal DWT to 2-D statistic maps and devised a strategy for probabilistic thresholding of the resulting wavelet coefficients

. They assumed that the coefficients were uniformly and independently contaminated with Gaussian white noise, and therefore, the sum of squared, standardized and orientation wavelet coefficients at each scale of the 2-D wavelet transform was distributed approximately on degrees of freedom, where is the as number of coefficients in the th level of the decomposition. Scales of the DWT for which the null hypothesis could not be rejected by an appropriate chi-squared test were not examined further, whereas in levels that passed this omnibus test for significance, each standardized coefficient was then individually tested against the standard Normal distribution. The independence both between scales and between coefficients within scales that is implied by the assumption of independent and identically distributed (i.i.d.) Gaussian noise in the data encouraged control of the FWER by Bonferroni correction for both stages of the hypothesis testing algorithm. An activation map was finally constructed by the inverse DWT using only those coefficients that had survived both tests for significance. This approach has the merit of reducing the overall number of and Normal tests to be conducted, but the validity of the approximations, and the exactness of the Bonferroni correction, all depend on the assumption that the errors in the imaging data are i.i.d. Gaussian. (Although we regard this assumption as unrealistic, we will continue to use it in this paper as a basis for comparing Ruttimann’s method as originally described to alternative wavelet-based methods for hypothesis testing.) We can reformulate Ruttimann’s method more generally, as shown in Fig. 1, as a two-stage testing procedure whereby the total number of hypotheses to be tested is first reduced by a

S¸ ENDUR et al.: MULTIPLE HYPOTHESIS MAPPING OF FUNCTIONAL MRI DATA

3415

preliminary thresholding operation in the wavelet domain; then, is tested, the smaller subset of surviving coefficients one hypothesis at a time, in a way that controls the FWER (or the tests. Finally, the signal is reconstructed in the FDR) over all spatial domain by the inverse DWT using only those coefficients that survive both tests (all other coefficients being set to zero). Shen et al. [14] recently introduced an analogous method, called the enhanced false discovery rate (EFDR) control algorithm, which differs in some important details from Ruttimann’s implementation. In the first stage of Shen’s algorithm, the generalized degrees of freedom [15] is used to define the reduced subset of wavelet coefficients for further testing, and in the second stage, they tested the significance of statistics informed by the local neighborhood around each wavelet coefficient (rather than treating each coefficient in isolation), setting the significance threshold for each test such that the FDR was controlled at . They showed that this algorithm achieved the same degree of type 1 error control as a standard FDR algorithm (considering the ordered -values for all wavelet coefficients) but was less conservative because the number of hypothesis tests had been substantially reduced by the first stage.

DWT and more specialized multiscale transforms such as ridgelets. 3) Computational complexity should be minimized. An attractive aspect of Shen’s algorithm is that the first-stage thresholding operation acknowledges the expected dependencies between wavelet coefficients under the alternative hypothesis [19], [20]. For example, among the wavelet coefficients representing many natural images, we can expect that large or small coefficients will tend to propagate across scales: A large “parent” coefficient tends to have a large “child” in the same location on the next finest scale of the hierarchy. In addition, large or small coefficients tend to cluster together within scale: If a coefficient is large or small, its immediate neighbors tend also to be large/small. Test statistics that are tuned to detect features of the image expected under the alternative hypothesis, such as the local cluster coefficient proposed in [14], are likely to provide greater power to reject the null hypothesis. However, the first-stage thresholding algorithm for the particular statistic proposed in [14], which involves a Monte Carlo numerical integration to estimate the generalized degrees of freedom [15], is extremely inefficient to compute. Partly for this reason, we explore the use of a Bayesian bivariate shrinkage function as an alternative and computationally more efficient way of identifying the reduced set of wavelet coefficients for formal significance testing.

C. Design Criteria for New Wavelet-Based Multiple Hypothesis Testing Algorithms In this paper, we are primarily concerned with developing and validating alternative algorithms for wavelet-based hypothesis testing of fMRI statistic maps that adopt the same general framework as developed in [13] and [14] but differ in terms of the wavelet transforms used and the way in which a reduced subset of hypotheses is defined for the second-stage of testing. In doing so, we are motivated by three issues in particular. 1) Power to reject the null hypothesis is crucial. For reasons of modest SNR and sample sizes in fMRI, power of statistical testing is at a premium. In what follows, we systematically compare four alternative multiple hypothesis testing algorithms, each designed to control the FDR and implemented using the orthogonal DWT in terms of their power in analyzing simulated images; we also define rigorously what we mean by power to refute the null hypothesis. 2) The orthogonal DWT may not be optimal for representing higher dimensional signals. Existing methods for wavelet analysis of spatial maps have generally used a separable 2-D or 3-D orthogonal DWT. It is well known that orthogonal wavelets are optimal for a 1-D signal, including point singularities; however, spatial processes that include line or curve singularities (edges) will not be so well represented by a separable DWT. Recent research has addressed the development of 2-D multiscale transforms representing edges more efficiently than the orthogonal DWT [16]–[18]. Here, we will consider one such transform [the dual-tree complex wavelet transform (CWT)] as an alternative basis to the DWT for representing fMRI statistic maps since the CWT strikes an attractive balance between flexibility, computational complexity, and redundancy, compared with the orthogonal

D. Organization of the Rest of This Paper In Section II-A, we introduce the dual-tree complex wavelet transform (CWT) as an alternative to the orthogonal DWT; in Section II-B, we describe a Bayesian bivariate shrinkage operator as a method for defining the reduced set of hypotheses for formal testing, and in Section II-C, we define algorithms for controlling FDR over multiple tests of orthogonal and complex wavelet coefficients; this entails validating some key assumptions concerning the decorrelating properties of the CWT (Section II-D). In Section III-A, we compare our methods based on the orthogonal DWT to alternative schemes proposed in [13] and [14] using simulated data to evaluate the conventional and FDR power of the methods. We also extend this comparison by considering an fMRI statistic map contaminated by various degrees of Gaussian noise. In Section III-B, we evaluate the performance of the dual-tree CWT by analyzing both simulated and experimental fMRI data from a single subject study of visual stimulation and a larger study of antidepressant drug effects on brain activation elicited by sad facial affect processing in patients with depression. We finish with some brief conclusions in Section IV. II. METHODS AND MATERIALS A. Orthogonal and Complex Wavelet Transforms Although the orthogonal, separable wavelet transform has been used as the basis for two-step hypothesis testing [13], [14], this approach is not without its limitations. In particular, the orthogonal DWT is not shift invariant; in other words, the wavelet

3416

Fig. 2. (Left) Image plots of 12 wavelets associated with the real and imaginary parts of the dual-tree complex wavelet basis functions are illustrated. The complex wavelets provide six directional filters, each oriented in a distinct direction, providing superior directional selectivity over discrete wavelet transform. (Right) The dual-tree complex wavelet transform also provides perfect reconstruction with short support filters while providing limited redundancy (four times for 2-D images), and the coefficients are estimated by two iterated filterbank structures operating in parallel. For tree a, the filters h [j; l] and h [j; l] are the lowpass and highpass wavelet filters corresponding to the scale j and l = 0 . . . L , where L is the filter length at scale j (the filters h [j; l] and h [j; l] are for tree b) [17], [21], [25].

coefficients of a shifted signal are not the same as the shifted coefficients of the original signal. Furthermore, although the orthogonal DWT is an optimal basis for representing a one-dimensional (1-D) signal including point singularities, it is less than optimal for representation of 2-D images, which may include line or curve singularities (edges). Recent research has led to the development of various 2-D multiscale transforms which address these issues, including curvelets [18], complex-valued transforms [17], [21], and the steerable pyramid [16]. However, we will focus here on the dual-tree complex wavelet transform (CWT), which has been proposed by Kingsbury [17], [21], to address problems of shift invariance and directional selectivity associated with image processing by the DWT. The dual-tree CWT is a wavelet frame obtained by concatenating two wavelet bases. The coefficients of the transform are estimated by two wavelet filter banks or “trees,” which are denoted and , operating iteratively in parallel (see Fig. 2). What makes the transform so powerful is the specification of these two wavelet bases: One wavelet basis is approximately the Hilbert transform pair of the other [22], [23]. Hence, the filters are designed in order to satisfy this property. The initial design used alternate odd- and even-length biorthogonal filters for all but the finest scale of the decomposition and the biorthogonal Antonini sets for the finest scale (level 1). Kinsgbury [24] proposed new sets of filters beyond the first level in order to improve the orthogonality and symmetry properties. These filters are similar to Daubechies orthonormal filters with an additional design constraint (the filters in the two trees are just the time reverse of each other, as are the analysis and reconstruction filters). Selesnick replaced the first-level biorthogonal Antonini sets with nearly symmetric filters for orthogonal two-channel perfect reconstruction [25], and we have used software developed at the Polytechnic University, Brooklyn, NY [26], to implement this version of the dual-tree CWT. The resulting coefficients can be thought of as complex pairs. In other words, each tree of iterated wavelet filters produces a

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 9, SEPTEMBER 2005

valid set of real DWT coefficients, which are denoted and , respectively, at each scale and location of the decomposition, and together, they form the complex coefficient (note the scale and location indices will be omitted for convenience when it is not necessary to reference specific wavelet coefficients). Although perfect reconstruction of the signal can be obtained using either set of coefficients, what makes the transform so powerful is that the magnitudes of the complex coefficients are nearly shift invariant, i.e., small , signal shifts do not affect the magnitudes although they do affect the real and imaginary coefficients considered separately. Therefore, the magnitude of the combined or coefficients is a more reliable measure than either the real coefficients. the imaginary Additionally, the basis functions of the dual-tree CWT have superior directional selectivity compared with the basis functions of the orthogonal DWT. As shown in Fig. 2, the dual-tree , , CWT basis has directional selectivity at 15, 45, 75, , whereas the DWT basis has directional selectivity and only at 0, 45, and 90 . This makes the dual-tree CWT naturally more adaptive to the analysis of 2-D images containing edges (line singularities) and more invariant under angular rotation of the edges. For -dimensional signals, the dual-tree CWT has times redundancy. For example, it is two times redundant for 1-D signals and four times redundant for 2-D signals. B. Bayesian Bivariate Shrinkage: BaybiShrink In the wavelet domain, if we use an orthogonal DWT, the denoising problem can be formulated as (3) where is the observed wavelet coefficient, is the true coefficient, and is independent Gaussian noise. Simple denoising algorithms, using the orthogonal DWT, consist of three steps. 1) Calculate the wavelet transform of the noisy signal. 2) Modify the noisy wavelet coefficients according to some rule. 3) Compute the inverse transform using the modified coefficients. One of the most well-known rules for the second step is soft thresholding [27]. Due to its effectiveness and simplicity, it is frequently used in the literature. The main idea is to subtract the threshold value from all coefficients larger than and to set all other coefficients to zero. It can be defined as soft where

sign

(4)

is defined as if otherwise.

(5)

Although the soft thresholding operator does not take into account the dependencies between the wavelet coefficients of natural images, very effective image denoising algorithms have

S¸ ENDUR et al.: MULTIPLE HYPOTHESIS MAPPING OF FUNCTIONAL MRI DATA

been proposed by estimating threshold values accurately such as VisuShrink [28], SureShrink [29], and BayesShrink [30]. The fundamental idea behind the Bayesian bivariate shrinkage operator, here dubbed “BaybiShrink” for short, is that we can generally assume the existence of dependencies between the wavelet coefficients representing a natural image or signal [19], [20]. In particular, if a wavelet coefficient is large/small, the adjacent coefficients are likely to be large/small, and large/small coefficients tend to propagate across the scales. Other shrinkage algorithms that exploit the dependency between coefficients under the alternative hypothesis can provide improved results compared with those that ignore this dependency structure [19], [20], [31]–[33]. For example, a new framework to capture the statistical dependencies by using wavelet-domain hidden Markov trees (HMTs) was developed in [19]. In [16] and [20], the wavelet coefficients were assumed to be Gaussian, and a joint shrinkage function that used neighboring wavelet coefficients was proposed. In [31] and [34], the class of Gaussian scale mixtures for modeling natural images was proposed and applied to the image denoising problem. S¸ endur and Selesnick [35] recently proposed a simple nonGaussian bivariate probability distribution function (PDF) to model the bivariate statistics of wavelet coefficients of natural images. The model captures the dependence between a wavelet coefficient and its parent. Using Bayesian estimation theory they derived from this model, a simple nonlinear shrinkage function for wavelet denoising, which generalizes the classical soft thresholding approach. The new shrinkage function, which depends on both the coefficient and its parent, yields improved results for wavelet-based image denoising. Let represent the parent of , i.e., is the wavelet but at the immediately coefficient at the same location as coarser scale. We formulate the problem in the wavelet domain and to take into account the staas tistical dependencies between a coefficient and its parent. The and , and coefficients and are noisy observations of and are realizations from the noise process. The proposed non-Gaussian bivariate PDF for the coefficient and its parent (the prior for ) can be written as (6) Using Bayesian estimation theory [36], the maximum a posteriori estimator of (the MAP estimate of ) is

(7) which can be interpreted as a bivariate shrinkage function where . Fig. 3 illustrates the plot of the threshold value this bivariate shrinkage function. A simple, computationally efficient denoising algorithm that incorporates this function, and defines estimation of the threshold value, is summarized in Box 2. The key point to note is that interscale dependencies between wavelet coefficients representing the true image are addressed by construction of the bivariate PDF, and intrascale dependencies are addressed by using a threshold value that is informed

3417

Fig. 3. BaybiShrink: Bayesian bivariate shrinkage function specified by (7).

by the variance of coefficients in neighboring locations and the same scale as the index coefficient. This choice of threshold can be regarded as a refinement of subband adaptive or level-specific thresholds that are informed by the variance of all coefficients in a given scale. This shrinkage operator is appropriate for denoising both DWT coefficients and the magnitude of a pair of CWT coefficients and can therefore serve as a method for the first step in hypothesis testing on either basis. Box 2 – BaybiShrink: Bayesian bivariate shrinkage algorithm 1. Estimate the noise variance by the robust median estimator applied to the finest scale wavelet coefficients , where is the finest ( scale) median

subband

2. For each coefficient, do the following. using —Estimate noisy signal variance neighboring coefficients

where is the size of the neighborhood . We use a rectangular-shaped window for our implementation. using —Estimate the signal variance , – Estimate the threshold value using . —Estimate each coefficient using the Bayesian bivariate shrinkage function [see (7)] with the estimated threshold value.

3418

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 9, SEPTEMBER 2005

C. FDR Control for Orthogonal and Complex Wavelet Coefficients If the orthogonal DWT has been adopted as the basis for analysis, then the wavelet coefficients divided by the noise variance will have a standard Normal distribution under the null hypothesis, and each can be tested using the FDR control algorithm defined in Box 1. However, if the dual-tree CWT has been used, we will want to test the magnitude of the complex coefficients, which is not Normally distributed. Since the dual tree filters used in the CWT are mutually orthogonal, it seems reasonable initially to assume that the real and imaginary components of each complex coefficient, i.e., the coefficients estimated by trees and at the same scales and locations, will be i.i.d. standard Normal. More formally, we expect i.i.d.

and

i.i.d.

(8)

Therefore, the squared magnitude of complex coefficients under the null hypothesis has a chi-squared distribution on two degrees of freedom, i.e., (9) On this assumption, it is straightforward to calculate the -value for the squared magnitude of each complex pair using the distribution and then apply the FDR algorithm to the series of ranked -values, as in Box 1. The validity of this test will clearly depend on the assumption of independence between CWT coefficients for arbitrary scales and locations, which we investigate in the following section. D. Covariance Structure of the Dual-Tree CWT Decorrelation is a basic property of the orthogonal DWT and discrete wavelet packet transform (DWPT) for stationary, and potentially nonstationary, time series models [9], [11], [12], [37]. A quantitative description of the autocovariance or cross-covariance sequences (ACVS and CCVS, respectively) of CWT coefficients for a given signal is easily obtained using basic properties of the covariance. (The details can be found in the Appendix). and be two nonboundary wavelet coLet efficients corresponding to tree at scale and spatial location and tree at scale and spatial location . Fig. 4 displays and the correlation between dual-tree CWT coefficients for a fractional difference (FD) process with [38]. A long-memory process is chosen as a relatively extreme example of correlation in a time series model and has been detected in fMRI studies [7]. Note that the vertical scale across , and hence, there is substantial corall plots in Fig. 4 is relation between the two trees for specific lags. One possible lag of interest is lag zero for wavelet coefficients at the same scale (plots along the diagonal). This would be of interest when combining wavelet coefficients for denoising [35], e.g., . The squared magnitude of the wavelet coefficient will have a distribution when both the dual-tree CWT coefficients are i.i.d. standard Normal [see (9)]. Assuming Normality, mean zero is straightforward and only depends on basic

Fig. 4. Correlation between the dual tree complex wavelet transform coefficients w [j; t] and w [j ; t ], formed from an FD(0.4) process for levels 1 j < j 4. The vertical scale is set at ( 0:55, 0.55) for all plots. Note that zero-lag, within-scale correlation between coefficients from different trees is generally small, apart from the finest scale (level 1), where between tree correlation > 0:5.





0

properties of the wavelet filters and the underlying spectral density function of the signal. Rescaling the dual-tree CWT coefficients may be achieved using an appropriate estimator of the variance (Box 2). From Fig. 4, independence is not guaranteed when , but may be assumed for scales with the correlation coefficients for an FD(0.4) process being , , and . The , exhibits substantial correlation first scale, with and at lag zero. between Testingthesquaredmagnitudeofcoefficientsfromthefirstlevel of thedual-tree CWTusing critical values from the distribution was found to be slightly liberal when compared to its exact distrifor an arbitrary bution. To determine the exact distribution of scale of the dual-tree CWT, one can rewrite (9) (10) as where are independent random variables with one degree of freedom each, and are the eigenvalues of the correlation matrix between the random variables [39]. The matrix is band diagonal and computed efficiently using the discrete Fourier transform or by applying (14)–(16). The distribution function of may be obtained using [40] and [41], where the characis numerically inverted. The upper teristic function of distribution is approximately 5.99. 95% critical value for the at level Plugging this value into the exact distribution of one, again from an FD process with , produces a quantile of 0.939, which is only slightly less than the nominal value. A critical value of 6.5 was found to produce the appropriate quantile of 0.95 in the exact distribution.

S¸ ENDUR et al.: MULTIPLE HYPOTHESIS MAPPING OF FUNCTIONAL MRI DATA

3419



Fig. 5. Simulated signals of variable radius r (top row) were embedded in Gaussian noise to give noisy signals of variable SNR (second row). Subsequent rows show representative examples of images recovered by the following multiple hypothesis testing algorithms in the wavelet domain: the Bonferroni correction, standard FDR control algorithm, Ruttimann’s algorithm, the enhanced FDR algorithm by Shen et al., and the BaybiShrink-EFDR algorithm. All methods were implemented using the orthogonal DWT and thresholded to control FDR or FWER at 5%.

III. RESULTS A. Simulated Data 1) Conventional and FDR Power Curves: Our first set of results is based on simulated data identical to those previously reported in [14]. As shown in Fig. 5, a circular signal of radius and intensity is embedded in a 64 64 matrix of i.i.d. Gaussian noise with unit variance; the signal-to-noise ratio (SNR) of the simulations varies as a function of . We have used and to explore the performance of the following multiple hypothesis testing algorithms in the context of realistically low SNR (all procedures utilize the orthogonal DWT): • the Bonferroni correction applied to all 2-D DWT coefficients; • the standard FDR control algorithm applied to all 2-D DWT coefficients; • Ruttimann’s two-stage testing method: Bonferroni correction applied to a subset of 2-D DWT coefficients detest at each scale and orientation; fined by a

Shen’s two-stage testing method: FDR correction applied to a subset of 2-D DWT coefficients defined by the generalized degrees of freedom; • BaybiShrink-EFDR: FDR correction applied to a subset of 2-D DWT coefficients defined by the BaybiShrink rule. wavelets as Each method was implemented using symmlet the basis for the DWT [42], and the FWER or FDR was controlled at the 5% level. Performance of the algorithms in detecting the signal was quantified in terms of both the conventional power and FDR power. Conventional power is defined as the number of times one or more null hypotheses are rejected in multiple realizations of noisy signals divided by the number . FDR power is defined as the expected of realizations of number of correctly rejected null hypotheses divided by the number of true alternative hypotheses in the image. Both power functions were estimated on the basis of 200 realizations for each value of and . As shown in Fig. 6, increasing SNR was predictably associated with increasing conventional and FDR power for signals of all sizes. The Bonferroni and classical FDR algorithms had very similar (relatively low) power to detect signals, compared with all two-stage algorithms. Shen’s algorithm was consistently more powerful than Ruttiman’s algorithm; the new method, incorporating Bayesian bivariate shrinkage, was overall the most sensitive method; it was incrementally more powerful than Shen’s algorithm, especially at low SNR. Representative images of the recovered signal, as a result of using each algorithm at various values of and , are shown in Fig. 5. A second set of results was based on a more realistic simulation of an fMRI signal. A geometrically irregular region of -statistics, corresponding to a region of significant activation by visual stimulation in a single subject fMRI experiment, was embedded in a field of i.i.d. Gaussian noise with different standard . FDR power curves were deviations estimated for all five multiple hypothesis testing algorithms, as shown in Fig. 7. Power to reject the null hypothesis predictably declined as a function of increasing . The rank order of methods was similar to that seen in the analysis of the simpler simulations: Bonferroni correction, Ruttimann’s algorithm, and the standard FDR algorithm were less powerful than Shen’s method and the BaybiShrink-EFDR algorithm, which had indistinguishable FDR power curves at low SNR. 2) Recovery of Simulated Activation in a 2-D Image Section: Our third set of results is based on a more realistic simulation of both the signal and noise structure. A simple periodic signal was located in geometrically irregular regions of differing size and embedded in a 2-D section of real fMRI noise (typically exhibiting correlation in time) obtained from an fMRI dataset of a single human subject lying “at rest” in the scanner. The strength of activation at each voxel was estimated by fitting a general linear model (GLM) at each voxel, using autoregressive-least squares to account for physiological autocorrelation of the fMRI noise [43]. The resulting maps of noisy -statistics, as shown in Fig. 8, were then tested for significance using the five methods previously considered, as well as the BaybiShrink-EFDR algorithm incorporating the dual-tree CWT as the basis for analysis instead of the orthogonal DWT. 3) Specificity of Signal Detection in a 2-D Image Section: One important distinction of hypothesis testing in the

3420

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 9, SEPTEMBER 2005

Fig. 6. Conventional and FDR power curves estimated by analysis of simple noisy signals with variable radius and SNR (see Fig. 5) using the Bonferroni correction, standard FDR control algorithm, the enhanced FDR algorithm by Shen et al., and the Baybishrink–EFDR algorithm. Two-stage testing algorithms generally outperform the Bonferroni and standard FDR methods; the Baybishrink–EFDR algorithm has incrementally greater power than Shen’s EFDR method at low SNR.

wavelet domain is that a single false positive test involving a coarse scale coefficient could lead to a spatially extensive area of false positive “activation” after reconstructing the signal in the spatial domain by the inverse transform. It is therefore important to evaluate alternative methods in terms of specificity

as well as sensitivity of signal detection. As illustrated in Fig. 8, all methods succeeded reasonably well in recovering the simulated signal. However, visual inspection of these results suggests that the BaybiShrink algorithm based on the CWT more completely and “cleanly” recovers the activated regions

S¸ ENDUR et al.: MULTIPLE HYPOTHESIS MAPPING OF FUNCTIONAL MRI DATA

Fig. 7. t-statistics estimated in a single subject fMRI experiment (top left) were embedded in Gaussian white noise (top right), and FDR power curves were estimated for multiple hypothesis testing algorithms over multiple realizations at different levels of noise variance (bottom). The BaybiShrink-EFDR algorithm and Shen’s EFDR method have comparable power (superior to other methods) over all levels of noise variance.

with minimal evidence for false positive activation in pure noise voxels. This demonstrates that signal reconstruction based on complex wavelets leads to fewer false positive artefacts than methods based on the orthogonal DWT. This impression is substantiated quantitatively by the estimated mean square error (MSE) over all truly nonactivated regions of the images (11) where is the th voxel value of the denoised image, and is the number of voxels outside the set of truly activated voxels . MSE values for two simulations, using different levels of noise variance, are summarized in Table I; larger values of MSE indicate larger values of noise variance . It is clear that in the bivariate Bayesian shrinkage algorithm based on the CWT has the overall best performance. Indeed, it yields maps that are comparable in specificity to the considerably less-sensitive Bonferroni and Ruttimann algorithms. 4) Computational Complexity: Computational cost is one of the major drawbacks of the EFDR algorithm. Basically, the reis estimated by duced number of hypotheses to be tested using an opminimizing a cost function with respect to timization algorithm; a simple search over all possible values can also be used. Since the calculation of the optimal of quadratic cost function is not feasible, the optimal estimator of the cost function is chosen among members of a family of cost functions that penalize an increase in the number of hypotheses tested while controlling the degree of penalization [14], [15]. For this purpose, a quantity called the generalized degrees of freedom (GDF) needs to be estimated by a Monte Carlo numerical integration, and a reasonable number of Monte Carlo

3421

in order to simulations is required for each possible value of compute the cost function accurately; the recommended number is 200 simulations [14]. As a result, the computational complexity depends on the optimization algorithm and the number of Monte Carlo simulations. For example, if we use a simple minimization values with Monte procedure searching all possible Carlo simulations, the EFDR algorithm requires standard FDR procedure computations. For the 2-D section of fMRI test statistics considered here (which comprises voxels), computing simulations at each voxel, and using a simple search algorithm to consider , the total time required from initial all possible values of wavelet transform to signal reconstruction in space by the inverse transform for the EFDR algorithm [14] is 5 613 s, for the BaybiShrink-EFDR with DWT, it is 0.12 s, and for the BaybiShrink-EFDR with CWT, it is 0.42 s. Note that these are crudely comparable processing times, which might be improved for Shen’s algorithm by better optimization strategies or more efficient coding; nonetheless, they reflect the computational . simplicity of the BaybiShrink approach to defining B. Experimental Data On the basis of these results from analysis of simulated data, we conclude that the BaybiShrink-EFDR algorithm, using the dual-tree CWT, is a promising method for multiple hypothesis testing in fMRI. In this section, we provide some illustrative examples of its application to two experimental datasets: a single subject study of visual and auditory stimulation and a factorially designed, multisubject study of antidepressant drug effects on brain function [44]. 1) Analysis of Single-Subject fMRI Experiment: A healthy, right-handed, male volunteer was studied using blocked periodic stimulation in both visual and auditory modalities. Visual stimulation consisted of 30-s epochs of 8-Hz photic flash alternating with 30 s epochs of darkness. Auditory stimulation consisted of 39-s epochs of hearing spoken words (common nouns) alternating with 39-s epochs of silence. There were five cycles of alternation between activation (A) and baseline (B) conditions in the visual modality and 3.8 cycles of AB alternation in the auditory modality. This experiment was designed to activate (at different frequencies) primary visual and auditory cortices. The gradient echo echoplanar imaging data were acquired using a 1.5 T system at the Maudsley Hospital, London, U.K., with ms, TR s, in plane resolution mm, slice TE mm, and number of excitations . thickness The power of the periodic response was estimated for each modality of stimulation at each voxel using a GLM fitted by autoregressive least squares [43], resulting in two sets of noisy maps of the -statistic representing strength of activation at each voxel. The t-statistic maps were estimated on 98 d.f. Sections of the noisy maps and the results of multiple hypothesis testing using the BaybiShrink algorithm applied to the dual-tree CWT are shown in Fig. 9. It is clear by visual inspection that the foci of significant activation resulting from this analysis are specifically located in areas of the occipital (visual) and temporal (auditory) cortex that are known to be areas of primary sensory cortex specialized for processing stimuli in these modalities. This example

3422

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 9, SEPTEMBER 2005

Fig. 8. Set of activated voxels (top left) was embedded in real fMRI noise (top right), and the signal was recovered by various multiple hypothesis testing algorithms, inlcuding the BaybiShrink-EFDR algorithm, implemented on the basis of the dual tree complex wavelet transform (bottom right). In this example, all methods recover the signal reasonably well, but the BaybiShrink-EFDR method using the CWT has visibly superior specificity to other two-stage testing algorithms; in fact, its specificity is comparable to the Bonferroni correction, as quantified in terms of MSE (see Table I). TABLE I MEAN SQUARE ERROR FOR REALISTICALLY SIMULATED t-MAPS OF BRAIN ACTIVATION AFTER MULTIPLE HYPOTHESIS TESTING IN THE WAVELET DOMAIN BY BONFERRONI CORRECTION, STANDARD FDR CONTROL, RUTTIMANN’S METHOD, SHEN’S METHOD OF ENHANCED FDR CONTROL, AND BAYBISHRINK-EFDR ALGORITHM BASED ON THE ORTHOGONAL DWT AND DUAL-TREE CWT

serves as proof of concept that our new methods of multiple hypothesis testing can recover the expected pattern of activation in analysis of relatively simple data where the true nature of the image is relatively well known. 2) Analysis of a Factorially Designed fMRI Experiment: Two groups of participants (19 patients with depression and 19 healthy comparison subjects) were each scanned twice: at baseline and eight weeks later. Immediately following the baseline scan, patients with depression received antidepressant treatment (fluoxetine hydrochloride 20 mg/day) for eight weeks; comparison subjects were not treated. In each scanning session, fMRI data were acquired while participants performed an event-related paradigm that entailed incidental recognition of human faces expressing variable intensities of sadness. The

main aim of the study was to identify treatment-related changes in brain function of the depressed patients; further experimental details are provided in [44]. The data were analyzed in two stages: First, the differential response to sad faces (compared to a low-level visual fixation condition) was estimated as a -statistic at each voxel of each individual image; second, a mixed-effects analysis of variance (ANOVA) model was fitted at each voxel, treating the 38 individually estimated -statistics as dependent variables, to estimate -statistics corresponding to the main effect of group (patients versus comparison subjects), the main effect of time (baseline versus week eight), and the group time interaction (indicative of an antidepressant treatment effect) [44]. These three -maps were previously tested for significance using

S¸ ENDUR et al.: MULTIPLE HYPOTHESIS MAPPING OF FUNCTIONAL MRI DATA

3423

Fig. 9. Experimental fMRI data: single-subject visual and auditory stimulation experiment. Noisy 2-D maps of t-statistics estimated as a measure of brain activation by blocked periodic visual and auditory stimulation (top row) were tested by the BaybiShrink-EFDR algorithm using the dual-tree CWT to recover foci of significant activation. As expected, these were located mainly in occipital cortex for the visual stimulus and in bilateral temporal cortex for the auditory stimulus.

a permutation test of suprathreshold voxel cluster “mass” [45]. Briefly, this method involves applying a preliminary statistic maps and setting to probability threshold to the , thereby creating a set of zero all voxels where suprathreshold voxel clusters that are contiguous in 3-D. The sum of suprathreshold voxel statistics in each cluster, i.e., its mass, is then tested against a permutation distribution using a cluster-wise probability threshold for significance such that the expected number of false positive clusters over the whole map is less than 1. As shown in Fig. 10, the factorial effect maps produced by this cluster-based technique correspond quite closely to the maps produced by BaybiShrink-EFDR on a complex wavelet basis. For example, both analyses identify a main effect of time in the left hippocampus and fusiform gyrus, and a group time effect in the left amygdala and ventral striatum. These effects are consistent not only between these analyses of the same data but also with prior and independent neuroimaging studies of repetition (time) effects [46] and antidepressant drug effects [47], [48] on brain function. However, the two sets of maps are not identical in every detail. Broadly speaking, the maps obtained by Baybishrink-EFDR show more precisely localized foci of differential activation, which often follow quite exactly the local distribution of gray matter in the cortex and subcortical nuclei, whereas the foci of activation generated by the clustering method are relatively extensive and occasionally show anatomically implausible features, such as the apparent connection across the cerebral midline between foci of activation in right and left ventral striatum in the group time effect mm. These differences in spatial resolution and lomap at cation of effects probably reflect the fact that the wavelet-based method is multiresolution, whereas the cluster-based test is biased toward detection of spatially more extensive features by the size of preliminary voxel threshold used to generate the . This example also clusters, i.e., in this example,

illustrates that the BaybiShrink-EFDR algorithm can be applied to 3-D maps of statistics, as well as 2-D maps of statistics. IV. CONCLUSIONS AND FUTURE WORK In this paper, we have set out a general framework for two-stage testing of multiple hypotheses in the wavelet domain, and we have comparatively evaluated three specific implementations of this approach. Our conclusion is that a Bayesian bivariate shrinkage operator provides a computationally efficient and powerful method to define a reduced set of hypotheses for formal testing by a false discovery rate control algorithm. Analysis of simulated data suggests that this approach is clearly preferable to the Bonferroni correction, the standard FDR algorithm, or Ruttimann’s algorithm, and of comparable power to the computationally more expensive method of Shen et al.. We have additionally defined and validated a new statistical test, based on the squared magnitude of complex coefficients estimated by the dual-tree complex wavelet transform, and shown that running the BaybiShrink-EFDR algorithm on this basis imparts superior specificity of activation mapping compared with results obtained by the same algorithm applied to coefficients of the orthogonal DWT. Finally, we have shown that these new wavelet-based methods are applicable to experimental fMRI data and provide results from a factorially designed pharmacological MRI experiment that are comparable with those previously obtained by permutation-based hypothesis testing in the spatial domain. We note that wavelet hypothesis testing algorithms have previously been compared rigorously to parametric hypothesis testing after monoresolution Gaussian smoothing (implemented in SPM software), and Bayesian wavelet methods were shown to have superior sensitivity [49]. In future work, we plan to extend this study in two key ways. First, we wish to evaluate critically the assumption of Gaussian i.i.d. noise that is common to this implementation of

3424

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 9, SEPTEMBER 2005

Fig. 10. Factorially designed pharmacological MRI experiment [44] was analyzed to localize main effects of group (patients with depression versus healthy volunteers), main effects of time (baseline versus a second scan after eight weeks), and a group time interaction (indicative of antidepressant drug effects on brain function). The results obtained by a permutation test based on suprathreshold voxel clusters in the spatial domain (clustering) can be directly compared to results obtained by BaybiShrink-EFDR using the dual tree CWT. For all maps, the right side of the image represents the left side of the brain; the crosshairs indicate the origin of the x and y coordinates, and the numeral indicates the z coordinate of each slice in the stereotactic system of Talairach.

2

BaybiShrink-EFDR, as well as the comparable algorithms by Ruttimann and Shen et al.. We consider that the true distribution of spatial noise in fMRI is unlikely to be Gaussian i.i.d., implying that the noise variance of the wavelet coefficients will not be identical at all scales and, thus, not generally well estimated by the MAD estimator based on the finest scale coef-

ficients. We intend to develop subband adaptive estimators of noise variance that will provide thresholds for BaybiShrink that do not depend on assuming Gaussian white noise in the image. Second, we wish to explore the extension of these methods to higher dimensional (3-D or 4-D) complex wavelet transforms of spatial maps and spatiotemporal datasets.

S¸ ENDUR et al.: MULTIPLE HYPOTHESIS MAPPING OF FUNCTIONAL MRI DATA

cov

3425

cov

(12)

cov

(13)

(14)

APPENDIX COVARIANCE SEQUENCES FOR THE DUAL-TREE COMPLEX WAVELET TRANSFORM Fig. 2 illustrates the iterated filterbank structure of the dualtree CWT and the set of filters used in each stage (the filters in each stage are different as opposed to the orthogonal DWT). Let and be two sets of wavelet filter coefficients from the dual-tree CWT corresponding to tree and tree , respectively (for example, , the high-frequency filter corresponding to , the high-frequency tree at scale 2, and is the length of the filter corresponding to tree at scale 3). filter corresponding to scale , which is same for each tree. Let and be two nonboundary wavelet coefficients. can Then, the covariance between them for a given signal be written as (12)–(14), shown at the top of the page, where is the ACVS of at lag [50]. This equation may be simplified when computing covariances within the same scale ), and letting , via ( cov (15) In addition, the cross-correlation sequence (CCS) may be obtained by normalizing the covariances computed above cov var (16) since Note that the denominator only uses var is the time-reversed verthe wavelet filter for tree sion of the wavelet filter for tree (also remember that var cov ). var REFERENCES [1] K. J. Worsley, “Detecting activation in fMRI data,” Statist. Methods Med. Res., vol. 12, pp. 401–418, 2003. [2] T. E. Nichols and A. P. Holmes, “Nonparametric permutation tests for functional neuroimaging: A primer with examples,” Human Brain Mapp., vol. 15, pp. 1–25, 2001. [3] R. J. Simes, “An improved Bonferroni procedure for multiple tests of significance,” BKA, vol. 73, pp. 751–754, 1986. [4] Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: A practical and powerful approach to multiple testing,” J. R. Statist. Soc. B, vol. 57, no. 1, pp. 289–300, 1995. [5] C. R. Genovese, N. A. Lazar, and T. Nichols, “Thresholding of statistical maps in functional neuroimaging using the false discovery rate,” NeuroImag., vol. 15, pp. 870–878, 2002.

[6] A. Laine, “Wavelets in temporal and spatial processing of biomedical images,” Annu. Rev. Biomed. Eng., vol. 2, pp. 511–550, 2000. [7] E. Bullmore, J. Fadili, V. Maxim, L. S¸ endur, B. Whitcher, J. Suckling, M. Brammer, and M. Breakspear, “Wavelets and functional magnetic resonance imaging of the human brain,” NeuroImag., vol. 23, no. 1, pp. 234–249, 2004. [8] S. Mallat, “A theory for multiresolution signal decomposition: The wavelet representation,” IEEE Trans. Pattern Anal. Machine Intell., vol. 11, no. 7, pp. 674–693, Jul. 1989. [9] A. H. Tewfik and M. Kim, “Correlation structure of the discrete wavelet coefficients of fractional Brownian motion,” IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. 904–909, Mar. 1992. [10] R. W. Dijkerman and R. R. Mazumdar, “On the correlation structure of the wavelet coefficients of fractional Brownian motion,” IEEE Trans. Inf. Theory, vol. 40, no. 5, pp. 1609–1612, Sep. 1994. [11] E. J. McCoy and A. T. Walden, “Wavelet analysis and synthesis of stationary long-memory processes,” J. Comput. Graphical Statist., vol. 5, no. 1, pp. 26–56, 1996. [12] Y. Fan, “On the approximate decorrelation property of the discrete wavelet transform for fractionally differenced processes,” IEEE Trans. Inf. Theory, vol. 49, no. 2, pp. 516–521, Feb. 2003. [13] U. E. Ruttimann, M. Unser, R. R. Rawlings, D. Rio, N. F. Ramsey, V. S. Mattay, D. W. Hommer, J. A. Frank, and D. R. Weinberger, “Statistical analysis of functional MRI data in the wavelet domain,” IEEE Trans. Med. Imag., vol. 17, no. 2, pp. 142–154, Apr. 1998. [14] X. Shen, H.-C. Huang, and N. Cressie, “Nonparametric hypothesis testing for a spatial signal,” J. Amer. Statist. Assoc., vol. 97, no. 460, pp. 1122–1140, 2002. [15] J. Ye, “On measuring and correcting the effects of data mining and model selection,” J. Amer. Statist. Assoc., vol. 93, pp. 120–131, 1998. [16] E. P. Simoncelli, “Modeling the joint statistics of images in the wavelet domain,” in Proc. SPIE Wavelet Applications Signal Image Processing VII, vol. 3813, M. A. Unser, A. Aldroubi, and A. F. Laine, Eds., 1999, pp. 181–195. [17] N. Kingsbury, “Image processing with complex wavelets,” Proc. R. Soc. London, ser. A, vol. 357, pp. 2543–2560, 1999. [18] E. J. Candes and D. L. Donoho, “Curvelets, multiresolution representation, and scaling laws,” in Proc. SPIE Wavelet Applications Signal Image Processing VIII, vol. 4119, A. Aldroubi, A. F. Laine, and M. A. Unser, Eds., 2000. [19] M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, “Wavelet-based statistical signal processing using hidden Markov models,” IEEE Trans. Signal Process., vol. 46, no. 4, pp. 886–902, Apr. 1998. [20] E. P. Simoncelli, “Bayesian denoising of visual images in the wavelet domain,” in Bayesian Inference in Wavelet-Based Models, P. Müller and B. Vidakovic, Eds. New York: Springer Verlag, 1999, vol. 141, Lecture Notes in Statistics, pp. 291–308. [21] N. Kingsbury, “Complex wavelets for shift invariant analysis and filtering of signals,” Appl. Comput. Harmonic Anal., vol. 10, no. 3, pp. 234–253, 2001. [22] I. W. Selesnick, “Hilbert transform pairs of wavelet bases,” IEEE Signal Process. Lett., vol. 8, no. 6, pp. 170–173, Jun. 2001. , “The design of approximate Hilbert transform pairs of wavelet [23] bases,” IEEE Trans. Signal Process., vol. 50, no. 5, pp. 1144–1152, May 2002. [24] N. G. Kingsbury, “A dual-tree complex wavelet transform with improved orthogonality and symmetry properties,” in Proc. IEEE Int. Conf. Image Process., 2000.

3426

[25] A. F. Abdelnour and I. W. Selesnick, “Design of 2-band orthogonal nearsymmetric CQF,” in Proc. Int. Conf. Acoust., Speech, Signal Process., 2001. [26] S. Cai, K. Li, and I. Selesnick. Matlab Programs for Real and Complex Wavelet Transforms for 1D, 2-D, and 3D Signals. [Online]. Available: http://taco.poly.edu/WaveletSoftware/ [27] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans. Inf. Theory, vol. 41, no. 3, pp. 613–627, May 1995. [28] D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, no. 3, pp. 425–455, 1994. , “Adapting to unknown smoothness by wavelet shrinkage,” J. [29] Amer. Statist. Assoc., vol. 90, pp. 1200–1224, 1995. [30] S. G. Chang, B. Yu, and M. Vetterli, “Spatially adaptive wavelet thresholding with context modeling for image denoising,” IEEE Trans. Image Process., vol. 9, no. 9, pp. 1522–1531, Sep. 2000. [31] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Adaptive Wiener denoising using a Gaussian scale mixture model in the wavelet domain,” in Proc. Int.Conf. Image Process., 2001. [32] Z. Cai, T. H. Cheng, C. Lu, and K. R. Subramanian, “Efficient waveletbased image denoising algorithm,” Electro. Lett., vol. 37, no. 11, pp. 683–685, 2001. [33] J.-C. Pesquet and D. Leporini, “Bayesian wavelet denoising: Besov priors and non-Gaussian noises,” Signal Process., vol. 81, pp. 55–66, 2001. [34] M. J. Wainwright and E. P. Simoncelli, “Scale mixtures of Gaussians and the statistics of natural images,” in Advances in Neural Information Processing Systems, S. A. Solla, T. K. Leen, and K.-R. Müller, Eds. Cambridge, MA: MIT Press, 2000, vol. 12, pp. 855–861. [35] L. S¸ endur and I. W. Selesnick, “Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency,” IEEE Trans. Signal Process., vol. 50, no. 11, pp. 2744–2756, Nov. 2002. [36] H. L. Van Trees, Detection, Estimation, and Modulation Theory. New York: Wiley, 1968. [37] G. W. Wornell, Signal Processing with Fractals: A Wavelet Based Approach. Englewood Cliffs, NJ: Prentice Hall, 1996. [38] J. Beran, Statistics for Long-Memory Processes. New York: Chapman & Hall, 1994, vol. 61, Monographs on Statistics and Applied Probability. [39] N. L. Johnson and S. Kotz, Continuous Multivariate Distributions. New York: Wiley, 1972. [40] J. P. Imhof, “Computing the distribution of a quadratic form in normal variables,” Biometrika, vol. 48, pp. 419–426, 1961. [41] R. W. Farebrother, “The distribution of a quadratic form in normal variables,” Applied Statist., vol. 39, no. 2, pp. 294–309, 1990. [42] I. Daubechies, Ten Lectures on Wavelets. Philadelphia, PA: SIAM, 1992, vol. 61, CBMS-NSF Regional Conference Series in Applied Mathematics. [43] E. T. Bullmore, M. J. Brammer, S. C. R. Williams, S. Rabe-Hesketh, N. Janot, A. S. David, J. D. C. Mellers, R. Howard, and P. Sham, “Statistical methods of estimation and inference for functional MR image analysis,” Magn. Resonance Med., vol. 35, pp. 261–277, 1996. [44] C. H. Y. Fu, S. C. R. Williams, A. J. Cleare, M. J. Brammer, N. D. Walsh, J. Kim, C. M. Andrew, E. M. Pich, P. M. Williams, L. J. Reed, M. T. Mitterschiffthaler, J. Suckling, and E. T. Bullmore, “Attenuation of the neural response to sad faces in major depression by antidepressant treatment: A prospective, event-related functional magnetic resonance imaging study,” Arch. Gen. Psych., vol. 61, pp. 877–888, 2004. [45] J. Suckling and E. T. Bullmore, “Permutation tests for factorially designed neuroimaging experiments,” Human Brain Mapp., vol. 22, pp. 193–205, 2004. [46] H. C. Breiter, N. L. Etcoff, P. J. Whalen, W. A. Kennedy, S. L. Rauch, R. L. Buckner, M. M. Strauss, S. E. Hyman, and B. R. Rosen, “Response and habituation of the human amygdala during visual processing of facial expression,” Neuron, vol. 17, pp. 875–887, 1996. [47] R. J. Davidson, W. Irwin, M. J. Anderle, and N. H. Kalin, “The neural substrates of affective processing in depressed patients treated with venlafaxine,” Amer. J. Psych., vol. 160, pp. 64–75, 2003. [48] Y. I. Sheline, D. M. Barch, J. M. Donnelly, J. M. Ollinger, A. Z. Snyder, and M. A. Mintun, “Increased amygdala response to masked emotional faces in depressed subjects resolves with antidepressant treatment: An fMRI study,” Biol. Psych., vol. 50, pp. 651–658, 2001. [49] J. Fadili and E. Bullmore, “A comparative evaluation of wavelet-based methods for hypothesis testing of brain activation maps,” NeuroImag., vol. 23, no. 3, pp. 1112–1128, 2004. [50] D. B. Percival and A. T. Walden, Wavelet Methods for Time Series Analysis. Cambridge, U.K.: Cambridge Univ. Press, 2000.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 9, SEPTEMBER 2005

Levent S¸ endur (M’04) received the B.S. and M.S. degrees in electrical engineering in 1996 and 1999, respectively, from Middle East Technical University, Ankara, Turkey. He received the Ph.D. degree from the Department of Electrical Engineering, Polytechnic University, Brooklyn, NY, in 2003. Since September 2003, he has been a postdoctoral research associate at the Brain Mapping Unit, Department of Psychiatry, University of Cambridge, Cambridge, U.K. His current research interests are in the area of digital signal processing, wavelet-based signal processing, and the statistical analysis of functional magnetic resonance images of the brain. Dr. S¸ endur received a full research fellowship as a Ph.D. student, and his Ph.D. dissertation received the Alexander Hessel Award for the outstanding Ph.D. dissertation in Electrical Engineering, Polytechnic University, 2003.

Voichit¸a Maxim was born in Roumania and received the Ph.D. degree in signal processing from the Universite de Grenoble, Grenoble, France. She was a post-doctoral researcher with Cambridge University, Cambridge, U.K., and published on methods for estimating the Hurst exponent in fMRI time series. She is currently with the Departement de Mathematiques, INSA de Toulose, Toulose, France.

Brandon Whitcher received the B.S. degree in applied mathematics from Carnegie Mellon University, Pittsburgh, PA, in 1993 and the M.S. and Ph.D. degrees in statistics from the University of Washington, Seattle, in 1995 and 1998, respectively He was a post-doctoral researcher at EURANDOM, Eindhoven, the Netherlands, a European research institute for the study of stochastic phenomena, from 1998 to 2000, and a visiting scientist with the Geophysical Statistics Project, National Center for Atmospheric Research (NCAR), Boulder, CO, from 2000 to 2002. Since September 2002, he has been a Clinical Imaging Scientist at GlaxoSmithKline, Greenford, U.K. His research interests include wavelet methodology, stationary and nonstationary time series analysis, long-memory processes, diffusion-tensor and dynamic contrast-enhanced magnetic resonance imaging, computational statistics, and applications in the physical sciences, finance, economics, and medicine. He currently serves as an Associate Editor of the International Journal of Wavelets, Multiresolution and Information Processing. Dr. Whitcher is a member of the American Statistical Association, the Royal Statistical Society, the Institute of Mathematical Statistics, and the International Society for Magnetic Resonance in Medicine.

Ed Bullmore received the B.A., MBBS, MRCP, MRCPsych., and Ph.D. degrees in medicine from the University of Oxford, Oxford, U.K., and the University of London, London, U.K. He completed specialist training in psychiatry at the Maudsley Hospital and Institute of Psychiatry, London. From 1993 to 1999, he was supported by research fellowships awarded by the Wellcome Trust and received the Ph.D. degree from the University of London for his thesis on statistical analysis of functional magnetic resonance images of the brain. Since 1999, he has been a Professor of psychiatry and Director of the Brain Mapping Unit in the Department of Psychiatry, University of Cambridge, Cambridge, U.K. Since 2005, he has been Deputy Director of the MRC/Wellcome Trust Behavioural and Clinical Neurosciences Institute, Cambridge. He has published more than 150 peer-reviewed articles on various aspects of MRI, including several papers on issues in fMRI time series modeling, nonparametric inference, and multivariate analysis of brain functional connectivity. A consistent theme in his recent methodological work has been the application of wavelets to diverse data analytic problems in MRI.