828
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 2, NO. 6, DECEMBER 2008
On the Performance of Autocorrelation Estimation Algorithms for fMRI Analysis Brian Lenoski, Member, IEEE, Leslie C. Baxter, Lina J. Karam, Senior Member, IEEE, José Maisog, and Josef Debbins
Abstract—Pre-whitening of fMRI time-series is commonly performed to address temporal autocorrelations. The pre-whitening procedure requires knowledge of the spatially dependent autocorrelations, which in turn must be estimated from the observed data. The accuracy of the autocorrelation estimation algorithm is important because biased autocorrelation estimates result in biased test statistics, thereby increasing the expected false-positive and/or false-negative rates. Thus, a methodology for testing the accuracy of autocorrelation estimates and for assessing the performance of today’s state-of-the-art autocorrelation estimation algorithms is needed. To address these problems, we propose an evaluation framework that tests for significant autocorrelation bias in the model residuals of a general linear model analysis. We apply the proposed testing framework to 18 pre-surgical fMRI mapping datasets from ten patients and compare the performance of popular fMRI autocorrelation estimation algorithms. We also identify five consistent spectral patterns representative of the encountered autocorrelation structures and show that they are well described by a second-order/two-pole model. We subsequently show that a nonregularized, second-order autoregressive model, AR(2), is sufficient for capturing the range of temporal autocorrelations found in the considered fMRI datasets. Finally, we explore the bias versus predictability tradeoff associated with regularization of the autocorrelation coefficients. We find that the increased bias from regularization outweighs any gains in predictability. Based on the obtained results, we expect that a second-order, nonregularized AR algorithm will provide the best performance in terms of producing white residuals and achieving the best possible tradeoff between maximizing predictability and minimizing bias for most fMRI datasets.
Index Terms—Autocorrelations, bias, fMRI, pre-whitening, regularization.
Manuscript received March 15, 2008; revised October 02, 2008. Current version published January 23, 2009. This work was supported in part by the Barrow Neurological Foundation. The Guest Editor coordinating the review of this manuscript and approving it for publication was Martin McKeown. B. Lenoski and L. J. Karam are with the Department of Electrical Engineering, Arizona State University, Tempe, AZ 85287 USA and also with Medical Numerics, Inc., Germantown, MD 20876 USA (e-mail: brian.lenoski@asu. edu). L. J. Karam is with the Department of Electrical Engineering, Arizona State University, Tempe, AZ 85287 USA (e-mail:
[email protected]). L. C. Baxter is with the Neuropsychology Neuroimaging Laboratory, Barrow Neurological Institute, Phoenix, AZ 85013 USA (e-mail: leslie.baxter@chw. edu). J. Maisog is with Medical Numerics, Inc., Germantown, MD 20876 USA (e-mail:
[email protected]). J. Debbins is with The Keller Center for Imaging Innovation, Barrow Neurological Institute, Phoenix, AZ 85013 USA and also with the Department of Electrical Engineering, Arizona State University, Tempe, AZ 85287 USA (e-mail:
[email protected]). Digital Object Identifier 10.1109/JSTSP.2008.2007819
I. INTRODUCTION HE general linear model (GLM) is a flexible and widely used tool for analyzing fMRI datasets [1]–[3]. One component of the GLM involves specifying the statistics of the additive noise. A common assumption is that the noise is temporally uncorrelated. Violation of this assumption biases the test statistic used to test for an effect. Functional MRI time-series are known to contain temporally correlated noise from both physical and physiological processes [4]–[6]. The existence of colored noise has been reported in the fMRI literature and numerous algorithms have been proposed to account for this nonwhite structure when solving the GLM equations [5], [7]–[16]. Autocorrelations in fMRI time-series are thought to originate from a variety of sources. Low frequency correlations, e.g., scanner drift, are often attributed to equipment and other physical sources and have been observed in patients, cadavers, and phantoms [6]. Wide-band correlations are thought to result from the undersampling of higher frequency physiological processes such as heavily aliased cardiac and respiratory cycles [17]. Additional autocorrelation sources likely exist but are not yet completely understood [18]. Explicit modeling of autocorrelations is difficult because they are spatially dependent and variable [19]. In [3], the autocorrelation problem was mitigated by adjusting the effective residual degrees of freedom to reflect the overestimation of test statistics resulting from positive autocorrelations. In [5], the autocorrelation bias was handled by bandpass filtering the observed time-series thereby imposing a known autocorrelation structure which was assumed to “swamp” the unknown correlations. A pre-whitening solution to the autocorrelation problem was proposed by Bullmore et al. [13]. Prewhitening, compared with alternative algorithms, has the distinct advantage of returning the best linear unbiased estimates (BLUEs) of the GLM regression coefficients and thus is the preferred method for most GLM-based fMRI processing applications. The pre-whitening procedure requires knowledge of the unknown noise autocorrelations which must be estimated from the observed fMRI data. Various parametric autocorrelation estimation models have been proposed in the fMRI litera, model [9], ture, including: a th-order autoregressive, a first-order autoregressive, AR(1), model [13], a first-order autoregressive moving-average, ARMA(1,1), model [14], a confrequency dotrast ARMA model of variable order [15], a main model [20], and an AR(1) plus white noise model [17]. A nonparametric solution is provided in [7]. The choice of autocorrelation estimation algorithm is important because pre-whitening is known to be sensitive to the accuracy of the autocorrelation estimates [5]. Inaccuracies result in
T
1932-4553/$25.00 © 2008 IEEE Authorized licensed use limited to: Arizona State University. Downloaded on July 4, 2009 at 07:43 from IEEE Xplore. Restrictions apply.
LENOSKI et al.: ON THE PERFORMANCE OF AUTOCORRELATION ESTIMATION ALGORITHMS FOR fMRI ANALYSIS
biased test statistics. These biased test statistics can in turn result in false positives and/or false negatives in terms of detected activations. A methodology for testing the accuracy of autocorrelation estimates and for assessing the performance of today’s state-of-the-art autocorrelation estimation algorithms is therefore needed. While several studies to date have considered these problems, there are three limitations with previous performance comparisons of fMRI autocorrelation estimation algorithms: 1) the tests were applied to either simulated or null datasets rather than the activation datasets obtained in practice; 2) only a subset of the commonly used autocorrelation algorithms were tested; and/or 3) the results did not provide insight as to why one algorithm performed better than another. To address these limitations, we propose an extension of the testing framework described in [6], [22]. The proposed testing framework is based on the Durbin-Watson (DW) and cumulative periodogram (CP) tests [22]. We apply the proposed testing framework to 18 pre-surgical fMRI mapping activation datasets and compare the performance of several popular fMRI autocorrelation algorithms, including: the voxel-wise nonparametric algorithm in [7], the global AR(1) algorithm in [8], and the voxel-wise AR(1) algorithm in [9]. Variants of these autocorrelation estimation algorithms are implemented in FSL [23], SPM2/5 [8], and fMRIStat [9], respectively. In addition, a nonregularized AR(2) autocorrelation estimation algorithm is tested and is shown to exhibit superior performance in terms of producing white residuals. More generally, we show that nonregularized autocorrelation estimation algorithms achieve a superior performance as compared to their regularized counterparts, motivating a need to revisit the issue of regularization of the autocorrelation coefficients. The remainder of this paper is organized as follows. In Section II, we describe the general fMRI signal model. In Section III, we review several autocorrelation estimation algorithms employed by popular fMRI analysis packages. Section IV describes the fMRI experiments, data processing, and evaluation framework used to compare the autocorrelation estimation algorithms. In Section V, we provide results followed by a discussion of the results in Section VI.
829
Fig. 1. fMRI signal model.
half-cosines [25], Gaussians [26], and lagged gamma functions [26]. The low-frequency drifts, , are modeled as a linear combination of weighted discrete cosine transform (DCT) basis functions [8], polynomials [9], or splines [27]. Drifts can alternatively be pre-filtered from the statistical analysis using the canonical correlation technique in [28], the nonlinear running lines smoother technique in [10], or residual DCT filtering , is modeled as a stationary, multivariate [29]. The noise, Gaussian process with variance, , and autocorrelation matrix, . The observed fMRI signal can be expressed as follows: .. .
.. .
is a matrix whose columns are the signal vectors , is a matrix whose columns are the sampled drift functions, is a vector of random noise, and and are the corresponding unknown weights. Expressing (1) in the general linear model form yields where
(2) where is the observation or design matrix, and is the vector of regression coefficients. From the Gauss-Markov theorem [30], the best linear unbiased estimates (BLUEs) of the GLM regression coefficients are equal to the linear least-squares estimates of the transformed model (3) where the whitening matrix
satisfies
II. FMRI SIGNAL MODEL Fig. 1 depicts a general fMRI signal model. The discrete-time fMRI signal, , at a particular location is modeled as a linear admixture of signal, , low-frequency drifts, , and noise, . The signal, , is generated by passing the known input functions, , through a linear time-invariant (LTI) filter characterized by the hemodynamic (impulse) response function (HRF). The input functions are typically boxcar/square-wave functions (possibly nonperiodic) for block experimental designs and variable-length impulse trains for event-related designs. The internal structure of the HRF filter is shown in Fig. 2. Each input function, , is individually convolved with each basis function, , and then downsampled to match the sampling period of the fMRI signal resulting in the signals . Each signal is then multiplied by a scalar coefficient, . Common basis functions used for fMRI analysis include the canonical hemodynamic response function and its time and dispersion derivatives [24],
(1)
(4) and is the positive-definite autocorrelation matrix. Given , the whitening matrix can be found using standard matrix decomposition techniques, e.g., Cholesky decomposition. Given the contrast vector, , which tests for the desired effect [1], the whitened design matrix, whitened observation vector, BLUEs, model residuals, unbiased variance estimate, and t-test statistic are given by [31]
Authorized licensed use limited to: Arizona State University. Downloaded on July 4, 2009 at 07:43 from IEEE Xplore. Restrictions apply.
(5)
830
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 2, NO. 6, DECEMBER 2008
TABLE I AUTOCORRELATION ESTIMATION ALGORITHMS AND FMRI SOFTWARE
[8]. The weighted sample covariance of the voxels identified as most likely to contain activation are first pooled. The pooled covariance is then used to find the restricted maximum likelihood (ReML) estimates [34], [35] of the two parameters of the linearized AR(1) model. The fMRIStat [9] software employs a autocorrelation algorithm [9], voxel-wise, regularized [36]. Sample autocorrelation coefficients are computed from the OLS residuals, bias corrected, and then regularized. The are found from autoregressive coefficients the Yule-Walker equations. The whitening matrix, ignoring end effects, can be constructed directly from the autoregressive coefficients as follows:
.. .
..
.
..
.
..
.
.
..
.
..
.
..
.
..
.
..
.
..
.. ..
.. .
Fig. 2. Structure of the HRF filter.
If is assumed to be white-noise, both and reduce to the identity matrix and (5) returns the so-called ordinary least-squares (OLS) estimates. In practice, is unknown, spatially dependent, and varies considerably over both subjects and voxels. An estimate of the autocorrelation matrix is consequently substituted into (4) for the true but unknown autocorrelations. In general, the particular combination of HRF basis functions, drift functions, and the autocorrelation estimation algorithm employed will impact the final value of the test in (5). Here we restrict our focus to bias in the test statistic statistic resulting from inaccurate or incorrect estimation of the autocorrelations.
III. EXISTING AUTOCORRELATION ESTIMATION ALGORITHMS Table I lists the autocorrelation estimation algorithms employed by four popular fMRI software packages. The AFNI software [32] assumes uncorrelated noise and thus no autocorrelation correction is performed; the resulting regression analysis therefore reduces to OLS. The FSL software [23] utilizes a regularized, nonparametric autocorrelation estimation algorithm [7]. Unbiased sample autocorrelation coefficients are computed from the OLS residuals, regularized, then windowed with a Hann window [33] to produce the final autocorrelation estimates. The SPM2/5 [8] software employs a global, linearized first-order autoregressive autocorrelation algorithm
..
(6)
. .
..
.
.
..
.
..
.
IV. METHODS The fMRI experiments, acquisition, data processing, and testing framework used to assess the relative performance of autocorrelation estimation algorithms are described below. A. Experiments Data from 3 types of tasks were obtained from ten patients. Tasks were all block paradigms targeting visual, language comprehension, or motor activation. The number of acquired volumes for the vision, language, and motor tasks were 100, 70, and 90, respectively. During the vision task, patients looked at either pictures or a cross hair in alternating 30-s blocks for a total of five cycles. For the language comprehension task, patients read short paragraphs for 24 s followed by 18 s of a baseline task for a total of five cycles. On the motor task, patients tapped their right index finger, left index finger, or remained at rest for a period of 18 s with a total of five occurrences for each condition. A total of 18 datasets were acquired: five patients performed both the vision and motor tasks, three patients performed both the vision and language comprehension tasks, one patient performed only the motor task, and one patient performed only the language comprehension task.
Authorized licensed use limited to: Arizona State University. Downloaded on July 4, 2009 at 07:43 from IEEE Xplore. Restrictions apply.
LENOSKI et al.: ON THE PERFORMANCE OF AUTOCORRELATION ESTIMATION ALGORITHMS FOR fMRI ANALYSIS
831
Fig. 4. Results for OLS residuals. (a) DW statistic; (b) DW p-value; (c) RW% for each dataset.
the visual, language, and motor tasks respectively. A column of ones was included to model the mean. Statistical processing included an initial OLS fit, yielding the OLS residual time-series. The autocorrelation matrices of the OLS residual time-series were then estimated using six algorithms, including: regularized and nonregularized versions of the nonparametric algorithm in [7] with the window parameter, , set to 15, regularized and nonregularized versions of the AR(1) algorithm in [9], and the global AR(1) algorithm in [8]. In addition, a nonregularized AR(2) algorithm was implemented following [9]. For the regularized algorithms, the autocorrelation coefficients were spatially smoothed using an isotropic Gaussian filter with a full . The whitening width at matrix for each of the six autocorrelation algorithms was found [using (4) or (6)] and the pre-whitened residuals for each algorithm were computed using (5). D. Evaluation Framework Fig. 3. Spectral patterns identified in OLS residuals. (a) Positively correlated, 1-pole spectrum; (b) negatively correlated, 1-pole spectrum; (c) negatively correlated, 2-pole spectrum; (d) positively correlated, 2-pole spectrum; (e) approximately white spectrum.
B. Imaging Parameters Scans were acquired at the Barrow Neurological Institute on a GE Excite 3-T MRI scanner. Functional scans were acquired using an eight-channel head coil and 2-D gradient-echo, echo planar imaging sequence with the following parameters: , , , , , , , interleaved acquisition, no gap. C. Data Analysis Preprocessing of the functional datasets included rigid-body motion correction using the SPM5 software [8]. The columns of the design matrix were constructed using filter inputs modeled by boxcar functions corresponding to the experimental timings (cf. Fig. 2). The canonical hemodynamic response function [24] and its temporal derivative were chosen for the HRF basis functions. The temporal derivative of the canonical HRF was included to account for differences in both slice acquisition times and peak response delay. Drifts were modeled in the design matrix using the SPM5 [8] discrete cosine transform basis functions with cut-off parameter set to 120, 84, and 108 s, for
Evaluating the accuracy of autocorrelation estimation algorithms on activation fMRI datasets is difficult because the ground truth is not known. Luo and Nichols [22] recently described the application of two tests for colored structure in the residuals of a GLM analysis: 1) the Durbin-Watson (DW) test [37], [38] for an assumption of independence against an alternative of first-order autocorrelations and 2) the cumulative periodogram (CP) test for an assumption of independence against an alternative of general order autocorrelations. These tests are equivalent to testing the GLM residuals for an assumption of white noise against an alternative of colored noise. If the autocorrelation estimate is accurate for a particular voxel, then the model residuals will be approximately white. If the autocorrelation estimate is significantly biased, then the model residuals will be colored. The DW and CP tests thus provide a mechanism for measuring the relative performance of fMRI autocorrelation estimation algorithms. We introduce and define the relative whitening performance metric (RW%) for a particular autocorrelation estimation algorithm as follows. 1) Apply the DW and CP tests separately to each GLM residual time-series. 2) If the null hypothesis of white noise is accepted for both tests, then classify the voxel’s spectrum as white. 3) If the null hypothesis is rejected for either test, then conclude that a statistically significant autocorrelation structure is present and classify the voxel’s spectrum as colored.
Authorized licensed use limited to: Arizona State University. Downloaded on July 4, 2009 at 07:43 from IEEE Xplore. Restrictions apply.
832
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 2, NO. 6, DECEMBER 2008
Fig. 5. DW statistic histogram (left) and DW p-value histogram (right). (a) nonregularized AR(2); (b) global AR(1); (c) regularized NP; (d) nonregularized NP; (e) regularized AR(1); (f) nonregularized AR(1).
4) Compute the relative whiteness performance metric of the autocorrelation estimation algorithm as follows:
V. RESULTS A. Spectral Identification of Autocorrelations
(7) In addition to the RW% performance metric, we find it useful to analyze histograms of the DW test statistic and, in particular, the DW probability (p-)values accumulated across voxels. The DW test statistic is defined as
Prior to the application of our proposed testing framework, we wanted to qualitatively explore the autocorrelation structures, equivalently spectral patterns, of the residual time-series for which we desire to find autocorrelation estimates. For this purpose, we developed an interactive Matlab GUI allowing us to examine voxel-wise power spectral density (PSD) plots of the OLS residual time-series from all 18 datasets in the study. The OLS residual time-series provide a good first-order approximation of fMRI noise [7] and are given by
(8) (9) where are the GLM model residuals (given in (5)) and is the lag-1 normalized autocorrelation coefficient. The range of is zero to four. Values near two indicate no evidence of autocorrelations, values less than two indicate evidence of positive autocorrelations, and values greater than two indicate evidence of negative autocorrelations [37]. The computed DW p-value properly accounts for dependencies in the residuals with respect to the column space of the design matrix. DW p-values near zero indicate a high probability of positive autocorrelations, while DW p-values near one indicate a high probability of negative autocorrelations. The reader is referred to [22] for further details on the DW and CP tests in the context of fMRI analysis.
where the OLS regression estimates are given by (10) Across all datasets, we identified five consistent spectral patterns (Fig. 3) including: positively and negatively correlated spectra well described by one-pole models, positively and negatively correlated spectra well described by two-pole models, and approximately white spectra. The five spectral patterns were identified as follows: 1) load the 4-D OLS residual time-series into the developed Matlab GUI; 2) explore the spectra for each slice of the dataset and note consistent spectral patterns (approximately 20–30 s per slice); 3) repeat Steps 1 and 2 for all datasets
Authorized licensed use limited to: Arizona State University. Downloaded on July 4, 2009 at 07:43 from IEEE Xplore. Restrictions apply.
LENOSKI et al.: ON THE PERFORMANCE OF AUTOCORRELATION ESTIMATION ALGORITHMS FOR fMRI ANALYSIS
833
Fig. 6. (a), (b) Misclassification bias and (c) underestimation bias in the OLS spectral estimates (top). The pre-whitened residual spectra are subsequently nonwhite (bottom). (a) Reg. AR(1); (b) reg. AR(1); (c) reg. NP.
Fig. 7. OLS residual spectrum (solid, top), estimated residual spectrum (dashed, top) and pre-whitened residual spectrum (bottom) for positively correlated, single-pole spectrum. All algorithms successfully removed autocorrelations. (a) AR(2); (b) gl. AR(1); (c) reg. NP; (d) NP; (e) reg. AR(1); (f) AR(1).
in study. In general, we did not observe the periodic patterns associated with spectra requiring three or more poles to describe their behavior. Based on these observations, we hypothesize that an AR(2) autocorrelation model is sufficient for capturing the range of temporal autocorrelation structures in the considered fMRI time-series. B. Analysis of OLS Residuals We applied the DW and CP tests [22] to the OLS residual time-series from all 18 datasets in the study and computed histograms of both the DW statistic and DW p-value accumulated across all voxels [Fig. 4(a) and 4(b)]. The mean of the DW
statistic histogram is shifted below 2.0, indicating a strong presence of positive autocorrelations in the OLS residuals. The DW p-value histogram shows that 69% of the OLS residual timeseries have significant positive autocorrelations (p-values less than 0.05) and 1.9% have significant negative autocorrelations (p-values greater than 0.95). The null hypothesis of white-noise is accepted in only 29.1% of the OLS residual time-series with respect to the DW test. Overall, our results agree with previous findings that most OLS residuals have significant positive autocorrelations (nearly 7 in 10 for our analysis) [7]–[9], [21]. Finally, we computed the OLS RW% performance metric for each of the 18 datasets [Fig. 4(c)]. The RW% was highly variable and to a maximum of ranged from a minimum of
Authorized licensed use limited to: Arizona State University. Downloaded on July 4, 2009 at 07:43 from IEEE Xplore. Restrictions apply.
834
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 2, NO. 6, DECEMBER 2008
Fig. 8. OLS residual spectrum (solid, top), estimated residual spectrum (dashed, top) and pre-whitened residual spectrum (bottom) for negatively correlated, single-pole spectrum. Only the AR(2), NP, and AR(1) algorithms successfully removed autocorrelations. (a) AR(2); (b) gl. AR(1); (c) reg. NP; (d) NP; (e) reg. AR(1); (f) AR(1).
Fig. 9. OLS residual spectrum (solid, top), estimated residual spectrum (dashed, top) and pre-whitened residual spectrum (bottom) for two-pole, parabolic spectrum. Only the AR(2) and NP algorithms successfully removed autocorrelations. (a) AR(2); (b) gl. AR(1); (c) reg. NP; (d) NP; (e) reg. AR(1); (f) AR(1).
was
. The average performance across all OLS voxels (Fig. 10, category “A”).
C. Evaluation of Autocorrelation Estimation Algorithms A total of six autocorrelation estimation algorithms were tested including regularized and nonregularized implementations of the nonparametric (NP) algorithm in [7], regularized and nonregularized implementations of the AR(1) algorithm in [9], the global AR(1) algorithm in [8], and a nonregularized AR(2) algorithm based on [9]. Global AR(1) algorithm: The global AR(1) algorithm had the of the tested autocorrelalowest performance tion estimation algorithms (Fig. 10, category “A”). The mean of the DW statistic histogram is shifted above 2.0, indicating a global negative autocorrelation bias [Fig. 5(b)]. Roughly 25% of the global AR(1) model residuals have significant negative correlations—nearly a 12-fold increase compared with the OLS residuals. The increase in negative correlations is explained by the algorithm’s spatial inflexibility. A positive autocorrelation model is fit at every voxel regardless of the true underlying spectrum. Consequently, both white spectra and negatively correlated spectra [Fig. 8(b), top] are incorrectly fit with a positive autocorrelation model. Negative autocorrelations are subsequently introduced in the pre-whitened residuals [Fig. 8(b), bottom].
The global AR(1) algorithm did successfully model many positively correlated, one-pole spectra [Fig. 7(b)]. The number of significant positive autocorrelations was reduced by nearly 76% compared with OLS. In [8], it is argued that the global AR(1) autocorrelation estimation algorithm is most likely to be accurate for activated/ interesting voxels. To test this assumption, we computed the RW% for each autocorrelation algorithm using only the subset of voxels pooled in the global AR(1) estimation procedure. The performance of the global AR(1) algorithm improved by nearly and slightly outperformed the regular7% and regularized AR(1) algorithms ized NP over this subset of voxels (Fig. 10, category “F”). Nevertheless, 40% of the pre-whitened residuals within the subset still contained significant autocorrelations. Regularized NP and AR(1) Algorithms: The regularized NP and AR(1) algorithms provided superior performance compared with the global AR(1) algorithm. The percentage for the former of white residuals increased to for the latter (Fig. 10, category “A”). The and DW test results show that significant positive correlations were greatly reduced and significant negative correlations increased slightly relative to OLS [Fig. 5(c) and 5(e)]. Both algorithms successfully modeled many positively correlated
Authorized licensed use limited to: Arizona State University. Downloaded on July 4, 2009 at 07:43 from IEEE Xplore. Restrictions apply.
LENOSKI et al.: ON THE PERFORMANCE OF AUTOCORRELATION ESTIMATION ALGORITHMS FOR fMRI ANALYSIS
835
Fig. 10. Relative whitening performance metric (RW%) accumulated across voxels. (A) = all datasets, (L) = language datasets, (V) = vision datasets, (M) = motor datasets, (F) = subset of voxels defined in global AR(1) algorithm.
Fig. 11. Bias versus predictability tradeoff and regularization of autocorrelation coefficients. (a) Percentage of biased residuals (100-RW%) versus FWHM of regularization kernel. (b) Percent gain in predictability (relative to OLS) versus FWHM of regularization kernel.
spectra [Fig. 7(c) and 7(e)] causing the decrease in positive autocorrelations. Increased negative correlations and remaining positive correlations are attributable to the regularization operation which causes systematic misclassification and/or underestimation bias in the OLS spectral estimates (Fig. 6). Misclassification bias prevents the regularized NP and AR(1) algorithms from successfully modeling many negatively correlated and approximately white spectra [Fig. 6(a) and 6(b), top]. The bias is a direct consequence of the regularization operation which, because of the predominance of positive autocorrelations, tends to down-weight the spectral estimates at high frequencies. Both white and negatively correlated spectra are then incorrectly fit with positive autocorrelation models. Negative autocorrelations are consequently introduced in the pre-whitened residuals [Fig. 6(a) and 6(b), bottom]. Underestimation bias occurs when residual time-series with positively correlated spectra are bordered by residuals with weakly correlated or white spectra. The regularization operation causes the magnitudes of the autocorrelations to be underestimated [Fig. 6(c), top]. Significant positive correlations may then still remain after performing the autocorrelation correction [Fig. 6(c), bottom]. Nonregularized NP, AR(1), and AR(2) Algorithms: The nonregularized NP, AR(1), and AR(2) algorithms provided the best performance of the tested algorithms. The percentage of , , and white residuals increased to , respectively (Fig. 10, category “A”). The DW
test results for the AR(1) algorithm [Fig. 5(f)] show that positive autocorrelations were not reduced to nominal levels resulting in the inferior performance compared with the NP and AR(2) algorithms. As expected, this finding is explained by the AR(1) algorithm’s inability to sufficiently model two-pole spectra [Fig. 9(f)]. All three nonregularized algorithms successfully modeled both positively and negatively correlated single-pole spectra (Fig. 7 and Fig. 8). Additionally, both the NP and AR(2) algorithms successfully modeled two-pole spectra (Fig. 9). The DW p-value histogram for the AR(2) algorithm [Fig. 5(a)] shows that nearly all of the residuals have been successfully whitened. The mean of the DW statistic histogram is slightly greater than two, consistent with Luo et al.’s [22] reasoning that a slight negative correlation is expected in appropriately pre-whitened residuals because of their orthogonality requirement with respect to the column space of the design matrix. The DW and RW% results for the AR(2) algorithm support our hypothesis that a nonregularized AR(2) autocorrelation model is sufficient for capturing the range of temporal autocorrelation structures found in the considered fMRI time-series. For the NP algorithm, the mean of the DW p-value histogram [Fig. 5(d)] is shifted below 0.5 indicating a presence of weak positive autocorrelations. An explanation for this finding is the use of biased autocorrelation coefficients in the NP algorithm [21]. The bias reduction step in [9] effectively removes this bias source from both the AR(2) and AR(1) DW p-value histograms [Fig. 5(a) and 5(f)].
Authorized licensed use limited to: Arizona State University. Downloaded on July 4, 2009 at 07:43 from IEEE Xplore. Restrictions apply.
836
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 2, NO. 6, DECEMBER 2008
VI. DISCUSSION Pre-whitening is known to be sensitive to the accuracy of the autocorrelation estimates [5]. Inaccurate estimates result in biased test statistics. On a global scale, biased test statistics increase the expected false positive rate (FPR) and/or false negative rate (FNR) relative to unbiased test statistics. For positive autocorrelation bias, test statistics are overestimated and the expected FPR increases. Locally, the degree of overestimation is spatially dependent so that, even for all true activations, certain voxels and/or clusters could be assigned an artifactually higher significance than neighbors having possibly greater neurological significance. Conversely, for negative autocorrelation bias, test statistics are underestimated and the expected FNR increases. As with the degree of underestimation, the degree is spatially dependent so that, even for all true activations, certain voxels and/or clusters could be assigned an artifactually lower significance than neighbors having possibly less neurological significance. Although not usually emphasized, minimizing false negative findings is a critical factor in pre-surgical mapping applications because of the risk of negative outcomes from resection of viable tissue. Comparing autocorrelation estimation algorithms should therefore include their impact in terms of both false-positive and false-negative rates. The majority of the OLS residuals in our analysis had significant autocorrelations. The autocorrelation bias was primarily positive, resulting in systematically inflated test statistics. The actual FPR is thus almost certainly higher than expected for a given detection threshold. We also observed nontrivial levels of negative autocorrelations in the OLS residuals, which should not be overlooked when considering certain fMRI applications such as pre-surgical mapping. The global AR(1) algorithm, regularized NP, and regularized AR(1) algorithms provided similar performance and approximately doubled the percentage of white residuals compared to OLS. All three algorithms successfully modeled many positively correlated, single-pole spectra resulting in a lower expected FPR compared with OLS. Positive correlations were not, however, reduced to nominal levels. Furthermore, significant negative autocorrelations increased for all three algorithms resulting in a higher expected FNR compared with OLS. For this reason, we do not recommend these algorithms for pre-surgical mapping applications. Autocorrelations in the model residuals were not reduced to nominal levels because, while the majority of voxels within the brain have positive autocorrelations, there are a substantial number of voxels that are either approximately white or have negative autocorrelations. Averaging/regularizing the autocorrelations across voxels subsequently causes systematic misclassification and/or underestimation of the estimated autocorrelations and voxels may become/remain significantly biased after performing the autocorrelation correction. The nonregularized NP, AR(1), and AR(2) algorithms provided superior performance in terms of producing white residuals. All three algorithms reduced significant negative autocorrelations to nominal levels. The NP and AR(2) algorithms additionally reduced significant positive autocorrelations to nominal levels. While many of the spectra we encountered were well described by single pole models, we found that two-pole spectra
existed in sufficient numbers such that the AR(1) algorithm was unable to reduce positive autocorrelations to nominal levels. Regularization of the autocorrelation coefficients is motivated by the desire to increase the predictability of the GLM parameter estimates [36]. Our results, however, showed that regularization appreciably biases autocorrelation estimates. Consequently, many of the model residuals are significantly different than white noise thereby biasing the GLM parameter estimates. There is thus a tradeoff between maximizing predictability and minimizing bias when considering spatial regularization of autocorrelation coefficients. To further explore this tradeoff in the context of fMRI analyses, we applied the NP, AR(1), and AR(2) (no algorithms with regularization kernels of regularization), 2, 4, 6, and 8 mm to the six motor datasets previously described. We then computed the percentage of biased residuals (100-RW%) and the gain in predictability (computed as the mean reduction in residual variance relative to OLS) across all voxels from all six datasets (Fig. 11). As expected, the percentage of biased residuals increased with kernel width for all three algorithms. For the NP and AR(1) algorithms, predictability initially increased with kernel width reaching a maximum at 4 mm (about the size of a single voxel in our analysis) and decreased thereafter. For the AR(2) algorithm, the gain in predictability was maximum for no regularization and decreased with increasing kernel width. For all three algorithms, any gains in predictability were greatly outweighed by increasing percentages of biased residuals. Placing an upper bound of 10% on the acceptable bias threshold, the AR(2) algorithm with no regularization clearly provides the optimal tradeoff between maximizing predictability and minimizing bias for the considered datasets. Naively choosing the algorithm that maximizes predictability (NP with 4 mm regularization kernel) would result in significant autocorrelation bias in nearly 19% of the residual time-series. The performance of the autocorrelation estimation algorithms presented here may differ depending on the chosen HRF basis functions, drift functions, pre-processing options, experimental design, regularization filter, and image acquisition parameters. The observed autocorrelations may also be task, patient, and/or scanner dependent. Additional research is required to identify the extent to which each of these variables impact the results presented here. However, for most fMRI datasets, we expect that a second-order, nonregularized AR algorithm will provide the best performance in terms of producing white residuals and achieving the best possible tradeoff between maximizing predictability and minimizing bias.
REFERENCES [1] K. J. Worsley and K. J. Friston, “Analysis of fMRI time-series revisited-again,” NeuroImage, vol. 2, pp. 173–81, Sep. 1995. [2] K. Worsley, S. Marrett, P. Neelin, A. Vandal, K. Friston, and A. Evans, “A unified statistical approach for determining significant signals in images of cerebral activation,” Hum. Brain Mapp., vol. 4, pp. 58–73, 1996. [3] K. J. Friston, A. P. Holmes, K. J. Worsley, J. B. Poline, C. D. Frith, and R. S. Frackowiak, “Statistical parametric maps in functional imaging: A general linear approach,” Hum. Brain Mapp., vol. 2, pp. 189–210, 1995.
Authorized licensed use limited to: Arizona State University. Downloaded on July 4, 2009 at 07:43 from IEEE Xplore. Restrictions apply.
LENOSKI et al.: ON THE PERFORMANCE OF AUTOCORRELATION ESTIMATION ALGORITHMS FOR fMRI ANALYSIS
[4] E. Bullmore, C. Long, J. Suckling, J. Fadili, G. Calvert, F. Zelaya, T. A. Carpenter, and M. Brammer, “Colored noise and computational inference in neurophysiological (fMRI) time series analysis: Resampling methods in time and wavelet domains,” Hum. Brain Mapp., vol. 12, pp. 61–78, Feb. 2001. [5] K. J. Friston, O. Josephs, E. Zarahn, A. P. Holmes, S. Rouquette, and J. Poline, “To smooth or not to smooth? Bias and efficiency in fMRI time-series analysis,” NeuroImage, vol. 12, pp. 196–208, Aug. 2000. [6] T. E. Lund, K. H. Madsen, K. Sidaros, W. L. Luo, and T. E. Nichols, “Non-white noise in fMRI: Does modelling have an impact?,” NeuroImage, vol. 29, pp. 54–66, Jan. 1, 2006. [7] M. W. Woolrich, B. D. Ripley, M. Brady, and S. M. Smith, “Temporal autocorrelation in univariate linear modeling of FMRI data,” NeuroImage, vol. 14, pp. 1370–86, Dec. 2001. [8] R. S. Frackowiak, K. J. Friston, C. Frith, R. J. Dolan, C. J. Price, S. Zeki, J. Ashburner, and W. Penny, Human Brain Function, 2nd ed. San Diego, CA: Elsevier Science, 2004. [9] K. J. Worsley, C. H. Liao, J. Aston, V. Petre, G. H. Duncan, F. Morales, and A. C. Evans, “A general statistical analysis for fMRI data,” NeuroImage, vol. 15, pp. 1–15, Jan. 2002. [10] J. L. Marchini and B. D. Ripley, “A new statistical approach to detecting significant activation in functional MRI,” NeuroImage, vol. 12, pp. 366–80, Oct. 2000. [11] M. A. Burock and A. M. Dale, “Estimation and detection of event-related fMRI signals with temporally correlated noise: A statistically efficient and unbiased approach,” Hum. Brain Mapp., vol. 11, pp. 249–60, Dec. 2000. [12] W. Penny, S. Kiebel, and K. Friston, “Variational Bayesian inference for fMRI time series,” NeuroImage, vol. 19, pp. 727–41, Jul. 2003. [13] E. Bullmore, M. Brammer, S. C. Williams, S. Rabe-Hesketh, N. Janot, A. David, J. Mellers, R. Howard, and P. Sham, “Statistical methods of estimation and inference for functional MR image analysis,” Magn. Reson. Med., vol. 35, pp. 261–77, Feb. 1996. [14] P. L. Purdon, V. Solo, R. M. Weisskoff, and E. N. Brown, “Locally regularized spatiotemporal modeling and model comparison for functional MRI,” NeuroImage, vol. 14, pp. 912–23, Oct. 2001. [15] J. J. Locascio, P. J. Jennings, C. I. Moore, and S. Corkin, “Time series analysis in the time domain and resampling methods for studies of functional magnetic resonance brain imaging,” Hum. Brain Mapp., vol. 5, pp. 168–193, 1997. [16] T. Gautama and M. M. Van Hulle, “Optimal spatial regularization of autocorrelation estimates in fMRI analysis,” NeuroImage, vol. 23, pp. 1203–1216, 2004. [17] P. L. Purdon and R. M. Weisskoff, “Effect of temporal autocorrelation due to physiological noise and stimulus paradigm on voxel-level falsepositive rates in fMRI,” Hum. Brain Mapp., vol. 6, pp. 239–249, 1998. [18] E. Zarahn, G. K. Aguirre, and M. D’Esposito, “Empirical analyses of BOLD FMRI statistics. I. Spatially unsmoothed data collected under null-hypothesis conditions,” NeuroImage, vol. 5, no. 3, pp. 179–197, 1997. [19] V. D. Calhoun and T. Adali, “Unmixing fMRI with independent component analysis,” IEEE Eng. Med. Biol. Mag., vol. 25, no. 2, pp. 79–90, Mar.–Apr. 2006. [20] E. Zarahn, G. K. Aguirre, and M. D’Esposito, “Empirical analyses of BOLD fMRI statistics. I. Spatially unsmoothed data collected under null-hypothesis conditions,” NeuroImage, vol. 5, pp. 179–97, Apr. 1997. [21] J. L. Marchini and S. M. Smith, “On bias in the estimation of autocorrelations for fMRI voxel time-series analysis,” NeuroImage, vol. 18, pp. 83–90, Jan. 2003. [22] W. L. Luo and T. E. Nichols, “Diagnosis and exploration of massively univariate neuroimaging models,” NeuroImage, vol. 19, pp. 1014–32, Jul. 2003. [23] S. M. Smith, M. Jenkinson, M. W. Woolrich, C. F. Beckmann, T. E. J. Behrens, H. Johansen-Berg, P. R. Bannister, M. De Luca, I. Drobnjak, D. E. Flitney, R. Niazy, J. Saunders, J. Vickers, Y. Zhang, N. De Stefano, J. M. Brady, and P. M. Matthews, “Advances in functional and structural MR image analysis and implementation as FSL,” NeuroImage, vol. 23, pp. 208–219, 2004. [24] K. J. Friston, A. Mechelli, R. Turner, and C. J. Price, “Nonlinear responses in fMRI: The Balloon model, Volterra kernels, and other hemodynamics,” NeuroImage, vol. 12, pp. 466–477, Oct. 2000.
837
[25] M. W. Woolrich, T. E. J. Behrens, and S. M. Smith, “Constrained linear basis sets for HRF modeling using Variational Bayes,” NeuroImage, vol. 21, no. 4, pp. 1748–61, 2004. [26] H. Chen, D. Yao, and Z. Liu, “A comparison of Gamma and Gaussian dynamic convolution models of the fMRI BOLD response,” Magn. Reson. Imag., vol. 1, pp. 83–88, 2005. [27] J. Tanabe, D. Miller, J. Tregellas, R. Freedman, and F. G. Meyer, “Comparison of detrending methods for optimal fMRI preprocessing,” NeuroImage, vol. 15, no. 4, pp. 902–907, 2002. [28] O. Friman, M. Borga, P. Lundberg, and H. Knutsson, “Exploratory fMRI analysis by autocorrelation maximization,” NeuroImage, vol. 16, no. 2, pp. 454–464, 2002. [29] A. P. Holmes, O. Josephs, C. Büchel, and K. J. Friston, “Statistical modeling of low-frequency confounds in fMRI,” NeuroImage, vol. 5, p. 480, 1997. [30] S. Kay, Fundamentals of Statistical Signal Processing—Estimation Theory. Upper Saddle River, NJ: Prentice-Hall, 1993. [31] J. A. Mumford and T. E. Nichols, “Modeling and Inference of Multisubject fMRI Data,” IEEE Eng. Med. Biol. Mag., vol. 25, no. 2, pp. 42–51, March–April 2006. [32] R. W. Cox, “AFNI: Software for analysis and visualization of functional magnetic resonance neuroimages,” Comput. Biomed. Res., vol. 29, pp. 162–73, Jun. 1996. [33] P. Stoica and R. Moses, Spectral Analysis of Signals. Upper Saddle River, NJ: Pearson, 2005. [34] D. A. Harville, “Bayesian inference for variance components using only error contrasts,” Biometrika, vol. 61, pp. 383–385, 1974. [35] T. P. Speed, “Restricted Maximum Likelihood (ReML),” in Encyclopedia of Statistical Science. New York: Wiley-Interscience, 1997, pp. 472–481. [36] K. J. Worsley, “Spatial Smoothing of autocorrelations to control the degrees of freedom in fMRI analysis,” NeuroImage, vol. 26, pp. 635–641, 2005. [37] J. Durbin and G. S. Watson, “Testing for serial correlation in least squares regression. I,” Biometrika, vol. 37, pp. 409–28, Dec. 1950. [38] J. Durbin and G. S. Watson, “Testing for serial correlation in least squares regression. II,” Biometrika, vol. 38, pp. 159–78, Jun. 1951.
Brian Lenoski (M’08) received the B.S.E. degree in biomedical engineering and the M.S. degree in electrical engineering from Arizona State University, Tempe, in 2005 and 2007, respectively, where he is currently pursuing the Ph.D. degree in the Department of Electrical Engineering. He is an Imaging Scientist at Medical Numerics, Inc., Germantown, MD. His research interests include the processing and analysis of functional neuroimaging datasets for presurgical brain mapping, diffusion tensor imaging and tractography, statistical signal processing, and scientific programming. Mr. Lenoski was awarded the Science Foundation of Arizona (SFAz) graduate research fellowship in 2007.
Leslie C. Baxter received the B.S. degree in biology from the University of Michigan, Ann Arbor, in 1986, and the Ph.D. degree in clinical neuropsychology from the Chicago Medical School, Chicago, IL, in 1999. She completed a clinical internship at Long Island Jewish Medical Center, New York, and a postdoctoral fellowship in neuropsychology at Dartmouth University. She is currently a Clinical Neuropsychologist and Staff Scientist at Barrow Neurological Institute, Phoenix, AZ. Her specialty is functional MRI mapping and she performs pre-surgical mapping of cognitive abilities for a wide variety of patients undergoing resection of tumors and aneurysm as well as epilepsy surgery. Her research interests include structural and functional imaging changes associated with aging and Alzheimer’s Disease.
Authorized licensed use limited to: Arizona State University. Downloaded on July 4, 2009 at 07:43 from IEEE Xplore. Restrictions apply.
838
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 2, NO. 6, DECEMBER 2008
Lina J. Karam (S’91–M’95–SM’03) received the B.E. degree in computer and communications engineering from the American University of Beirut in 1989 and the M.S. and Ph.D. degrees in electrical engineering from the Georgia Institute of Technology, Atlanta, in 1992 and 1995, respectively. She is currently an Associate Professor in the Electrical Engineering Department, Arizona State University, Tempe. Her research interests are in the areas of image and video processing, image and video coding, error-resilient source coding, human visual perception, biomedical imaging, and digital filtering. From 1991 to 1995, she was a Graduate Research Assistant in the Graphics, Visualization, and Usability (GVU) Center and then in the Department of Electrical Engineering at Georgia Institute of Technology, Atlanta. She was with Schlumberger Well Services, working on problems related to data modeling and visualization, and in the Signal Processing Department of AT&T Bell Labs, working on problems in video coding during 1992 and 1994, respectively. Dr. Karam is the recipient of a NSF CAREER Award. She served as the Chair of the IEEE Communications and Signal Processing Chapters in Phoenix in 1997 and 1998. She also served as an associate editor of the IEEE TRANSACTIONS ON IMAGE PROCESSING from 1999 to 2003 and of the IEEE SIGNAL PROCESSING LETTERS from 2004 to 2006, and as a member of the IEEE Signal Processing Society’s Conference Board from 2003 to 2005. She is an Associate Editor of the IEEE TRANSACTIONS ON IMAGE PROCESSING and the Technical Program Chair of the 2009 IEEE International Conference on Image Processing. She is an elected member of the IEEE Circuits and Systems Society’s DSP Technical Committee, the IEEE Signal Processing Society’s IMDSP Technical Committee, and a member of the editorial board of the Foundation Trends in Signal Processing journal. She is a member of the Signal Processing and Circuits and Systems societies of the IEEE.
He then joined Massachusetts General Hospital for a residency in radiology. In 1998, he became a Systems Engineer at Sensor Systems, Sterling, VA. He served as the Lead Engineer for Sensor Systems’ first clinical product, a software system for presurgical brain mapping using functional MRI. He joined the Center for the Study of Learning at Georgetown University as the Director of the Methods Core in April 2003. In June 2007, he returned to Sensor Systems’ Medical Image Processing Division, Medical Numerics, where he is a Senior Imaging Scientist. His research interests include statistics, processing, and analysis of neuroimages, presurgical brain mapping, and scientific programming.
Josef Debbins received the B.S. degree in electrical engineering from the University of Minnesota, Minneapolis, in 1986, the B.A. degree in applied mathematics and the M.S. degree in electrical engineering from Michigan Technological University, Houghton, in 1992, the Ph.D. degree in biophysics/biomedical imaging from Mayo Clinic and Foundation, Rochester, MN, in 1997, and completed a year of international research in Paris, France. He is currently a Staff Scientist at the Barrow Neurological Institute, St. Joseph’s Hospital and Medical Center, which follows eight years with GE Healthcare’s MRI Division in Milwaukee, WI. He is also an Adjunct Professor of Electrical Engineering at Arizona State University. His MR research interests include novel k-space trajectories for rapid MR imaging, diffusion and diffusion tensor imaging and processing, functional MR imaging including data acquisition and processing strategies, and developing MR coil prototypes for specialty applications. Dr. Debbins is a registered professional electrical engineer (MN) and a member of ISMRM.
José Maisog received the B.S. degree in electrical engineering from Princeton University, Princeton, NJ, in 1986 and the M.D. degree from the University of Maryland, Baltimore, in 1990. He is currently pursuing the Master’s degree in biostatics at Georgetown University. Following a one-year internship at the University of Maryland Hospital, he spent six years as a Postdoctoral Fellow at the National Institutes of Health (NIH), where he studied statistics and data analysis, especially with respect to functional neuroimaging.
Authorized licensed use limited to: Arizona State University. Downloaded on July 4, 2009 at 07:43 from IEEE Xplore. Restrictions apply.