Electronic Journal of Statistics Vol. 7 (2013) 1913–1956 ISSN: 1935-7524 DOI: 10.1214/13-EJS825
Diffusion tensor smoothing through weighted Karcher means Owen Carmichael Department of Neurology and Computer Science University of California, Davis e-mail:
[email protected]
and Jun Chen, Debashis Paul, Jie Peng Department of Statistics University of California, Davis e-mail:
[email protected];
[email protected];
[email protected] Abstract: Diffusion tensor magnetic resonance imaging (MRI) quantifies the spatial distribution of water diffusion at each voxel on a regular grid of locations in a biological specimen by diffusion tensors– 3 × 3 positive definite matrices. Removal of noise from DTI is an important problem due to the high scientific relevance of DTI and relatively low signal to noise ratio it provides. Leading approaches to this problem amount to estimation of weighted Karcher means of diffusion tensors within spatial neighborhoods, under various metrics imposed on the space of tensors. However, it is unclear how the behavior of these estimators varies with the magnitude of DTI sensor noise (the noise resulting from the thermal effects of MRI scanning) as well as the geometric structure of the underlying diffusion tensor neighborhoods. In this paper, we combine theoretical analysis, empirical analysis of simulated DTI data, and empirical analysis of real DTI scans to compare the noise removal performance of three kernel-based DTI smoothers that are based on Euclidean, log-Euclidean, and affine-invariant metrics. The results suggest, contrary to conventional wisdom, that imposing a simplistic Euclidean metric may in fact provide comparable or superior noise removal, especially in relatively unstructured regions and/or in the presence of moderate to high levels of sensor noise. On the contrary, log-Euclidean and affine-invariant metrics may lead to better noise removal in highly structured anatomical regions, especially when the sensor noise is of low magnitude. These findings emphasize the importance of considering the interplay of sensor noise magnitude and tensor field geometric structure when assessing diffusion tensor smoothing options. They also point to the necessity for continued development of smoothing methods that perform well across a large range of scenarios. AMS 2000 subject classifications: Primary 62G08; secondary 62P10. ∗ The
authors thank the anonymous reviewers for their helpful comments. research is partially supported by the NIH grants AG-10220, AG-10129, AG-030514, AG-031252 and AG-021028 and the NSF grant DMS-1208917. ‡ Chen and Peng’s research is partially supported by the NSF grants DMS-0806128 and DMS-1007583. § Paul’s research is partially supported by the NSF grants DMS-0806128, DMR-1035468 and DMS-1106690. † Carmichael’s
1913
1914
O. Carmichael et al. Keywords and phrases: Tensor space, diffusion MRI, Karcher mean, kernel smoothing, perturbation analysis. Received October 2012.
1. Introduction Diffusion magnetic resonance imaging (MRI) has emerged as a prominent technique for using magnetic field gradients to measure the directional distribution of water diffusion in a biological specimen (Bammer et al., 2009; Beaulieu, 2002; Chanraud et al., 2010; Mukherjee et al., 2008a,b) because the diffusion measurements are useful as non-invasive proxy measures for the structural organization of the underlying tissue. In diffusion MRI of the brain, the head of a person or animal is placed inside a stationary magnetic field, and the tissue of the brain is excited by applying direction-specific magnetic field gradients and pulses of radio frequency energy. Measuring the frequency characteristics of energy emitted from the excited tissue allows us to estimate, at each location in the brain, the bulk amount of water diffusion occurring along each of the magnetic field gradient directions. Directional distributions of water diffusion across all possible 3D directions are then estimated by extrapolating from these direction-specific diffusion measurements. Diffusion tensor imaging (DTI) is the most widespread version of diffusion MRI, resulting in a representation of the directional distribution through a 3 × 3 positive definite matrix (diffusion tensor) at each spatial location. For a more detailed description of diffusion tensor imaging measurements and models, see Section 2. However, DTI data include a substantial amount of noise, which results in noisy estimation of the diffusion tensors (Gudbjartsson and Patz, 1995; Hahn et al., 2006, 2009; Zhu et al., 2009). Consequently, noise adversely affects subsequent tracing of anatomical structures and mapping of inter-regional brain connectivity (Basser and Pajevic, 2000). For this reason, a variety of methods have been developed to filter noise from DTI data by enforcing spatial smoothness. One prominent approach replaces the tensor at each voxel by a weighted Karcher mean of tensors in a local spatial neighborhood under a Riemannian metric imposed on the space of tensors (Arsigny et al., 2005, 2006; Casta˜ no Moraga et al., 2007). This amounts to kernel smoothing of the tensor field. In a notable development, Yuan et al. (2012) extended the weighted Karcher-mean approach to local polynomial smoothing in the tensor space with respect to different geometries, when the observations are taken at random locations. They also derived asymptotic mean squared errors of the estimated tensors. While other alternatives exist, including spatially regularizing the estimation of diffusion tensors from the raw diffusion weighted imaging (DWI) data (Tabelow et al., 2008), in this paper, we focus on the more common Karcher mean approach, with an emphasis on theoretical and numerical evaluation of smoothing results under various conditions. In the existing literature on Karcher mean approaches, methods that impose a Euclidean metric on the space of tensors (hereafter “Euclidean smoothers”) are generally considered inferior to those that impose a log-Euclidean metric or
Diffusion tensor smoothing
1915
an affine-invariant metric (hereafter, “geometric smoothers”). The former gives rise to a swelling effect (Arsigny et al., 2005, 2006), which means smoothed tensors have larger determinants than those of the original tensors. This results in artificially inflated estimates of local water diffusion. However, in other important respects the performance characteristics of these smoothers are not well understood. For example, it is not clear how the geometric structure of the tensor neighborhoods impacts smoothing performance. Because this geometric structure is highly variable across the human brain, and because it provides the primary cue for the structural organization of the underlying brain tissue, understanding its effects on tensor smoothing is highly relevant. Secondly, it is not known how smoother performance varies by noise magnitude, which can vary substantially across DTI scanners. Clarifying the performance tradeoffs of competing smoothing algorithms could help to guide end users toward advantageous smoothers based on the biological sample and the operating characteristics of the MRI scanner. Given that the choice of DTI smoothing algorithm has a significant impact on scientific studies of DTI properties in various brain diseases, a deeper understanding of DTI smoothing performance has high practical importance (Viswanath et al., 2012). We first study tensor estimation at each voxel based on the raw DWI data. The estimated tensors will then be used as input for the smoothing process. We derive asymptotic expansions for a nonlinear regression estimator and a linear regression estimator under the small noise asymptotic regime. We show that, compared with the more widely used linear estimator, the nonlinear estimator is more efficient. We also show that the additive noise resulting from thermal effects of MRI scanning (sensor noise) on the raw DWI data leads to approximately additive noise on the estimated tensors. We then study the properties of the Euclidean smoothers and geometric smoothers applied to these estimated tensors for the purpose of further denoising. The main goal of this study is to demonstrate the effect of different metrics on the performance of local constant smoothing under various local structures of the tensor field and various sensor noise levels. The major finding of this paper is that on contrary to conventional wisdom – Euclidean smoothers may in fact have superior performance than geometric smoothers depending on the interplay of the aforementioned two factors. More specifically, we use perturbation analysis to show that when sensor noise levels are relatively low, either Euclidean or geometric smoothers may have smaller bias depending on whether the tensor field is spatially homogeneous or inhomogeneous. Here we focus on asymptotic bias rather than variance since regardless of the choice of the metric, the variance is essentially inversely proportional to the neighborhood size and is proportional to noise level. However, even when the tensor field is constant within a neighborhood, the smoothers corresponding to different metrics can exhibit different bias characteristics (see Section 4.2 for a detailed discussion). We then use simulated tensor fields to show empirically that when sensor noise levels are relatively high, Euclidean smoothers tend to perform better regardless of the geometric structure of the tensor neighborhoods. Finally, we perform validation experiments on real DTI scans that confirm the theoretical and simulation findings. Together, these find-
1916
O. Carmichael et al.
ings suggest that we may need to revisit the conventional wisdom that Euclidean smoothers generally provide inferior smoothing performance. The rest of the paper is organized as follows. In Section 2, we study tensor estimation from raw DWI data. In Section 3 we describe the three tensor smoothers, followed by Section 4 which presents a perturbation analysis to compare these smoothers under the small noise regime. Section 5 is for simulation studies and Section 6 is an application to human brain DTI data. The paper is concluded by a discussion in Section 7. 2. Tensor estimation In this section, we derive asymptotic expansions of two regression based tensor estimators under the additive noise model for the raw DWI data. We consider the small noise asymptotic regime with a fixed number of gradient directions, which is different from the conventional statistical paradigm where the sample size increases to infinity. We choose to work under this setting because in DTI experiments, the number of gradient directions is usually small and fixed whereas the signal to noise ratio may be increased by increasing the magnetic field strength and scan time. In contrast, asymptotic analysis under the framework where the number of gradient directions grows to infinity has been conducted by Zhu et al. (2007) and Zhu et al. (2009). Zhu et al. (2007) proposed a weighted least squares estimate of the diffusion tensor and quantified the effects of noise on this estimator and their eigenvalues and eigenvectors, as well as on the subsequent morphological classification of the tensors. Zhu et al. (2009) studied a number of different regression models for characterizing stochastic noise in DWI and functional MRI (fMRI) and developed a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise. DWI measurements and additive noise model Proton Nuclei Magnetic Resonance measures signals from the H 1 nuclei, the majority of which in biological tissues is from water molecules. In diffusion magnetic resonance (DT-MR), the signals are sensitized to water diffusion by applying a set of magnetic gradient fields to the tissue. The raw data obtained by MRI scanning are complex numbers representing the Fourier transformation of a magnetization distribution of a tissue at a certain point in time. The observed DWI data are the amplitude of the DT-MR signals corresponding to a set of magnetic gradients denoted by Q – a set of unit norm vectors in R3 referred to as gradient directions. Assuming Gaussian diffusion of water molecules at a given voxel, the noiseless diffusion weighted signal intensity in direction q = (q1 , q2 , q2 )T ∈ Q is given by (Mori, 2007) Sq = S0 exp(−bqT Dq),
(1)
Diffusion tensor smoothing
1917
where S0 is the baseline intensity determined by the background constant field, b is a fixed experimental constant, and D is a 3×3 positive definite matrix which is referred to as the diffusion tensor at that voxel. For simplicity of exposition, throughout this section, S0 is assumed to be known and fixed. We also absorb b into the tensor D and ignore it hereafter. The so called sensor noise in the observed (complex) signal at each voxel is mainly attributed to thermal noise in the MRI scanner and is modeled as independent and additive white noise on the real and imaginary parts of the signal (Gudbjartsson and Patz, 1995). Consequently, the actually observed, noise corrupted DWI data are Sˆq =k Sq uq + σεq k,
q ∈ Q,
(2)
where uq , a unit vector in R2 , is the phase of the signal, the random vectors εq have two independent coordinates with zero mean and unit variance and are also assumed to be independent across gradient directions q’s. The parameter σ > 0 controls the noise level. This model is referred to as the additive noise model for DWI data. If in model (2) the noise is further assumed to be Gaussian, i.e., εq ’s are independent N (0, I2 ), then Sˆq ’s follow the Rician distribution with probability density function pSq ,σ , where for ζ > 0, 2 xζ x x + ζ2 pζ,σ (x) = 2 exp − I0 1(x > 0), (3) 2 σ 2σ σ2 where I0 denotes the zero-th order modified Bessel function of the first kind. Tensor estimators Here, we consider estimating the tensor D based on the observed DWI signal intensities {Sˆq : q ∈ Q}. We use vec(D) to denote the 6×1 vector (D11 , D22 , D33 , D12 , D13 , D23 )T , and define xq = (q12 , q22 , q32 , 2q1 q2 , 2q1 q3 , 2q2 q3 )T . With a slight abuse of notation, in this section we use D to mean vec(D). Then the quadratic form qT Dq can be written as xTq D. In the following, we also assume that the P matrix q∈Q xq xTq is well-conditioned, which is guaranteed by an appropriate choice of the gradient directions (in practice, an approximate uniform design on a sphere is often used). The first method is to use the log transformed DWI’s to estimate D by a linear regression. The resulting estimator is referred to as the b LS . linear regression estimator and denoted by D X b LS := arg min D (log Sˆq − log S0 + xTq D)2 , (4) 6 D∈R
q∈Q
The second method uses nonlinear regression based on the original DWI’s. The resulting estimator is referred to as the nonlinear regression estimator and debNL noted by D X b N L := arg min D (Sˆq − S0 exp(−xTq D))2 . (5) 6 D∈R
q∈Q
1918
O. Carmichael et al.
Note that, in the above two estimates, we did not restrict D to be positive definite, but only that it is a symmetric matrix. Accordingly, Propositions 2.1 and 2.2 below, dealing with the asymptotic behavior of the estimated tensors, do not explicitly require positive-definiteness, even though they point to the positive-definiteness of the estimated tensors with a high probability. Imposing the positive definite constraint explicitly for tensor estimation at each voxel could be done, e.g., through a logarithm parametrization of the tensor. Let X be the |Q| × 6 matrix with rows xTq , for q ∈ Q, where |Q| denotes the number of gradient directions. Also let ℓS = (log Sˆq − log S0 )q∈Q . For future use, we also define DS = diag(Sq : q ∈ Q). Then we have an explicit formula b LS , namely, for D −1 X X b LS = − D xq xTq (log Sˆq − log S0 )xq = −(X T X)−1 X T ℓS . q∈Q
q∈Q
b N L , it can be numerically solved Though no explicit formula is available for D by a nonlinear least square solver. We used the Levenberg-Marquardt method in the numerical study of this paper. With the linear regression estimator as the initial estimate, the algorithm usually converges in just a few iteration steps. In the literature, it has been shown numerically that the nonlinear regression estimator often outperforms the linear regression estimator (Basser and Pajevic, 2000; Hahn et al., 2006). Also, when the noise associated with DWI measurements is Rician, it is shown in Polzehl and Tabelow (2008) that in the low signal-to-noise-ratio setting, the linear regression estimator is biased. In the following, we analyze the behavior of the two regression estimators under the additive noise model (2) as the noise parameter σ → 0 and show that the nonlinear regression estimator is asymptotically more efficient. The proofs of the stated asymptotic results can be found in the Appendix. Throughout, we use D0 to denote the true diffusion tensor. Asymptotic expansions b LS = D0 + Proposition 2.1. Under the additive noise model, as σ → 0, D P 3 2 3 −1 σD1,LS + σ D2,LS + O(σ |Q| q∈Q k εq k ), where the random vectors D1,LS and D2,LS satisfy E(D1,LS ) = 0, E(D2,LS ) = 0, and Var(D1,LS ) =
(
X
xq xTq )−1 (
q∈Q
=
T
X
q∈Q −1
(X X)
(X
T
Sq−2 xq xTq )(
X
xq xTq )−1
q∈Q
DS−2 X)(X T X)−1 .
(6)
b LS is of the order O(σ 3 ). Moreover, under the Rician noise Thus, the bias in D model, D1,LS is a normal random vector, whereas the coordinates of D2,LS are weighted sums of differences of independent χ21 random variables.
Diffusion tensor smoothing
1919
b N L = D0 + Proposition 2.2. Under the additive noise model, as σ → 0, D P σD1,N L + σ 2 D2,N L + O(σ 3 |Q|−1 q∈Q k εq k3 ), where, X E(D1,N L ) = 0, Var(D1,N L ) = ( Sq2 xq xTq )−1 = (X T DS2 X)−1 , (7) q∈Q
and
X X X 1 1 − Sq2 xTq ( Sq2 xq xTq )−1 Sq2′ xq′ xTq′ )−1 xq xq . E(D2,N L ) = − ( 2 ′ q∈Q
q ∈Q
q∈Q
(8) In (8), at least a few the coefficients of xq are nonzero if |Q| > 6 and thus the b N L is of order O(σ 2 ). mean of D2,N L is non-vanishing. Therefore, the bias in D Moreover, under the Rician noise model, D1,N L is a normal random vector, whereas the coordinates of D2,N L are weighted sums of (dependent) χ21 random variables. In terms of the first order terms, both regression estimators are unbiased and b N L has a smaller variance than D b LS , in the sense D aT (X T DS2 X)−1 a ≤ aT (X T X)−1 (X T DS−2 X)(X T X)−1 a,
for all a ∈ R6 . (9) Indeed, the difference between the two asymptotic covariance matrices can be quite substantial if Sq ’s vary a lot. This situation may arise when the true diffusion tensor D0 is highly anisotropic. Under such a situation, only a few gradient directions are likely to be aligned to the leading eigen-direction, which results in small Sq , whereas other gradient directions lead to large Sq . The above analysis suggests that at least when the noise level is low, the nonlinear regression estimator is more preferable due to its smaller variance. In the Supplementary Material (Carmichael et al.), we present a numerical study which shows that for both small and large noise levels, the nonlinear estimator performs better than the linear estimator. In terms of the second order terms, the linear regression estimator is also unbiased. If the number of gradient directions is not too small, we have the b N L: approximation for the second order bias term (8) of D −1 X 1 1 1 X 2 1 Sq xq xTq xq = − (X T DS2 X)−1 X T 1Q . E(D2,N L ) ≈ − 2 |Q| |Q| 2 q∈Q
q∈Q
(10) Notice that in this approximation, only the matrix inverse term involves the true tensor (through Sq ’s). So, as long as the design is uniform, no single gradient direction has a dominating influence on this bias. 3. Kernel smoothing in tensor space
In this section, we consider kernel smoothing on the space of N × N positive definite matrices (hereafter referred to as the tensor space PN ). We first briefly
1920
O. Carmichael et al.
review the kernel smoothing idea in a Euclidean space (Fan and Gijbels, 1996). Let f be a function defined on D ⊂ Rd and taking values in Rp . The data consist of pairs {(si , Xi )}ni=1 with si ’s being spatial positions in D, and Xi ∈ Rp ’s being noise-corrupted versions of f (si )’s. One way to reconstruct f is to use weighted averages: n n X X wi (s), s ∈ D. wi (s)Xi / fb(s) = i=1
i=1
Let k · k denote a Euclidean norm. A common scheme for the weights is ωi (s) := Kh (k si − s k),
i = 1, . . . , n,
(11)
where Kh (·) := (1/h)K(·/h), K(·) is a nonnegative, integrable kernel and h > 0 is the bandwidth which determines the size of the smoothing neighborPn hood. Note that, fb(s) minimizes i=1 ωi (s)d2 (Xi , c) with respect to c, where d(Xi , c) =k Xi − c k is a Euclidean distance on Rp . Thus kernel smoothing can be immediately generalized to a Riemannian manifold M by replacing the Euclidean distance with the geodesic distance dM (·, ·). Specifically, to reconstruct a function f : D ⊂ Rd → M, we define fb(s) := arg min
c∈M
n X
ωi (s)d2M (Xi , c),
s ∈ D.
i=1
(12)
Kernel smoothing thus takes the form of a weighted Karcher mean of Xi′ s (Karcher, 1977). From (12), it is obvious that different distance metrics on the manifold M may lead to different kernel smoothers. Since PN is the interior of a cone in the Euclidean space RN ×N , one may simply impose a Euclidean metric on PN , for example, dE (X, Y ) := {trace(X − Y )2 }1/2 . Under Euclidean distances, (12) can be easily solved by a weighted average fbE (s) =
n X
ωi (s)Xi
i=1
n .X
ωi (s).
(13)
i=1
Since the tangent space of PN is the space of N × N symmetric matrices, as an alternative to the Euclidean metrics, logarithmic Euclidean (henceforth logEuclidean) metrics have been proposed (Arsigny et al., 2005, 2006): dLE (X, Y ) :=k log X − log Y k .
(14)
Under dLE , (12) can also be explicitly solved as fbLE (s) = exp
n X i=1
! n .X ωi (s) . ωi (s) log(Xi )
(15)
i=1
Here exp(·) and log(·) denote the matrix exponential and matrix logarithm functions. Log-Euclidean metric has also been used by Schwartzman (2006) to develop nonparametric test procedures in the context of DTI.
Diffusion tensor smoothing
1921
Moreover, since PN can be identified with a naturally reductive homogenous space (Absil et al., 2008), one may use a bi-invariant metric: 2 1/2 , dAf f (X, Y ) := tr log(X −1/2 Y X −1/2 )
(16)
where tr(·) is the trace operator. This metric is affine-invariant, i.e., for any g ∈ GL+ (N, R), where GL+ (N, R) is the group of matrices (defined on R) with positive determinant, we have dAf f (gXg T , gY g T ) = dAf f (X, Y ). Therefore, we refer to dAf f as the affine-invariant metric. The affine-invariant geometry on PN has been extensively studied (Fletcher and Joshi, 2004, 2007; F¨orstner and Moonen, 1999) and has been applied to DTI data (Arsigny et al., 2005, 2006; Pennec et al., 2006). While a closed form solution for fb(s) is not available under dAf f , fb(s) may be computed using gradient descent methods (Pennec et al., 2006), Newton-Raphson methods (Ferreira et al., 2006; Fletcher and Joshi, 2007), or an approximate recursive procedure which is computationally much faster (outlined in Section S-1 of the Supplementary Material). Note that all three Karcher means considered here are scale-equivariant. Euclidean smoothing is often being criticized due to its swelling effect where the determinant of the “average” tensor is larger than those of the tensors being averaged (Arsigny et al., 2005). Since the determinant of the tensor quantifies the overall diffusivity of water within the voxel, this property contradicts with principles of physics. On the other hand, under both geometric smoothing, averaging of tensors results in an averaging of their determinants, thus precluding swelling artifacts. However, as we shall see in the following sections, the relative merits of these smoothers in terms of estimation accuracy are rather complicated in that there is no one smoother performs the best under all situations. Indeed, the performance of these smoothers depends heavily on the nature of the noise and local geometric structures in the data. 4. Comparison of smoothers under different geometries In this section, we first use perturbation analysis to quantify the differences among the Euclidean mean and the two geometric means under an arbitrary target tensor. Arsigny et al. (2005) compared the log-Euclidean and affineinvariant means. However, their analysis was restricted to the setting where the target tensor is the identity matrix. We use the perturbation analysis results to compare the three tensor smoothers defined in Section 3. In particular, we study the effects of the noise in raw DWI data and the spatial heterogeneity of the tensor field on the bias associated with the smoothers. Yuan et al. (2012) considered local polynomial smoothing in tensor space using log-Euclidean and affine-invariant geometries when the observations are taken at random locations. In their setting, conditional mean and variance of the tensors are specified and asymptotic MSE is derived under the asymptotic regime where the number of sampling points in the neighborhood goes to infinity. In contrast, in this paper,
1922
O. Carmichael et al.
the tensor field is observed on a grid determined by voxelization. The noise in the observed tensors (which are used for smoothing) originates from the sensor noise in the diffusion weighted MRI data based on which the observed tensors are estimated. Local constant smoothing is applied to these noisy tensors and asymptotic analysis is carried out under a regime where the noise level converges to zero. Moreover, the focus of our asymptotic analysis is primarily on studying the bias characteristics of the local constant smoother under different geometries. 4.1. Asymptotic expansions of Karcher means Let {D(ω) ∈ PN : ω ∈ Ω} be a set of random tensors, where Ω is an arbitrary index set with a Borel σ-algebra. Let PΩ be a probability measure on Ω. Let ¯E, D ¯ LE and D ¯ Af f denote the weighted Karcher mean of D: D Z arg min d2 (c, D(ω))dPΩ (ω) (17) c∈PN
with respect to the distance metrics dE , dLE and dAf f , respectively. We also use EΩ to denote expectation with respect to the measure PΩ underR the Euclidean metric dE , i.e., for any random symmetric matrix K, EΩ (K) := K(ω)dPΩ (ω). ¯ E = EΩ (D). Thus D We take a matrix perturbation analysis approach to study the differences among these means under a small noise limit regime. Let D0 denote the underlying “true” or target tensor. Let λj denote the j-th largest distinct eigenvalue of D0 , and let Pj denote the corresponding eigen-projection. For simplicity of exposition, we consider only two scenarios: (a) when the eigenvalues of D0 are all distinct (anisotropic tensor); and (b) when all the eigenvalues of D0 are equal to λ1 (isotropic tensor). Let ( max{maxj λj−1 , maxk6=j |λk − λj |−1 }, if D0 6= λ1 I C := λ−1 if D0 = λ1 I, 1 where I denotes the N × N identity matrix. We assume that the probability measure PΩ satisfies PΩ (sup k ∆(ω) k< C −1 t) = 1,
(18)
ω∈Ω
for some t > 0, where ∆(ω) := D(ω) − D0 , and k · k denotes the operator norm of matrices. Roughly speaking, C −1 indicates the scale of the signal and t is a parameter that controls the degree of deviation of the tensors from the target tensor. Also, we denote the difference between the Euclidean mean and the target tensor by ¯ E := EΩ (∆(ω)) = D ¯ E − D0 . ∆
Diffusion tensor smoothing
1923
The following theorems give expansions of the three means around the target D0 when t → 0. For simplicity of exposition, in the anisotropic case, expansions are in terms of the logarithm of the determinant of the mean. The expansions of the logarithm of the mean are given in Propositions 8.1-8.3 in the Appendix. Proofs are also given in the Appendix. Theorem 4.1. Suppose that the tensors {D(ω) : ω ∈ Ω} and the probability distribution PΩ satisfy (18) and t → 0. (a) If D0 = λ1 I, then
¯ E − log D0 = 1 ∆ ¯E − 1 ∆ ¯ 2 + O(t3 ). log D λ1 2λ21 E
(19)
(b) If the eigenvalues of D0 are all distinct, ¯ E ) − log det(D0 ) log det(D =
−
N X 1 ¯ E) tr(Pj ∆ λ j j=1
N X N X
j=1 k:k6=j
N
X 1 1 ¯ E )]− 1 ¯ E )]2 + O(t3 ). ¯ E Pk ∆ [tr(Pj ∆ [tr(Pj ∆ λj (λk − λj ) 2 j=1 λ2j
(20)
Theorem 4.2. Assume that the conditions of Theorem 4.1 hold. (a) If D0 = λ1 I, then ¯ E − 1 EΩ (∆2 ) + O(t3 ). ¯ LE − log D0 = 1 ∆ log D λ1 2λ21
(21)
(b) If the eigenvalues of D0 are all distinct, ¯ LE ) − log det(D0 ) log det(D =
−
N X 1 ¯ E) tr(Pj ∆ λ j=1 j
N N X X
j=1 k:k6=j
N
1 1X 1 EΩ [tr(Pj ∆)]2 + O(t3 ). EΩ [tr(Pj ∆Pk ∆)] − λj (λk −λj ) 2 j=1 λ2j
(22)
Theorem 4.3. Assume that the conditions of Theorem 4.1 hold. (a) If D0 = λ1 I, then ¯ Af f − log D0 = 1 ∆ ¯ E − 1 EΩ (∆2 ) + O(t3 ). log D λ1 2λ21
(23)
1924
O. Carmichael et al.
(b) If the eigenvalues of D0 are all distinct, ¯ Af f ) − log det(D0 ) log det(D =
N X 1 ¯ E) tr(Pj ∆ λ j j=1 N
−
N
1 XX 1 EΩ [tr(Pj ∆Pk ∆)] 2 j=1 λj λk
+
k=1
N X
N X 1 1 1 ¯ E) ¯ E Pk ∆ tr(Pj ∆ − λj 2λk λk − λj
j=1 k:k6=j
+
N 1X 1 ¯ E )]2 + O(t3 ). ¯ E )2 ) − [tr(Pj ∆ tr((Pj ∆ 2 2 j=1 λj
(24)
These theorems show that, the three means are the same up to the first order terms. If D0 is isotropic, then the first order difference between these means ¯ E . Moreover, D ¯ LE and D ¯ Af f are and the truth (in logarithm scale) is (1/λ1 )∆ the same even up to the second order terms. If D0 is anisotropic, then the first order difference in log-determinant between these means and the truth PN ¯ E ). Also, in this case, all three means differ in second order is j=1 λ1j tr(Pj ∆ terms. 4.2. Comparison of smoothers In this subsection, we utilize the expansions in Section 4.1 to compare the three tensor smoothers described in Section 3. We assume that, the tensors being smoothed are derived from the raw DWI data by using one of the regression methods discussed in Section 2. For comparison purposes we focus on the asymptotic bias of the smoothed tensors, as the variance is inversely proportional to the size of the smoothing neighborhood and is not very sensitive to the choice of the smoother. The main objective here is to demonstrate the effect of different metrics on the performance of local constant smoothing under different local structures of the tensor field. Since we are dealing with DTI, throughout this subsection, we have N = 3 where N is the dimension of the tensor space. We first relate the notations in Section 4.1 to the context of tensor smoothing. Now, ω is the voxel index and D(ω) is the estimated tensor based on the raw DWI data at that voxel, while Ω denotes the smoothing neighborhood of the target voxel, and PΩ is the measure determined by the kernel weights. Thus, the mean tensor defined in (17) is the local constant estimate given in (12). In the following, we assume that a compact kernel is used. Note that, as the bandwidth of the kernel goes to zero, the parameter t in (18) goes to zero if the tensor field is sufficiently smooth and the noise in the raw DWI data goes to zero as well. We adopt an infill asymptotic framework in which as the bandwidth goes to zero, the number of data points in the neighborhood increases. Specifically, P we assume that |Ω| goes to infinity and rΩ := ω∈Ω (p(ω))2 = o(1) as t → 0
Diffusion tensor smoothing
1925
where {p(ω) : ω ∈ Ω} denotes the probability mass function of PΩ . This is true for example if PΩ is the uniform measure on Ω since then rΩ = 1/|Ω|. More generally, this holds if the kernel is a continuous density that is compactly supported. We further assume that |Ω| ≤ t−K , for some K > 0, which puts an upper bound on the growth rate of |Ω| as t → 0. We denote the underlying true tensor at each voxel ω by D0 (ω) and denote the true tensor at the target voxel for smoothing by D0 . The observed tensors D(ω)’s may be viewed as noisy versions of D0 (ω)’s. The analysis in Section 2 shows that if a regression estimator is used, then the noise in D(ω) is approximately additive. Moreover, the spatial heterogeneity is another factor influencing estimator bias of the kernel smoothers. In the analysis of Case 2 described below, we treat D0 (ω)’s as random perturbations of D0 . This perspective is in agreement with approaches in spatial statistics where the spatial field is treated as a random process. We look into the effects of these two factors separately by considering the following two cases. Case 1 There is no spatial inhomogeneity, i.e., D0 (ω) = D0 for all ω ∈ Ω. In this case, the sensor noise from the DTI scanner is the sole source of variation in D(ω)’s leading to the bias in the kernel estimators. Case 2 The noise in the DWI data is small, such that the spatial inhomogeneity is the dominant source of variation in D(ω)’s. Moreover, we study a particular case where the geometric mean of the tensors {D0 (ω) : ω ∈ Ω} is approximately equal to the target tensor D0 . For the subsequent analysis, we assume that, in the additive noise model (2) for DWI data, εq (ω)’s are i.i.d. across voxel ω ∈ Ω and gradient q ∈ Q, and the moment generating function of k εq (ω) k2 is finite in a neighborhood of zero, for example, when the coordinates of εq (ω) have sub-Gaussian tails. By Chebyshev’s inequality, this implies that for every c > 0, there is a c′ > 0 such that, for any δ ∈ (0, 1), X P max k εq (ω) k3 ≥ c′ |Q|(log(1/δ) + log |Q|)3/2 ≤ 2|Ω|δ c . (25) ω∈Ω
q∈Q
This allows us to impose a uniform tail bound on the residual terms in the expansions of the regression estimators in Propositions 2.1 and 2.2. Since Q is fixed, we ignore the terms involving log |Q| when using the bound (25) with a sequence δ → 0. Case 1. Let σ be the standard deviation of the noise in raw DWI data (see equation (2)). By Propositions 2.1 and 2.2, and the fact that D0 (ω) = D0 , for all ω ∈ Ω, we have
∆(ω) = D(ω) − D0 = [σD1 (ω) + σ 2 D2 (ω)] + R∆ (ω) := ∆∗ (ω) + R∆ (ω), (26) where R∆ (ω) = O(σ 3 (log(1/σ))3/2 ) with high probability1 , and D1 (ω) and 1 We say that an event holds with high probability if the complement of that event has probability O(δc ) for a given c > 0 where δ is a positive sequence converging to zero.
1926
O. Carmichael et al.
D2 (ω) denote the first order and second order terms in the expansion p of D(ω) around D0 (in matrix form). Thus, the parameter t may be taken as σ log(1/σ), since this ensures that equation (18) is satisfied with high probability. Then ¯ E = EΩ (∆) = ∆ ¯ ∗ + EΩ (R∆ ), ∆ E
(27)
¯ ∗ := EΩ (∆∗ ) = where EΩ (R∆ ) = O(σ 3 (log(1/σ))3/2 ) with high probability and ∆ E 2 σEΩ (D1 ) + σ EΩ (D2 ). We first consider the case when D(ω)’s are the nonlinear regression estimates. By Proposition 2.2, E(D1 ) = E(EΩ (D1 )) = 0 and hence ¯ ∗E ) = E(EΩ (∆∗ )) = σ 2 E(D2 ). E(∆
(28)
¯ ∗ )) = rΩ Var(vec(∆∗ )), we have Moreover, since Var(vec(∆ E ¯ ∗ )vec(∆ ¯ ∗ )T ] E[vec(∆ E E ¯ ∗E ))E(vec(∆ ¯ ∗E ))T = rΩ Var(vec(∆∗ )) + E(vec(∆ = rΩ E(vec(∆∗ )vec(∆∗ )T ) + (1 − rΩ )E(vec(∆∗ ))E(vec(∆∗ ))T = rΩ σ 2 Var(vec(D1 )) + rΩ O(σ 3 ) + O(σ 4 ).
(29)
where, in the last step we have used the fact that
=
E(vec(∆∗ )vec(∆∗ )T ) σ 2 Var(vec(D1 )) + σ 3 [E(vec(D1 )(vec(D2 ))T ) + E(vec(D2 )(vec(D1 ))T )]
=
+ σ 4 E(vec(D2 )vec(D2 )T ) σ 2 E(vec(D1 )vec(D1 )T ) + O(σ 3 ).
(30)
Note that, E(vec(D2 )) and Var(vec(D1 )) are given in Proposition 2.2. Asymptotic biases of the logarithm of the smoothers with respect to log D0 , ¯ log D0 ), are defined as expectations of the denoted generically by ABias(log D; ¯∗ ) leading order terms (i.e., the terms that are linear or quadratic in ∆∗ or ∆ E ¯ (see Theorems 4.1, 4.2 and 4.3). Note in the asymptotic expansions of log D that, by the discussion above, the remainder terms in these expansions are of the order O(σ 3 (log(1/σ))3/2 ) with high probability. If D0 = λ1 I3 , by part (a) of Theorems 4.1, 4.2 and 4.3, and equations (26) – (29), we conclude that: Corollary 4.1. If D0 is isotropic, then ¯ E ; log D0 ) = ABias(log D = ¯ LE ; log D0 ) = ABias(log D ¯ Af f ; log D0 ) = ABias(log D
1 ¯ ∗E ) − 1 E((∆ ¯ ∗E )2 ) E(∆ λ1 2λ21 1 ¯ ∗E ) − rΩ · 1 E((∆∗ )2 ) + O(σ 4 ), E(∆ λ1 2λ21 1 ¯ ∗E ) − 1 E((∆∗ )2 ), E(∆ λ1 2λ21 1 ¯ ∗E ) − 1 E((∆∗ )2 ). E(∆ λ1 2λ21
Diffusion tensor smoothing
1927
Also, Lemma 8.1 in the Appendix shows that if the gradient directions are ¯ ∗ ) = σ 2 E(D2 ) is a negative definite matrix. uniform on the sphere, then E(∆ E Since E((∆∗ )2 ) is positive definite and is of order O(σ 2 ), under the assumption that rΩ = o(1), the two geometric smoothers are more biased than the Euclidean smoother when D0 is isotropic and the gradient design is uniform. If D0 is anisotropic, asymptotic expansions for the smoothers can be obtained from Propositions 8.1, 8.2 and 8.3 in the Appendix. Using these and equations (26)–(29), we have the following: Corollary 4.2. If D0 is anisotropic, then ¯ E ; log D0 ) ABias(log D ¯ LE ; log D0 ) ABias(log D ¯ Af f ; log D0 ) ABias(log D
= σ 2 (T1 − rΩ T2 + rΩ T3 ) + O(rΩ σ 3 ) + O(σ 4 ),
= σ 2 (T1 − T2 + T3 ) + O(σ 3 ), = σ 2 (T1 − rΩ T2 + rΩ T3 + (1 − rΩ )T4 ) + O(σ 3 ),
where T1
=
3 X 3 3 X X log λj 1 tr(Pj E(D2 ))Pj − [Pj E(D2 )Pk + Pk E(D2 )Pj ] λ λk − λj j=1 j=1 j k6=j
T2
=
3 X
3 X
j=1 k:k6=j
T3
=
3
1 1X 1 E[tr(Pj D1 )]2 Pj E[tr(Pj D1 Pk D1 )]Pj + λj (λk − λj ) 2 j=1 λ2j
3 X 3 3 X X
j=1 k:k6=j l:l6=j
log λj (λk − λj )(λl − λj )
× E [Pj D1 Pk D1 Pl + Pk D1 Pj D1 Pl + Pk D1 Pl D1 Pj ] − −
3 X 3 X
j=1 k:k6=j
3 X 3 X
j=1 k:k6=j 3
T4
=
−
log λj E [Pj D1 Pj D1 Pk + Pj D1 Pk D1 Pj + Pk D1 Pj D1 Pj ] (λk − λj )2 1 E[tr(Pj D1 )(Pj D1 Pk + Pk D1 Pj )] λj (λk − λj )
3
1 XX 1 E[tr(Pj D1 Pk D1 )]Pj 2 j=1 λj λk
+
k=1
3 3 3 1 XX X
2
j=1 k=1 l:l6=j
log λj E [Pj D1 Pk D1 Pl + Pl D1 Pk D1 Pj ] . λk (λl − λj )
Corollary 4.2 shows that the asymptotic biases of all three estimators are of the order O(σ 2 ). Note that, each summand in the Ti ’s can be expressed as a product of two terms, one is a polynomial of inverses of λj ’s and their differences (and multiplied by log λj ’s in some instances), and the other involves linear functions of Var(vec(D1 )) and E(vec(D2 )). By (7) and (10), the P latter quantities are dependent on D0 essentially only through the matrix ( q∈Q Sq2 xq xTq )−1 ,
1928
O. Carmichael et al.
as long as the number of gradient directions is not too small. Now consider the case where D0 is highly anisotropic, such that its smallest eigenvalue approaches zero while assuming constant diffusivity, i.e., constant det(D0 ). Under this setting, the terms T2 , T3 and T4 diverge faster than T1 due to the presence of additional multipliers of the form 1/λj or 1/(λk − λj ) in T2 , T3 and T4 . This, ¯ LE ; log D0 ) grows faster and the fact that rΩ = o(1), imply that ABias(log D ¯ E ; log D0 ). For comparing the asymptotic biases of log D ¯ Af f than ABias(log D ¯ E , first it can be checked that T4 diverges faster than rΩ T2 . We further and log D suppose that λj−1 − λj ≥ c0 λj−1 for some fixed c0 > 0 for j = 2, 3. This condition ensures that the eigenvalues of D0 do not coalesce as D0 becomes more anisotropic and is needed to show that T4 diverges faster than rΩ T3 . Specifically, as long as rΩ max{| log λ1 |, | log λ3 |} = o(1), then under the above setting, ¯ Af f ; log D0 ) grows faster than ABias(log D ¯ E ; log D0 ). These show ABias(log D that when the true tensor is highly anisotropic, the Euclidean smoother tends to have smaller second order bias than the geometric smoothers. If linear regression estimates are used as input for smoothing, then it is easy to see that the Euclidean smoother has a bias of order O(rΩ σ 2 ) = o(σ 2 ), while the biases in the two geometric smoothers are of the order O(σ 2 ). This is because, ¯ ∗ ) = 0, and hence the terms from Proposition 2.1 we can deduce that E(∆ E ∗ ¯ ) do not contribute to the expansions of the asymptotic biases. involving E(∆ E In summary, when the tensor filed is locally homogeneous, the Euclidean smoother tends to have smaller second order bias than the geometric smoothers. Case 2. We assume that the underlying true tensors D0 (ω)’s are perturbed versions of the target tensor D0 , where the perturbation is additive on the logarithm of the eigenvalues of D0 , but does not alter the eigenvectors. Specifically, let the spectral decomposition of D0 be D0 = GΛGT , where Λ is a diagonal matrix with elements being the ordered eigenvalues of D0 : λ1 ≥ λ2 ≥ λ3 > 0. Then, we assume D0 (ω) = G diag(λ1 eτ Z1 (ω) , λ2 eτ Z2 (ω) , λ3 eτ Z3 (ω) ) GT ,
(31)
where τ > 0 is a scale parameter and the random variables {(Zj (ω))3j=1 : ω ∈ Ω} satisfy the following: Condition M: EΩ (Zj ) = 0 for j = 1, . . . , 3; Zj (ω)’s are uniformly bounded in ω. Note that, under this model, the perturbations are larger along the dominant eigen-directions of D0 . This can be seen as an additive noise structure under the log-Euclidean geometry. Since by (31), D0 (ω)’s commute with each other, the log-Euclidean and affine invariant means of {D0 (ω) : ω ∈ Ω} are the same. Moreover, under condition M, the log-Euclidean mean can be easily seen to be equal to D0 . However, by Jensen’s inequality, EΩ (eZj ) > 1 unless Zj is degenerate at zero, which implies that Euclidean mean is not equal to D0 . Indeed, the difference between the two means is of order O(τ 2 ). Assume that σ = o(τ ), where σ is the standard deviation of the additive noise associated with the DWI data. This means that the noise in DWI data is
Diffusion tensor smoothing
1929
small compared with the degree of spatial variation. As in Case 1, we compare the biases of the three smoothers under the assumption that rΩ := P asymptotic 2 (p(ω)) = o(1) as τ → 0. For simplicity of exposition, we only state the ω∈Ω result when D(ω) is the nonlinear regression estimator and D0 is anisotropic (i.e., its eigenvalues are all distinct). Corollary 4.3. Suppose that D0 (ω) is given by (31) and D(ω) denotes the nonlinear regression estimate. If σ = o(τ ), and rΩ = o(1), as τ → 0, and condition M holds, then if the eigenvalues of D0 are all distinct, ¯ E , log D0 ) = ABias(log D
3 τ2 X EΩ (Zj2 )Pj ) + σ 2 (T1 − rΩ T2 + rΩ T3 ) + O(τ 3 ) ( 2 j=1
¯ LE , log D0 ) = σ 2 (T1 − T2 + T3 ) + O(τ 3 ) ABias(log D ¯ Af f , log D0 ) = σ 2 (T1 − rΩ T2 + rΩ T3 + (1 − rΩ )T4 ) + O(τ 3 ) ABias(log D where the terms Ti , i = 1, . . . , 4, are as in Corollary 4.2.
Corollary 4.3 shows that the asymptotic biases of the geometric smoothers are smaller than that of the Euclidean smoother in terms of the logarithm of the mean. Corollary 4.3 can be proved by using arguments similar to those used in proving Corollary 4.2 but with a different expansion for ∆(ω). The details are given in the Appendix. Remark 4.1. Qualitatively similar statements hold under a more general perturbation model, with the spectral decomposition for D0 (ω) given by D0 (ω) = exp(τ X(ω))G · diag(λ1 exp(τ Z1 (ω)), λ2 exp(τ Z2 (ω)), λ3 exp(τ Z3 (ω))) · GT exp(−τ X(ω))
(32)
where X(ω)’s are skew-symmetric random matrices which are uniformly bounded with respect to ω and are independent of {Zj }3j=1 . Thus, under this model the perturbation is in terms of both eigenvalues and eigenvectors and the logEuclidean mean of {D0 (ω) : ω ∈ Ω} is only approximately the same as D0 . As long as the magnitude of eigenvector perturbations is of a smaller order than that of the eigenvalue perturbations (specifically, EΩ (X) = o(τ ) and EΩ (X 2 ) = o(1), as τ → 0), the geometric smoothers have smaller bias than the Euclidean smoother. Remark 4.2. Propositions 8.2 and 8.3 (in the Appendix) provide clues as for when the affine-invariant smoother is likely to be more efficient than the log-Euclidean smoother. Suppose that the noise structure is such that in the ¯ LE and log D ¯ Af f , the terms that are quadratic in ∆ ¯ E can be expansions of log D ignored. Then, the dominant terms in the asymptotic biases are Tˆ1 − Tˆ2 + Tˆ3 for ¯ LE and Tˆ1 + Tˆ4 for log D ¯ Af f , where Tˆi ’s has the same functional form as log D ∗ ¯ ¯ E and ∆∗ replaced by ∆. Since the terms in ˜ Ti ’s in (59) with ∆E replaced by ∆ ˆ the expression for T3 involve inverses of quadratic terms in the spacings between the eigenvalues, while all the other terms involve inverses of at most linear terms
1930
O. Carmichael et al.
in the spacings, this suggests that the affine-invariant smoother tends to have smaller bias compared to the log-Euclidean smoother provided that λ1 , λ2 and λ3 are of similar magnitude while the spacing (λj−1 − λj ) is of smaller order than λj−1 for at least one j ∈ {2, 3}. The analysis in this subsection suggests that under the small noise setting, the Euclidean smoother may or may not outperform the geometric smoothers in terms of bias, depending on whether raw DWI sensor noise or spatial heterogeneity of the tensor field dominate. 5. Simulation studies In this section, we conduct simulation studies that explore how the differences among the three kernel smoothers extend to scenarios with relatively larger levels of sensor noise. Besides the choice of metrics, one also needs to choose a scheme to assign weights wi (s)’s in (12). The conventional approach is to simply set weights as in (11) where K is a fixed kernel. This is referred to as isotropic smoothing. However, the tensor field often shows various degrees of anisotropy in different regions and the tensors tend to be more homogeneous along the leading anisotropic directions (e.g., along the fiber tracts). Thus it makes sense to set the weights larger if the vector si − s is aligned to the leading diffusion direction at s. Therefore, we propose the following anisotropic weighting scheme: q b −1 (si − s) , b i − s)T D (33) tr(D)(s ωi (s) := Kh
b is the current estimate of the tensor at voxel location s, and Kh (·) := where D K(·/h) is a nonnegative, integrable kernel with h > 0 being the bandwidth. b in (33) is to set the weights scale-free with respect to D. b The use of tr(D) There are other schemes for anisotropic weights. For example, in Tabelow et al. b is replaced by det(D) b in (33), which is supposed to (2008), the term tr(D) capture not only the directionality of the local tensor field, but also the degree of anisotropy. Chung et al. (2003, 2005) also propose kernel smoothing under Euclidean geometry with a different scheme of anisotropic kernel weights. In the following, we conduct two simulation studies. Simulation design I corresponds to Case 1 studied in Section 4.2, where the true tensor field is locally homogeneous and the dominant source of variability in the tensor field comes from the raw DWI data. In the second simulation study, we consider a case with substantial spatial variations in the underlying tensor field which is generated based on a real human brain diffusion MRI data set. Thus this setting may be seen as a realistic generalization of Case 2. For both simulations, we consider different levels of observational noise in the raw DWI data. For performance measure, we use the median (across a region of the tensor field) of the squared affine-invariant distances between true and smoothed tensors at a variety of combinations of preliminary isotropic smoothing bandwidth and
Diffusion tensor smoothing
1931
secondary anisotropic smoothing bandwidth. We choose median distance over mean distance because of its robustness and the fact that the log-Euclidean and affine-invariant metrics occasionally produce extreme estimates in scenarios where the true tensor is near singular and/or the noise level is high. Moreover, using Euclidean distance as error measure leads to qualitatively similar results. 5.1. Simulation I Here we construct a simulated tensor field on a 128 × 128 × 4 three-dimension grid consisting of the background regions with identical isotropic tensors and the banded regions with three parallel vertical bands and three horizonal bands (for each of the four slices), where within each band tensors are identical and aligned in either the x− or the y− direction. The bands are of various widths and degrees of anisotropy (see Figure S-1 and Table S-1 of the Supplementary Materials for details). For a clearer comparison of different smoothers, we examine their performance on four sets of tensors: (i) the “whole set” – the entire set of tensors; (ii) the “crossings,” where pairs of bands intersect or bands intersect with the background; (iii) the “background interior” regions that are within the background and are at least four voxels away from any crossing; and (iv) the “band interior” regions that are within a band and are at least four voxels away from the background. The purpose is to compare smoother performance on homogenous regions where diffusion is isotropic (background interior) and anisotropic (band interior), as well as on heterogenous regions where the diffusion direction and the degree of anisotropy vary within individual neighborhoods (crossings). At each voxel, we simulate the raw DWI data Sˆq ’s using the true tensor at that voxel and the Rician noise model (equations (1) to (3)). Specifically, we set S0 = 1, 000 and use 9 gradient directions each repeated twice, which are normalized versions of the following vectors (1, 0, 1), (1, 1, 0), (0, 1, 1), (3, 2, 1), (0.9, 0.45, 0.2), (1, 0, 0), (0, 1, 0), (0, 0, 1), (2, 1, 1.3). This gradient design is from a real DT-MRI study performed in UC Davis. We consider three different values of the noise parameter σ = 10, 50, 100, which correspond to SNR := S0 /σ = 100, 20 and 10. These are referred to as “low”, “moderate” and “high” noise levels, respectively. Such signal to noise ratios are typical for real DTI studies with SNR= 100 at the higher end and SN R = 10 at the lower end (Farrell et al., 2007; Parker et al., 2000; Tuch et al., 2002). Finally, at each voxel, a nonlinear regression procedure is applied to the DWI data to derive the observed tensors as inputs for the smoothing procedure. Errors in terms of affine-invariant distance between kernel-smoothed and ground-truth tensors are summarized in Figures 1, 2 and 3. At the low noise level (σ = 10), the Euclidean smoother works marginally better than the geometric smoothers on the isotropic homogeneous region – “background interior”, and its advantage is more pronounced on the anisotropic homogeneous region – “bands interior”. This is consistent with the analysis in Section 4.2. Moreover, smooth-
1932
O. Carmichael et al.
Fig 1. Simulation I: σ = 10. Comparison of median errors measured by affine invariant distance over different regions for “observed” tensors (obtained by nonlinear regression) and Euclidean, log-Euclidean and Affine smoothers. In “bandwidth combination”: the first number denotes the isotropic bandwidth and the second number denotes the anisotropic bandwidth.
ing is not beneficial on the heterogenous “bands crossing” regions at low noise level. At higher noise levels (σ = 50, 100), Euclidean smoothing substantially outperforms geometric smoothing in anisotropic regions (“bands interior” and “bands crossing”), and is slightly better for the isotropic region (“background interior”). The two geometric smoothers perform comparably regardless of noise levels and regional heterogeneity, although affine-invariant smoothing is slightly better at higher noise levels. Also, anisotropic smoothing is seen to be beneficial in the anisotropic “bands interior” regions when noise level is low. 5.2. Simulation II Here, we first smoothed four axial slices of a real DTI scan for one human subject from the data set described in Section 6 using affine-invariant smoothing. We then used the resulting tensors as the underlying true tensors and simulated the raw DWI data using the 18 gradient directions as described in Section 5.1 and adding Rician noise (Figure 4). We set the baseline signal strength S0 = 1000 and consider two noise levels, namely “low noise” – σ = 10 and “moderate noise” – σ = 50.
Diffusion tensor smoothing
no smoothing
E
1933
logE
Affine bands crossing
whole set Median error
Sigma = 50
Median error
1.600
1.600
1.215
1.215
0.830
0.830
0.445
0.445
0.060
0.060 (0.5,0)
(0.5,1)
(1,0)
(1,1)
(1,2.5)
(2.5,0)
bandwidth combination
(0.5,0)
(0.5,1)
(1,0)
(1,1)
(1,2.5)
(2.5,0)
bandwidth combination
background interior
bands interior
Median error
Median error
1.600
1.600
1.215
1.215
0.830
0.830
0.445
0.445
0.060
0.060 (0.5,0)
bandwidth combination
(0.5,1)
(1,0)
(1,1)
(1,2.5)
(2.5,0)
(0.5,0)
(0.5,1)
(1,0)
(1,1)
(1,2.5)
(2.5,0)
bandwidth combination
Fig 2. Simulation I: σ = 50. Comparison of median errors measured by affine invariant distance over different regions for “observed” tensors (obtained by nonlinear regression) and Euclidean, log-Euclidean and Affine smoothers. In “bandwidth combination”: the first number denotes the isotropic bandwidth and the second number denotes the anisotropic bandwidth.
Errors in terms of affine-invariant distance between kernel-smoothed and ground-truth tensors are shown in Figure 5 (for σ = 10) and Figure 6 (for σ = 50). As can be seen from these figures, for each isotropic bandwidth smaller than 0.9 for σ = 10 and smaller than 1.3 for σ = 50, there is an optimal anisotropic smoothing bandwidth, and bandwidths larger and smaller than the optimal one suffer from over- and under-smoothing, respectively. For σ = 10, the geometric smoothers outperform the Euclidean smoother, especially at larger bandwidths. In contrast, when σ = 50, the Euclidean smoother has smaller error than the geometric smoothers. This is consistent with the findings in Section 4, that is, when spatial heterogeneity is dominant over the sensor noise in DWI data, the geometric smoothers may be more advantageous, while the Euclidean smoothers are more advantageous when sensor noise is dominant. We also observe that at the low noise level, the affine-invariant smoother generally performs better than the log-Euclidean smoother. It is conjectured that a presence of the features described in Remark 4.2 in Section 4.2 in a significant portion of the tensor field may be partly responsible for the observed phenomenon.
1934
O. Carmichael et al.
no smoothing
E
logE
Affine
whole set
bands crossing
Median error
Sigma = 100
Median error
3.400
3.400
2.575
2.575
1.750
1.750
0.925
0.925
0.100
0.100 (0.5,0)
(0.5,1)
(1,0)
(1,1)
(1,2.5)
(2.5,0)
bandwidth combination
(0.5,0)
(0.5,1)
(1,0)
(1,1)
(1,2.5)
(2.5,0)
bandwidth combination
background interior
bands interior
Median error
Median error
3.400
3.400
2.575
2.575
1.750
1.750
0.925
0.925
0.100
0.100 (0.5,0)
bandwidth combination
(0.5,1)
(1,0)
(1,1)
(1,2.5)
(2.5,0)
(0.5,0)
(0.5,1)
(1,0)
(1,1)
(1,2.5)
(2.5,0)
bandwidth combination
Fig 3. Simulation I: σ = 100. Comparison of median errors measured by affine invariant distance over different regions for “observed” tensors (obtained by nonlinear regression) and Euclidean, log-Euclidean and Affine smoothers. In “bandwidth combination”: the first number denotes the isotropic bandwidth and the second number denotes the anisotropic bandwidth.
In summary, the results in Sections 4 and 5 show that the choice of the best kernel smoother depends on the sensor noise level in the raw DWI data and the degree and nature of spatial variation of the underlying tensor field. When the noise from the raw DWI data dominates, the Euclidean smoother is less biased and tends to perform better. On the other hand, if spatial variation of the tensor field is dominant, geometric smoothers may perform better. Moreover, the simulation results also show that anisotropic smoothing is often beneficial in anisotropic and heterogeneous regions. 6. Application to DT-MRI scans of human brain A third evaluation of the kernel smoothers was performed on a set of 33 real DTI scans of elderly individuals who volunteered for research at the UC Davis Alzheimer’s Disease Center. The purpose of the experiments was to evaluate the degree to which the smoothers enhanced or diminished the biological plau-
Diffusion tensor smoothing
1935
Fig 4. Simulation II: Upper left panel: raw tensors in slice 10; Upper right panel: smoothed tensors used as ground truth in the simulation in the rectangle region shown in the upper left panel; Lower left panel: noisy tensors after adding Rician noise (σ = 10); Lower right panel: smoothed tensors by Affine smoother
no smoothing
E
logE
Affine
Median error 0.1100
isotropic smoothing bandwidth 0.6
isotropic smoothing bandwidth 0.66
isotropic smoothing bandwidth 0.7
isotropic smoothing bandwidth 0.9
0.1043
0.0985
0.0928
0.0870 0
0.4
0.6
0.7
0.8
0.85
0
0.4
0.6
0.7
0.8
0.85
0
0.4
0.6
0.7
0.8
0.85
0
0.4
0.6
0.7
0.8
bandwidth for anisotropic smoothing
Fig 5. Simulation II: σ = 10. Comparison of median errors measured by affine-invariant distance across the whole tensor field for “observed” tensors (obtained by nonlinear regression) and Euclidean, log-Euclidean and Affine smoothers.
1936
O. Carmichael et al.
Fig 6. Simulation II: σ = 50. Comparison of median errors measured by affine-invariant distance across the whole tensor field for “observed” tensors (obtained by nonlinear regression) and Euclidean, log-Euclidean and Affine smoothers.
sibility of white matter integrity and inter-regional connectivity measures calculated from the diffusion tensors. We assessed the plausibility of the DTI-based measures in terms of a few bedrock neuroanatomical principles: (a) in the cerbrospinal fluid (CSF), water is free to diffuse in any direction, and therefore fractional anisotropy (FA, defined in equation (34) below), which measures the degree to which diffusion tensor suggests an anisotropic water diffusion distribution, should be near zero in the CSF; (b) the corpus callosum is a highlyorganized white matter tract, and therefore its FA should be relatively high; (c) white matter tracts do not travel through CSF compartments, and therefore the number of fibers traced by DTI tractography that intersect CSF spaces should be low; and (d) the parieto-temporo-occipital subregion of the corpus callosum connects the left and right parietal, temporal, and occipital lobes to each other, and therefore the number of tractography fibers that connect those lobar regions via the parieto-temporo-occipital subregion of the corpus callosum should be relatively high. Noise in DTI acquisition may cause the collected DTI data to violate these basic principles. Our goal is to evaluate the degree to which the removal of noise via kernel smoothing reduces the instances of such violations. 6.1. Data Imaging was performed at the UC Davis Imaging Research Center on a 1.5T GE Signa Horizon LX Echospeed system. Subjects were scanned in a supine position with an 8-channel head coil placed surrounding the head. After placement of the head coil, subjects were inserted into the MRI scanner magnetic field and two MRI sequences were acquired: a three-dimensional T1-weighted coronal spoiled gradient-recalled echo acquisition (T1) used for parcellating the brain into tissue classes and regions of interest; and a single-shot spin-echo echo planar imaging DTI sequence used for estimating diffusion tensors.
Diffusion tensor smoothing
1937
The T1-weighted sequence was an axial-oblique 3D Fast Spoiled Gradient Recalled Echo sequence with the following parameters: TE: 2.9 ms (min), TR: 9 ms (min), Flip angle: 15 deg, Slice thickness: 1.5 mm, Number of Slices: 128, FOV: 25 cm x 25 cm, Matrix: 256 x 256. Data from the T1 sequence gave an indication of the tissue type at each location in the brain: white matter appeared brightest, cerebrospinal fluid (CSF) appeared darkest, and gray matter appeared as an intermediate gray. This contrast between tissue types enabled an expert rater to manually trace the corpus callosum (a white matter structure), along with a subdivision of the corpus callosum into four sub-regions, using established protocols on a population-averaged brain called the T1-weighted Minimum Deformation Template (MDT) (Kochunov et al., 2001). The subdivision partitioned the corpus callosum into four zones that carry interhemispheric axonal connections within prefrontal cortex, premotor and supplementary motor cortex, sensory-motor cortex, and parieto-temporo-occipital cortex without post-central gyrus respectively. The MDT represents the brain of a prototypical elderly individual whose brain anatomy has been warped to represent the approximate average anatomy of a large group of cognitively-healthy elderly individuals. In addition, an established method was used to segment the MDT into gray, white, and CSF tissue compartments (Rajapakse et al., 1996). The MDT was nonlinearly warped to the skull-stripped T1-weighted scan of each individual (Rueckert et al., 1999), thus allowing the delineation of the corpus callosum and its sub-regions to be transferred to the brain of each individual in the study. Relevant DTI acquisition parameters include: TE: 94 ms, TR: 8000 ms, Flip angle: 90 degrees, Slice thickness: 5 mm, slice spacing: 0.0 mm, FOV: 22 cm × 22 cm, Matrix: 128 × 128, B-value: 1000 s/mm2 . Each acquisition included collection of 2 B0 images and 4 diffusion-weighted images acquired along each of 6 gradient directions. The directions vectors were: (1, 0, 1), (1, 1, 0), (0, 1, 1), (−1, 0, 1), (−1, 1, 0), (0, −1, 1). Geometric distortions were removed from the diffusion-weighted images (De Crespigny and Moseley, 1998), and diffusion tensors were estimated using a linear least squares estimator (Basser and Pierpaoli, 2005). The eigenvalues of the diffusion tensors were then estimated, and FA was calculated at each voxel as follows: s 1 (λ1 − λ2 )2 + (λ1 − λ3 )2 + (λ2 − λ3 )2 , (34) FA = √ λ21 + λ22 + λ23 2 where λj ’s are the eigenvalues of the tensor. The T1-weighted scan was affinely aligned to the image of FA values derived from the corresponding DTI data. This allowed the CSF, corpus callosum, and corpus callosum subdivision labels to be transferred from the space of the MDT image, to the space of the subject T1-weighted scan, and then to the space of the corresponding DTI data. Label maps in the DTI space were eroded by one voxel to remove spurious labels generated by partial volume effects.
1938
O. Carmichael et al.
6.2. Experimental design For each of the 33 DTI scans, we calculated FA at all voxels labeled as CSF, prior to smoothing. FA is a commonly used measure of the degree of anisotropy of diffusion: if diffusion is isotropic, then FA = 0; if diffusion is highly anisotropic, then FA is near one. Then, for each smoother, we calculated FA at CSF voxels after smoothing. We used the same two-stage isotropic-anisotropic smoothing procedure described in the simulation study (Section 5), with isotropic bandwidth of 0.8 and anisotropic bandwidth of 1.8. Analogous results with a smaller anisotropic bandwidth of 1.2 are very similar. For each scan, we summarized smoothing-induced shifts in the distribution of CSF FA values by calculating the difference in the median CSF FA values before and after smoothing (Results are similar for smoothing-induced differences in the 75th and 90th percentiles of CSF FA values). Greater reductions in median CSF FA caused by smoothing would indicate greater smoothing performance in terms of encouraging biological plausibility. Similarly, for each scan and smoother, we calculated the difference in median corpus callosum FA before and after smoothing; greater reductions in callosal FA caused by smoothing are considered detrimental, since relatively higher FA there is more plausible. We then used the MedINRIA software package (Toussaint et al., 2007), and its implementation of the TensorLines algorithm (Weinstein et al., 1999), to trace white matter tract fibers throughout the brain based on the raw diffusion tensors and tensors that were smoothed using each of the three smoothers. All tract fibers were stored as trajectories of 3D points in the DTI space. For each individual, in-house software was used to select only those fibers that intersect selected regions of interest defined by our anatomical labels. We first isolated the tract fibers that intersected voxels labeled as CSF, and counted the number of such CSF-intersecting tract fibers before and after smoothing. Greater reduction of such spurious tract fibers is an indicator of higher performance of the smoother. We then isolated tract fibers that intersected both the parieto-temporo-occipital portion of the corpus callosum, and the white matter of the occipital lobe; greater addition of such plausible fibers is an indicator of higher performance. However, we guarded against the possibility that smoothers increased the number of such plausible fibers simply by increasing the total number of fibers, both plausible and implausible, that connect the parietotemporo-occipital corpus callosum to any and all parts of the brain– to do this, we calculated the number of implausible fibers connecting the parieto-temporooccipital corpus callosum to the prefrontal cortex before and after smoothing, to be sure that this number did not increase with smoothing. 6.3. Results A boxplot of smoothing-induced reductions in median CSF FA, across all 33 individuals for the three smoothers is shown in the upper left panel of Figure 7. As in the simulation study, all three smoothers performed similarly in this highly
Diffusion tensor smoothing
1939
Median Per-Individual Reduction in Callosal FA
0.025
0.00
0.035
0.05
0.045
0.10
Median Per-Individual Reduction in CSF FA
Euclidean
logEuclidean
Affine
Euclidean
logEuclidean
Affine
6000
8000
10000
12000
Number of CSF-Intersecting Fibers Removed
Euclidean
logEuclidean
Affine
Fig 7. Application. Upper left panel: Boxplots of the reduction in median CSF FA before and after smoothing for the three smoothers across 33 scans. Upper right panel: Boxplots of the reduction in median callosal FA before and after smoothing for the three smoothers. Lower left panel: Boxplots of reduction in the number of spurious tract fibers intersecting the CSF before and after smoothing for the three smoothers.
isotropic region. However, both geometric smoothers succeeded in maintaining diffusion anisotropy in the corpus callosum (CC) much better than the Euclidean smoother (Figure 7, upper right), with Euclidean smoothing erroneously reducing the high FA inherent in this structure by approximately 0.05 in terms of median FA. Analogous histograms for differences in other per-individual FA summary measures are similar. This result reinforces the findings of the simulation study: the geometric smoothers may be more effective at maintaining the structure of real-world, highly-organized tensor fields while removing low levels of noise. A boxplot of smoothing-induced reductions in number of CSF-intersecting fibers is shown in the lower left panel of Figure 7. The three methods perform similarly, although the Euclidean smoother may be superior for removing such spurious fibers. The left panel of Figure 8 shows how many additional fibers between occipital cortex and occipital corpus callosum were traced by tractography after smoothing compared to before. While the performance of the three smoothers is similar, the number of fibers added by the geometric smoothers is slightly higher, suggesting greater performance in encouraging such biologi-
1940
O. Carmichael et al.
Number of Occipital CC-Prefrontal Cortex Fibers Removed
0
0
10
100
20
30
40
300
50
60
500
70
Number of Occipital CC-Occipital Cortex Fibers Added
Euclidean
logEuclidean
Affine
Euclidean
logEuclidean
Affine
Fig 8. Application. Left panel: Boxplots of the addition in the number of plausible tract fibers intersecting the occipital corpus callosum and occipital lobe before and after smoothing for the three smoothers across 33 scans. Right panel: Boxplots of the differences in the number of implausible tract fibers intersecting the occipital corpus callosum and prefrontal lobe before and after smoothing for the three smoothers.
cally plausible fibers. Meanwhile, the three methods are nearly identical in their ability to prevent the number of spurious fibers intersecting occipital corpus callosum and prefrontal cortex from increasing (right panel of Figure 8). 7. Discussion Based on both theoretical and numerical analysis, the key finding of this study is that the performance of diffusion tensor smoothers depends heavily on the characteristics of both the noise from the MRI sensor and the geometric structure of the underlying tensor field. The asymptotic expansion of the linear and nonlinear regression estimators in Section 2 quantifies the advantage of the latter, which suggests that the nonlinear estimator should be incorporated in standard software packages for DTI analysis. The perturbation analysis in Sections 2 and 4 shows that under the small noise regime, if the tensors in the smoothing neighborhood is homogeneous then the Euclidean smoother tends to have a smaller bias than the geometric smoothers. On the other hand, if the local heterogeneity of the tensor field is the dominant component of variation of the tensors, and the noiseless tensors follow a certain geometric regularity, then the geometric smoothers may lead to smaller bias. In addition, the simulation studies show that if the sensor noise from DWI data is the dominant source of variation of the tensors, the Euclidean smoother outperforms the geometric smoothers as the latter tend to fit the spurious structure induced by the noise under such a case. Together, these findings suggest that DTI users may need to revisit the conventional wisdom that geometric smoothers are generally superior to Euclidean smoothers. In fact, optimal DTI smoothing may need to be spatially adaptive, applying varying DTI metrics depending on local geometric structures and levels of sensor noise. The simulation results also point to the benefits of anisotropic smoothing in the highly anisotropic regions.
Diffusion tensor smoothing
1941
These results are mostly in agreement with the qualitative features of our findings for the real DT-MRI data (Section 6), even though there the metrics for performance and comparison are different and are based on the biological plausibility of the subsequent FA maps and tractography results. As in the simulations, the geometric smoothers appear to perform better in highly structured, anisotropic regions such as the corpus callosum and the fiber tract pathways in the occipital lobe; meanwhile, all smoothers perform similarly in fairly isotropic regions, such as the CSF. In addition, the two geometric smoothers perform similarly in all aspects. The Euclidean smoother does an exceptional job at removing spurious tract fibers that travel through both unstructured regions (CSF) and structured regions. An explanation for this, suggested by the Euclidean smoother’s reduction of callosal FA and occipital fiber counts, is that it discourages highly structured tensor regions globally, including those that accidentally occur in the CSF. The geometric smoothers, meanwhile, may tend to preserve such structure. In this paper, we have not addressed the issue of choosing smoothing parameters as the primary goal here is to demonstrate the comparative performance of different smoothers. In practice, one may use an AIC type criterion or a generalized cross validation criterion, as proposed in (Yuan et al., 2012). However, when the tensor field is inhomogeneous, any “global” choice of the bandwidth may not be very effective. How to choose the bandwidth adaptively, while accounting for spatial inhomogeneity is a future direction of study. The simulation results also demonstrate some degrees of boundary effect. For example, at the boundary of a band of anisotropic tensors and the background of isotropic tensors, the performance of tensor smoothers is different from that in the interior of the bands. Such boundary effect may be mitigated by a careful choice of the anisotropic neighborhoods (for example, using a multi-stage scheme as in Tabelow et al. (2008)), or by employing higher order polynomial approximation of the tensor field as in Yuan et al. (2012). In this paper, we did not explicitly specify a probabilistic model for the tensor distribution. Rather, it is implicitly determined by the noise model of DWI data and the tensor estimation procedure at each voxel. In Section 4.2, under “Case 2”, we considered a specific local probabilistic structure while comparing the bias characteristics of geometric smoothers with that of the Euclidean smoother. This structure is based on a spectral decomposition of the tensor and with independent random variations in the eigenvalues and eigenvectors. Possible extensions of such a probabilistic framework is a future direction of research. 8. Appendix 8.1. Proofs of Proposition 2.1 and Proposition 2.2 Let the phase vector for the DWI signal corresponding to gradient direction q be denoted by uq . Then uq is a unit vector in R2 . Let vq be a unit vector in R2 such that vqT uq = 0 so that uq uTq + vq vqT = I2 .
1942
O. Carmichael et al.
Next, for an arbitrary tensor D (written in the vectorized form), an arbitrary gradient direction q and an arbitrary w ∈ R2 , T
fq (w, D) :=k S0 e−xq D uq + w k .
(35)
Then, fq (0, D0 ) = Sq and fq (σεq , D0 ) = Sˆq . We denote the partial derivatives with respect to the first and second arguments by ∇ij fb etc, where 1 ≤ i, j ≤ 2. Then, ∇1 fq (w, D0 ) =
1 1 Sq u q = u q . (Sq uq + w) =⇒ ∇1 fq (0, D0 ) = 0 fq (w, D ) Sq (36)
Thus, using (36), ∇11 fq (w, D0 )
1 1 (Sq uq + w)(Sq uq + w)T I2 − fq (w, D0 ) (fq (w, D0 ))3 1 1 (I2 − uq uTq ) = vq vqT . (37) =⇒ ∇11 fq (0, D0 ) = Sq Sq =
We also have, ∇2 fq (0, D0 ) = −S0 exp(−xTq D0 )xq = −Sq xq
(38)
∇22 fq (0, D0 ) = S0 exp(−xTq D0 )xq xTq = Sq xq xTq .
(39)
and
Proof of Proposition 2.1 b LS = D0 + σD1,LS + σ 2 D2,LS + O(σ 3 |Q|−1 P We show that, as σ → 0, D q∈Q k εq k3 ) where the random vectors D1,LS and D2,LS are given by D1,LS = −(
X
xq xTq )−1 (
q∈Q
and
X 1 (uT εq )xq ), Sq q
X 1 1 X D2,LS = − ( ((v T εq )2 − (uTq εq )2 )xq ). xq xTq )−1 ( 2 Sq2 q q∈Q
(40)
q∈Q
(41)
q∈Q
Proposition 2.1 then follows from (40) and (41) by taking appropriate expectations and using the independence of εq ’s. We use the representation X X b LS = −( D xq xTq )−1 ( (log Sˆq − log S0 )xq ) q∈Q
= −(
X
q∈Q
q∈Q
xq xTq )−1
X
(log fq (σεq , D0 ) − log fq (0, D0 ) − xTq D0 )xq
q∈Q
Diffusion tensor smoothing
1943
0 T (∇ f (0, D )) ε 1 q q xq = D0 − ( xq xTq )−1 σ fq (0, D0 ) q∈Q q∈Q # " X (εTq ∇1 fq (0, D0 ))2 εTq ∇11 fq (0, D0 )εq 1 X T −1 2 xq − xq xq ) σ − ( 2 fq (0, D0 ) (fq (0, D0 ))2 q∈Q q∈Q X + O(σ 3 |Q|−1 k εq k3 ).
X
X
q∈Q
By invoking (36) and (37), we obtain (40) and (41). Proof of Proposition 2.2 b LS , there is no explicit expression for D b N L . Instead, it satisfies the Unlike D normal equation: X b N L ))S0 exp(−xTq D b N L )xq = 0. (Sˆq − S0 exp(−xTq D (42) q∈Q
b N L = D0 + σD1,N L + σ 2 D2,N L + O(σ 3 |Q|−1 We prove that, as σ → 0, D εq k3 ), where X X D1,N L = −( Sq2 xq xTq )−1 ( Sq (uTq εq )xq ), q∈Q
P
q∈Q
k
(43)
q∈Q
and D2,N L
=
(
X
q∈Q
Sq2 xq xTq )−1
X
Sq2 (xTq D1,N L )2 xq
q∈Q
X X 1 1 + (Sq xTq D1,N L + uTq εq )2 xq − k εq k2 xq . (44) 2 2 q∈Q
q∈Q
b N L , it can be shown using standard arguFrom (42) and the definition of D 0 −1 b N L − D k= O(σ|Q| P ments that k D q∈Q k εq k) as σ → 0. Thus, expanding the LHS of (42) in Taylor series, we have X σ2 T b N L − D0 ) σεTq ∇1 fq (0, D0 ) + ε ∇11 fq (0, D0 )εq − (∇2 fq (0, D0 ))T (D 2 q q∈Q 1 b 0 T 0 b 0 − (D − D ) ∇ f (0, D )( D − D ) NL 22 q NL 2 b N L − D0 ) · ∇2 fq (0, D0 ) + ∇22 fq (0, D0 )(D X = O(σ 3 |Q|−2 ( k εq k)3 ). q∈Q
1944
O. Carmichael et al.
P b N L as D b N L = D0 + σD1,N L + σ 2 D2,N L + O(σ 3 |Q|−1 We can express D q∈Q k 3 εq k ), where D1,N L and D2,N L involve terms that are only linear and only quadratic in εq ’s, respectively. Substituting this in the above expression and equating terms with multiplier σ, we have X X D1,N L = ( ∇2 fq (0, D0 )(∇2 fq (0, D0 ))T )−1 ( εTq ∇1 fq (0, D0 )∇2 fq (0, D0 )) q∈Q
q∈Q
which equals (43) by virtue of (38). Also, collecting terms with multiplier σ 2 , X D2,N L = ( ∇2 fq (0, D0 )(∇2 fq (0, D0 ))T )−1 · q∈Q
X T 0 0 1 εTq ∇11 fq (0, D0 )εq − D1,N L ∇22 fq (0, D )D1,N L ∇2 fq (0, D ) 2 q∈B X + (εTq ∇1 fq (0, D0 ))∇22 fq (0, D0 )D1,N L q∈Q
−
X
0 0 T (D1,N L ∇2 fq (0, D ))∇22 fq (0, D )D1,N L .
q∈Q
Now, using (36)-(39) and (43), we can simplify the expression for D2,N L as X 3 X 2 T 1 X T D2,N L = ( Sq2 xq xTq )−1 Sq (xq D1,N L )2 xq − (vq εq )2 xq 2 2 q∈Q q∈Q q∈Q X + Sq (uTq εq )(xTq D1,N L )xq q∈Q
which can be rearranged to get (44) by using the identity vq vqT + uq uTq = I2 . bNL Bias in D
The following shows that when the tensor D0 is isotropic, the second order bias b N L is negative definite and does not depend on D0 . in D
Lemma 8.1. If D0 is isotropic, the gradient directions {q : q ∈ Q} are uniformly distributed on S2 , then under the framework of Proposition 2.2, E(D2,N L ) = ˜ 2,N L is the second order σ 2 N where N is a 3 × 3 negative definite matrix and D b bias in DN L written as a 3 × 3 symmetric matrix.
˜ 2,N L so that D ˜ 2,N L (i, i) = Proof. First note that D2,N L is the vectorization of D ˜ ˜ D2,N L (i) for i = 1, . . . , 3 and for D2,N L (1, 2) = D2,N L (4), D2,N L (1, 3) = ˜ 2,N L (2, 3) = D2,N L (6). D2,N L (5) and D
Diffusion tensor smoothing
1945
Suppose that D0 = λ1 I3 and the design is uniform. Without loss of generality, −λ1 for all q ∈ Q. Also, uniform design implies that the let S0 = 1. Thus, P Sq = e terms 1 − xTq ( q′ ∈Q xq′ xTq′ )−1 xq are the same for all q ∈ Q. In order to prove P this, we use the following facts about the uniform design: (i) |Q|−1 q∈Q qj4 = P P 1/5 for j = 1, 2, 3; (ii) |Q|−1 q∈Q qj qk = 0 for j 6= k; (iii) |Q|−1 q∈Q qj2 qk2 = P P 1/15 for j 6= k (iv) |Q|−1 q∈Q qj3 qk = 0 for j 6= k; and (v) |Q|−1 q∈Q qj2 qk ql = 0 for j 6= k 6= l. (Note that this imposes aPrestriction on the minimum value of |Q|.) Then it can be checked that xTq ( q′ ∈Q xq′ xTq′ )−1 xq = 6/|Q|. Thus, from Proposition 2.2, we obtain that E(D2,N L ) is a positive multiple of (with multiplicative factor (1 − 6/|Q|)) X X X 1 X 2 1 − ( xq xTq )−1 ( Sq xq xTq )−1 ( xq ). xq ) = − e2λ1 ( 2 2 q∈Q
q∈Q
q∈Q
(45)
q∈Q
Let yq = (q12 , q22 , q32 )T and zq = (2q1 q2 , 2q1 q3 , 2q2 q3 ). Define, X X X L11 = |Q|−1 yq yqT , L12 = |Q|−1 yq zqT , and L22 = |Q|−1 zq zqT . q∈Q
q∈Q
Then (
X
xq xTq )
q∈Q
where L21 =
LT12 .
q∈Q
L = |Q| 11 L21
L12 L22
If the design is uniform, then L12 = O (zero matrix) and X |Q| 13 . xq = 3 03 q∈Q
Using these, we have (
X
q∈Q
xq xTq )−1 (
X
q∈Q
xq ) =
1 L11 3 L21
L12 L22
−1
1 L−1 13 11 13 . = 03 03 3
Note that, the diagonal entries of L11 are all equal and off-diagonal entries are also equal among themselves, due to the uniformity of the design. Thus, 13 is −1 an eigenvector of L11 and hence also of L−1 11 . Consequently, L11 13 is a positive −1 multiple of 13 since L11 is positive definite. Now we conclude the proof of Lemma 8.1 by invoking (45). 8.2. Proofs of Theorem 4.1, Theorem 4.2 and Theorem 4.3 In the case that D0 is anisotropic, define Hj to be the matrix Hj :=
X k6=j
Note that Hj Pj = Pj Hj = 0 for all j.
1 Pk . λk − λj
(46)
1946
O. Carmichael et al.
Proposition 8.1. Suppose that the tensors {D(ω) : ω ∈ Ω} and the probability distribution PΩ satisfy (18) and t → 0. If the eigenvalues of D0 are all distinct, then ¯ E − log D0 log D N N N X X X 1 1 1 ¯ E )Pj − 1 ¯ E )Pj − ¯ E Hj ∆ ¯ 2 = tr(Pj ∆ tr(Pj ∆ 2 [tr(Pj ∆E )] Pj λ λ 2 λ j j j=1 j=1 j=1 j − +
N X j=1
N X j=1
¯ E Pj ¯ E Hj + Hj ∆ log λj Pj ∆
¯ E Pj ¯ E Hj ∆ ¯ E H j + Hj ∆ ¯ E Pj ∆ ¯ E Hj + H j ∆ ¯ E Hj ∆ log λj Pj ∆
2¯ ¯ ¯ E H 2∆ ¯ ¯ E H 2 − Pj ∆ ¯ E Pj ∆ −Pj ∆ j E Pj − Hj ∆E Pj ∆E Pj j
−
N X 1 ¯ E Pj )] + O(t3 ). ¯ E Hj + Hj ∆ ¯ E )(Pj ∆ [tr(Pj ∆ λ j=1 j
(47)
Taking trace on both sides of (47), and using the fact that Pj Hj = 0 and Pj2 = Pj , we get (20) when the eigenvalues of D0 are all distinct. Proposition 8.2. Assume that the conditions of Theorem 4.1 hold. If the eigenvalues of D0 are all distinct, then ¯ LE − log D0 log D N N N X X 1 1 1X 1 2 ¯ E )Pj − tr(Pj ∆ EΩ [tr(Pj ∆Hj ∆)]Pj − = 2 EΩ [tr(Pj ∆)] Pj λ λ 2 λ j j j j=1 j=1 j=1 − +
N X j=1
N X
¯ E Pj ¯ E H j + Hj ∆ log λj Pj ∆ log λj EΩ [Pj ∆Hj ∆Hj + Hj ∆Pj ∆Hj + Hj ∆Hj ∆Pj
j=1
−Pj ∆Pj ∆Hj2 − Pj ∆Hj2 ∆Pj − Hj2 ∆Pj ∆Pj
−
N X 1 EΩ [tr(Pj ∆)(Pj ∆Hj + Hj ∆Pj )] + O(t3 ). λ j j=1
(48)
Taking trace on both sides of (48), we get (22) when the eigenvalues of D0 are all distinct. Proposition 8.3. Assume that the conditions of Theorem 4.1 hold. (a) ¯ E (D0 )−1 ∆ ¯ E + O(t3 ). (49) ¯ Af f − D0 = ∆ ¯ E − 1 EΩ ∆(D0 )−1 ∆ + 1 ∆ D 2 2
Diffusion tensor smoothing
1947
(b) If the eigenvalues of D0 are all distinct, then ¯ Af f − log D0 log D
=
N N N X X X 1 1 1 ¯ E ) 2 Pj ¯ E )Pj − 1 ¯ E )Pj − ¯ E Hj ∆ tr(Pj ∆ tr(Pj ∆ tr(Pj ∆ 2 λ λ 2 j=1 λj j=1 j j=1 j N
− −
N
1X 1 1X 1 ¯ E (D0 )−1 ∆ ¯ E )Pj EΩ [tr(Pj ∆(D0 )−1 ∆)]Pj + tr(Pj ∆ 2 j=1 λj 2 j=1 λj
N X j=1
¯ E Pj ¯ E H j + Hj ∆ log λj Pj ∆
N
+
1X log λj EΩ Pj ∆(D0 )−1 ∆Hj + Hj ∆(D0 )−1 ∆Pj 2 j=1 N
1X ¯ E (D0 )−1 ∆ ¯ E Pj ¯ E (D0 )−1 ∆ ¯ E Hj + Hj ∆ log λj Pj ∆ − 2 j=1 +
N X j=1
¯ E Pj ¯ E Hj ∆ ¯ E Hj + Hj ∆ ¯ E Pj ∆ ¯ E H j + Hj ∆ ¯ E Hj ∆ log λj Pj ∆
¯ E Pj ¯ E Hj2 ∆ ¯ E Pj − Hj2 ∆ ¯ E Pj ∆ ¯ E Hj2 − Pj ∆ ¯ E Pj ∆ −Pj ∆
−
N X 1 ¯ E Pj )] + O(t3 ). ¯ E H j + Hj ∆ ¯ E )(Pj ∆ [tr(Pj ∆ λ j j=1
(50)
Taking trace of both sides of (50), we get (24) when the eigenvalues of D0 are all distinct. Proof of Proposition 8.2 and Theorem 4.2 First suppose that eigenvalues of D0 are all distinct. Let µj (ω) denote the j-th largest eigenvalue of D(ω) and Qj (ω) denote the corresponding eigen-projection. Then under (18), for t sufficiently small, with probability 1, µj (ω) is of multiplicity 1, and so Qj (ω) is a rank 1 matrix. We can then use the following matrix perturbation analysis results (Kato, 1980) µj (ω) = λj + tr(Pj ∆(ω)) − tr(Pj ∆(ω)Hj ∆(ω)) + O(k ∆(ω) k3 ),
(51)
where Qj (ω) =
Pj − (Pj ∆(ω)Hj + Hj ∆(ω)Pj )
+ Pj ∆(ω)Hj ∆(ω)Hj + Hj ∆(ω)Pj ∆(ω)Hj + Hj ∆(ω)Hj ∆(ω)Pj − Pj ∆(ω)Pj ∆(ω)Hj2 − Pj ∆(ω)Hj2 ∆(ω)Pj − Hj2 ∆(ω)Pj ∆(ω)Pj + O(k ∆(ω) k3 ).
(52)
1948
O. Carmichael et al.
We consider an asymptotic expansion of log D(ω) around log D0 as t → 0. By definition, log D(ω) − log D0 =
N X j=1
=
N X j=1
+
(log µj (ω)Qj (ω) − log λj Pj ) (log µj (ω) − log λj )Pj +
N X j=1
N X j=1
log λj (Qj (ω) − Pj )
(log µj (ω) − log λj )(Qj (ω) − Pj ).
(53)
Now, from (51) and (18), we have log µj (ω) − log λj µj (ω) − λj = log 1 + λj 1 1 = [tr(Pj ∆(ω)) − tr(Pj ∆(ω)Hj ∆(ω))] − 2 [tr(Pj ∆(ω))]2 + O(k ∆(ω) k3 ), λj 2λj P∞ where we have used the series expansion log(1 + x) = n=1 (−1)n−1 xn /n. Substituting in (53), using (52), and finally taking expectation with respect to PΩ ¯ LE = EΩ (log D), we obtain (48). This concludes the and noticing that log D proof of Proposition 8.2. Taking trace on both sides of (48) and using the fact that Pj Hj = Hj Pj = O (zero matrix), we have part (b) of Theorem 4.2. Now, for part (a) of Theorem 4.2, we have D0 = λ1 I. Thus, from the expansion 1 log D(ω) = (log λ1 )I + log I + ∆(ω)) λ1 1 1 = log D0 + ∆(ω) − 2 (∆(ω))2 + O(k ∆(ω) k3 ) λ1 2λ1 we obtain (21) by taking expectation with respect to PΩ . Proof of Proposition 8.1 and Theorem 4.1 The proof is almost identical to the proof of Proposition 8.2 and Theorem 4.2, ¯ E in every step. the only difference being that we replace ∆(ω) by ∆ Proof of Proposition 8.3 and Theorem 4.3 ¯ Af f . But it satisfies the barycentric There is no closed form expression for D equation (Arsigny et al., 2005) h i ¯ −1/2 DD ¯ −1/2 = O. (54) EΩ log D Af f Af f
Diffusion tensor smoothing
1949
Define, ¯ −1/2 D(ω)D ¯ −1/2 ), L(ω) = log(D Af f Af f
ω ∈ Ω.
Note that, (54) implies that EΩ (L) = O. First, we show that k L k∞ := sup k L(ω) k= O(t).
(55)
ω∈Ω
This implies that ¯E D
1/2
1/2
L¯ ¯ = EΩ (D) = EΩ (D Af f e DAf f ) ¯ 1/2 EΩ (L2 )D ¯ 1/2 + O(t3 ), ¯ Af f + D ¯ 1/2 EΩ (L)D ¯ 1/2 + 1 D = D Af f Af f Af f 2 Af f
from which, after invoking the fact that EΩ (L) = O, we get ¯ Af f = D ¯E − 1D ¯ 1/2 + O(t3 ). ¯ 1/2 EΩ (L2 )D D E 2 E
(56)
˜ ¯E. Now we express EΩ (L2 ) in terms of an expectation involving ∆(ω) = D(ω)− D In order to do this, observe that ˜ ∆(ω) = = =
¯E = D ¯ 1/2 eL(ω) D ¯ 1/2 − D ¯ Af f + D ¯ Af f − D ¯E D(ω) − D Af f Af f ¯ 1/2 ((L(ω))2 − EΩ (L2 ))D ¯ 1/2 + O(t3 ) ¯ 1/2 L(ω)D ¯ 1/2 + 1 D D Af f Af f Af f 2 Af f ¯ 1/2 + 1 D ¯ 1/2 + O(t3 ), ¯ 1/2 L(ω)D ¯ 1/2 ((L(ω))2 − EΩ (L2 ))D D E E E 2 E
where, in the second and third steps we have used (55) and (56). Therefore, again using (55), −1/2
¯ EΩ (L2 ) = D E
−1/2
˜D ¯ −1 ∆) ˜ D ¯ EΩ (∆ E E
+ O(t3 ),
(57)
which, together with (56), leads to the representation ¯ Af f D
= =
¯ E − 1 EΩ (∆ ˜D ¯ −1 ∆) ˜ + O(t3 ) D E 2 ¯ E (D0 )−1 ∆ ¯ E + O(t3 ), (58) ¯ E − 1 EΩ (∆(D0 )−1 ∆) + 1 ∆ D0 + ∆ 2 2
˜ ¯ E and EΩ (∆) = O where, in the last step we use the fact that ∆(ω) = ∆(ω) − ∆ ¯ and k ∆E k= O(t). For part (a) of Theorem 4.3, since D0 = λ1 I, using the Taylor series expansion of log(I + K), where K is the sum of the terms after D0 on the RHS of (58) and is a symmetric matrix, (23) immediately follows. For the proof of Proposition 8.3, using similar perturbation analysis as in the ¯ Af f − D0 proof of Proposition 8.2, with ∆(ω) replaced by the expression for D obtained from (58), we obtain (50). Part (b) of Theorem 4.3 now follows by taking trace.
1950
O. Carmichael et al.
Proof of (55) ˜ k∞ = O(t), it easily follows that EΩ (d2 (D ¯ E , D)) = From the fact that k ∆ Af f 2 2 2 ¯ Af f it follows that EΩ (d ¯ O(t ). Then, by definition of D Af f (DAf f , D)) = O(t ) ¯ Af f , D)) = O(t). Now, writing which implies EΩ (dAf f (D ¯ −1/2 + D ¯ −1/2 ∆(ω) ˜ ¯ −1/2 ) ¯ −1/2 D(ω)D ¯ −1/2 ) = log(D ¯ −1/2 D ¯ ED D log(D Af f Af f Af f Af f Af f Af f and using the Baker-Campbell-Hausdorff formula (Varadarajan, 1984) (which gives an expansion of log A−log B where A and B are positive definite matrices), ˜ together with the fact that supω∈Ω k ∆(ω) k= O(t), we conclude that ¯ Af f , D ¯ E ) = O(t) so that k D ¯ Af f − D ¯ E k= O(t). dAf f (D From this it is easy to deduce (55). 8.3. Proofs of Corollary 4.2 and Corollary 4.3 Proof of Corollary 4.2 It can be seen from Propositions 8.1, 8.2 and 8.3 and equations (26) and (27) that ¯ E ; log D0 ) = ABias(log D ¯ LE ; log D0 ) = ABias(log D ¯ Af f ; log D0 ) = ABias(log D
T1 − rΩ T2 + rΩ T3 + O(σ 4 ),
T1 − T2 + T3 ,
T1 − rΩ T2 + rΩ T3 + (1 − rΩ )T4 + O(σ 4 ),
where T˜1
=
3 3 X 3 X X 1 log λj ¯ ∗E ))Pj − ¯ ∗E )Pk + Pk E(∆ ¯ ∗E )Pj tr(Pj E(∆ Pj E(∆ λ λ − λ k j j=1 j j=1 k6=j
T˜2
=
3 X 3 X j=1 k6=j
T˜3
=
3
1 1X 1 E[tr(Pj ∆∗ )]2 Pj E[tr(Pj ∆∗ Pk ∆∗ )]Pj + λj (λk − λj ) 2 j=1 λ2j
3 3 3 X X X
log λj (λk − λj )(λl − λj )
j=1 k:k6=j l:l6=j × E [Pj ∆∗ Pk ∆∗ Pl
− −
3 X 3 X
j=1 k:k6=j
3 X 3 X
j=1 k:k6=j
+ Pk ∆∗ Pj ∆∗ Pl + Pk ∆∗ Pl ∆∗ Pj ]
log λj E [Pj ∆∗ Pj ∆∗ Pk + Pj ∆∗ Pk ∆∗ Pj + Pk ∆∗ Pj ∆∗ Pj ] (λk −λj )2 1 E[tr(Pj ∆∗ )(Pj ∆∗ Pk + Pk ∆∗ Pj )] λj (λk − λj )
Diffusion tensor smoothing 3
T˜4
=
−
1951
3
1 XX 1 E[tr(Pj ∆∗ Pk ∆∗ )]Pj 2 j=1 λj λk k=1
+
1 2
3 X 3 X
3 X
j=1 k=1 l:l6=j
log λj E [Pj ∆∗ Pk ∆∗ Pl + Pl ∆∗ Pk ∆∗ Pj ] . λk (λl − λj )
(59)
Now, the result is obtained by applying (28), (29) and (30).
Proof of Corollary 4.3 We use the following analog of (26): ∆(ω) = σD1 (ω) + στ DZ (ω) + σ 2 D2 (ω) 3 X τ2 2 ˜ ∆ (ω) λj τ Zj (ω) + (Zj (ω)) Pj + R + 2 j=1
(60)
where D1 (ω) and D2 (ω) are exactly as in (26), and vec(DZ (ω)) 3 X X λj Zj (ω)qT Pj q)(uTq εq (ω))xq Sq ( = A−1 j=1
q∈Q
− A−1
X
q∈Q
N X X Sq uTq εq (ω) , (61) λj Zj (ω)qT Pj q)xq xTq A−1 Sq2 ( j=1
q∈Q
P ˜ ∆ (ω) = O(τ 3 (log(1/σ))3/2 ) with high probabilwith A = q∈Q Sq2 xq xTq , and R ity. We only prove (60), from which Corollary 4.3 follows by calculations similar to those used in proving Corollary 4.2. To see why (60) holds, first define ∆0 (ω) = D0 (ω) − D0 . Then we can write ∆0 (ω) =
3 X j=1
(τ Zj (ω) +
τ2 2 Z (ω))Pj + O(τ 3 ). 2 j
Let us denote the first order term in the expansion of the nonlinear regression estimator, when D0 (ω) is the true tensor, by D1 (D0 (ω); ω), and the corresponding term, when D0 is the true tensor, by D1 (D0 ; ω) (= D1 (ω)). Similarly, we define D2 (D0 (ω); ω) and D2 (D0 ; ω) (= D2 (ω)). Our strategy is to first find the expansion of the terms Dj (D0 (ω); ω) around Dj (D0 ; ω) and then use Proposition 2.2 to deal with the remainder terms.
1952
O. Carmichael et al.
Define Sq (ω) = exp(−xTq D0 (ω)) (where, by convention, D0 (ω) means vec(D0 (ω))). Then, from (43) we have
vec(D1 (D0 (ω); ω)) = −
X
−1
(Sq (ω))2 xq xTq
q∈Q
X
q∈Q
Sq (ω)(uTq εq (ω))xq .
(62)
We observe that, for each ω ∈ Ω, 1 T 2 3 T Sq (ω) = Sq − Sq q ∆0 (ω)q − (q ∆0 (ω)) + O(k ∆0 (ω) k ) 2 T 2 2 2 (Sq (ω)) = Sq − 2Sq q ∆0 (ω)q − (qT ∆0 (ω))2 + O(k ∆0 (ω) k3 ) so that,
1 X (Sq (ω))2 xq xTq = A − B(ω), |Q| q∈Q
where A = |Q| B(ω) =
P −1
q∈Q
Sq2 xq xTq and
2 X 2 T 2 X 2 T Sq (q ∆0 (ω)q)xq xTq − Sq (q ∆0 (ω)q)2 xq xTq |Q| |Q| q∈Q
q∈Q
3
+ O(k ∆0 (ω) k );
and
1 X Sq (ω)(uTq εq (ω))xq = W (ω) − C(ω), |Q| q∈Q
where W (ω) =
1 X Sq (uTq εq (ω))xq |Q| q∈Q
and C(ω)
=
1 X Sq (qT ∆0 (ω)q)(uTq εq (ω))xq |Q| q∈Q
−
1 X Sq (qT ∆0 (ω)q)2 (uTq εq (ω))xq + O(k ∆0 (ω) k3 ). 2|Q| q∈Q
Substituting these in (62) and simplifying, vec(D1 (D0 (ω); ω))
= −(A − B(ω))−1 (W (ω) − C(ω))
= −A−1 W (ω) + A−1 C(ω) − A−1 B(ω)A−1 W (ω) +O(k A−1 k2 k B(ω) kk C(ω) k)
+ O(k A−1 k3 k B(ω) k2 (k W (ω) k + k C(ω) k)).
Diffusion tensor smoothing
1953
Now observe that A−1 W (ω) = vec(D1 (D0 ; ω)) and A−1 C(ω) − A−1 B(ω)A−1 W (ω) = vec(DZ (ω)) + O(τ 2 (|Q|−1
X
q∈Q
k εq (ω) k2 )1/2 )
where vec(DZ (ω)) is defined in (61). Moreover, k B(ω) k
k C(ω) k
k W (ω) k
= O(k ∆0 (ω) kk A k)
= O(k ∆0 (ω) kk A k1/2 (|Q|−1
= O(k A k
1/2
(|Q|
−1
X
q∈Q
X
q∈Q
k εq (ω) k2 )1/2 )
k εq (ω) k2 )1/2 ).
Combining these, together with a first order expansion of D2 (D0 (ω); ω) around D2 (D0 ; ω) and recalling the definition of ∆(ω), we have ∆(ω) = =
σD1 (D0 (ω); ω) + σ 2 D2 (D0 (ω); ω) + ∆0 (ω) + R1 (ω) σ(D1 (D0 ; ω) + τ DZ (ω) + σ 2 D2 (D0 ; ω) + R2 (ω) 3 2 X τ + (τ Zj (ω) + Zj2 (ω))Pj + R3 (ω) + R1 (ω), 2 j=1
where R1 (ω) = O(σ 3 (log(1/σ))3/2 ) and R2 (ω) = O(στ 2 (log(1/σ))3/2 ) with high probability, and R3 (ω) = O(τ 3 ). Collecting terms, we obtain (60). Supplementary Material Supplement to “Diffusion tensor smoothing through weighted Karcher means” (doi: 10.1214/00-EJS825SUPP; .pdf). References Absil, P.-A., Mahony, R., and Sepulchre, R. (2008). Optimization Algorithms on Matrix Manifolds. Princeton University Press. MR2364186 Arsigny, V., Fillard, P., Pennec, X., and Ayache, N. (2005). Fast and simple computations on tensors with log-euclidean metrics. Technical report, Institut National de Recherche en Informatique et en Automatique. Arsigny, V., Fillard, P., Pennec, X., and Ayache, N. (2006). Logeuclidean metrics for fast and simple calculus on diffusion tensors. Magnetic Resonance in Medicine, 56:411–421. Bammer, R., Holdsworth, S. J., Veldhuis, W. B., and Skare, S. (2009). New methods in diffusion-weighted and diffusion tensor imaging. Magnetic Resonance Imaging Clinics of North America, 17:175–204.
1954
O. Carmichael et al.
Basser, P. and Pajevic, S. (2000). Statistical artifacts in diffusion tensor mri (dt-mri) caused by background noise. Magnetic Resonance in Medicine, 44:41–50. Basser, P. J. and Pierpaoli, C. (2005). A simplified method to measure the diffusion tensor from seven mr images. Magnetic Resonance in Medicine, 39(6):928–934. Beaulieu, C. (2002). The basis of anisotropic water diffusion in the nervous system - a technical review. NMR in Biomedicine, 15:435–455. Carmichael, O., Chen, J., Paul, D., and Peng, J. Supplement to “Diffusion tensor smoothing through weighted Karcher means”. DOI: 10.1214/00-EJS825SUPP. ˜o Moraga, C. A., Lenglet, C., Deriche, R., and RuizCastan Alzola, J. (2007). A Riemannian approach to anisotropic filtering of tensor fields. Signal Processing, 87:263–276. Chanraud, S., Zahr, N., Sullivan, E. V., and Pfefferbaum, A. (2010). MR diffusion tensor imaging: a window into white matter integrity of the working brain. Neuropsychology Review, 20:209–225. Chung, M. K., Lazar, M., Alexender, A. L., Lu, Y., and Davidson, R. (2003). Probabilistic connectivity measure in diffusion tensor imaging via anisotropic kernel smoothing. Technical report, University of Wisconsin, Madison. Technical Report. Chung, M. K., Lee, J. E., and Alexander, A. L. (2005). Anisotropic kernel smoothing in diffusion tensor imaging: theoretical framework. Technical report, University of Wisconsin, Madison. Technical Report. De Crespigny, A. and Moseley, M. (1998). Eddy current induced image warping in diffusion weighted epi. In Proceedings of the 6th Meeting of the International Society for Magnetic Resonance in Medicine. Berkeley, Calif, ISMRM. Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman & Hall/CRC. MR1383587 Farrell, J. A. D., Landman, B. A., Jones, C. K., Smith, S. A., Prince, J. L., van Zijl, P. C. M., and Mori, S. (2007). Effects of signal-tonoise ratio on the accuracy and reproducibility of diffusion tensor imagingderived fractional anisotropy, mean diffusivity, and principal eigenvector measurements at 1.5T. Journal of Magnetic Resonance Imaging, 256:756–767. Ferreira, R., Xavier, J., Costeira, J. P., and Barroso, V. (2006). Newton method for riemannian centroid computation in naturally reductive homogeneous spaces. In Proceedings of ICASSP 2006 - IEEE International Conference on Acoustics, Speech and Signal Processing, volume 3, pages 77– 85. Fletcher, P. T. and Joshi, S. (2004). Principal geodesic analysis on symmetric spaces: statistics of diffusion tensors. In Computer Vision and Mathematical Methods in Medical and Biomedical Image Analysis, pages 87–98. Springer. Fletcher, P. T. and Joshi, S. (2007). Riemannian geometry for the statistical analysis of diffusion tensor data. Signal Processing, 87:250–262.
Diffusion tensor smoothing
1955
¨ rstner, W. and Moonen, B. (1999). Qua vadis geodesia...? In Krumm, F. Fo and Schwarze, V. S., editors, Festschrift for Erik W. Grafarend, pages 113– 128. Gudbjartsson, H. and Patz, S. (1995). The Rician distribution of noisy MRI data. Magnetic Resonance in Medicine, 34:910–914. Hahn, K. R., Prigarin, S., Heim, S., and Hasan, K. (2006). Random noise in diffusion tensor imaging, its destructive impact and some corrections. In Weickert, J. and Hagen, H., editors, Visualization and Processing of Tensor Fields, pages 107–117. Springer. MR2210513 Hahn, K. R., Prigarin, S., Rodenacker, K., and Hasan, K. (2009). Denoising for diffusion tensor imaging with low signal to noise ratios: method and monte carlo validation. International Journal for Biomathematics and Biostatistics, 1:83–81. Karcher, H. (1977). Riemannian center of mass and mollifier smoothing. Communications in Pure and Applied Mathematics, 30:509–541. MR0442975 Kato, T. (1980). Perturbation Theory of Linear Operators. Springer-Verlag. Kochunov, P., Lancaster, J. L., Thompson, P., Woods, R., Mazziotta, J., Hardies, J., and Fox, P. (2001). Regional spatial normalization: toward an optimal target. Journal of Computer Assisted Tomography, 25:805–816. Mori, S. (2007). Introduction to Diffusion Tensor Imaging. Elsevier. Mukherjee, P., Berman, J. I., Chung, S. W., Hess, C., and Henry, R. (2008a). Diffusion tensor MR imaging and fiber tractography: theoretic underpinnings. American Journal of Neuroradiology, 29:632–641. Mukherjee, P., Chung, S. W., Berman, J. I., Hess, C. P., and Henry, R. G. (2008b). Diffusion tensor MR imaging and fiber tractography: technical considerations. American Journal of Neuroradiology, 29:843–852. Parker, G. J. M., Schnabel, J. A., Symms, M. R., Werring, D. J., and Barker, G. J. (2000). Nonlinear smoothing for reduction of systematic and random errors in diffusion tensor imaging. Journal of Magnetic Resonance Imaging, 11:702–710. Pennec, X., Fillard, P., and Ayache, N. (2006). A riemannian framework for tensor computing. Journal of Computer Vision, 66:41–66. Polzehl, J. and Tabelow, K. (2008). Structural adaptive smoothing in diffusion tensor imaging: the R package dti. Technical report, Weierstrass Institute, Berlin. Technical Report. Rajapakse, J. C., Giedd, J. N., DeCarli, C., Snell, J. W., McLaughlinand, A., Vauss, Y. C., Krain, A. L., Hamburger, S., and Rapoport, J. L. (1996). A technique for single-channel MR brain tissue segmentation: application to a pediatric sample. Magnetic Resonance in Medicine, 14:1053– 1065. Rueckert, D., Sonoda, L. I., Hayes, C., Hill, D. L., Leach, M. O., and Hawkes, D. J. (1999). Nonrigid registration using free-form deformations: application to breast MR images. IEEE Transactions on Medical Imaging, 18:712–721.
1956
O. Carmichael et al.
Schwartzman, A. (2006). Random Ellipsoids and False Discovery Rates: Statistics for Diffusion Tensor Imaging Data. PhD thesis, Stanford University. MR2708811 Tabelow, K., Polzehl, J., Spokoiny, V., and Voss, H. U. (2008). Diffusion tensor imaging: structural adaptive smoothing. Neuroimage, 39:1763– 1773. Toussaint, N., Souplet, J. C., and Fillard, P. (2007). MedINRIA: medical image navigation and research tool by INRIA. In Proceedings of the MICCAI’07 Workshop on Interaction in medical image analysis and visualization, Brisbane, Australia. Tuch, D. S., Reese, T. G., Wiegell, M. R., Makris, N., Belliveau, J. W., and Wedeen, V. J. (2002). High angular resolution diffusion imaging reveals intravoxel white matter fiber heterogeneity. Magnetic Resonance in Medicine, 48:577–582. Varadarajan, V. S. (1984). Lie groups, Lie algebras, and their representations. Springer. MR0746308 Viswanath, V., Fletcher, E., Singh, B., Smith, N., Paul, D., Peng, J., Chen, J., and Carmichael, O. T. (2012). Impact of DTI smoothing on the study of brain aging. In Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2012). Weinstein, D. M., Kindlmann, G. L., and Lundberg, E. C. (1999). Tensorlines: advection-diffusion based propagation through diffusion tensor fields. In Proceedings of the IEEE Visualization, pages 249–253. Yuan, Y., Zhu, H., Lin, W., and Marron, J. (2012). Local polynomial regression for symmetric positive definite matrices. Journal of the Royal Statistical Society: Series B (Statistical Methodology). MR2965956 Zhu, H. T., Li, Y., Ibrahim, I. G., Shi, X., An, H., Chen, Y., Gao, W., Lin, W., Rowe, D. B., and Peterson, B. S. (2009). Regression models for identifying noise sources in magnetic resonance images. Journal of the American Statistical Association, 104:623–637. MR2751443 Zhu, H. T., Zhang, H. P., Ibrahim, J. G., and Peterson, B. (2007). Statistical analysis of diffusion tensors in diffusion-weighted magnetic resonance image data. Journal of the American Statistical Association, 102:1081–1110. MR2412533