use them to derive a point estimator (maximum a posteriori. (MAP)) as well as a covariance estimator (as a reliability measure) which simultaneously reconstruct ...
Riemannian Bayesian Estimation of Diffusion Tensor Images Kai Krajsek, Marion I. Menzel, Hanno Scharr ICG-3, Forschungszentrum J¨ulich, Germany {k.krajsek,m.i.menzel,h.scharr}@fz-juelich.de
Abstract Diffusion tensor magnetic resonance imaging (DT-MRI) is a non-invasive imaging technique allowing to estimate the molecular self-diffusion tensors of water within surrounding tissue. Due to the low signal-to-noise ratio of magnetic resonance images, reconstructed tensor images usually require some sort of regularization in a postprocessing step. Previous approaches are either suboptimal with respect to the reconstructing or regularization step. This paper presents a Bayesian approach for simultaneous reconstructing and regularization of DT-MR images that allows to resolve the disadvantages of previous approaches. To this end, estimation theoretical concepts are generalized to tensor valued images that are considered as Riemannian manifolds. Doing so allows us to derive a maximum a posterior estimator of the tensor image that considers both the statistical characteristics of the Rician noise occurring in MR images as well as the nonlinear structure of tensor valued images. Experiments on synthetic data as well as real DT-MRI data validate the advantage of considering both statistical as well as geometrical characteristics of DT-MRI.
1. Introduction Diffusion tensor magnetic resonance imaging DT-MRI delivers a tensor image, i.e. each pixel/voxel contains a second order tensor presented by a 3 × 3 symmetric positive definite matrix. A physical model, the Stejskal-Tanner (ST) equation, allows to reconstruct the diffusion tensor from the observed nuclear magnetic resonance (NMR) data, the so called diffusion weighted (DW) images. Current methods for estimating diffusion tensor images focus either on the reconstruction step, on the regularization step, do not consider actual statical properties of DW data or do not consider the nonlinear structure of DTI data. In this paper Bayesian estimation theoretical concepts are adapted in order to estimate diffusion tensor from MRI measurements in a way respecting the Riemannian geometry of the diffusion tensors as well as the statistical properties of the measurements. To this end, we generalize estimation theoretical
concepts, originally developed for linear spaces, to Riemannian manifolds accounting for the nonlinear structure of diffusion tensor images. In recent years several image analysis concepts original derived for linear spaces, like regularization, nonlinear diffusion or local M-smoothing, have been generalized to 𝒫(𝑛), the space of positive definite tensors. In particular classical statistical concepts have been generalized within the Riemannian framework of 𝒫(𝑛). However, to the best of our knowledge Bayesian estimation on manifolds have been treated on a rather general and abstract level [12]. In particular no Bayesian estimation concepts have ever been generalized and applied to the Riemannian space of positive definite tensors. We develop these concepts and use them to derive a point estimator (maximum a posteriori (MAP)) as well as a covariance estimator (as a reliability measure) which simultaneously reconstruct as well as regularize the diffusion tensor image. We validate our approach on both synthetic as well as real data.
1.1. Related work As our approach combines reconstruction with regularization it is related to each method that has been proposed in this field since the seminal work of Bihan et al.[2]. The maybe most common way to reconstruct diffusion tensors is based on a linearized version of the Stejskal-Tanner equation combined with a least squares estimator (see e.g.[3]). This approach assumes independent Gaussian distributed noise in the log-ratio of DW images and reference image. For high signal to noise ratios the Gaussian assumption has been shown to be satisfied. However the independence assumption does not hold as the reference image is the same for several DW images. This inevitably introduces correlations between log-ratios of different measurements resulting in a bias. Moreover, the linear estimator does not account for the nonlinear structure of the space of diffusion tensors. Therefore it may lead to a non-positive definite diffusion tensor estimate. A simultaneous reconstruction and regularization approach assuring positive definiteness has been proposed in [23] but no Rician noise model nor the Riemannian geometry of MRI has been considered. There are reconstruc2327
2009 IEEE 12th International Conference on Computer Vision (ICCV) 978-1-4244-4419-9/09/$25.00 ©2009 IEEE
tion methods considering Rician noise and/or positive definiteness. Landman et al.[13] proposed a maximum likelihood estimator considering the Rician noise occurring in DW images. Parameters of a diffusion tensor model are estimated to assure positive definiteness. From a differential geometric point of view such treatment is extrinsic as it embeds the space of positive definite tensors 𝒫(𝑛) in the space of symmetric tensors. A more suitable intrinsic approach has been proposed in [14]. Intrinsic means it considers 𝒫(𝑛) as a Riemannian manifold in order to assure positive definiteness. Unfortunately no Rician noise has been assumed leading to a biased estimate as shown in this paper. Both approaches, [13] and [14], do not incorporate any denoising or regularization strategies of either the reconstructed diffusion tensor images or the measured DW images. Combination of reconstruction and regularization have been proposed within the computational efficient log-Euclidean metric [1]. However, tensor reconstruction within this metric still cannot exploit its full power as computationally expensive matrix operations need to be applied as the Stejskal-Tanner equation relates the DW images with the diffusion tensors and not with its logarithm. In [10] local coordinates within the affine invariant metric have been exploited for simultaneous reconstruction and regularization of diffusion tensor images. Unfortunately, only the Riemannian geometry for a regularization term but for the data term can be assigned such that positive definiteness is not guarantied. Both approaches [10] and [1] do not consider the Rician noise model. Several approaches have been proposed for denoising DTI in a post-processing step. Regularization techniques for DTI can be classified into two groups: using extrinsic [22, 24, 4] or intrinsic view [16, 8, 15, 7, 14, 18]. All these approaches make either no assumption about the noise in the DTI or assume Gaussian distributed noise. However, the assumption of Gaussian noise in DTI is satisfied for linear reconstruction methods only, due to the central limit theorem. It does not hold for nonlinear methods (e.g. [13, 14]). Extrinsic approaches need to ensure positiveness of regularized diffusion tensors. Although then tensors stay positive definite the use of a flat metric is not appropriate. For instance applying regularization using the flat Euclidian metric, the processed tensors become deformed [18]. This is known as eigenvalue swelling effect [22, 6, 18, 5]. Intrinsic approaches consider DTIs as Riemannian symmetric spaces (see [11]) equipped with an invariant metric on the tangent bundle. Using this metric avoids the eigenvalue swelling effect. For instance, in [7, 18] a ’Riemannian framework for tensor computing’, has been proposed. It generalizes several well established image processing approaches to tensor valued images in an intrinsic way, including interpolation, restoration and nonlinear diffusion filtering. In principle extrinsic denoising and regularization ap-
proaches are compatible with the extrinsic reconstruction approach of Landman et al., i.e. they can be combined to a simultaneous reconstruction and regularization method. However, extrinsic regularization methods suffer from the eigenvalue swelling effect as they do not consider the Riemannian geometry of MRI. On the other hand intrinsic denoising and regularization approaches are compatible with the intrinsic reconstruction approach proposed in [14]. However this reconstruction scheme leads to a bias as it does not consider the Rician distributed noise in the DW images. Therefore to the best of the authors knowledge an intrinsic denoising and regularization scheme for the simultaneous reconstruction of DTI from DW images considering both statistical as well as geometric properties of DW images and DTIs has not been presented so far.
2. Diffusion Tensor Imaging 2.1. Physical Background The most commonly used underlying model for estimating the diffusion tensor is given by the Stejskal-Tanner (ST) equation [20]. It relates the diffusion tensor Σ with the measured signal amplitudes 𝑆𝑗 ,the reference signal 𝑆0 measured without any diffusion sensitizing gradient, and the so called ’b-value’ 𝑏𝑗 , which contains a few material constants and experimental parameters, such as the diffusion encoding duration and diffusion encoding gradient strength as well as the unit vector g𝑗 indicating the direction of the diffusion encoding and thus the Tensor elements measured 𝑆𝑗 = 𝑆0 exp(−𝑏𝑗 g𝑗𝑇 Σg𝑗 ) .
(1)
The ’b-values’ of a set of diffusion experiments including the diffusion encoding directions are usually determined by the experimental design and (including its uncertainties) known/set before any measuring has been made. Six or more non-collinear measured signals per image pixel can thus be used to estimate the diffusion tensor by minimizing a cost functional of the residua of the corresponding ST-equations. For simplicity, in order to turn the nonlinear estimation problem into a linear one, usually the logarithm of the ST are considered combined with a quadratic cost functional. Nonetheless, all these methods implicit or explicit assume an additive noise model on the original or transformed signal which does not reflect the usual statistical characteristics of real MRI data.
2.2. Noise in NMR images NMR images are measured in the Fourier domain. The observed complex valued NMR images are well described by Gaussian distributions with the same standard deviation in the real as well as imaginary image amplitude. The 2328
Fourier coefficient images are combined in a nonlinear way such that the resulting DW images 𝑆𝑖 follow a Rician distribution ) ( ) ( 𝑆𝑖 𝜇 𝑆𝑖2 + 𝜇2 𝑆𝑖 𝐼0 , (2) 𝑝(𝑆𝑖 ∣𝜇, 𝜎) = 2 exp − 𝜎 2𝜎 2 𝜎2 where 𝜇 denotes the noise free image amplitude, 𝐼0 the zeroth ordered modified Bessel function of first kind. The 𝜎 denote the standard deviation in the real as well as imaginary part of the measured Fourier coefficient images.For high signal-to-noise ratio (SNR) the Rician distribution is quite well approximated by a Gaussian distribution. For low SNR the distribution is highly skewed and distinguished clearly from a Gaussian distribution. In particular, in both cases the observed DW image is not related to true underlying noise free signal 𝜇 by an additive noise term.
3. Mathematical Structure of DTI 3.1. Riemanian Manifolds A manifold ℳ is an abstract mathematical space that locally looks like the Euclidean space. A local chart (𝜗, 𝑈 ) is a one to one map 𝜗 : 𝑈 → ℝ𝑛 of an open set 𝑈 ∈ ℳ to an open set of the Euclidean space. The piecewise one to one mapping to the Euclidean space allows to generalize concepts developed for the Euclidean space onto manifolds. At each point 𝑝 ∈ ℳ a vector space, the tangent space 𝑇𝑝 ℳ, can be attached that contains all directions one can pass thought 𝑝. More precisely, let 𝛾(𝑡) : ℝ → ℳ denote a curve on the manifold going through 𝛾(0) = 𝑝 ∈ ℳ and 𝛾𝜗 (𝑡) = 𝜗 ∘ 𝛾(𝑡) its representation in a local chart (𝜗, 𝑈 ). → is then given by A representation of a tangent vector − 𝑝𝑣 𝜗 − → the instantaneous speed 𝑝𝑣 𝜗 := ∂𝑡 𝛾𝜗 (𝑡)∣𝑡=0 of the curve and the speed vectors of all possible curves constitute the tangent space at 𝑝. A Riemannian manifold owns additional structure that allows to define distances between different points on the manifold. Each tangent space is equipped with a scalar product →− → =− →𝑇 G𝜗 − → ⟨− 𝑝𝑥∣ 𝑝𝑦⟩ 𝑝𝑥 𝑝 𝜗 𝑝 𝑝𝑦 𝜗 ,
3.2. DTI as Riemannian manifolds An intrinsic treatment requires the formulation of 𝒫(𝑛) as a Riemannian manifold, i.e. at each position Σ ∈ 𝒫(𝑛) a tangent space 𝑇Σ ℳ is attached that is equipped with an inner product ⟨Λ1 , Λ2 ⟩Σ , Λ1 , Λ2 ∈ 𝑇Σ ℳ that smoothly varies from point to point1 . Assuming the inner product to be invariant under any affine transformation W ∗ Σ := W𝑇 ΣW, i.e. the scalar product at any point can be expressed by the standard matrix scalar product at the identity ⟨Λ1 , Λ2 ⟩Σ
(4)
with the vector ∇⊥ 𝑓 = (∂1 𝑓, ..., ∂𝑛 𝑓 )𝑇 built from partial derivatives ∂𝑖 𝑓 . The scalar product induces the norm → − → on the manifold. The shortest path → = √⟨− ∣∣− 𝑝𝑥∣∣ 𝑝𝑥, 𝑝𝑥⟩ 𝑝 between two points is denoted as geodesic. An important
= =
1
1
⟨Σ− 2 ∗ Λ1 , Σ− 2 ∗ Λ2 ⟩I (5) ) ( 1 − 2 𝑇 −1 − 12 . (6) trace Σ Λ1 Σ Λ2 Σ
The tangent vectors in 𝒫(𝑛) are symmetric 𝑛 × 𝑛 matrices, but not necessarily positive definite. They transform in the same way as the elements of the manifold. The affine invariance assumption allows to derive analytical expressions of the exponential and logarithmic map by relating any geodesics to the corresponding tangent spaces along the path. A geodesic ΓΛ (𝑡) parameterized by ’time’ 𝑡 and passing through the tensor ΓΛ (0) = Σ at time 𝑡 = 0 is uniquely defined by its tangent vector Λ at Σ 1
1
1
1
ΓΣ (𝑡) = Σ 2 exp(𝑡Σ− 2 ΛΣ− 2 )Σ 2 .
(3)
defined by the Riemannian metric G𝑝 : 𝑇𝑝 ℳ × 𝑇𝑝 ℳ → ℝ (with its matrix representation G𝜗𝑝 in the local chart) that smoothly varies from point to point on the manifold. The gradient of a function in a local coordinate system is expressed by the inverse metric tensor ∇𝑓 = (G𝜗𝑝 )−1 ∇⊥ 𝑓 ,
tool for working on manifolds is the Riemannian logarithmic and exponential map. The exponential map exp𝑝 : 𝑇𝑝 ℳ → ℳ is a mapping between the tangent space 𝑇𝑝 ℳ → and the corresponding manifold ℳ. It maps the tangent − 𝑝𝑥 − → vector to the element of the manifold exp𝑝 (𝑝𝑥) = 𝑥 that is reached by the geodesic at time step one 𝑥 = 𝛾(1). The space of positive definite tensors is equipped with additional structure, namely, it is a so called homogenous space with non-negative curvature with the consequence that the exponential map is one to one. In particular there exists an in→ The logarithmic verse, the logarithmic map log𝑝 (𝑥) = − 𝑝𝑥. map preserves distances, i.e. the distance between the point 𝑥 mapped on the tangent space and the point 𝑝 to which the tangent space is attached to, is equal to the length of the → tangent vector − 𝑝𝑥.
(7)
For 𝑡 = 1 this map is denoted as the exponential map which is one to one in case of the space of positive definite tensors. Its inverse, denoted as the logarithmic map, reads 1
Λ = Σ2
) 1 ( 1 1 log Σ− 2 ΓΛ (1)Σ− 2 Σ 2 .
(8)
Based on the geodesic equation and the logarithmic map a gradient descent algorithm for functions defined on the 1 If
not otherwise stated we use the standard matrix elements as global coordinates such that the corresponding chart becomes the identity. In favor of uncluttered notation we make no difference between manifold elements and their matrix representation.
2329
manifold, the so called geodesic marching scheme, can be designed. Gradient ∇𝑓 is the product of inverse metric tensor times the partial derivatives (see Equations 3 and 4). Reformulating the affine invariant scalar product as ) ( ⟨Λ1 , Λ2 ⟩Σ = trace Λ𝑇1 Σ−1 Λ2 Σ−1 (9) =
⟨Λ1 , Σ−1 Λ2 Σ−1 ⟩I ,
(10)
the MAP estimation. For nonlinear manifolds, the vpf does not allow to define the mean or other moments by means of an integral over random primitives, e.g. because addition is not defined on nonlinear spaces. Therefore the mean is defined by the minimum (if it exists) of the variance denoted as the Frechet mean or as Karcher mean(s) if local minima are considered. In case of Riemannian manifolds, one can map (via the exponential map) vpfs on corresponding tangent spaces, a linear space which allows the definition of statistical moments like the covariance by an integral over the probability measure ∫ →− →𝑇 𝑝 (z)√∣G (z)∣𝑑z , C= − 𝑥𝑧 𝑥𝑧 (13) x x
reveals that, in matrix notation, the product of the metric tensor times a tangent vector Λ reads Σ−1 Λ𝑇 Σ−1 . The gradient ∇𝑓 thus reads ∇𝑓 = Σ (∇⊥ 𝑓 ) Σ where (∇⊥ 𝑓 )𝑖𝑗 = ∂Σ𝑖𝑗 𝑓 denotes the partial derivative with respect to the 𝑖, 𝑗 th matrix element. As the gradient is an element of the tangent space and indicates the direction of steepest ascent, we can find the argument leading to a lower value of 𝑓 by going a sufficiently small step 𝛿𝑡 in the negative direction of the gradient −𝛿𝑡∇𝑓 and mapping this point back on the manifold using the geodesic equation (7). The next gradient is then computed with respect to the new point which in turn can then be used for finding the next tensor etc.
where we have defined 𝑝x (z) = 𝑝(expx (z)) and Gx (z) = G(expx (z)) and x denotes the Frechet mean of the vpf. Thus we can work on the tangent space as if we were in a linear space having in mind that the tangent space depends on the point where it is attached to the manifold and that only radial distances are preserved by the exponential map.
3.3. Probabilities on Riemannian manifolds
3.4. Estimation on Riemannian manifolds
A random vector describes a map from the probability space Ω to the state space, i.e. the Euclidean space or a subset of it. The corresponding probabilistic entities on manifolds are denoted as random primitives [17] describing a map from the probability space onto a state space which is a subset of a Riemannian manifold. To each subset 𝒜 ⊂ ℳ a probability 𝑃 (x ∈ 𝒜) can then be assigned describing the chance to find a realization of the random primitive x within this area with the normalization constraint 𝑃 (x ∈ ℳ) = 1. Generalizing the concept of probability density functions (pdfs) requires a measure 𝑑ℳ on the manifold which, in case of Riemannian manifolds, is induced in a natural way by means of the metric G(x) for a given local coordinate system x = (𝑥1 , 𝑥2 , ..., 𝑥𝑛 ) √ 𝑑ℳ = ∣G(x)∣𝑑x . (11)
In estimation theory on Euclidean spaces, a Bayesian estimator of a random vector f given some observation g is commonly defined by the minimum of the Bayesian risk 𝔼f ∣g [𝐿(𝜺)], i.e. the expectation of a loss function 𝐿(𝜺) which is a symmetric function of the difference 𝜺 = f − ˆf between the estimate ˆf and the current realization f of the random vector to be estimated. Commonly used are the quadratic loss function leading to the minimum mean squared error estimator 𝑓ˆ = 𝔼f ∣g [f ] and the hit and miss loss function leading to the maximum a posteriori (MAP) estimator 𝑓ˆ = arg minf {𝑝(f ∣g)}. If we generalize Bayesian estimators to nonlinear manifolds, we have to assure them to be invariant with respect to the chosen chart. The minimum mean squared error estimator is an expectation value and thus is invariant with respect to any chosen coordinate system. It can directly be generalized to 𝒫(𝑛). On the other hand the MAP estimator is not invariant, as the probability distribution transforms as a density, i.e. it has to be multiplied by the corresponding Jacobian factor. As an appropriate generalization of the hit and miss loss function that penalizes all non-zero errors, we propose the negative delta distribution 𝐿(𝜺) = −𝛿(𝜺). Minimizing the corresponding Bayesian risk leads to maximizing the corresponding vpf. This is a coordinate invariant generalization to the MAP estimator usually defined by the hit and miss loss function in Euclidean statistics. As shown in [21] the Bayesian rule holds for the vpf such that we can derive the posterior vpf by means of a corresponding likelihood function and prior distribution as in case of the Euclidean space. It is good scientific practise to provide some measure of the reliability to each parameter estimate. From a Bayesian
A function2 𝑝 : ℳ → ℝ+ 0 is then denoted as probability density function (pdf) with respect to the volume form 𝑑ℳ if for all 𝒜 ⊆ ℳ ∫ 𝑃 (x ∈ 𝒜) = 𝑝 𝑑ℳ . (12) 𝒜
The definition of the pdf above implies invariance under coordinate transformation, i.e. it transforms as a scalar and not a density for which reason we denote it as volumetric probability function (vpf) in accordance with [21]. However the invariance property of the vpf is important when generalizing theoretical estimation concepts as Bayesian law and 2 More formally: 𝑝 is the Radon-Nykodym derivative of 𝑃 with respect to ℳ
2330
perspective, the vpf 𝑝(f ∣g) contains all information about the desired parameter vector f . However, full probability distributions are difficult to interpret as well as cumbersome to be processed from measurement space to the parameter space (cmp. [19]). Consequently, one commonly aims at compressing the information contained in the posterior distribution into two entities. The first entity is the point estimate of the parameter vector, e.g. the most probable value in case of the MAP estimate. The second entity containing information about the reliability of the point estimate is the covariance of the posterior vpf. As for non-Gaussian distributions the covariance is difficult to obtain, it is often approximated by the inverse Hessian of the posterior evaluated at the maximum mode (Laplace approximation). Consequently, the inverse of the Hessian of the posterior around its maximum serves as a useful reliability quantity of the parameter estimate ˆf also for non-Gaussian distributions.
4. Riemannian Bayesian Estimation of DTI 4.1. Likelihood function As in case of estimating in the Euclidean space the Riemannian sampling distribution of DW image values at one spatial position3 is obtained by inserting the ST equation of the 𝑗-th measurement into the Rician noise model (2) ( ( ) ) 𝑝(𝑆𝑗 ∣𝑆0 , Σ, 𝜎) = 𝑝 𝑆𝑗 ∣𝑆0 exp −𝑏𝑗 g𝑗𝑇 Σg𝑗 , 𝜎 , (14) where 𝑆0 denotes the noise free reference signal. The noise of the DW signals are mutual independent as they are acquired in independent measurements steps and are assumed to have the same standard deviation for all measurements and independent from its spatial position. The overall sampling distribution of 𝐿 DW images∏is consequently given by the product 𝑝(𝑺∣𝑆0 , Σ, 𝜎) = 𝑗 𝑝(𝑆𝑗 ∣𝑆0 , Σ, 𝜎) with 𝑺 = (𝑆1 , ..., 𝑆𝐿 ). Assuming statistical independence of signal magnitudes at different spatial position allows to express the overall likelihood function (the sampling distribution considered as a function of its parameters) as product of likelihood functions at different spatial positions.
4.2. Prior distribution Deriving the posterior vpf requires the prior distribution over all unknown parameters, the diffusion tensor image Σ(x𝑖 ), the reference image 𝑆0 (x𝑖 ) as well as the noise parameter 𝜎. In a full Bayesian approach one aims at deriving the posterior vpf of all unknown parameters and subsequently marginalizing all nuisance parameters, i.e. 𝑆0 (x𝑖 ) and 𝜎. On nonlinear manifolds marginalization is not, also in case of Gaussian vpfs, analytical tractable such that we 3 We
notation
omit the index for the spatial position in favor of a uncluttered
decide to estimate the nuisance parameters in a separate estimation step (cmp. Section 5). Consequently, we model the prior distribution of the tensor valued image 𝑝(Σ) where Σ = {Σ(x𝑖 )} denotes the set of all tensors in the image. A desired property of prior or regularization terms in restoration of scalar valued images is to preserve characteristic features like image edges4 Such property fulfills the nonlinear isotropic regularizer, first been proposed in [7] in conjunction with the affine invariant metric. We choose the statistical counterpart of the nonlinear isotropic regularizer as a model for our prior 𝑝(Σ) = exp{−𝐽𝑝 (Σ)} with the energy ( ) ∑ 1∑ 2 𝐽𝑝 = 𝜓 dist (Σ𝑖 , Σ𝑘 ) (15) 4 𝑖 𝑘∈𝒩𝑖
and potential function 𝜓. We would like to stress that other priors also work within our framework. However, in order to keep the focus on the overall concept of a Bayesian estimator we decided to choose a well known prior term that already has been applied in deterministic regularization approaches [7, 18].
4.3. Riemannian MAP estimation Applying the geodesic marching scheme to maximizing our posterior vdf requires the gradient of its negative logarithm (which is equal to the gradient of the energy 𝐽 = 𝐽𝐿 + 𝐽𝑝 ) with respect to each tensor in the image. In our matrix notation, this reads ∇Σ 𝐽𝐿 = Σ (∇⊥ 𝐽𝐿 ) Σ. Applying this scheme to the energy of the likelihood yields ∇Σ 𝐽 𝐿
=
∑ 𝐼1 𝑏𝑗 𝑆ˆ0 𝑆𝑗 𝑒−𝑏𝑗 g𝑗𝑇 Σg𝑗 𝑗
−
𝜎 ˆ2
𝐼0
∑ 𝑏𝑗 𝑆ˆ2 𝑒−2𝑏𝑗 g𝑗𝑇 Σg𝑗 0
𝑗
𝜎 ˆ2
Σg𝑗 g𝑗𝑇 Σ (16)
Σg𝑗 g𝑗𝑇 Σ ,
where we omit the index for the spatial position in favor of an uncluttered notation, i.e. Σ := Σ𝑚 . Let us compare this gradient with the gradient of the log likelihood model we would obtain for an additive Gaussian noise in the DW signals 𝑆𝑖 . Applying the same steps as above for the additive Gaussian noise model, would lead for the same expression of the gradient except for the correction term 𝐼𝐼10 which would be set to one. The correction term becomes nearly one for large arguments which corresponds to high signal to noise ratios, validating the Gaussian noise approximation for high signal to noise ratios. However, one should keep in mind that the signal strength depends on the diffusion process, i.e. even a high SNR for 𝑆0 does not guarantee high SNRs for the DW images (cmp. with Equ. (1)). 4 In analogy to gray value images, we define edges by the gradient magnitude ∣∣∇Σ∣∣Σ , i.e. ’large’ gradient magnitudes indicate image boundaries.
2331
For low SNR the Rician noise model introduces a correction term reducing the weight of the measured DW signal 𝑆𝑖 → 𝑆𝑖 𝐼𝐼10 to account for the skewed probability density function with heavy right tail. Thus, modeling data contaminated by Rician noise as additive Gaussian noise, leads to an overestimation of the true underlying signal and consequently to an underestimation of the diffusion process (cmp. Equation 16). Computing the gradient of the negative log prior distribution (cmp. with (15)), requires the computation of the gradient of squared distances of the form dist2 (Σ𝑖 , Σ𝑗 ) which have been derived in [16] yield−−−→ ing ∇Σ𝑗 dist2 (Σ𝑖 , Σ𝑗 ) = −2Σ𝑗 Σ𝑖 such that the prior energy reads ∇Σ𝑚 𝐽 𝑝 = −
′ ∑ 𝜓𝑚 + 𝜓𝑗′ −−−→ Σ𝑚 Σ𝑗 . 2
(17)
𝑗∈𝒩𝑚
Having the gradients of the objective function allows us to compute a (local) minimum using the geodesic marching scheme by cycling through all tensors in the image.
4.4. Covariance estimation We apply the Laplace approximation to the posterior vpf in order to estimate the posterior covariance. I.e. we expend the negative log posterior pdf in a second order Talyor series and afterwards use relation (13) to estimate the covariance matrix by means of a Gauss Markov random sampling algorithm. To this√end, we consider (13) →− →𝑇 ∣G (z)∣] of the outer as the expectation 𝐶 = 𝔼[− 𝑥𝑧 𝑥𝑧 x product of the tangent vectors time the Jacobian factor √ ∣Gx (z)∣ with respect to a Gaussian distribution in a linear space. The Hessian H𝑖𝑗 (𝑓 ) = ∂𝑖 ∂𝑗 𝑓 − Γ𝑘𝑖𝑗 ∂𝑘 𝑓 of a function 𝑓 defined on a nonlinear manifold, in general requires evaluation of Christoffel symbols Γ𝑘𝑖𝑗 = 1 𝑘𝑚 (∂𝑖 𝑔𝑗𝑚 + ∂𝑗 𝑔𝑖𝑚 − ∂𝑚 𝑔𝑖𝑗 ). However, in case of Rie2𝑔 mannian manifolds, the exponential/logarithmic map allows to define a chart 𝜙 := 𝜉 ∘ log𝑝 (Σ), where 𝜉 : 𝑇𝑝 ℳ → ℝ3×3 is the usual matrix coordinate chart. In this chart all Christoffel symbols vanish, i.e. Γ𝑘𝑖𝑗 = 0. In order to see this let us remind (cmp. section 3.1) that geodesics 𝛾(𝑡) are mapped on straight lines by the logarithmic map, i.e. → Furthermore, a geodesic 𝜙 ∘ 𝛾(𝑡) = z𝑡 with z = 𝜉 ∘ − 𝑝𝑧. in local coordinates need to fulfill the geodesic equation 𝑑2 𝑧 𝑘 𝑘 𝑑𝑧 𝑖 𝑑𝑧 𝑗 𝑑𝑡2 + Γ𝑖𝑗 𝑑𝑡 𝑑𝑡 = 0, which in our chart 𝜙 reduces to Γ𝑘𝑖𝑗 𝑧 𝑖 𝑧 𝑗 = 0. This is in general only fulfilled for vanishing Christoffel symbols and consequently the Hessian simplifies to H𝑖𝑗 (𝑓 ) = ∂𝑖 ∂𝑗 𝑓 . Second order derivatives of the likelihood term are straightforward to compute in 𝜙 using basic matrix calculus. Samples from 𝒩 (x, H−1 ) are obtained via standard Gauss Markov Monte Carlo sampling. To this end let p𝑘 ∼ 𝒩 (0, I), 𝑘 ∈ 1, ..., 𝐾 be 𝐾 samples from the zero mean multivariate, random vector valued Gaussian distribution with identity covariance matrix I.
Further let H = L𝑇 L be the Cholesky decomposition of H. The distribution of solutions q𝑘 of the linear equation system L𝑇 q𝑘 = p𝑘 then has the desired covariance ma( )−1 = H−1 , and thus trix C0 (q) = C0 (L−𝑇 p) = LL𝑇 ∑ √ C ≈ 𝑘 ∣Gx (q𝑘 )∣q𝑘 q𝑇𝑘 /𝐾. Fig. (3) illustrates our covariance estimation scheme on real data.
5. Experiments Performance of our Bayesian Riemannian tensor reconstruction approach is demonstrated on synthetic and real DT-MRI data. As reference methods we consider the extrinsic maximum likelihood approach proposed by Landman et al.[13] in its original form as well as combined the nonlinear isotropic regularizer proposed in [24], as well as the intrinsic maximum likelihood approach proposed in [14] with and without the intrinsic nonlinear isotropic regularizer presented in section 4.2. The noise level 𝜎 is estimated by means of a standard variance estimator [9] directly from the complex valued NMR signal. In order to estimate the reference image 𝑆0 (𝒙𝑖 ), we take the average of the complex valued NMR images measured several times for zero ’b’ value. As a model for the potential function we choose 𝜓(𝑥) = 𝛽 log(1 + 𝑥2 /𝛽) and set the contrast parameter 𝛽 minimizing the distance between estimate and ground truth for each noise level in case of synthetic data and use the corresponding parameter afterwards for the real data.
5.1. Synthetic data In this experiment we apply the different reconstruction approaches to simple synthetic NMR data. To this end, we generate a 32 × 32 large tensor image (see Figure 1 lower right; in order to visualize details more precise only a cutout of the image is shown). Each tensor is represented by an ellipsoid and the orientation in the plain of its main axis is additional color coded. As we know the tensor valued images we can derive complex valued NMR images using the ST equation (1) with the ’b-values’ as well as gradient orientation from the real data experiment below. We corrupt the ground truth NMR signals with five different noise levels (𝜎 = 0.002, 0.004, 0.006, 0.008, 0.01). Figure 1 shows the ground truth as well as the different reconstructed diffusion tensor images from noisy images with 𝜎 = 0.006. Reconstruction without regularization [13] and [14] leads to highly noisy tensor images. However, in case of the additive Gaussian noise model we can observe the bias resulting in significant smaller diffusion ellipsoids. On the other hand, the combination of the Rician noise model with nonlinear isotropic regularization [24] within the flat Euclidean metric suffers from the eigenvalue swelling effect. The intrinsic approach with the Gaussian noise model [14] shows less swelling of eigenvalues but suffers from the bias which is indicated in too small ellipsoids as well as in a 2332
Figure 2. Error norms (average distance between estimate and ground truth (left), average error of the determinant (right)) vs. the noise level (𝜎 has to be multiplied by 10−2 ) for different reconstruction methods: Additive noise model [14], additive noise model [14] with regularization, Rician noise model [13], Rician noise model [13] with regularization as well as our approach.
Figure 1. Synthetic tensor image reconstruction: Upper row: [13] (left), [14](right); Middle row: [13] with regularization (left), [14] with regularization (right); Lower Row: Our approach (left), ground truth (right).
change of the average tensor direction. Our approach, on the other hand, resembles most with the ground truth. However, no approach is able to preserve the high eccentricity of the tensors in the line structure. The quantitative analysis of two error measures confirms the visual impression (see Figure 2). Our approach results in the lowest error measures. Only for the highest SNR (𝜎 = 0.002) the regularized Rician noise model within the flat Euclidean metric leads to comparable results.
5.2. Real data In this experiment, we apply our as well as the reference approaches to DW images measured from a human brain in-vivo. A single-shot diffusion-weighted twice-refocused spin-echo echo planar imaging sequence was used with the following measurement parameters: TR = 6925 ms, TE = 104 ms, 192 matrix with 6/8 phase partial Fourier, 23 cm
field of view, and 36 2.4-mm-thick contiguous axial slices. The in-plane resolution was 1.2 mm/pixel. For each slice, diffusion weighted images were acquired involving diffusion gradients with eight different 𝑏 values (250, 500, 750, 1000, 1250, 1500, 1750, 2000 s/mm2 ), each being applied along six non-collinear directions [ (𝑥, 𝑦, 𝑧) gradient directions were (1, 0, 1), (-1, 0, 1), (0, 1, 1), (0, 1, -1), (1, 1, 0), (-1, 1, 0)], and an individual set without diffusion weighting (𝑏 = 0 s/mm2 ). As no ground truth is available we cannot make any quantitative comparisons, however, a closer look at the reconstructed tensor images (cutouts of the reconstructed tensor images are shown in Fig. (3)) show qualitative a similar behavior as in case of the study on synthetic data.
6. Conclusion We derived a Bayesian estimation framework for reconstruction of diffusion tensor images from DW images using an intrinsic view. We briefly reviewed all the necessary background, i.e. the physical basis of DTI and the Rician noise distribution of DW images, the basic concepts of Riemannian manifolds, probabilities and estimation on such manifolds. The explicit formulation of a likelihood term respecting Rician noise as well as a prior for regularization together with a suitable MAP and covariance estimation procedure respecting the non-flat nature of 𝒫(𝑛) are the core contributions of this paper. Overall we conclude that the consequent usage of the intrinsic view for model derivation in conjunction with suitable Bayesian estimation schemes for value and covariance yields the most accurate and reliable approach to DTI reconstruction from DW images presented so far.
2333
Figure 3. Estimation of real tensor images: Upper row: Rician noise model [13](left), Gaussian noise model [14] with regularization (right); Lower Row: Our approach (left), estimated variance of our approach, i.e. the trace of the covariance at each position (right).
References [1] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache. LogEuclidean metrics for fast and simple calculus on diffusion tensors. Magnetic Resonance in Medicine, 56(2):411–421, 2006. [2] D. L. Bihan, E. Breton, D. Lallemand, P. Grenier, E. Cabanis, and M. Laval-Jeantet. MR imaging of intravoxel incoherent motions: application to diffusion and perfusion in neurologic disorders. Radiology, 161(2):401–407, 1986. [3] D. L. Bihan, J.-F. Mangin, C. Poupon, C. A. Clark, S. Pappata, N. Molko, and H. Chabriat. Diffusion tensor imaging: Concepts and applications. Journal of Magnetic Resonance Imaging, 13(4):534–546, 2001. [4] B. Burgeth, S. Didas, L. Florack, and J. Weickert. A generic approach to the filtering of matrix fields with singular PDEs. In In F. Sgallari, A. Murli, and N. Paragios, editors: ScaleSpace and Variational Methods in Image Processing, volume 4485, pages 556–567, 2007. [5] C. A. Castano-Moraga, C. Lenglet, R. Deriche, and J. RuizAlzola. A Riemannian approach to anisotropic filtering of tensor fields. Signal Processing, 87(2):263–276, 2007. [6] C. Chefd’hotel, D. Tschumperl´e, R. Deriche, and O. Faugeras. Regularizing flows for constrained matrixvalued images. J. Math. Imaging Vis., 20(1-2):147–162, 2004. [7] P. Fillard, V. Arsigny, N. Ayache, and X. Pennec. A Riemannian framework for the processing of tensor-valued images. In DSSCV, pages 112–123, 2005.
[8] P. Fletcher and S. Joshi. Principle geodesic analysis on symmetric spaces: Statistics of diffusion tensors. In Computer Vision and Mathematical Methods in Medical and Biomedical Image Analysis, ECCV 2004 Workshops CVAMIA and MMBIA, pages 87–98, 2004. [9] W. F¨orstner. Image preprocessing for feature extraction in digital intensity, color and range images. In Proc. Int’l. Summer School on Data Analysis and the Statistical Foundation of Geomatics, 1998. [10] Y. Gur, O. Pasternak, and N. Sochen. Fast GL(n)-invariant framework for tensors regularization. Intern. Journal of Computer Vision, Published online, 2008. [11] S. Helgason. Differential Geometry, Lie groups and symmetric spaces. Academic press, 1978. [12] I. H. Jermyn. Invariant bayesian estimation on manifolds. Ann. Statist., 33(2):583–605, 2005. [13] B. A. Landman, P.-L. Bazin, and J. L. Prince. Diffusion tensor estimation by maximizing Rician likelihood. In ICCV, pages 1–8, 2007. [14] C. Lenglet, M. Rousson, R. Deriche, and O. Faugeras. Statistics on the manifold of multivariate normal distributions: Theory and application to diffusion tensor MRI processing. J. Math. Imaging Vis., 25(3):423–444, 2006. [15] C. Lenglet, M. Rousson, R. Deriche, O. D. Faugeras, S. Lehericy, and K. Ugurbil. A Riemannian approach to diffusion tensor images segmentation. In IPMI, pages 591–602, 2005. [16] M. Moakher. A differential geometric approach to the geometric mean of symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl., 2003. [17] X. Pennec. Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements. J. Math. Imaging Vis., 25(1):127–154, 2006. [18] X. Pennec, P. Fillard, and N. Ayache. A Riemannian framework for tensor computing. International Journal of Computer Vision, 66(1):41–66, 2006. [19] T. Preusser, H. Scharr, K. Krajsek, and R. M. Kirby. Building blocks for computer vision with stochastic partial differential equations. Int. J. Comput. Vision, 80(3):375–405, 2008. [20] E. Stejskal and J. Tanner. Spin diffusion measurements: spin echoes in the presence of a time-dependent field gradient. Journal of Chemical Physics, 42:288292, 1965. [21] A. Tarantola. Inverse Problem Theory and Model Parameter Estimation. SIAM, 2005. [22] D. Tschumperl´e and R. Deriche. Diffusion tensor regularization with constraints preservation. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’01), pages 948–953, 2001. [23] Z. Wang, B. C. Vemuri, Y. Chen, and T. Mareci. Simultaneous smoothing and estimation of the tensor field from diffusion tens or MRI. In CVPR (1), pages 461–466, 2003. [24] J. Weickert and T. Brox. Diffusion and regularization of vector- and matrix-valued images. In Inverse Problems, Image Analysis, and Medical Imaging. Contemporary Mathematics, pages 251–268, 2002.
2334