Anatomical Regularization on Statistical ... - Cogimage - CNRS

1 downloads 0 Views 1MB Size Report
ments of a statistical manifold, one can define a metric that integrates various types of ... to the classification of magnetic resonance (MR) images based on gray.
Anatomical Regularization on Statistical Manifolds for the Classification of Patients with Alzheimer’s Disease R´emi Cuingnet1,2 , Joan Alexis Glaun`es1,3 , Marie Chupin1 , Habib Benali2 , and Olivier Colliot1 the Alzheimer’s Disease Neuroimaging Initiative 1

Universit´e Pierre et Marie Curie-Paris 6, CNRS UMR 7225, Inserm UMR S 975, Centre de Recherche de l’Institut Cerveau-Moelle (CRICM), Paris, France 2 Inserm, UMR S 678, LIF, Paris, France 3 MAP5, Universit´e Paris 5 - Ren´e Descartes, Paris, France

Abstract. This paper introduces a continuous framework to spatially regularize support vector machines (SVM) for brain image analysis based on the Fisher metric. We show that, by considering the images as elements of a statistical manifold, one can define a metric that integrates various types of information. Based on this metric, replacing the standard SVM regularization with a Laplace-Beltrami regularization operator allows integrating to the classifier various types of constraints based on spatial and anatomical information. The proposed framework is applied to the classification of magnetic resonance (MR) images based on gray matter concentration maps from 137 patients with Alzheimer’s disease and 162 elderly controls. The results demonstrate that the proposed classifier generates less-noisy and consequently more interpretable feature maps with no loss of classification performance.

1

Introduction

Brain image analyses have widely relied on univariate voxel-wise analyses, such as voxel-based morphometry (VBM) for structural MRI [1]. In such analyses, brain images are first spatially registered to a common stereotaxic space, and then mass univariate statistical tests are performed in each voxel to detect significant group differences. However, the sensitivity of theses approaches is limited when the differences are spatially complex and involve a combination of different voxels or brain structures [2]. Recently, there has been a growing interest in support vector machines (SVM) methods [3, 4] to overcome the limits of these univariate analyses. Theses approaches allow capturing complex multivariate relationships in the data and have 1

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). A complete listing of ADNI investigators can be found at: http://adni.loni.ucla.edu/wpcontent/uploads/how to apply/ADNI Authorship List.pdf

2

been successfully applied to the individual classification of a variety of neurological and psychiatric conditions such as Alzheimer’s disease [5–9] fronto-temporal dementia [5], schizophrenia [10] and Parkinsonian syndromes [11]. Moreover, the output of the SVM can also be analyzed to localize spatial patterns of discrimination, for example by drawing the coefficients of the optimal margin hyperplane (OMH) – which, in the case of a linear SVM, live in the same space as the MRI data [6, 7]. However, voxel-based comparisons are subject to registration errors and interindividual variability. Therefore, one of the problems with analyzing directly the OMH coefficients is that the corresponding maps are scattered and lack spatial coherence. This makes it difficult to give a meaningful interpretation of the maps, for example to localize the brain regions altered by a given pathology. This is due to the fact that the regularization term of the standard linear SVM is not a spatial regularization. To overcome this limitation, Cuingnet et al. [12] proposed to directly enforce spatial consistency into the SVM by using the Laplacian of a regularization graph. They proposed a regularization graph which takes into consideration both spatial information (the location) and anatomical information (the tissue types). They combine spatial and anatomical information by modifying the local topology induced by the spatial information with respect to some given anatomical priors (tissues types). Since the images are discrete, they used a discrete framework to model local behaviors: graphs. Nevertheless, as the brain is intrinsically a continuous object, it seems more interesting to describe local behaviors from the continuous viewpoint. This paper extends this spatial regularization framework to the continuous case. In particular, we show that by considering images as statistical manifolds together with the Fisher metric, it allows taking into account various prior information such as tissue, atlas information and spatial proximity. We then apply the proposed framework to the classification of MR images based on gray matter concentration maps and cortical thickness measures from patients with Alzheimer’s disease and elderly controls. The results demonstrate that the proposed approach allows obtaining spatially and anatomically coherent discrimination patterns. It generates more interpretable features maps with an increase or at least with no loss of classification performance.

2 2.1

Spatially Regularized SVM on Riemannian Manifold Background

In this contribution, we consider the case of brain images which are spatially normalized to a common stereotaxic space as in many group studies or classification methods [6, 7, 9, 10, 13]. These images can be any characteristics extracted from the MRI, such as tissue concentration maps (in VBM). Let (xs )s∈[1,N ] be the images of N subjects and (ys )s∈[1,N ] ∈ {±1}N their group labels (e.g. diagnosis). For each subject s, xs can be considered as a square integrable real-valued function defined on a compact subset, V, of R3 or more generally on a compact

3

of a 3D Riemannian manifold. Let V be the domain of the 3D images. SVMs search for the hyperplane for which the margin between groups is maximal. The standard linear SVM solves the following optimization problem [3, 4]:  wopt , bopt =

N 1 X `hinge (ys [hw, xs iL2 + b]) + λkwk2L2 w∈L2 (V),b∈R N s=1

arg min

(1)

where λ ∈ R+ is the regularization parameter and `hinge the hinge loss function defined as: `hinge : u ∈ R 7→ (1 − u)+ . With a linear SVM, the feature space is the same as the input space. Thus, when the input features are images, the weight map wopt is also an image. This map qualitatively informs us about the role of the different brain regions in the classifier [9]. Therefore, since two neighboring regions should have a similar role in the classifier, wopt should be smooth with respect to the topology of V. However, this is not guaranteed with the standard linear SVM because the regularization term is not a spatial regularization. 2.2

Regularization operator

By considering the SVM from the regularization viewpoint [4], one can constrain wopt to be smooth with respect to the topology of V. This is done through the definition of a regularization operator, P , defined as a linear map from a space U ⊂ L2 (V) into L2 (V). When P is bijective and symmetric, N 1 X `hinge (ys [hu, xs iL2 + b]) + λkP uk2L2 u∈U ,b∈R N s=1

min

(2)

is equivalent to a linear SVM on the data (P −1 xs )s . Similarly, it can be seen as a SVM minimization problem on the raw data with kernel K defined by K(x1 , x2 ) = hP −1 x1 , P −1 x2 iL2 . One has to define the regularization operator P to obtain the suitable regularization for the problem. 2.3

Spatial Regularization on Compact Riemannian Manifold

Spatial regularization requires the notion of proximity between elements of V. In this paper, V is considered as a 3-dimensional compact Riemannian manifold (M, g) with boundaries. The metric, g, then models the notion of proximity. On such spaces, the heat kernel exists [14, 15]. Therefore, the Laplacian regularization presented in [12] can be extended to compact Riemannian manifolds. Let ∆g denotes the Laplace-Beltrami operator4 . Let (en )n∈N be an orthonormal basis of L2 (V) of eigenvectors of ∆g (with homogeneous Dirichlet boundary conditions) [14, 16] and (µn )n∈N the corresponding eigenvalues. We define Uβ ( )  1  X 2 βµ 2 ∈` Uβ = u = un en | (un )n∈N ∈ ` and e 2 n un n∈N 4

n∈N

Note that, with the convention used in this paper, in Euclidean space, ∆g = −∆ where ∆ is the Laplacian operator.

4

where `2 denotes the set of square-summable sequences. We chose the regularization operator Pβ : Uβ → L2 (V) defined as: X X 1 1 e 2 βµn un en (3) Pβ : u = un en 7→ e 2 β∆g u = n∈N

n∈N

This penalizes the high-frequency components with respect to the topology of V.

3

Spatial Proximity

When the proximity is encoded by a Euclidean distance, this is equivalent to preprocess the data with a Gaussian smoothing kernel with standard deviation σ = √ β. However such a metric does not take into account anatomical information. In this section, the goal is to define a metric that takes into account various prior informations such as tissue, atlas and location information. We first show that this can be done by considering the images as elements of a statistical manifold and using the Fisher metric. We then give some details about the computation of the Gram matrix. 3.1

Fisher metric

The images are registered to a common space. Therefore, when considering some location v ∈ R3 , the true location is known up to the registration errors. Such spatial information can be modeled by a probability density function: x ∈ R3 7→ 2 ploc (x|v). A simple example would be ploc (·|v) ∼ N (v, σloc ). It can be seen as a confidence index about the spatial localization at voxel v. We further assume that we are given an anatomical or a functional atlas A composed of R regions: {Ar }r=1···R . Therefore, in each point v ∈ V, we have a probability distribution patlas (·|v) ∈ RA which informs us about the atlas region in v. As a result, in each point v ∈ R3 , we have some information about the spatial location and some anatomical information through the atlas. Such 3 information can be modeled by a probability density function p(·|v) ∈ RA×R . Therefore, we consider the parametric family of probability distributions: n o 3 M = p(·|v) ∈ RA×R v∈V

In the following, we further assume that ploc and patlas are independent. Thus, p verifies: p((Ar , x)|v) = patlas (Ar |v)ploc (x|v), ∀(Ar , x) ∈ A × R3 . We also assume that p is sufficiently smooth in v ∈ V and that the Fisher information matrix is definite at each v ∈ V. Then the parametric family of probability distributions M can be considered as a differential manifold [17]. A natural way to encode proximity on M is to use the Fisher metric, since such metric is invariant under reparametrization of the manifold. M with the Fisher metric is a compact Riemmanian manifold [17]. The metric tensor g is then given for all v ∈ V by:   ∂ log p(·|v) ∂ log p(·|v) , 1 ≤ i, j ≤ 3 gij (v) = Ev ∂vi ∂vj 2 atlas When ploc (·|v) ∼ N (v, σloc I3 ), we have: gij (v) = gij (v) +

δij 2 . σloc

5

3.2

Computing the Gram matrix

The computation of the kernel matrix requires the computation of e−β∆g xs for all the subjects of the training set. The eigendecomposition of the LaplaceBeltrami operator is intractable since the number of voxels in a brain images is about 106 . Hence e−β∆g xs is considered as the solution at time t = β of the heat equation with the Dirichlet homogeneous boundary conditions of unknown u: ∂u + ∆g u = 0; ∂t

u(t = 0) = xs

(4)

To solve equation (4), one can use a variational approach [18]. We used the rectangular finite elements φ(i) in space and the explicit finite difference scheme for the time discretization. ∆x and ∆t denote the space step and the time step respectively. Let U (t) denote the coordinates of u(t). Let U n denote the coordinates of u(t = n∆t ). This leads to: dU (t) + KU (t) = 0; U (t = 0) = U 0 (5) dt Z D Z E = ∇M φ(i) , ∇M φ(j) dµM and Mi,j = φ(i) φ(j) dµM (6) M

with

Ki,j

V

M

V

where K is the stiffness matrix and M is the mass matrix. The explicit finite difference scheme was used for the time discretization, thus U n+1 is given by: MU n+1 = (M − ∆t K) U n . The step ∆x is fixed by the MRI spatial resolution. The time step, ∆t , is then chosen so as to respect the Courant-Friedrichs-Lewy (CFL) condition: ∆t ≤ 2(max λi )−1 where λi are the eigenvalues of the general eigenproblem: KU = λMU . Therefore, the computational complexity is: O (N β(maxi λi )d). To compute the optimal time step ∆t , we estimated the largest eigenvalue with the power iteration method. In our experiments, for σloc = 5, λmax ≈ 15.4 and for σloc = 10, λmax ≈ 46.5. 3.3

Setting the diffusion parameter β

Our method required the tuning of two parameters σloc and β. The parameter σloc was chosen a priori. As evaluating the spectrum of the Laplacian operator is intractable considering the images’ sizes, β was chosen to be equivalent to the diffusion parameter of the Gaussian smoothing, β = σ 2 , where σ is the standard deviation for the Gaussian smoothing kernel. To be comparable with the Euclidean case, we first normalized g with:  Z  2 1 1  1 tr g 2 (u) du |V| u∈V 3

4

Experiments and Results

In this section, the proposed framework is applied to the analysis of MR images using gray matter concentration maps from patients with Alzheimer’s disease and elderly controls.

6

4.1

Materials

Subjects and MRI acquisition Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The Principal Investigator of this initiative is Michael W. Weiner, MD, VA Medical Center and University of California, San Francisco. ADNI is the result of efforts of many co-investigators from academic institutions and private corporations. For up-to-date information, see www.adni-info.org. We used the same study population as in [9]. As a result, 299 subjects were selected: 162 cognitively normal elderly controls (76 males, 86 females, age ± SD [range] = 76.3 ± 5.4 [60 − 90] years, and mini-mental score (MMS) = 29.2 ± 1.0 [25 − 30]) and 137 patients with AD (67 males, 70 females, age = 76.0 ± 7.3 [55 − 91] years, and MMS = 23.2 ± 2.0 [18 − 27]). The T1-weighted MR images described in [19] were used in this study. Features Extraction All images were segmented into gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) using the SPM5 unified segmentation routine [20] and spatially normalized using the DARTEL diffeomorphic registration algorithm [21] with the default parameters. The features are the modulated GM probability maps in the MNI space. 4.2

Classification experiments

We tested the spatial regularization for both the Euclidean metric and the Fisher metric. In the following, they will be referred to as Regul-Euclidean and RegulFisher respectively. The atlas information used was only the tissue types (GM, WM and CSF templates). To assess the impact of the regularization we also performed the classification experiments with no regularization: Direct. Optimal coefficient maps The optimal SVM weights wopt for different value of β are shown on Figure 1. When no spatial regularization has been carried out (a), the wopt maps are noisy and scattered. With Euclidean spatial regularization (b-c), they become smoother and more spatially consistent. However it mixes tissues and does not respect the topology of the cortex. With the Fisher metric (d-e), the obtained map is much more consistent with the brain anatomy. Compared to the Euclidean regularization, it better respects the topology of the cortex (Fig. 2). The main regions in which atrophy increases the likelihood of being classified as AD (regions in red) are: the medial temporal lobe, the inferior and middle temporal gyri, the posterior cingulate and the posterior middle frontal gyri. Classification performances In in order to obtain unbiased estimates of the performances, the set of participants was randomly split into two groups of the same size: a training set and a testing set. On the training set, a gridsearch with a leave-one-out-cross-validation was used to estimate the optimal

7 -0.5

(a)

-0.1

(b)

+0.1

(c)

+0.5

(d)

(e)

Fig. 1. Normalized wopt coefficients for: (a) Direct, (b-c) Regul-Euclidean with FWHM = 4 mm and FWHM = 4 mm respectively, (d-e) Regul-Fisher with FWHM ∼ 4 mm and FWHM ∼ 8 mm respectively (σloc = 10). In all experiments, C = 1.

(a)

(b)

(c)

(d)

(e)

Fig. 2. Gray probability map ((a) original map) of a control subject preprocessed with: (b) a 4 mm FWHM gaussian kernel, (c) an 8 mm FWHM gaussian kernel, (d)-(e) with β

e− 2 ∆g and β corresponds to a 4 mm and to an 8 mm FHWM respectively.

values of the hyperparameters: the cost parameter C (λ = 2N1 C ) of the linear C-SVM (10−5 , 10−4.5 , · · · , 103 ), FWHM (0, 2, · · · , 8 mm) and σloc (5, 10 mm). The performances of the resulting classifiers were then evaluated on the testing set. Classification performances in terms of accuracies were slightly improved by spatially regularizing the SVM with the Fisher metric: Direct: 89%, RegulEuclidean: 89%, Regul-Fisher : 91%, COMPARE [10]: 86%, STAND-Score [7]: 81%.

5

Conclusion

In conclusion, this paper presents a continuous framework to spatially regularize SVM for brain image analysis based on the Fisher metric. By considering the images as elements of a statistical manifold, one can define a metric that integrates various types of information. Based on this metric, replacing the standard SVM regularization with a Laplace-Beltrami regularization operator allows integrating to the classifier various types of constraints based on spatial and anatomical information. The proposed approach makes the results more consistent with the anatomy, making their interpretation more meaningful. Finally, it should be noted that the proposed approach is not specific to structural MRI, and can be applied to other pathologies and other types of data (e.g. functional or diffusionweighted MRI).

8

Acknowledgements This work was supported by ANR (project HM-TC, number ANR-09-EMER006). Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904).

References 1. Ashburner, J., Friston, K.J.: Voxel-based morphometry–the methods. NeuroImage 11(6) (2000) 805–21 2. Davatzikos, C.: Why voxel-based morphometric analysis should be used with great caution when characterizing group differences. NeuroImage 23(1) (2004) 17–20 3. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer-Verlag (1995) 4. Sch¨ olkopf, B., Smola, A.J.: Learning with Kernels. MIT Press (2001) 5. Davatzikos, C., et al.: Individual patient diagnosis of AD and FTD via highdimensional pattern classification of MRI. NeuroImage 41(4) (2008) 1220–27 6. Kl¨ oppel, S., et al.: Automatic classification of MR scans in Alzheimer’s disease. Brain 131(3) (2008a) 681–9 7. Vemuri, P., et al.: Alzheimer’s disease diagnosis in individual subjects using structural mr images: Validation studies. NeuroImage 39(3) (2008) 1186–97 ´ et al.: Multidimensional classification of hippocampal shape features 8. Gerardin, E., discriminates Alzheimer’s disease and mild cognitive impairment from normal aging. NeuroImage 47(4) (2009) 1476–86 9. Cuingnet, R., et al.: Automatic classification of patients with Alzheimer’s disease from structural MRI: A comparison of ten methods using the ADNI database. NeuroImage 56(2) (2011) 766 – 781 10. Fan, Y., et al.: COMPARE: classification of morphological patterns using adaptive regional elements. IEEE Transactions on Medical Imaging 26(1) (2007) 93–105 11. Duchesne, S., et al.: Automated computer differential classification in Parkinsonian syndromes via pattern analysis on MRI. Academic radiology 16(1) (2009) 61–70 12. Cuingnet, R., et al.: Spatially regularized SVM for the detection of brain areas associated with stroke outcome. In: MICCAI. Volume 6361 of LNCS. (2010) 316–23 13. Querbes, O., et al.: Early diagnosis of Alzheimer’s disease using cortical thickness: impact of cognitive reserve. Brain 132(8) (2009) 2036–47 14. Jost, J.: Riemannian geometry and geometric analysis. Springer Verlag (2008) 15. Lafferty, J., Lebanon, G.: Diffusion kernels on statistical manifolds. JMLR 6 (2005) 129–63 16. Hebey, E.: Sobolev spaces on Riemannian manifolds. Springer-Verlag (1996) 17. Amari, S.I., et al.: Differential Geometry in Statistical Inference. Volume 10. Institute of Mathematical Statistics (1987) 18. Druet, O., Hebey, E., Robert, F.: Blow-up theory for elliptic PDEs in Riemannian geometry. Princeton Univ Press (2004) 19. Jack, C.R., et al.: The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. Journal of Magnetic Resonance Imaging 27(4) (2008) 20. Ashburner, J., Friston, K.J.: Unified segmentation. NeuroImage 26(3) (2005) 839–51 21. Ashburner, J.: A fast diffeomorphic image registration algorithm. NeuroImage 38(1) (2007) 95–113

Suggest Documents