1808
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 59, NO. 7, JULY 2012
Improved Labeling of Subcortical Brain Structures in Atlas-Based Segmentation of Magnetic Resonance Images Siamak Yousefi*, Nasser Kehtarnavaz, and Ali Gholipour
Abstract—Precise labeling of subcortical structures plays a key role in functional neurosurgical applications. Labels from an atlas image are propagated to a patient image using atlas-based segmentation. Atlas-based segmentation is highly dependent on the registration framework used to guide the atlas label propagation. This paper focuses on atlas-based segmentation of subcortical brain structures and the effect of different registration methods on the generated subcortical labels. A single-step and three two-step registration methods appearing in the literature based on affine and deformable registration algorithms in the ANTS and FSL algorithms are considered. Experiments are carried out with two atlas databases of IBSR and LPBA40. Six segmentation metrics consisting of Dice overlap, relative volume error, false positive, false negative, surface distance, and spatial extent are used for evaluation. Segmentation results are reported individually and as averages for nine subcortical brain structures. Based on two statistical tests, the results are ranked. In general, among four different registration strategies investigated in this paper, a two-step registration consisting of an initial affine registration followed by a deformable registration applied to subcortical structures provides superior segmentation outcomes. This method can be used to provide an improved labeling of the subcortical brain structures in MRIs for different applications. Index Terms—Atlas, MRI, registration, segmentation.
I. INTRODUCTION ANY studies in neuroscience involve the delineation of brain anatomical structures for diagnostic purposes, treatment planning, neurosurgical applications, and precise functional localization. Distinguishing various neurological diseases is dependent on knowing the morphometry of brain structures. In order to get the morphometry of brain structures and thus a better understanding of brain diseases, neuroscientists use labels in magnetic resonance (MR) brain images. Since manual labeling methods are very time consuming and subject to various errors including inter- and intra-rater variations, the use of automatic or semiautomatic methods for brain labeling is of much interest.
M
Manuscript received November 30, 2010; revised February 6, 2011; accepted February 22, 2011. Date of publication March 3, 2011; date of current version June 20, 2012. Asterisk indicates corresponding author. *S. Yousefi is with the Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75080 USA (e-mail: sxy072100@ utdallas.edu). N. Kehtarnavaz is with the Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75080 USA (e-mail:
[email protected]). A. Gholipour is with the Department of Radiology, Children’s Hospital Boston, and Harvard Medical School, Boston, MA 02115 USA (e-mail:
[email protected]). Digital Object Identifier 10.1109/TBME.2011.2122306
Fig. 1. Registration and label propagation components in atlas-based segmentation.
The approach of atlas-based segmentation was originated by the pioneering work in [1]. In the work, an atlas was modeled as a deformable object and elastically matched to CT brain images. Within a short period of time, other researchers [2]–[5] used atlases to perform segmentation. As an automatic way for labeling brain images, the atlas-based segmentation approach is simple and easy-to-use, and thus is widely used. It helps to first define what is meant here by atlas and atlasbased segmentation. An atlas is defined as a pairing of two images, an MR anatomical image and its corresponding manual labels. Segmentation allows assigning a label to each voxel in the MR anatomical image. Atlas-based segmentation is done as follows: given an atlas, segmentation for a query image (unseen) is carried out via image registration. The atlas MR anatomical image is first registered to the query image, yielding a transformation which allows the atlas labels to be propagated to the query image and thus a segmentation of the query image. Fig. 1 shows the components of atlas-based segmentation. The first block contains the registration component, which is the core component of any atlas-based segmentation. The second block is the label propagation component that involves applying the transformation obtained by the registration component to the atlas labels in order to get an estimate of the labels on the query image. Several datasets consisting of MR brain atlases are publicly available. Easy access to such atlases has resulted in advancing atlas-based segmentation approaches [6]. Single-atlasbased segmentation techniques [7]–[9] rely on a single atlas to propagate labels to the query image. If a database of atlases is available, the labels from multiple atlases can be propagated to the query image and then via fusion an improved label selection can be achieved. This approach is referred to as multiatlas-based segmentation and has been applied to bee
0018-9294/$31.00 © 2012 IEEE
YOUSEFI et al.: IMPROVED LABELING OF SUBCORTICAL BRAIN STRUCTURES IN ATLAS-BASED SEGMENTATION
brain segmentation databases [10] as well as to human brain images [11]–[14]. In [15], [16], atlas selection methods and their effects on multiatlas-based segmentation are discussed. Regardless of the type of atlas-based segmentation (single or multi) or the atlases used, the accuracy of segmentation is highly dependent on the registration methods utilized. Registration methods typically involve an initial affine registration followed by a nonlinear deformable (nonrigid) registration to map the corresponding structures on the atlas to the subject’s anatomy [17]. A thorough evaluation and comparison of deformable registration algorithms has been recently presented in [18]. That work suggests a ranking of different registration algorithms and, thus, provides a reference point to compare and select among different registration tools as well as a motivation for evaluating different registration methods for more specific tasks, such as atlas-based segmentation of subcortical brain structures [19], [20]. Subcortical brain structures have been the focus of many neuroimaging studies to better understand brain structural and functional connectivity and its variations in mental brain disorders, aging, and brain development [5], [21]–[24]. As high-resolution atlases of subcortical brain structures, such as hippocampus, are generated [25], atlas-based segmentation is expected to become even more popular and thus the effect of registration on atlas-based segmentation becoming increasingly more important. Prior studies that have some relation to this study are as follows. In [18], an evaluation of 14 nonlinear deformation algorithms for human brain MRI registration was presented. In [26], several piece-wise linear, linear warping algorithms in subcortical nuclei of the brain were compared. The focus of these two papers was on the comparison of the algorithms or tools used in registration where a good reference for selecting an algorithm (tool) for the registration was provided. In [27], a quantitative evaluation of registration algorithms in the medial temporal lobe of MRI images was performed to the medial temporal lobe region-of-interest (ROI). In [28], a FreeSurfer-initiated fully automated subcortical brain segmentation was discussed. What make this study different from the aforementioned studies are as follows. 1) Klein et al. [18] compare different algorithms of registration; however, the focus here is on comparing singleatlas based segmentation when using different registration methods (combination of registration steps to the entire brain area and (ROI). Two different algorithms are used here based on the ranking of the tools presented in [18] to support our findings of the multistep registration approach. 2) In [26], just one registration method using several algorithms is discussed. From a dataset point of view, a single bilateral version of a new high-resolution atlas for the purpose of surgical applications in subnuclei of thalamus is used; however, in this paper, four different registration methods are used together with two publically available datasets consisting of 58 MRI atlases. 3) In [27], a study is performed based on the Dice overlap metric to assess the accuracy of segmentation using a single database of 20 MR structural images while comparing
Fig. 2.
1809
Registration methods used in atlas-based segmentation.
two registration methods applied to the entire brain and to an ROI. 4) In [28], a new method for a fully automatic segmentation of the subcortical structures is proposed and no comparison between different methods is provided. A more detailed study for nine subcortical structures using additional segmentation metrics has been our motivation in this paper noting the variability of registration methods in atlas-based segmentation. Atlas-based segmentation outcomes are examined here using six segmentation metrics consisting of Dice overlap, relative volume error, false positive (FP) error, false negative (FN) error, surface distance (SD) of structures, and spatial extent (SE) variability. Two different publicly available datasets of atlases that offer the manually segmented images of the brain are used to serve as the gold standards for assessing the accuracy of the outcomes. II. METHODOLOGY In this section, we describe different registration methods and algorithms for the registration component or for getting the transformation function. The labels from the atlas are then propagated to the query image based on the transformation obtained. Regardless of the atlas selection strategy used (single or multi), the accuracy of the registration component affects the segmentation outcome. Fig. 2 shows two different registration algorithms and four different registration methods that can be utilized within the registration component or framework. From now on, we refer to the first row in Fig. 2 as method 1, second row as method 2, third row as method 3, and finally the fourth row as method 4. Finally, the accuracy of the segmentation is examined based on the registration method utilized. A. Registration Algorithms The registration process, finding a spatial transformation to map the pixels from the atlas MR image to the homologous pixels of a query image, is an essential component of many medical image analysis algorithms [17], [30]. There are several registration algorithms appearing in the literature including automated nonlinear image matching and anatomical labeling (ANIMAL) [31], automated image registration (AIR)
1810
algorithm [32], image registration toolkit (IRTK) [30], statistical parametric mapping (SPM) [33], FMRIB’s linear registration tool (FLIRT), [35], automated registration tool (ART) [36], diffeomorphic demons [37], SPM diffeomorphic anatomical registration using exponential lie algebra (SPM DARTEL) [33], FMRIB’s nonlinear image registration tool (FNIRT) [38], and symmetric normalization (SyN) [34] algorithm. Several linear, piece-wise-linear and nonlinear registration algorithms are compared in [26]. Klein et al. [18] evaluated 14 deformable registration algorithms in the literature and ranked them based on different metrics and criteria. The registration ranking presented by Klein et al. showed that using different algorithms for registration resulted in different registration accuracy. According to [18], SyN from advanced normalization tools (ANTS) (available: http://www.picsl.upenn.edu/ANTS) was selected as the best performing algorithm and FNIRT from widely used MR/DT analysis software package FMRIB Software Library (FSL) (available: http://www.fmrib.ox.ac.uk/fsl/index.html) as a mid performing but a widely used deformable registration algorithm. To support the accuracy of comparison between different registration methods as discussed in the previous section, SyN and FNIRT were used to perform the deformable registration step and ANTS and FLIRT (from FSL) the affine registration step.
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 59, NO. 7, JULY 2012
Fig. 3.
Methodology used for examining atlas-based segmentation.
ment. Multistep nonlinear registration methods (more than 2 steps) are also excluded from this study due to the fact that nonlinear registration tools (e.g., FNIRT and SyN) inherently involve local deformations and include multiple stages of local nonlinear registration (typically more than five stages). It is also found that a single global affine registration provides the sufficient and necessary operation before nonlinear registration. In essence, the methods considered here are generic and cover both global and local nonlinear methods in multiple scales. In addition, such multistep nonlinear methods do not appear in the literature. Note that the thrust of this paper is on comparing outcomes of single atlas-based segmentation when applying different registration methods appearing in the literature.
B. Registration Methods (Strategies)
C. Data and Experiments
One of the main sources of error in atlas-based segmentation is due to registration that is not being able to register the anatomy of the atlas with the corresponding anatomy of the query image. Different atlas-based segmentation approaches in the literature use different registration methods (see Fig. 2) based on different registration algorithms. Consequently, using different registration methods (method 1, 2, 3, or 4) in atlas-based segmentation results in different segmentation outcomes. In the registration part of the FIRST (FMRIB’s integrated registration segmentation toolkit) tool that is essentially a subcortical segmentation and registration based on shape and appearance models as part of the FSL package, an affine transformation is applied to the entire brain area followed by another affine transformation applied just to the subcortical brain structures [29]. This approach corresponds to the second row of the registration framework depicted in Fig. 2. In other words, the registration method in FSL is method 2 and the algorithm is FLIRT. As an example, in the atlas-based segmentation approach in [11], a combination of an affine transformation followed by a deformable transformation applied to the entire brain area is used. This method is very popular in the literature and is widely used. This approach corresponds to the third row of the registration framework depicted in Fig. 2. Another approach is the registration method discussed in [16] that uses a two-step method composed of an affine to the Montreal Neurological Institute (MNI) space followed by a deformable registration applied to the entire brain area. Single nonrigid registration is excluded from these methods since for the algorithms to be effective, a rigid registration should be applied first in order to place the images in global spatial align-
As shown in Fig. 2, all the registration methods are categorized here into four groups which are identified by the method adopted: method 1, method 2, method 3, and method 4. In the first method (the first row of Fig. 2), a single step affine registration is applied to the entire brain area. In method 2 (the second row of Fig. 2), a two-step registration is used; initially an affine transformation is applied to the entire brain area and then an affine registration is applied to the subcortical structures as ROI. In method 3 (the third row of Fig. 2), initially an affine transformation is applied to the entire brain area and then a deformable transformation is applied to the entire brain area. This method is popular among atlas-based segmentation approaches. In method 4, the deformable transformation in the second step of registration is applied only to specific ROIs or to the subcortical brain structures. To compare the outcomes of atlas-based segmentation based on different registration methods and algorithms, single-atlas-based segmentation was considered using two different databases. The experiments were done based on the FLIRT and FNIRT algorithms (FSL package) and affine and SyN algorithms (ANTS package) for registration steps. For the label propagation component, single-atlas-based segmentation was considered. Experiments were performed based on the atlas images in the two publicly available databases mentioned earlier. Fig. 3 shows the scenario used to evaluate different registration pipelines for atlas-based segmentation. In this scenario, the process starts by selecting two images from one of the databases; one is used as the atlas, the other as the query image. Then, the atlas is registered to the query image using one of the registration methods and one of the algorithms to obtain the transformation T. At the next step, the
YOUSEFI et al.: IMPROVED LABELING OF SUBCORTICAL BRAIN STRUCTURES IN ATLAS-BASED SEGMENTATION
Fig. 4. Example images from the IBSR and LONI databases: the first row corresponds to T1-weighted image from the IBSR database; the second row T1-weighted image from the LONI database; subcortical segmentations in the ROI boxes are overlaid on the anatomy. TABLE I DATABASE NUMBERS, AGE OF SUBJECTS, AND VOLUMES
labels of the atlas image are propagated to the query image based on the saved transformation T. For each database image and for each algorithm, the segmentation outcomes are compared to see which pipeline provides the most accurate atlas-based segmentation. The comparison is done based on four widely used segmentation metrics. D. Atlas Databases and Standard Space The first dataset used in our experiments consists of T1weighted MR brain images of 18 male and female subjects from the internet brain segmentation repository (IBSR) database (http://www.cma.mgh.harvard.edu/ibsr/). Each atlas in this database comprises 84 labeled structures. Since the thrust of this paper is on subcortical brain structures, 17 labeled structures were considered that included left accumbens, right accumbens, left amygdala, right amygdala, left caudate, right caudate, left hippocampus, right hippocampus, left lateral ventricles, right lateral ventricles, left pallidum, right pallidum, left putamen, right putamen, left thalamus, right thalamus, and brainstem. The second database consists of 40 T1-weighted MR brain images of male and female subjects from a probabilistic brain atlas made available by the University of California Los Angeles Laboratory of Neuro Imaging (LONI) [6] known as LPBA40. The atlases in this database are available online (http://www.loni.ucla.edu/Atlases/LPBA40). The number of labeled structures in each atlas is 56. The labeled subcortical structures in this dataset are left caudate, right caudate, left hippocampus, right hippocampus, left putamen, and right putamen. Fig. 4 shows sample images and their subcortical segmentations from these two datasets. The main advantage of these two datasets is that the manually delineated structures are available that can be used as the gold standard to examine the accuracy of the segmentation outcomes. The demographics for the IBSR and LONI databases are indicated in Table I. E. Implementation We used the leave-one-out cross-validation strategy in our experiments by interchanging the place of query images and
1811
atlases in a repetitive fashion. The simulated MNI brain image (nonlinear MNI152) provided by MNI [32] was used to define the standard template in the preprocessing steps. To generate masks for the subcortical structures, the MNI152 subcortical mask was dilated using a cubic structuring element (3 × 3 × 3) and then the smallest cube that contained all of the voxels was selected as the subcortical mask. Fig. 4 shows the atlases and cubic subcortical masks used in these experiments. The preprocessing consisted of transforming all the IBSR and LPBA40 images (MR image and labels) to the MNI152 standard space using the FLIRT algorithm with the following settings: nine-parameter degree of freedom and normalized mutual information as the similarity measure, trilinear interpolation for atlas MR images and nearest neighbor interpolation, and zero padding size for atlas label images. Then, in the first step of each method, the FLIRT algorithm was used to apply affine transformation with the aforementioned settings and using correlation ratio as the similarity measure. In the second step of methods 3 and 4 for deformable registration, the FNIRT and ANTS algorithms were utilized. For the FNIRT algorithm the standard parameters (default) with a hierarchical coarse-to-fine subsampling scheme of 8, 4, 2, 2 were used (general recommendations in the FNIRT documentation). The ANTS algorithm was used with SyN providing SyN and Gaussian regularization with sigma of 2. The optimization was performed with (30 × 99 × 11) number of iterations at each resolution step using histogram matching and cross correlation with a window radius of 2 and a weight of 1 as similarity measure. These parameters are considered standard parameters of ANTS which were also used in [18]. In method 1, there was just a single step affine transformation and in methods 2 and 4, the registration was applied just to the cubic mask using the mask option of these algorithms while in method 3 the deformable registration was applied to the entire brain area. F. Segmentation Metrics An effective validation mechanism is an important part of any segmentation approach. Most of the commonly used metrics are based on the overlap concept and actually compare the degree of overlap of two binary labels. The most popular metric in this family is Dice overlap metric [39] defined as follows: D=
2 × vol(A ∩ Q) vol(A) + vol(Q) 0≤D≤1
(1)
where vol indicates the volume of the structure. Actually, Dice is a specific metric from a more general family of overlap metrics. It provides the amount of overlap in the intersection of the atlas label and query label. In a perfect situation, this metric becomes 1. The other widely used metric is relative volume difference (RV) that indicates the difference in volume of the atlas label and query label and is defined as follows: RV =
2 × |vol(A) − vol(Q)| vol(A) + vol(Q) 0 ≤ RV ≤ 1
(2)
1812
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 59, NO. 7, JULY 2012
where vol(A) – vol(Q) indicates all voxels outside of the intersection of two volumes. In a perfect situation, RV becomes 0. FN and FP are other metrics that are used to assess segmentation outcomes. FN for a given volume is an indication of how any of the voxels in that volume are incorrectly labeled. It is defined as follows: vol(A)\vol(Q) vol(A)
FN =
0 ≤ FN ≤ 1
(3)
where vol(A)\vol(Q) indicates the set of voxels in vol(A) but not in vol(Q). In a perfect situation, FN is 0. FP for a given volume is a measure of how many of the voxels outside that volume are labeled incorrectly as compared to the voxels inside the volume. It is defined as follows: vol(A)\vol(Q) vol(Q)
FP =
0 ≤ FP ≤ 1.
(4)
In a perfect situation, FP is 0. Another useful metric in comparing the boundary discrepancies between segmented structures is the average SD error. SD is equal to the minimum distance of each point of the surface boundary of an atlas structure to all boundary voxels of a query image structure averaged across all the boundary voxels as follows: 1 min dist(Abi , Qb) n i=1 n
SD =
0 ≤ SD ≤ D
(5)
where n is number of voxels in the atlas boundary, min dist is a function providing the minimum distance between a voxel and a set of voxels, and Ab and Qb represent the atlas and query surface boundary voxels, respectively. Since this metric is not symmetric, it was made symmetric by using the method described in [41]. In a perfect situation, SD is zero and in a worst case it is D (greatest dimension of the image). The surface boundary here is considered to be the neighboring voxels within two voxels based on the city block distance. For each method, each algorithm and each structure, this process is done separately. The last metric used to evaluate the results is SE. The probability maps of the labels can be computed using each method. SE reveals the spatial variability between different atlases and a query using different registration methods. All of the labels of atlases are propagated to a query image based on each method and the probability maps are computed based on the segmentation outcomes as follows: P (x, y, z) =
N 1 Li (x, y, z) N i=1
(6)
where P indicates the probability map for each structure, (x, y, z) represents the voxel coordinate, N is the number of images, and L is a binary image representing one of the structures. Then,
SE is computed based on the probability maps as follows: SE = vol(P > t) 0 ≤ SE ≤ V
(7)
where t is a specified threshold. Under a perfect situation, SE becomes V (volume of the structure) meaning that the registration method perfectly aligns two structures leaving no variability between structures. In the worst situation, it becomes zero. This metric was also used in [40] to evaluate the registration step in the FIRST algorithm. For each database, a query image is selected and the labels of the other atlases are propagated to that atlas based on each method. Then, the probability map of each structure is computed for each method. Based on the leave-one-out strategy, for each atlas in the dataset, this process is iterated. Then, the SE results are computed and averaged for each method. The SE metric is computed here for p > 0.15 to exclude outliers. G. Statistical Analysis Testing for significance in the performance of the discussed segmentation methods is not straightforward because of the dependence of observation caused by reusing the images several times in the leave-one-out strategy. We design two tests for statistical analysis. First, we examine the dependence problem and compute the cross correlation (CC) between samples. To determine the degree of correlation across source-target registration pairs in each dataset, we generate four groups and in each group we generate two columns each consisting of an image pair. The first group has no-dependence configuration (a → b, c → d) meaning that each brain image is used only once results in four rows for the IBSR and ten rows for the LPBA datasets. The second group has the source and target dependence configuration (a → b, b → a) resulting in nine rows for the IBSR and 20 rows for the LPBA datasets. The third group has the source dependence configuration (a → b, a → c) results in 6 rows for the IBSR and 13 rows for the LPBA datasets and finally the fourth group has the target dependence configuration (a → b, c → b) results in 6 rows for the IBSR and 13 rows for the LPBA datasets. For each group, we compute the Dice overlap metric for each registration method and compute the correlation between the left and right column. We repeat this procedure 1000 times using a new pair set each time and average the results. We refer to this test as our first statistical analysis test. In our second test, we apply two different statistical analysis methods including classical one-way ANOVA test and permutation test on the samples in the no-dependence group to avoid dependence of the results. Based on these tests we rank different registration methods. We apply the classical one-way ANOVA to the Dice overlap metric of each registration method of the no-dependence group and subject these results to a multiple comparison test using Bonferroni correction to determine which pairs of means are significantly different. We consider a disjoint 95% confidence interval about the mean based on the critical value from t-distribution. The ANOVA test is repeated 20 times, each time randomly selecting samples from the no-dependence group.
YOUSEFI et al.: IMPROVED LABELING OF SUBCORTICAL BRAIN STRUCTURES IN ATLAS-BASED SEGMENTATION
1813
There is one issue with the ANOVA test in our analysis: because of the limited number of samples in the datasets, sometimes the overlap values have a skewed distribution making the p-values not exact in the test. In order to address this issue, we apply a permutation test following the permutation algorithm mentioned in [18] as follows: the subset of brain pairs from the no-dependence group from each registration method is selected. Then, for each pair of methods (two vectors of total Dice overlap values), first the mean difference of the Dice overlap values between the vectors is computed. Then, a subset of the elements from one of the vectors is selected and swapped across the other vector and the mean difference between the two new vectors is computed. This process is repeated 1000 times and the number of times (n) that the new mean difference is greater than the first mean difference is counted. The exact p-value is calculated as n/1000. The entire process is repeated 10 000 times and the fraction of times that p < 0.05 is noted. This test generates exact p-values and is more accurate than the ANOVA test. We refer to these ANOVA and permutation tests as our second statistical analysis tests. III. RESULTS AND DISCUSSION Experiments were carried out using the leave-one-out crossvalidation per dataset to assess the accuracy of single-atlasbased segmentation based on different registration methods using two registration algorithms. Considering the algorithms (ANTS or FSL), for each single-atlas based segmentation applied to the IBSR dataset, a total of 306 segmentations and for the LONI dataset, a total of 1560 segmentations were performed for each registration method. Fig. 5 shows the results based on the atlases in the IBSR dataset and using the FSL algorithm (FLIRT for affine and FNIRT for deformable registration) for nine different subcortical structures. Avg in the last column of each plot denotes the average of a metric for all the subcortical structures. In Fig. 5(a), the Dice overlap metric is shown. Using method 4, all the substructures generated better segmentation accuracy in terms of this metric in a consistent way. Fig. 5(b) shows the results for the relative volume error and Fig. 5(c) and (d) displays the FN and FP errors, respectively. On average, method 4 produced the best accuracy or the least amounts of FN and FP errors. To support the effects of different registration methods regardless of the algorithm used for the registration steps, the same process was repeated using a different algorithm for the registration steps. Fig. 6 shows the results for the pervious metrics based on the atlases in the IBSR dataset and using the ANTS algorithm (affine and deformable registration) for nine different subcortical structures. Fig. 6(a) shows the Dice overlap metric results. The same as the previous experiments, with the ANTS tool, method 4 generated better overlap metrics. Fig. 6(b)–(d) illustrates the relative volume error, FN and FP metrics, respectively. Method 4 produced better results consistently. Fig. 7 represents the results based on the atlases in the LPBA dataset using the FSL algorithm (FLIRT for affine and FNIRT for deformable) for three different subcortical structures. In Fig. 7(a), the Dice overlap metric is displayed. Fig. 7(b) shows the relative volume
Fig. 5. Segmentation metrics: (a) Dice, (b) relative volume error, (c) FN, and (d) FP for single-atlas based segmentation based on four registration methods (FSL algorithm) on the IBSR dataset with 306 brain image labels.
1814
Fig. 6. Segmentation metrics: (a) Dice, (b) relative volume error, (c) FN, and (d) FP for single-atlas based segmentation based on four registration methods (ANTS algorithm) on the IBSR dataset with 306 brain image labels.
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 59, NO. 7, JULY 2012
Fig. 7. Segmentation metrics: (a) Dice, (b) relative volume error, (c) FN, and (d) FP for single-atlas based segmentation based on four registration methods (FSL algorithms) for the LPBA dataset with 1540 brain image labels.
YOUSEFI et al.: IMPROVED LABELING OF SUBCORTICAL BRAIN STRUCTURES IN ATLAS-BASED SEGMENTATION
Fig. 8. Segmentation metric average surface distance. (a) Based on FSL algorithms. (b) Based on ANTS algorithms for the IBSR dataset.
error results. On average, method 3 generated better accuracy in terms of this metric. Fig. 7(c) and (d) represents FN and FP errors, respectively. On average, method 3 produced better outcome. In Figs. 8 and 9, the outcomes for the average SD and SE metrics are presented, respectively. As can be seen from these figures, methods 3 and 4 generate less variability in SD and SE indicating their effectiveness across image variability in the datasets. In Fig. 10, when using the IBSR dataset and FNIRT algorithm, the whisker of the Dice metric for registration methods 3 and 4 are represented in a more detailed statistical comparison manner. Each box represents values obtained by atlas-based segmentation and has lines at the lower quartile, median, and upper quartile values; whiskers extend from each end of the box to the most extreme values within 1.5 times the interquartile range from the box. Outliers (+) have values beyond the ends of the whiskers. As can be seen, for nearly every subcortical structure, the segmentation accuracy obtained using registration method 4 based on the median and 75th percentile exceeds the segmentation accuracy obtained using registration method 3. This shows that applying the second deformable registration to the subcortical structures results in better registration and consequently better segmentation. These results are in agreement with the results presented in [27] where it was reported that the temporal lobe segmentation outcome was more accurate when the registration was
1815
Fig. 9. Segmentation metric spatial extent. (a) Based on FSL algorithms. (b) Based on ANTS algorithms for the IBSR dataset.
Fig. 10. Atlas-based segmentation based on registration methods 3 and 4: For every structure the whisker plot to the left of the dotted line corresponds to method 3 and the one to the right to method 4 (IBSR dataset).
applied to the brain temporal lobe. In Fig. 11, for further illustration, the Dice overlap metric is shown for registration method 4 based on the ANTS and FSL algorithms using the IBSR dataset. As can be seen from this figure, in particular for the hippocampus and lateral ventricle structures, the median of the segmentation accuracy using the ANTS algorithm exceeds the 75th percentile of the segmentation outcomes using the FSL algorithm. In fact, for all the subcortical structures, the use of the ANTS algorithm leads to better Dice overlap segmentation accuracy which clearly demonstrates the effect of the registration algorithm on atlas-based segmentation. It is worth mentioning that these results are also in agreement with the registration results presented in [18] regarding the accuracy of the deformable registration algorithms. We apply two statistical analysis tests discussed in Section II-G to examine the statistical significance
1816
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 59, NO. 7, JULY 2012
As can be seen from this table, method 3 lies within the first standard deviation of method 4 for the IBSR dataset, whereas for the LPBA dataset, the difference is relatively significant which is an indication of the significantly improved Dice overlap metric when using method 4. For the other metrics, the difference between methods 3 and 4 was not significant; however, the difference between methods 1 or 2 and methods 3 or 4 were significant.
Fig. 11. Atlas-based segmentation based on the FSL and ANTS algorithms: for every structure, the whisker plot to the left corresponds to FSL and the one to the right to ANTS (IBSR dataset). TABLE II CC BETWEEN THE SAMPLES OF FOUR GROUPS FOR TESTING THE INDEPENDENCE OF OBSERVATIONS ON DICE METRIC
TABLE III CLASSICAL ONE-WAY ANOVA TEST RANKING OF THE REGISTRATION METHODS (μ = MEAN, STDEV = STANDARD DEVIATION) ON DICE METRIC
TABLE IV PERMUTATION TEST RANKING OF THE REGISTRATION METHODS (μ = MEAN, STDEV = STANDARD DEVIATION) ON DICE METRIC
of the differences in segmentation results. Table II shows the outcome of our first statistical analysis test. As can be seen, for no dependence condition (group 1), the correlation values were close to zero as expected. For other conditions, there were high correlations between the results. From this table, one can conclude that: first, there was no symmetry between group 3 and 4 showing the correlation in source dependence and target dependence were not the same and besides for both datasets the source dependence had higher correlation than the target dependence. Second, the correlation of dependent groups of LPBA was less than the correlation of the same groups of the IBSR dataset denoting lower dependence between the brain images in the LPBA dataset than the images in the IBSR dataset. The results of our second statistical analysis tests, i.e., the rankings based on ANOVA and the permutation test, are shown in Tables III and IV, respectively.
IV. CONCLUSION The outcomes of atlas-based segmentation of subcortical brain structures based on four different registration strategies were examined in this paper. The evaluation was performed based on two datasets and using six segmentation metrics. The results obtained from a large number of experiments using statistical tests on the leave-one-out cross-validation strategy has shown that for segmentation of subcortical brain structures, it is more effective to use a two-step registration consisting of an affine transformation applied to the entire brain area followed by a deformable transformation applied only to the subcortical structures. The results indicate that the FLIRT algorithm performed slightly better than ANTS when a single affine transformation was applied to the entire brain area while ANTS performed better than FLIRT when applying an affine transformation to subcortical structures. The deformable registration of ANTS also performed better than FNIRT. On average, ANTS also performed better in terms of relative volume error, FN, FP, SD, and SE errors than FSL. Finally, a ranking of the registration methods has been provided based on the statistical tests on segmentation results. In summary, affine registration of different algorithms performs more or less the same (FLIRT and ANTS) while ANTS performs better in deformable registration. Thus, it is recommended to use ANTS for atlas-based segmentation of subcortical structures. Finally, based on a large number of experiments involving different methods, algorithms, datasets and metrics, this study provides the results of segmentations and in addition indicates that a two-step registration consisting of an affine transformation applied to the entire brain area followed by a deformable transformation applied only to the subcortical structures is more effective than other registration methods and generates the most accurate labels for the subcortical brain structures.
REFERENCES [1] C. Broit, “Optimal registration of deformed images,” Ph.D. dissertation, Dept. of Comput. and Informat. Sci. Univ. of Pennsylvania, Philadelphia, 1981. [2] J. C. Gee, M. Reivich, and R. Bajcsy, “Elastically deforming a three dimensional atlas to match anatomical brain images,” J. Comput. Assist. Tomogr., vol. 17, no. 2, pp. 225–236, 1993. [3] M. I. Miller, G. E. Christensen, Y. Amit, and U. Grenander, “Mathematical textbook of deformable neuroanatomies,” Proc. Natl. Acad. Sci. USA., vol. 90, no. 24, pp. 11944–11948, 1993. [4] D. L. Collins, C. J. Holmes, T. M. Peters, and A. C. Evans, “Automatic 3-D model-based neuroanatomical segmentation,” Hum. Brain Mapp., vol. 3, no. 3, pp. 190–208, 1995. [5] W. R. Crum, R. I. Scahill, and N. C. Fox, “Automated hippocampal segmentation by regional fluid registration of serial MRI: validation and
YOUSEFI et al.: IMPROVED LABELING OF SUBCORTICAL BRAIN STRUCTURES IN ATLAS-BASED SEGMENTATION
[6]
[7]
[8]
[9]
[10]
[11] [12] [13]
[14]
[15]
[16] [17] [18]
[19] [20]
[21]
[22] [23]
application in Alzheimer’s disease,” NeuroImage, vol. 13, no. 5, pp. 847– 855, 2001. D. W. Shattuck, M. Mirza, V. Adisetiyo, C. Hojatkashani, G. Salamon, K. L. Narr, R. A. Poldrack, R. M. Bilder, and A. W. Toga, “Construction of a 3D probabilistic atlas of human cortical structures,” NeuroImage, vol. 39, pp. 1064–1080, 2007. D. V. Iosifescu, M.E. Shenton, S. K. Warfield, R. Kikinis, J. Dengler, F. A. Jolesz, and R. W. McCarley, “An automated registration algorithm for measuring MRI subcortical brain structures,” NeuroImage, vol. 6, no. 1, pp. 13–25, 1977. P. D’Haese, V. Duay, T. Merchant, B. Macq, and B. Dawant, “Atlasbased segmentation of the brain for 3-dimensional treatment planning in children with infratentorial ependymoma,” in Proc. 6th Int. Conf. Med. Image Comput. Comput.-Assisted Intervention (MICCAI), 2003, pp. 627– 634. C. Svarer, K. Madsen, S. Hasselbalch, L. Pinborg, S. Haugbol, V. Frokjaer, S. Holm, O. Paulson, and G. Knudsen, “MR-based automatic delineation of volumes of interest in human brain PET images using probability maps,” NeuroImage, vol. 24, no. 4, pp. 969–979, 2005. T. Rohlfing, R. Brandt, R. Menzel, and C. Maurer, “Evaluation of atlas selection strategies for atlas-based image segmentation with application to confocal microscopy images of bee brains,” NeuroImage, vol. 21, no. 4, pp. 1428–1442, 2004. R. Heckemann, J. Hajnal, P. Aljabar, D. Rueckert, and A. Hammers, “Automatic anatomical brain MRI segmentation combining label propagation and decision fusion,” NeuroImage, vol. 33, no. 1, pp. 115–126, 2006. A. Klein and J. Hirsch, “Mindboggle: A scatterbrained approach to automate brain labeling,” NeuroImage, vol. 24, no. 2, pp. 261–280, 2005. S. Warfield, K. Zou, and W. Wells, “Simultaneous truth and performance level estimation (STAPLE): An algorithm for the validation of image segmentation,” IEEE Trans. Med. Imag., vol. 23, no. 7, pp. 903–921, 2004. T. Rohlfing, D. Russakoff, and C. Maurer, “Performance-based classifier combination in atlas-based image segmentation using expectationmaximization parameter estimation,” IEEE Trans. Med. Imag., vol. 23, no. 8, pp. 983–994, 2004. P. Aljabar, R. Heckemann, A. Hammers, J. Hajnal, and D. Rueckert, “Classifier selection strategies for label fusion using large atlas databases,” in Proc. 10th Int. Conf. Med. Image Comput. Comput.–Assisted Intervention (MICCAI), Lecture Notes in Computer Science, 2007, vol. 4791, pp. 523– 531. P. Aljabar, R. A. Heckemann, A. Hammers, J. V. Hajnal, and D. Rueckert, “Multi-atlas based segmentation of brain images: Atlas selection and its effect on accuracy,” NeuroImage, vol. 46, pp. 726–738, 2009. A. Gholipour, N. Kehtarnavaz, R. Briggs, M. Devous, and K. Gopinath, “Brain functional localization: A survey of image registration techniques,” IEEE Trans. Med. Imag., vol. 26, pp. 427–451, 2007. A. Klein, J. Andersson, B. A. Ardekani, J. Ashburner, B. Avants, M. C. Chiang, G. E. Christensen, D. L. Collins, J. Gee, P. Hellier, J. H. Song, M. Jenkinson, C. Lepage, D. Rueckert, P. Thompson, T. Vercauteren, R. P. Woods, J. J. Mann, R. V. Parsey, “Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration,” NeuroImage, vol. 46, no. 3, pp. 786–802, 2009. S. Yousefi, N. Kehtarnavaz, K. Gopinath, and R. Briggs, “Two-stage registration of substructures in magnetic resonance brain images,” Proc. IEEE Int. Conf. Image Process (ICIP) Conf., pp. 1729–1732, 2009. S. Yousefi, N. Kehtarnavaz, A. Gholipour, K. Gopinath, and R. Briggs, “Comparison of atlas-based segmentation of subcortical structures in magnetic resonance brain images,” in Proc. IEEE Southwest Symp. Image Anal. Interpretation. Austin, TX, 2010, pp. 1–4. M. Chupin, A. R. Mukuna-Bantumbakulu, D. Hasboun, E. Bardinet, S. Baillet, S. Kinkingn´ehun, L. Lemieux, B. Dubois, and L. Garnerob, “Anatomically constrained region deformation for the automated segmentation of the hippocampus and the amygdala: Method and validation on controls and patients with Alzheimer’s disease,” NeuroImage, vol. 34, pp. 996–1019, 2007. B. Avants, P. Yushkevich, J. Pluta, D. Minkoff, M. Korczykowski, J. Detre, and J. Gee, “The optimal template effect in hippocampus studies of diseased populations,” NeuroImage, vol. 49, pp. 2457–2466, 2009. M. Ystad, T. Eichele, A. J. Lundervold, and A. Lundervold, “Subcortical functional connectivity and verbal episodic memory in healthy elderly—A resting state fMRI study,” NeuroImage, vol. 52, pp. 379–388, 2010.
1817
[24] J. U. Blackford, J. W. Buckholtz, S. N. Avery, and D. H. Zald, “A unique role for the human amygdala in novelty detection,” NeuroImage, vol. 50, pp. 1188–1193, 2010. [25] P. A. Yushkevich, B. B. Avants, J. Pluta, S. R. Das, D. Minkoff, D. Mechanic-Hamilton, S. Glynn, S. Pickup, W. Liu, J. C. Gee, M. Grossman, and J. A. Detre, “A high-resolution computational atlas of the human hippocampus from postmortem magnetic resonance imaging at 9.4 T,” NeuroImage, vol. 44, pp. 385–398, 2009. [26] M. M. Chakravarty, A. F. Sadikot, J. Germann, P. Hellier, G. Bertrand, and D. L. Collins, “Comparison of piece-wise linear, linear, and nonlinear atlas-to-patient warping techniques: Analysis of the labeling of subcortical nuclei for functional neurosurgical applications,” Human Brain Mapping, vol. 30, pp. 3574–3595, 2009. [27] M. A. Yassa and C. E. L. Stark, “A quantitative evaluation of cross participant registration techniques for MRI studies of the medial temporal lobe,” NeuroImage, vol. 44, pp. 319–327, 2009. [28] A. R. Khan, L. Wang, and M. F. Beg, “FreeSurfer-initiated fully-automated subcortical brain segmentation in MRI using large deformation diffeomorphic metric mapping,” NeuroImage, vol. 41, no. 3, pp. 735–746, 2008. [29] B. Patenaude, S. Smith, D. Kennedy, and M. Jenkinson, “Bayesian shape and appearance models,”, FMRIB Technical, Oxford, U.K., Rep. TR07BP1, 2007, pp. 1–23. [30] D. Rueckert, L. Sonoda, C. Hayes, D. Hill, M. Leach, and D. Hawkes, “Non-rigid registration using free-form deformations: Application to breast MR images,” IEEE Trans. Med. Imag., vol. 18, no. 8, pp. 712–721, 1999. [31] D. L. Collins, P. Neelin, T. M. Peters, and A. C. Evans, “Automatic 3D intersubject registration of MR volumetric data in standardized talairach space,” J. Comput Assisted Tomogr., vol. 18, pp. 192–205, Apr. 1994. [32] R. P. Woods, S. T. Grafton, C. J. Holmes, S. R. Cherry, and J. C. Mazziotta, “Automated image registration: I. General methods and intrasubject, intramodality validation,” J. Comput. Assisted Tomogr., vol. 22, pp. 139– 152, 1998. [33] J. Ashburner, “A fast diffeomorphic image registration algorithm,” NeuroImage, vol. 38, pp. 95–113, 2007. [34] B. B. Avants, C. L. Epstein, M. Grossman, and J. C. Gee, “Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain,” Med. Image Anal., vol. 12, pp. 26–41, 2008. [35] M. Jenkinson and S. Smith, “A global optimisation method for robust affine registration of brain images,” Med. Image Anal., vol. 5, pp. 143– 156, 2001. [36] B. Ardekani, M. Braun, B. F. Hutton, I. Kanno, and H. Iida, “A fully automatic multimodality image registration algorithm,” J. Comput Assisted Tomogr., vol. 19, pp. 615–623, Aug. 1995. [37] T. Vercauteren, X. Pennec, A. Perchant, and N. Ayache, “Non-parametric diffeomorphic image registration with the demons algorithm,” Med. Image Comput. Comput.-Assisted Intervention (MICCAI), vol. 4792, pp. 319–326, 2007. [38] J. Andersson, S. Smith, and M. Jenkinson, “FNIRT—FMRIB’s nonlinear image registration tool,” in Proc. 14th Annual Meeting of the Organization for Human Brain Mapping, Victoria, Australia, 2008. [39] L. Dice, “Measures of the amount of ecologic association between species,” Ecology, vol. 26, p. 297, 1945. [40] B. Patenaude, “Bayesian statistical models of shape and appearance for subcortical brain segmentation,” D.Phil. thesis, Dept. Neurosci., Univ. of Oxford, Oxford, U.K., 2007. [41] G. Gerig, M. Jomier, and M. Chakos, “Valmet: A new validation tool for assessing and improving 3D object segmentation,” Med. Image Comput. Comput-Assisted Intervention (MICCAI), pp. 516–523, 2001.
Authors’ photographs and biographies not available at the time of publication.