IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 5, SEPTEMBER 2008
595
A 3-D Active Shape Model Driven by Fuzzy Inference: Application to Cardiac CT and MR Hans C. van Assen, Mikhail G. Danilouchkine, Martijn S. Dirksen, Johan H. C. Reiber, Fellow, IEEE, and Boudewijn P. F. Lelieveldt, Member, IEEE
Abstract—Manual quantitative analysis of cardiac left ventricular function using Multislice CT and MR is arduous because of the large data volume. In this paper, we present a 3-D active shape model (ASM) for semiautomatic segmentation of cardiac CT and MR volumes, without the requirement of retraining the underlying statistical shape model. A fuzzy c-means based fuzzy inference system was incorporated into the model. Thus, relative gray-level differences instead of absolute gray values were used for classification of 3-D regions of interest (ROIs), removing the necessity of training different models for different modalities/acquisition protocols. The 3-D ASM was evaluated using 25 CT and 15 MR datasets. Automatically generated contours were compared to expert contours in 100 locations. For CT, 82.4% of epicardial contours and 74.1% of endocardial contours had a maximum error of 5 mm along 95% of the contour arc length. For MR, those numbers were 93.2% (epicardium) and 91.4% (endocardium). Volume regression analysis revealed good linear correlations between manual and semiautomatic volumes, r 2 ≥ 0.98. This study shows that the fuzzy inference 3-D ASM is a robust promising instrument for semiautomatic cardiac left ventricle segmentation. Without retraining its statistical shape component, it is applicable to routinely acquired CT and MR studies. Index Terms—3-D, active shape model, CT, fuzzy c-means (FCM), fuzzy inference, MR, modality independence, segmentation.
I. INTRODUCTION N CARDIAC imaging, multislice multiphase CT and MR are increasingly used for cardiac left ventricle (LV) function analysis, yielding large amounts of 3-D image data. Segmentation of such dynamic medical image data is a necessary step for quantification of global and regional LV function in terms of stroke volume, ejection fraction, wall thickness, and wall thickening. Manual segmentation of these datasets is a tedious,
I
Manuscript received January 30, 2007; revised July 31, 2007. First published May 30, 2008; current version published September 4, 2008. H. C. van Assen was with the Division of Image Processing, Department of Radiology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands. He is now with Biomedical Image Analysis, Faculty of Biomedical Engineering, Technical University Eindhoven, 5600 MB Eindhoven, The Netherlands. M. G. Danilouchkine was with the Division of Image Processing, Department of Radiology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands. He is now with the Department of Cardiology, Erasmus Medical Center, 3015 CE Rotterdam, The Netherlands. M. S. Dirksen is with the Department of Radiology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands. J. H. C. Reiber and B. P. F. Lelieveldt are with the Division of Image Processing, Department of Radiology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TITB.2008.926477
labor-intensive, and practically infeasible task. An automatic and intrinsically 3-D segmentation method is highly desired. Region- and edge-based low-level image processing techniques are generally not robust enough for application in clinical practice. Such low-level methods expect clear well-defined image features, while in medical image data fuzzy edges, inhomogeneity and noise are commonly observed, causing segmentation failures. Because of this, 3-D medical image segmentation tasks require a priori knowledge that the classical segmentation methods lack, e.g., knowledge about organ shape and intensity properties. In recent years, much work on knowledge- and model-driven segmentation has been described. Examples of such methods are deformable statistical models, which enable incorporation of a priori knowledge about statistically plausible organ shapes. These models are built using a training set of expert segmentations of an organ. Examples of models used for segmentation in a statistical framework are as follows: point distribution models (PDMs) [1]–[3], active shape models (ASMs) [4]–[6], active appearance models (AAMs) [7], shape parameterizations based on spherical harmonics [8], constrained level sets [9], and statistical deformation models [10], [11]. A more elaborate introduction on segmentation methods, including considerations on their usefulness and limitations with respect to medical image segmentation can be found in [12]. A review on 3-D cardiovascular image analysis can be found in [13]. Although the shape of the modeled anatomical organ remains identical, its tomographic appearance may substantially vary across radiological modalities used in clinical practice. For example, cardiac images of high quality with easily identifiable morphological structures are readily obtained on the modern high-field MRI systems. It allows for more accurate delineation of the ventricular boundaries and leads to reproducible quantitative results of cardiac global function [14]. However, inherently high noise level and signal dropouts impair detection of the boundary location in ultrasonic imagery. This renders the manual delineation of the cardiac structures for the purpose of shape modeling into a subjective procedure and introduces undesirable noise in shape variation statistics. Also, construction of a statistical model of normal anatomy from CT data is difficult due to the ionizing nature of CT. Hence, it is appealing to be able to build up a statistical model of the anatomical organs from the high-quality datasets obtained with MRI and use it interchangeably for recognition tasks on data from a wide variety of radiological scanners. The main contribution of this research is the development of a 3-D ASM that performs comparably on both MR and CT
1089-7771/$25.00 © 2008 IEEE
596
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 5, SEPTEMBER 2008
datasets, without retraining the model (i.e., collecting expert contours in a large amount of new datasets). Therefore, we required a method that minimizes dependence on image modality: this excludes the incorporation of a gray-level model (GLM), as in the “classic” ASM and active appearence model (AAM) definitions by Cootes et al. [4], [7]. The method presented here builds on previous work, which was presented (using only CT data, of only nine datasets) as a proof of concept [15]. This paper gives a deeper theoretical insight and demonstrates modality independence and was also evaluated more thoroughly in a larger quantitative evaluation study (using border positioning and volumetric errors, and clinical contour quality analysis) on both MR and CT data. II. BACKGROUND ASMs consist of two parts, the statistical shape model and the model-matching algorithm.
Fig. 1. Line parameterization, defining the point correspondence for the 3-D cardiac left ventricle.
III. 3-D ASM This section describes our approach to 3-D ASM model construction. With regard to the matching algorithm, the fuzzy inference (FI)-based edge detection technique and the model update scheme are presented. A. Model Generation
A. Shape Modeling To generate a point distribution model, the shape of interest is described using a set of n sample points. Shape differences can only be modeled if a point correspondence has been established, according to which the sample point coordinates are inserted into a shape vector x = (x1 , y1 , z1 , . . . , xn , yn , zn )T .
(1)
From all shapes xi in the training set, a mean shape vector and a covariance matrix are calculated, followed by principal component analysis (PCA) on the covariance matrix to compute the eigenvectors and eigenvalues of the training set. Depending on the amount of variation in the training set represented by the model, the eigenvectors φi corresponding to the t largest eigenvalues are retained, represented by matrix Φ = (φ1 | φ2 | · · · | φt ). From this matrix, a shape can be approximated x ≈ x + Φb
(2)
with b a t-dimensional vector containing the model parameters b = ΦT (x − x).
(3)
More elaborate background information on ASMs can be found in [1] and [4]. B. Model Matching Usually, the matching algorithm (in ASMs and AAMs) is an iterative approach, in which the model is aligned and deformed to match image evidences from the surroundings of the actual model state. During matching, two shape vectors exist, one representing the mean shape and one representing the proposed shape. From the mean and proposed shapes, the parameter vector b controlling model deformation is calculated using (3), with x being the vector representing the proposed shape and x the vector representing the mean shape of the shape model. The allowed shape instances are limited by the statistical shape description from the training set.
To define the parameterization representing a particular shape instance (in this case, LV end diastole), starting at the angle derived from the manually defined landmark—positioned at the posterior junction of the right- and left-ventricular myocardium in a mid-ventricular slice—each manual contour (i.e., both endoand epicardial contours on all slices intersecting the LV) is sampled at equidistant angles with respect to the LV long axis, thus generating additional (pseudo-) landmarks. The resulting point sets from different subjects do not necessarily contain an equal number of slices. To resolve this, the resulting point sets were resampled to all represent the same number of slices, and concurrently the same number of landmarks. Each shape sample is then expressed as a 3n element vector x [see (1)]. This yields a line parameterization with a specific point-to-point ordering, shown in Fig. 1. Using this parameterization, both the LV endo- and epicardial surface are concatenated into a single vector, and thus, modeled together. This application-specific point correspondence has also been used in a 3-D AAM presented earlier [16] and in our earlier work regarding the 3-D ASM [15]. After resampling, all shapes are aligned. For alignment of the training samples, procrustes analysis [17] is applied. We adopted the method by Besl and McKay for registration of corresponding 3-D point sets [18]. The modes of variation of the model were computed by application of PCA on the covariance matrix of the shape vectors (see Section II-A). B. Model Matching 1) 3-D Model Updates From 2-D Image Features: The key idea behind our matching approach is applicability to (anisotropically sampled) data acquired with arbitrary image slice orientations, and to data acquired with different imaging modalities. Because the orientation of CT image planes is always axial, it can be considered arbitrary with respect to the orientation of the heart. To build 3-D volumetric data from multiple slices containing the anatomical organ, the information in the undersampled regions (i.e., in between the imaging planes) has to
VAN ASSEN et al.: 3-D ACTIVE SHAPE MODEL DRIVEN BY FUZZY INFERENCE
be interpolated. The interpolation quality strongly depends on the through-plane resolution. Moreover, interpolation between image slices with different orientations (e.g., a radial stack of cardiac long axis views) is ill-posed. The design requirement of applicability to arbitrarily oriented (and anisotropically sampled) image slices implicates that only 2-D image data be used for updating our 3-D model. To realize this, a 3-D triangular mesh representing the shape of the heart was constructed. The model update at each iteration is computed from a set of points, formed by intersection of the mesh edges with the imaging planes. To remove dependencies on image orientation or anisotropic through-plane resolution, model update information at a given intersection point is represented by a 2-D in-plane point-displacement vector. The 2-D update vectors located at the mesh edges are first copied to the closest nodes of the mesh, and to exploit the 3-D character of the model, projected to the local surface normals. Multiple contributions from different mesh intersections to a single mesh node are averaged to yield a single 3-D update vector per node. 2) Shape Parameter Vector Updates: Scaling, rotation, and translation differences between the current state of the model and the point cloud representing the candidate updates are eliminated by alignment. The current mesh state is aligned with the candidate model state using the method of Besl and McKay [18] mentioned earlier. Successively, the parameter vector b controlling model deformation is calculated using (3). To obtain a plausible model update, each component of the vector b was limited to ±3σ of statistical variation.
597
C. Edge Detection Using Fuzzy Inference
Fig. 2. (a) Schematic representation of the fuzzy inference system. On the left, a typical image patch is shown (this serves as input to the FIS). In the middle, the pixels are represented with their fuzzy memberships to tissue classes (different shades of gray, red, and green). On the right, a single strip is left after defuzzification and majority voting. (b) Candidate position interpolation. Eight classified image patches (after majority voting) aligned and positioned next to each other. In the third, fourth, and fifth patch from the left, the myocardium tissue extends to the end of the patch at the top. Candidate positions in these patches are generated by interpolation (white line) between the detected tissue transitions in the second and sixth patches from the left, in which candidates had been generated by the FIS.
In the classic ASM [4], model updates were generated using a (multiresolution) statistical GLM in each sample point. This and other methods, however, would require extensive retraining to a new population of training samples (including the collection of manual contours) when applied to a different modality [12], [19]. For generating model updates without the use of a GLM, we developed an alternative decision scheme based on a fuzzy inference system (FIS) [20] and using relative intensity differences. Like other update mechanisms, this FIS is invoked for every iteration in the model matching process. The basic FIS consists of three parts: 1) a rule base that contains a selection of fuzzy rules; 2) a dictionary that holds the definitions of the membership functions; 3) a reasoning mechanism that infers a conclusion based on given evidence. FIS operates on fuzzy or crisp variables. The fuzzy variable reflects a certainty of events via a continuous membership function. Its values are restricted between 0 and 1. A crisp variable, however, assumes a set of discrete values. The fuzzy and crisp variables are supplied as an input to FIS. The output, however, is always a fuzzy variable. A conversion of a fuzzy output into a crisp one is called defuzzification. The mapping from input to output is performed by a set of fuzzy if–then rules, where the antecedent part defines a fuzzy region in the input space,
and the consequence part defines a fuzzy region in the output space. In this paper, the Takagi–Sugeno FIS (TSFIS) or Sugeno model [20] was adopted because it incorporates defuzzification at the output step. The output of a TSFIS is always a linear combination of the inputs, a typical fuzzy rule of the TSFIS is of the form “if input x = A and input y = B; then output z = aA + bB + c.” In this paper, a zero-order TSFIS is used, for which the output corresponds with a zero-order polynomial, i.e., a constant (a = b = 0). By setting both a and b to zero, direct influence of absolute gray values at the inputs of the TSFIS on the output was effectively eliminated. The TSFIS is implemented as follows (see Fig. 2). 1) Input: For each intersection point between the mesh and each of the 2-D images, an image patch, centered on this point, is considered. Thus, a large number of patches from the stack of images are sampled (see Fig. 3). Patch size was selected such that myocardium pixels, pixels from the blood pool, and/or air in the lung are included in the patch. 2) Fuzzification: To determine the location of the transition from blood to myocardium or from myocardium to air, the gray values collected from the model surroundings are first classified. Since the model deforms and changes pose during the matching stage, the pool of gray values that is collected also changes. To ensure modality independence,
598
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 5, SEPTEMBER 2008
Fig. 3. Ring of image patches from a cardiac short axis image slice. In the left circle, the true pixel gray values are shown.On the right, a ring of image patches after classification by the Takagi–Sugeno-based fuzzy inference decision scheme is shown.
only relative gray value differences between blood, myocardium, and air are used. This makes fuzzy c-means (FCM) [21] clustering a suitable algorithm to distinguish between three classes, and the class transitions can be used as borders. FCM is an unsupervised clustering algorithm that has previously been applied to segmentation of short axis cardiac MR datasets [22], [23]. In our application, pixels are labeled by minimizing a generalized within-group sum of squared error objective function Jm (U, V ; X) =
np c
2 um ik ∗ (xk − vi I )
(4)
k =1 i=1
where uik : X → [0, 1] defines the membership degree of a pixel xk to the ith cluster center vi . The parameter m ranges from 1 to ∞ and controls the fuzziness of the resulting partition, the matrix I defines the norm (Euclidean), and np is the number of pixels. The number of classes c used in the FCM was set to three: bright, dark, and medium bright, which represent blood pool, air, and, myocardium, respectively. FCM was applied to all image patches extracted at the intersection rings from all images simultaneously. 3) Inference (see Figs. 2 and 4): For each pixel, three fuzzy membership degrees (FMDs) result from the earlier fuzzification. Based on the FMDs, the inference step looks like: a) for each pixel if (gray value is bright) then pixel is blood pool; if (gray value is medium) then pixel is myocardium; if (gray value is dark) then pixel is air; b) for each line [i.e., horizontal line in Fig. 2(a)] if (majority of pixels is class i) then line is class i else line is unclassified. To find the endocardial border, the first transition from myocardium to blood pool is taken, going from the outside to the inside of the model. For the epicardial border, different rules apply at the septum part and at the rest of the model. At the septum, the same rule as used for the endocardial border applies, except for the fact that the direction in which to travel is reversed (going from the inside to the outside). At the rest of the model, the epicardial border is
found by looking for the first transition from myocardium to any other tissue while traveling from inside the model to outside the model. For each intersection of the mesh edges with an image slice (old position), the system aims to find the position of the transitions from blood to myocardium or from air to myocardium (new position). The old and new positions form the update vector for each mesh–image intersection. In case this system does not yield the sought tissue transition for a patch, the edge position was defined by interpolation between update positions from the nearest reliably processed image patches [see Fig. 2(b)]. Defuzzification (step 3a) can be performed, e.g., by assignment of pixels to the tissue class for which the pixel has the highest FMD. Thus, no parameters need tuning. However, hyper- and hypointense regions in our data required a more complex FIS. These regions attribute to the ill-balanced input population to the FIS. The severity of the balance problem depends heavily on the location of the 3-D ASM in the target dataset. For example, closer to the base of the LV, more blood is present in the images, whereas closer to the apex, myocardium, and surrounding tissues dominate the spectrum. FCM clustering produces a balanced partitioning of the input population into a number of classes. However, a balanced partitioning does not always correspond to the reality in the input population, which is not necessarily balanced with respect to the classes present. In step 3a, this can be tackled by application of minimum membership thresholds for the different classes to favor classification to one class with respect to other classes in the defuzzification. Another solution is the definition of a gray-value threshold below or above which certain tissue types should not occur. In this application, this threshold is defined as the gray value at a given proportion p between the class centers of the dark and the medium tissue class (resulting from step 2, fuzzification) gvt = gvdark + p(gvm edium − gvdark )
(5)
where gvt denotes the gray value of the threshold, gvdark and gvm edium denote the gray values of the class centers of the dark and medium classes, respectively, and p is the proportion that is used to move the threshold. Below the threshold, all is dark, above it, dark cannot occur. If a pixel is not classified as dark, it is assigned to the tissue class with the maximum FMD, provided that it exceeds the minimum FMD threshold for that tissue. If no tissue can be assigned using the earlier rules, the pixel remains unclassified. This defuzzification is shown schematically for MR in Fig. 4. Besides the classification for MR, it shows the air tissue cutoff value for both MR and CT, according to (5). IV. EXPERIMENTAL SETUP A. Training and Testing Data For model training, expert drawn contours of a group of 53 patients and normal subjects from short axis 3-D MR data were used. During the manual delineation, the papillary muscles were considered as part of the blood pool. The apical end of each dataset was defined as the first slice that exhibits both a complete endocardial and epicardial border, i.e., the last slice where blood
VAN ASSEN et al.: 3-D ACTIVE SHAPE MODEL DRIVEN BY FUZZY INFERENCE
599
different vendors, with slice thickness on the order of 2 mm, and in-plane resolution of approximately 0.5 mm/pixel. All CT datasets were reformatted to yield short axis image slices with slice thickness of approximately 5 mm and in-plane resolution of approximately 0.4–0.5 mm/pixel. The resulting datasets typically involved 16–28 image slices containing LV data. B. Matching Parameters
Fig. 4. Defuzzification in the Takagi–Sugeno-based FIS, schematically. The bell-shaped curves signify membership degrees. Horizontal dotted lines are the thresholds above which the memberships have to be before a pixel can be assigned to the tissue considered. Vertical dotted lines are the cutoff gray values from the intersections of the bell curves with the thresholds. The vertical dashed lines represent the upper air limit from (5).
Fig. 5.
Short axis image slices typical for this evaluation. (a) CT. (b) MR.
mass is still clearly visible. The basal end of each dataset was marked by the last slice that showed a complete circumferential endocardial contour. Thus, the mitral valve plane and the true apex were excluded, leaving both ends open. The shape parameterization presented in Section III-A was applied (see Fig. 1), where each sample was resampled to 16 slices—evenly spaced between the most basal slice and the most apical slice—each containing 32 points for the epicardial contour and 32 points for the endocardial contour. To reduce model dimensionality, the model was restricted to represent 99% of the shape variation present in the training data, resulting in 33 modes of variation. To test the performance of the ASM, two clinical evaluation studies for two different modalities were performed. One dataset consisted of multislice CT cardiac LV short axis reconstructions (25 patients); the other consisted of MR cardiac LV short axis acquisitions (15 healthy subjects). Both datasets involved the end-diastolic phase of the cardiac cycle (see Fig. 5). The MR dataset was acquired with the balanced fast field echo (FFE)protocol using the Q-body coil (12 subjects) or a dedicated cardiac coil (3 subjects) on a Philips Gyroscan NT Intera, 1.5 T scanner (Philips Medical Systems, Best, The Netherlands). Images had in-plane resolution of approximately 1.5 mm/pixel and 6-mm slice thickness plus 4-mm slice spacing, and, typically requiring 10–11 slices to cover the entire cardiac LV. The CT datasets were acquired with multislice CT scanners from two
Prior to matching, the model pose was initialized manually. First, the origin of the model (which is its center of mass) was positioned in the center of mass of the image data stack. Following that, a manual offset in model position and orientation was applied to the model. Model position was initialized inside the cardiac LV, more toward the LV base rather than the LV apex. Model scale was chosen so that the model was not larger than the actual LV in the image data. Model shape was initialized to cover a large number of slices (typically 7–11 for MRI, 14–21 for CT) from the start. For all CT and MR studies the model shape was initialized identically, i.e., all with b1 = 3σ, bi = 0∀i > 1. The setting for b1 caused the model to start more elongated in the long axis direction to cover more slices. The class centers of the tissue classes used by FCM were initialized identically for each dataset (but different between CT and MRI) and for each iteration of the model matching process. The stop criterion of FCM ε = 0.1. This means that FCM iteration stopped when the difference between two class centers from successive iterations was smaller than ε for all tissue classes. The fuzzification and defuzzification (part 3a of the TSFIS) use slightly different settings for CT and MR studies. For CT, fuzzification is performed using four tissue classes (one for blood, one for myocardium, and two classes for air and other darker structures, which are combined before defuzzification), while for MR, five classes are used (three for blood (which are combined before defuzzification), one for myocardium, and one for air and other darker structures). The parameters of our model and their values (and ranges, if applicable) are presented in Table I. C. Quantitative Assessment Indices For both evaluation studies, four types of analysis were performed. 1) Point-to-point distances: For each image slice, a fixed number (N = 100) of radial point-to-point distances between semiautomatic and manually drawn contours were measured, according to [24]. This means that first a center line between the manual and semiautomatic contours is defined. This center line is resampled equiangularly by casting 100 rays from its center. At each intersection between a ray and the center line, a line is drawn orthogonal to the center line. The intersections of these orthogonal lines with the individual contours are the points defining the point-to-point distance. During manual delineation, the papillary muscles were considered part of the blood pool, as they were in the delineation process in the training stage. From all distances per subject, and separately for the endocardium and epicardium, average and maximum
600
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 5, SEPTEMBER 2008
TABLE I PARAMETERS OF THE 3-D ASM AND THEIR VALUES
Fig. 6. Typical contours generated by the 3-D ASM. Shown in an MRI acquisition. TABLE II RANGES OF THE AVERAGE AND MAXIMUM DISTANCES MEASURED PER SUBJECT BETWEEN MANUAL AND SEMIAUTOMATIC CONTOURS (IN MILLIMETERS)
distances were calculated to give an idea of the global fit of the model to the data. 2) Clinical contour quality analysis: The clinical contour quality represents a measure for the actual workload for a physician. It is based on the idea that the physician may set a certain maximum distance threshold between expert and semiautomatic contours. Distances used are the pointto-point distances from the previous experiment. semiautomatic contours are accepted if the distances to manual contours remain below the chosen threshold. Among those distances, a number of outliers (exceeding the threshold) were allowed, varying between 0%, 5%, and 10% outliers. The distance threshold was varied between 2 and 10 mm. Single contour quality was expressed as the percentage of accepted semiautomatic contours from a data set. The percentage of accepted contours represents a tradeoff between the desired accuracy of the 2-D contours and the amount of work for the physician to manually correct the contours. The employed contour quality measure relates closely to the clinical applicability of the method, and as such may serve as a means for the reader to estimate clinical performance. 3) Volume regression analysis: Volumes inside the semiautomatic contours were plotted against volumes inside the manually drawn contours, and a linear regression formula was calculated, together with a correlation coefficient r. Volumes are calculated using Simpson’s rule, accumulating areas (in square centimeters) over all contours and multiplying with the slice thickness. Separate analyses for blood pool volume (endocardial contours) and LV volume (epicardial contours) were performed.
4) Bland–Altman analysis: Bland–Altman analyses were performed on the volumes derived from the expert drawn contours and those from semiautomatic contours, and 95% confidence intervals were calculated. Plots were made to evaluate whether a trend is present in the observed volume errors of the semiautomatic volumes with respect to manual volumes. V. RESULTS For all the MRI datasets, the 3-D ASM visually reached a stable state before the final iteration. From the initial CT dataset of 25 patients, two matches were excluded from further quantitative evaluation due to matching failure. One of those failures showed a large amount of epicardial fat, while the other dataset suffered from serious reconstruction artifacts. For the remaining 23 CT datasets, the 3-D ASM visually reached a stable state before the final iteration. For CT data, processing time for 100 iterations was 625 ± 141 s (minimum 278 s, maximum 984 s). For MR, this was 157 ± 13.8 s (minimum 140 s, maximum 175 s). Processing times were measured on an AMD Athlon XP, 1.8-GHz machine. A typical example of a semiautomatic segmentation of a cardiac MRI dataset is shown in Fig. 6. A. Quantitative Data The ranges of the average and maximal distances between the manual and semiautomatic contours are presented in Table II. Fig. 7 shows the percentage of accepted semiautomatic contours as a function of the maximal allowed distance to the expert drawn contours. The results for the volume regression analyses are shown in Fig. 8. For the LV volume, the regression line for MRI was described by y = 0.94 x + 27.8 (r2 = 0.99), whereas for CT, this line was described by y = 1.01 x + 7.48 (r2 = 0.98), with y being the semiautomatic and x the manual volume. For
VAN ASSEN et al.: 3-D ACTIVE SHAPE MODEL DRIVEN BY FUZZY INFERENCE
601
Fig. 7. Average percentage of accepted contours, determined by the maximum allowed distance to the manually drawn contours along 100 radial lines and with 0%, 5% or 10% outliers. (a) MRI studies. (b) CT studies.
Fig. 8.
Semiautomatic versus manual regression plots. LV volumes: (a) MRI and (b) CT. Blood pool volumes: (c) MRI and (d) CT.
602
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 5, SEPTEMBER 2008
Fig. 9. Bland–Altman analyses of semiautomatic volumes and manual volumes. (a) LV volume from MRI studies. (b) LV volume from CT studies. (c) Blood pool volume from MRI studies. (d) Blood pool volume from CT studies. The solid lines show the mean errors, the dashed lines show the mean ±2σ.
the blood pool, the regression formula for MRI was y = 0.85 x + 6.62 (r2 = 0.99), and for CT, y = 0.88 x + 0.38 (r2 = 0.98). The Bland–Altman analyses (see Fig. 9) show a slight systematic overestimation of LV volume and a systematic underestimation of the blood pool volume for both CT and MRI. Blood pool volume errors increase with increasing volumes [see Fig. 9(c) and (d)]. VI. DISCUSSION A. Segmentation Performance The results from the global point-to-point distance measures (see Table II) show that, on average, segmentation results from our 3-D ASM do not deviate much from the manual contours. Average distances range between 1.02 and 2.97 mm for CT, and between 1.34 and 2.05 mm for MRI. For CT, these distances translate to approximately 1.5–4.5 pixels, while for MRI, these distances are on the order of 1 pixel or less. The maximal deviations of the semiautomatic segmentation results from the manual contours are in the order of 5–18 pixels for CT, and
of 2–4 pixels for MRI. Globally the semiautomatic segmentations are very close to the manual ones, while localized larger deviations may occur. The average unsigned distances over all subjects observed in MRI compare favorably to those observed in the 3-D AAM by Mitchell et al. [16], i.e., 1.72 mm versus 2.75 mm for the endocardium and 1.55 mm versus 2.63 mm for the epicardium. From the clinical contour quality studies, we observed the following. The chart in Fig. 8(a) shows that more than 79.3% of semiautomatic MRI endocardial and epicardial contours do not deviate more than 5 mm (∼2.5 pixels) anywhere from the manually drawn ones. Moreover, the chart shows that epicardial contours are better than endocardial ones. This result did not show in the global evaluation from the point-to-point distances (Table II). The percentage of accepted epicardial contours was 85.4% versus 79.3% for endocardial contours. If 5% outliers were allowed, 93.2% of the epicardial contours were accepted, versus 91.4% of the endocardial contours. This means that for MRI, in more than 91.4% of the contours, little or no corrections to the semiautomatic contours are necessary if localized contour
VAN ASSEN et al.: 3-D ACTIVE SHAPE MODEL DRIVEN BY FUZZY INFERENCE
603
positioning errors of approximately 2.5 pixels are considered acceptable. For CT, epicardial contours were also better than the endocardial ones. Average percentage of accepted contours reached 75.1% at allowed deviations of 5 mm for epicardial contours, and 62.6% at allowed deviations of 5 mm for endocardial contours [see Fig. 8(b)]. When allowing 5% outliers, these numbers are 82.4% and 74.1%, respectively. Allowed deviations of 5 mm can seem to be very large in CT, with an in-plane resolution of about 0.5 mm/pixel. However, the CT short axis datasets have been reconstructed from axial slices. The axial datasets had an in-plane resolution of about 0.5 mm/pixel, and a slice thickness of approximately 2 mm. Applying the model to the short axis reconstructed CT data, which was the only CT data that were available to us, we were not able to benefit from the true resolution of the original CT scans. Thus, the high-resolution CT data, exhibiting a nearly isotropic nature, could not be exploited in this application of the 3-D ASM. Moreover, due to image reconstruction from axial data, in some cases reconstruction artifacts appeared. These artifacts were caused by displacement of the heart between subsequent axial slices.
air, and closer to myocardium. Thus, epicardial fat will be wrongly classified as myocardium. In MR, epicardial fat appears as very bright, leading to misclassifiaction as blood. In the latter case, due to the classification as another tissue than myocardium, this does not affect the segmentation. In CT, however, it results in a “too large” ventricle. 2) Unreliable edge information at the inferior epicardial border (CT): Here, edges between myocardium and other tissues are not visually detectable. The ability of the FIS to generate reliable endocardial candidate points at the posterior section partly determines the epicardial contour quality. Alternatively, the inferior epicardial border may be excluded from the FIS, calculating shape deformation based on a smaller set of (pseudo-) landmarks. Underestimating the blood pool can have the following causes. 1) The presence of dark structures in the blood pool close to the papillary muscles: These dark structures fused together with the papillary muscles form big dark objects, which force the endocardial surface of the model to bypass them on the inside, leading to the underestimation of contour area and consequently blood pool volume. 2) Over constraining the segmentation by the model: Over constrained segmentation results caused by (statistical) model-based approaches are a fact. Such problems, and those resulting from the lack of variability in a training set can be attacked by either relaxing the model constraints toward the later update iterations, or by adding more variation to the model. The latter approach was chosen by Cootes et al. [25] by adding variation to the diagonal of the covariance matrix of the training set, and recently by L¨otj¨onen et al. [26] by adding artificially altered shapes to the training dataset. Another way to resolve these modelinduced inaccuracies is by performing a subsequent, more local boundary detection step, such as dynamic programming [27]. The trend in blood pool volume error [Fig. 9(c)–(d)] may be caused by the error sources mentioned earlier. The contribution of those errors to the volume of the LV is proportional to the model scale, resulting in a trend in a Bland–Altman plot.
B. Volume Correlation to Gold Standard Fig. 8(a)–(d) reveals good linear correlations between manually and semiautomatic segmented volumes, both for blood pool volumes and LV volumes. Correlation coefficients for blood pool and LV volumes with expert-derived volumes for both CT and MRI were very high, r2 ≥ 0.98. The high correlation between semiautomatic and expert derived volumes is a consequence of the volumetric model-based approach, where local inaccuracies are averaged out. Thus, even in cases where the semiautomatic segmentation might be locally inaccurate, still valuable information regarding volume-based parameters can be extracted. This observation also shows that our method is very robust. In cases where transitions between tissues are fuzzy, or visually not detectable, the model-based approach is able to generate local contour estimates, based on evidence from more reliable positions and the built-in statistical knowledge, that do not produce large volume errors. The fact that the model used has both an open base and an open apex can have effects on the volumes measured. In case a closed apex would have been used, possibly the model would have a better ability of stretching toward the true apex in the dataset. Stretching of the model obviously influences the volume the model encloses. Bland–Altman analysis (Fig. 9) shows that for both CT and MRI, LV volumes are overestimated systematically, but only slightly, while blood pool volumes are underestimated systematically. Additionally, in the errors of semiautomatic blood pool volume with respect to manual blood pool volume [Fig. 9(c)– (d)] a negative trend appears. This is more apparent for MRI than for CT. Overestimation of the LV volumes can be a consequence of a number of causes. 1) The presence of epicardial fat (CT): In CT, the gray-values of epicardial fat are in between those of myocardium and
C. Limitations Classification of pixels into a number of tissue classes is based on a 3-D volume around the surfaces of the model mesh. In MR, either turbulent or slow blood flow in the vicinity of the myocardium causes inhomogeneities in the blood pool gray values. However, with CT, tissues of other organs close to the heart appear slightly darker than myocardium, but brighter than air. Consequently, in the combined histograms of all image patches used in the FCM, plateaus were observed in the blood pool area for MRI and in the dark structures area (air, epicardial fat) for CT. Therefore, in application of the FCM to MRI, three classes, together representing the blood pool, were used to cover the flat part better than a single class could, thus diminishing classification errors. For the same reason, two classes were used
604
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 5, SEPTEMBER 2008
to the manual segmentation, a good fit can be obtained. We investigated the effect of initialization on segmentation results in [28]. We showed quantitatively that within a range of approximately 15–20 mm in all directions, sensitivity to initial model placement is limited. With respect to the definition of 3-D point correspondence, an application-specific point correspondence was used. However, there are no fundamental barriers to replace this point correspondence definition with an automatically determined [28] or optimized [29] one, as it does not have a major influence on implementation of the other parts of our method. Fig. 10. (Left) initial and (right) Final state of the model in a match where initialization was far from the manual segmentation. (Top) Short axis view. (Bottom) Long axis view. The dotted lines indicate the intersection plane of the other view. With respect to the numbers, 1 and 2 indicate the model’s epicardial and endocardial surfaces, and 3 and 4 indicate the manual epicardial and endocardial surfaces, respectively.
to cover the dark structures plateaus in CT. The application of multiple classes for a single tissue substantially improved the segmentation results, and drastically decreased the number of failures. Incorporation of such a priori knowledge about parameters that vary among different modalities introduces a modality dependence, which we strive to avoid. However, by excluding a GLM, we managed to omit image modality dependence from the statistical training stage of the ASM. We only train the underlying PDM, such that no retraining to several new datasets was required to apply the model to CT and MRI. Nevertheless, to reliably segment unseen modalities and/or acquisition protocols, further generalization of the FIS may be required. D. Robustness Both CT and MRI show the same ordering of tissues with respect to their gray levels (gv), i.e., gvblo o d > gvmyo cardium > gvair . This model can be applied equally well to other modalities/acquisition protocols that do not exhibit the same ordering, provided that classification of tissues in the resulting images is feasible. In such cases, only the defuzzification and rules part (see 3a) of the inference step in the TSFIS (and perhaps the number of classes used in the FCM) needs adjustment. Because datasets acquired with 3-D ultrasound may be hard to classify, segmentation using this approach presents future challenges. Modifications to the pixel classification will be required due to the nonlinear intensity distributions in ultrasound imaging. The curves in Fig. 7 and the very small average unsigned distances between manual and semiautomatic segmentation results, give good confidence in the clinical usefulness of the 3-D ASM for semiautomatic segmentation of the cardiac LV in the future. The resulting segmentation obtained by the 3-D ASM is sensitive to initial model pose, shape, and scale. We observed that our model does not deform easily in the direction of the long axis, which may be a cause for the sensitivity of the results to initial model pose parameters. This sensitivity does not imply that the model has a small lock-in range (see Fig. 10). This figure shows that even when the model is not initialized close
VII. CONCLUSION The presented 3-D ASM is a powerful tool for semiautomatic segmentation of the cardiac LV myocardium. With minimal adjustments in tuning parameters of the model (in comparison with collecting a separate training dataset, manually segmenting it to obtain a labeling, and retraining a model from it for each modality), a single model can achieve good quality contours for both CT and MRI datasets. This means that no labor-intensive retraining is required for application to different modalities, or different acquisition protocols, provided that the image datasets have a similar gray value distribution among different tissues. Though further research is still required, the experiments presented here demonstrate that with one 3-D ASM with a model update generation step independent of absolute gray value distributions, acceptable semiautomatic segmentations of the cardiac LV can be achieved for multiple modalities, and protocols, without statistical retraining of any kind. REFERENCES [1] T. F. Cootes, D. Cooper, C. J. Taylor, and J. Graham, “A trainable method of parametric shape description,” Image Vis. Comput., vol. 10, no. 5, pp. 289–294, 1992. [2] C. Lorenz and N. Krahnst¨over, “Generation of point-based 3-D statistical shape models for anatomical objects,” Comput. Vis. Image Und., vol. 77, no. 2, pp. 175–191, 2000. [3] A. F. Frangi, D. Rueckert, J. A. Schnabel, and W. J. Niessen, “Automatic construction of multiple-object 3-D statistical shape models: Application to cardiac modeling,” IEEE Trans. Med. Imag., vol. 21, no. 9, pp. 1151– 1166, Sep. 2002. [4] T. F. Cootes, D. Cooper, C. J. Taylor, and J. Graham, “Active shape models—their training and application,” Comput. Vis. Image Und., vol. 61, no. 1, pp. 38–59, 1995. [5] G. Hamarneh and T. Gustavsson, “Deformable spatio-temporal shape models: Extending active shape models to 2-D +Time,” Image Vis. Comput., vol. 22, no. 6, pp. 461–470, 2004. [6] M. R. Kaus, J. von Berg, J. Weese, W. J. Niessen, and V. Pekar, “Automated segmentation of the left ventricle in cardiac MRI,” Med. Image Anal., vol. 8, no. 3, pp. 245–254, 2004. [7] T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 681–685, Jun. 2001. [8] A. Kelemen, G. Sz´ekely, and G. Gerig, “Elastic model-based segmentation of 3-D neuroradiological datasets,” IEEE Trans. Med. Imag., vol. 18, no. 10, pp. 828–839, Oct. 1999. [9] A. Tsai, A. Yezzi Jr, W. Wells, C. Tempany, D. Tucker, A. Fan, W. E. Grimson, and A. Willsky, “A shape-based approach to the segmentation of medical imagery using level sets,” IEEE Trans. Med. Imag., vol. 22, no. 2, pp. 137–154, Feb. 2003. [10] R. Chandrashekara, A. Rao, G. I. Sanchez-Ortiz, R. H. Mohiaddin, and D. R¨uckert, “Construction of a statistical model for cardiac motion analysis using nonrigid image registration,” in Information Processing in
VAN ASSEN et al.: 3-D ACTIVE SHAPE MODEL DRIVEN BY FUZZY INFERENCE
605
Medical Imaging, Lecture Notes Computer Science, vol. 2732, C. Taylor and J. A. Noble, Eds. Berlin: Springer-Verlag, 2003, pp. 599–610. J. L¨otj¨onen, S. Kivist¨o, J. Koikkalainen, D. Smutek, and K. Lauerma, “Statistical shape model of atria, ventricles, and epicardium from shortand long axis MR images,” Med. Image Anal., vol. 8, pp. 371–386, 2004. B. van Ginneken, A. F. Frangi, J. J. Staal, B. M. Ter Haar Romeny, and M. A. Viergever, “Active shape model segmentation with optimal features,” IEEE Trans. Med. Imag., vol. 21, no. 8, pp. 924–933, Aug. 2002. A. F. Frangi, W. J. Niessen, and M. A. Viergever, “3-D modeling for functional analysis of cardiac images,” IEEE Trans. Med. Imag., vol. 20, no. 1, pp. 2–25, Jan. 2001. J. C. C. Moon, C. H. Lorenz, J. M. Francis, G. C. Smith, and D. J. Pennell, “Breath-hold FLASH and FISP cardiovascular MR imaging: Left ventricular volume differences and reproducibility,” Radiology, vol. 223, no. 3, pp. 789–797, 2002. H. C. van Assen, M. G. Danilouchkine, F. Behloul, H. J. Lamb, R. J. van der Geest, J. H. C. Reiber, and B. P. F. Lelieveldt, “Cardiac LV segmentation using a 3-D active shape model driven by fuzzy inference,” in MICCAI 2003, Lecture Notes Computer Science, vol. 2878, R. Ellis and T. Peters, Eds. Berlin: Springer-Verlag, 2003, pp. 533–540. S. C. Mitchell, J. G. Bosch, B. P. F. Lelieveldt, R. J. van der Geest, J. H. C. Reiber, and M. Sonka, “3-D active appearance models: Segmentation of cardiac MR and ultrasound images,” IEEE Trans. Med. Imag., vol. 21, no. 9, pp. 1167–1178, Sep. 2002. C. Goodall, “Procrustes methods in the statistical analysis of shape,” J. R. Stat. Soc. Series B-Methodological, vol. 53, no. 2, pp. 285–339, 1991. P. J. Besl and N. D. McKay, “A method for registration of 3-D shapes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 14, no. 2, pp. 239–256, Feb. 1992. N. Duta and M. Sonka, “Segmentation and interpretation of MR brain images: An improved active shape model,” IEEE Trans. Med. Imag., vol. 17, no. 6, pp. 1049–1062, Dec. 1998. T. Takagi and M. Sugeno, “Fuzzy identification of systems and its applications to modeling and control,” IEEE Trans. Syst., Man Cybern., vol. SMC-15, no. 1, pp. 116–132, Jan./Feb. 1985. J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum, 1981. M. R. Rezaee, P. M. J. van der Zwet, B. P. F. Lelieveldt, R. J. van der Geest, and J. H. C. Reiber, “A multiresolution image segmentation technique based on pyramidal segmentation and fuzzy clustering,” IEEE Trans. Image Process., vol. 9, no. 7, pp. 1238–1248, Jul. 2000.
[23] A. E. O. Boudraa, “Automated detection of the left ventricular region in magnetic resonance images by fuzzy c-means model,” Int. J. Cardiac Imag., vol. 13, no. 4, pp. 347–355, 1997. [24] C. D. von Land, S. R. Rao, and J. H. C. Reiber, “Development of an improved centerline wall motion model,” in Proc. Comput. Cardiol., 1990, pp. 687–690. [25] T. F. Cootes and C. J. Taylor, “Statistical models of appearance for computer vision,” Imaging Sci. Biomed. Eng., Univ. Manchester, U.K., Tech. Rep., Mar. 2004. [Online]. Available: http://www.isbe.man.ac.uk/∼bim/Models/app_models.pdf. [26] J. L¨otj¨onen, K. Antila, E. Lamminm¨aki, J. Koikkalainen, M. Lilja, and T. Cootes, “Artificial enlargement of a training set for statistical shape models: Application to cardiac images,” in FIMH 2005, Lecture Notes Computer Science, vol. 3504, A. F. Frangi, P. Radeva, A. Santos, and M. Hernandez, Eds. Berlin: Springer-Verlag, 2005, pp. 92–101. [27] E. Oost, G. Koning, M. Sonka, P. V. Oemrawsingh, J. H. C. Reiber, and B. P. F. Lelieveldt, “Automated contour detection in X-ray left ventricular angiograms using multiview active appearance models and dynamic programming,” IEEE Trans. Med. Imag., vol. 25, no. 9, pp. 1158–1171, Sep. 2006. [28] H. C. van Assen, M. G. Danilouchkine, A. F. Frangi, S. Ordas, J. J. M. Westenberg, J. H. C. Reiber, and B. P. F. Lelieveldt, “SPASM: A 3-D ASM for segmentation of sparse and arbitrarily oriented cardiac MRI data,” Med. Image Anal., vol. 10, no. 2, pp. 286–303, 2006. [29] S. Ordas, H. C. van Assen, L. Boisrobert, M. Laucelli, J. Puente, B. P. F. Lelieveldt, and A. F. Frangi, “Statistical modeling and segmentation in cardiac MRI using a grid computing approach,” in Eur. Grid Conf., Lecture Notes Computer Science, vol. 3470, P. M. A. Sloot, A. G. Hoekstra, and T. Priol, Eds. Berlin: Springer-Verlag, 2005, pp. 6–15.
[11] [12] [13] [14]
[15]
[16]
[17] [18] [19] [20] [21] [22]
Authors’ photographs and biographies not available at the time of publication.