Copyright 2008 Society of Photo-Optical Instrumentation Engineers. This paper was (will be) published in Proc. SPIE Medical Imaging 2008 and is made available as an electronic reprint (preprint) with permission of SPIE. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.
Lymph node segmentation on CT images by a shape model guided deformable surface method Daniel Maleikea , Michael Fabelb , Ralf Tetzlaffac , Hendrik von Tengg-Kobligkc , Tobias Heimanna , Hans-Peter Meinzera , and Ivo Wolfa a Div.
of Medical and Biological Informatics, German Cancer Research Center, Heidelberg b Dept. of Diagnostic Radiology, University Hospital Schleswig-Holstein, Kiel c Div. of Radiology, German Cancer Research Center, Heidelberg ABSTRACT
With many tumor entities, quantitative assessment of lymph node growth over time is important to make therapy choices or to evaluate new therapies. The clinical standard is to document diameters on transversal slices, which is not the best measure for a volume. We present a new algorithm to segment (metastatic) lymph nodes and evaluate the algorithm with 29 lymph nodes in clinical CT images. The algorithm is based on a deformable surface search, which uses statistical shape models to restrict free deformation. To model lymph nodes, we construct an ellipsoid shape model, which strives for a surface with strong gradients and user-defined gray values. The algorithm is integrated into an application, which also allows interactive correction of the segmentation results. The evaluation shows that the algorithm gives good results in the majority of cases and is comparable to time-consuming manual segmentation. The median volume error was 10.1 % of the reference volume before and 6.1 % after manual correction. Integrated into an application, it is possible to perform lymph node volumetry for a whole patient within the 10 to 15 minutes time limit imposed by clinical routine. Keywords: Lymph nodes, Volumetry, Segmentation, Deformable geometry, Shape
1. INTRODUCTION In many cancer patients the primary tumor metastasizes not only to the lung, liver, or bones but also to the lymph nodes. Treatment of tumors means for the patient to often undergo long lasting treatment with the need for accurate and standardized monitoring of solid tumor manifestations. The response to therapy is more and more evaluated based on imaging data produced by computed tomography (CT) or magnetic resonance (MR) technology. In daily clinical practice and within clinical trials the oncologist requests an adequate evaluation protocol of the patients tumor burden. For a standardized documentation of solid tumor burden WHO criteria were first introduced in 1979 based on bi-dimensional measurement of each lesion. In 2000, these criteria were changed by introducing RECIST (Response Evaluation Criteria of Solid Tumors1 ), which simplified and standardized the measurement method (uni-dimensional) and its documentation. WHO and RECIST criteria have some inherent disadvantages. Assessing tumor growth or shrinkage by drawing diameters is subjective and error-prone. Volumetric analysis allows a more precise and objective assessment of tumor burden, first introduced for pulmonary nodules in the last few years and recently made clinically available with special software packages. In contrast, evaluation of enlarged lymph nodes is far more challenging, because they are found at numerous locations in the human body, with largely varying surroundings. Lymph nodes do have a characteristic density within a certain Hounsfield interval, but it overlaps with other soft tissues like muscles and organs. In addition, lymph nodes vary in size considerably, i.e. between 5 mm and more than 50 mm (examples in figure 2). To segment lymph nodes, Honea et al.2 used an active contour algorithm in 2D and 3D to align a surface with some viscosity constraint to the gradient magnitude of an image. 3D results were shown only for synthetic images. Yan et al.3 added a gray value term to the edge-based fast marching algorithm and constrained the Correspondence to
[email protected] or
[email protected], telephone: +49 6221 42 2326
curve evolution with a circle (2D). The algorithm was applied slice by slice. An evaluation is missing. Dornheim et al.4 used a 3D mass-spring model to detect neck lymph nodes. Their model makes use of both gradient and gray value information. It prevents leaking by internal torsion forces of the spherical model. The evaluation of their method showed that the model gives results comparable to a manual segmentation. Our goal is to develop a system that offers the radiologist the possibility to determine precise volumetric information about a patient’s lymph node status, without taking more time than the 10 to 15 minutes per patient that are available in clinical routine.
2. CONTRIBUTIONS In this contribution we use a generic segmentation algorithm,5 which combines methods from statistical shape models and deformable surfaces. We customize the algorithm to the lymph node problem and apply the algorithm to clinical images. Since our algorithm needs to be initialized with a position, an initial size and a gray value interval, we have developed an application for interactive segmentation, based on the Medical Imaging and Interaction Toolkit MITK.6 The application allows the user to apply the segmentation algorithm to several lymph nodes within one CT image, and also offers a tool for interactive correction of the algorithmic segmentations. In the following, we describe our algorithm, its integration into an application, and provide a quantitative evaluation of its segmentation results. In contrast to the previous work mentioned above, we successfully apply our algorithm to nodes of many different sizes (diameter between 0.7 cm and above 11 cm).
3. MATERIAL AND METHODS The images we work with are CTs with a resolution of between 0.75 mm and 0.9 mm in-plane and 1 mm slice distance. They are scans of the chest, abdomen, and pelvis with around 550 slices per patient. Our algorithm consists of three components: a geometric model, an appearance model and a deformable surface algorithm,5 which uses the two models to find an appropriate surface border in an image. The geometric model describes the possible shapes of our target, while the appearance model is a cost function, which describes desirable qualities of the target surface. The deformable surface algorithm is iteratively applied in a multiresolution framework and allows to roughly locate large nodes at a low resolution and refine them at the original resolution. Before section 3.5 describes how the algorithm is integrated into an application, the following sections detail the algorithm and its underlying models.
3.1 Geometric model The geometric part of the shape model consists of a mean shape and a number of modes of variation, which are linearly combined to represent all possible shapes. Our basic model was constructed from a sphere with 642 equally spaced points. In order to model all possible ellipsoid shapes, two modes of shape variation would be sufficient. However, since the search algorithm varies the modes of shape variation much stronger than it tries to rotate the shape (which is sensible, because it limits the search space significantly), we use a model with three modes of shape variation. The basic modes are the ellipsoids that result from a sphere being scaled in three orthogonal directions.
3.2 Local appearance model The local appearance model is a cost function for given positions of surface points, rating the goodness of fit of a surface point and its surroundings with values from 0 (excellent fit) to 1 (worst case). In a usual shape model setting one would “train” these local appearance models from the surrounding image data in reference images, but since lymph nodes do not have a “normal” surrounding which would be common to all of them, we chose to model the appearance model generically. Our appearance model takes a gray value profile of length 2N + 1 (N = 4) from the image at position p~, in the direction of the surface normal ~n. The distance between two consecutive profile pixels depends on the
current image resolution of the search algorithm and is 1 mm for the original resolution, then 2 mm, 4 mm, etc. for the down-scaled resolutions. Before we calculate costs, the profile is smoothed with a Gaussian kernel (σ = 1), yielding a smoothed profile s−N . . . s+N , where negative indices are on the inside and positive indices on the outside of the surface. We define a gradient based cost function as
where the gradient ∇sn is calculated as
|∇s0 | cgradient = 1 − P+N n=−N |∇sn |
(1)
∇sn = sn−1 − sn+1 .
(2)
This cost function describes the normalized gradient at the surface, which will be 1 if there is no gradient at all and smaller for more pronounced gradients. A user defined gray value interval G = [glower , gupper ] is also used to rate the fitness of a profile. The gray value based cost function should reward gray values within G on the inside of the surface and different gray values on the outside. Since the interval G should be no absolute constraint on gray values, we use two sigmoid functions to extend the borders of G and define the cost function cgrayvalue as g n = sn
cgrayvalue
1 1+e
glower −b a
1 =1− 2N
1
(1 −
−1 X
n=−N
1+e gn +
gupper +b a
+N X
)
(3) !
(1 − gn )
n=1
(4)
The parameter values of b = 4.5 and a = 1.5 were chosen so that gray values within [glower − 4.5, gupper + 4.5] get scores above 0.5 and gray values inside G get scores very close to 1. With a typical gray value interval width of 50-60 HU these parameter choices result in a 10 % area beyond each border of the interval, where the classification scores slowly drop from 1 to 0. The total cost function is a combination of the gradient based function and the gray value based function. Since we want to fulfill both conditions (a good score for the gradient should not make up for a bad gray value score), we do not use the sum but the product of our two cost functions and define a total cost function c(~ p, ~n) = cgradient · cgrayvalue
(5)
This cost function describes a surface with a strong gradient, user-defined gray values on the inside and different gray values on the outside.
3.3 The segmentation algorithm The search algorithm combines techniques from discrete deformable surfaces and statistical shape models. In this section we only shortly describe the algorithm and our parametrization of the algorithm for this specific application. An in-depth description of the algorithm and its application to the liver is provided by Heimann et al.5 The deformable surface is subject to regularizing internal and image-driven external forces (figure 1a). Both forces are applied to the surface to deform it iteratively. The internal forces try to keep the relative lengths of the edges and the angles between adjacent faces close to a specific template shape, following the concepts of tension and rigidity as introduced by Kass et al.7 for the snakes algorithm. In our case the calculation of tension and rigidity forces for each step is based on the best-fitting statistical shape model (ellipsoid). The external forces are defined by the image data. At each point of the surface, the search algorithm will take a number of probes on both sides of the surface, each time asking the appearance model for a cost rating. A graph based algorithm is then used to find the surface which minimizes the global costs, while satisfying a hard
Table 1. Parameters of the iterative scheme used for different values of the radius r of the initial sphere. The first column (“Res”) gives the image resolution of each step, where 0 means the original image resolution, 1 half the original size, etc. A dash “-” for γ signifies that the surface was restricted to the ellipsoid shape model and no additional free deformation was allowed. The scheme is executed from top to bottom, from the first row with a condition that is met by the initial sphere. If the size conditions would lead to free deformation in the very first step, an additional step without free deformation (but identical in all other parameters) is prepended – this is done to ensure we always try to fit an ellipsoid first.
Res
Iterations
γ
∆
Condition
3
50
-
2
r > 24 mm
2
50
-
2
r > 12 mm
1
100
0.06
2
r > 8 mm
0
100
0.06
2
r > 8 mm
0
10
0.15
2
constraint ∆ for the maximum steepness between neighboring vertices. This procedure prevents single outliers from disturbing the external forces calculation. The total force that is applied to each vertex pi is a combination of the internal and external forces, weighed against each other: pt+1 = pti + αFtension (pti ) + βFrigidity (pti ) + γFext (pti ) i
(6)
In our application, we used a fixed α = 0.125 and β = 0.25, while γ was changed during the steps of a multi-resolution iteration scheme – an overview of all parameters of this scheme is given in table 1.
3.4 Manual correction tool The algorithm described above does not always find the correct surface. For example, we do not optimize the inside of the segmentation, but only the surface features – so the surface could move too far away in lower resolutions and never move back to the “real” border. In other cases the correct border is not clearly visible to the non-expert human and surrounded by strong image gradients (figure 3). To offer the user a tool for interactive correction of misled segmentations, we added a simple correction tool to the application, which is also in used in the context of heart ventricle segmentation by Schwarz et al.8 The tool allows the user to pick a point of the resulting mesh, and drag it with the mouse to a new position. Points in the surrounding of the picked one with distance d up to a maximum distance dmax are moved along in form of a Gaussian normal distribution: 2 d 1 (7) ~v (d) = v~0 e 2 ( σ(dmax ) ) where v0 indicates the movement vector of the picked and moved point. σ(dmax ) provides a sharp peak for small values of dmax and broader peaks for higher values of dmax (a more detailed explanation is provided in the original paper8 ). The maximum distance (= radius of interaction around the point) can be chosen by the user through a slider in the application GUI. After such manual correction steps we perform another 2 iterations of the segmentation algorithm to re-align the modified surface with image features. Because of this additional step the user has to move the surface only close to the desired image area and usually the algorithm will find the correct position then.
3.5 Integration into a clinical application For practical use the algorithm has to be integrated into an application, so that a user can initialize it with a first position and a gray value interval G. We built our application based on the Medical Imaging Interaction Toolkit MITK,6 because it offers a graphical application that could be easily extended and gives easy access to the algorithm of section 3.3, which is based on the Insight Toolkit (ITK). The application allows a user to navigate through the image volume in three orthogonal slices to locate lymph nodes. When a target lymph node is found, the user chooses the segmentation tool and draws a rough 2D contour
a)
b)
Figure 1. a) Illustration of a calculation step in the search algorithm of section 3.3. The light gray contour represents the external forces, i.e. the surface with the minimal global appearance costs. The medium gray contour is the best fitting ellipsoid shape model (internal forces), and the dark contour shows the resulting deformable mesh. b) User interface of our application. In the upper right window the user initializes our algorithm by drawing a rough contour somewhere inside a lymph node. From this contour a histogram is calculated and a sphere is placed inside the image, which is then iteratively deformed using the algorithm of section 3.3.
inside the lymph node to initialize the search algorithm – this procedure is fast and intuitive. The center of mass and a radius is calculated from the contour and a corresponding initial sphere is placed for the algorithm. The radius of the sphere is halfway between the mean and maximal distance of the contour points from the center of mass. The gray value interval G, which is required by the algorithm, is defined by the 2 % and 98 % quantiles of the histogram of all pixels inside the contour. The algorithm runs in a background task, so that the user can go on to locate the next possible target. As the algorithm runs for up to 15 s on a standard PC (Pentium 4, 3.2 GHz), this background processing can save considerable time and gives the user the opportunity to concentrate on the diagnostic process. Every lymph node segmentation can be interactively corrected using the correction tool. For that purpose, the user simply selects the tool and drags the surface to the correct position. To quantify the segmentation(s) for the user, we calculate and display the volume and the largest transversal diameter (as required by RECIST) for each segmented lymph node. We also provide the user with the sum and average of all lymph node diameters and volumes (right of figure 1b).
3.6 Quantitative evaluation To quantitatively evaluate our algorithm, a radiologist manually delineated 29 lymph nodes in 4 CT images. This data served as our reference segmentation. To assess manual segmentation accuracy, a second radiologist and an assistant medical technician manually segmented the same lymph nodes as in the reference segmentation (radiologist delineated 23, technician all 29) and we calculated several error measures from a comparison to the reference segmentation. One user then used our algorithm to segment all 29 lymph nodes. Because the algorithm relies on a handdrawn initialization, the user repeated the procedure five times (with different contours each time) to average the errors from this interactive part. We calculated the same error measures as for the manual segmentations – once before and once after using the manual correction tool of section 3.4. To determine the repeatability of our application, we also performed a leave-one-out comparison of the four segmentations we had for each node. Each segmentation was declared reference in turn and each time the accuracy of the other segmentations was calculated.
118 mm/706.9 ml
54 mm/55.6 ml
36 mm/16.9 ml
19 mm/3.5 ml
Figure 2. Examples of good segmentations. The calculated largest transversal diameters and volumes are given below the images.
a)
b)
Figure 3. Two failed examples, both shown in transversal and sagittal slices. The dark contour is the gold standard, the light ones show the four attempts of our algorithm. In cases like these we find strong non-target gradients around a target with very weak or unclear borders.
4. RESULTS Table 2 compares the results of a second radiologist, the results of an assistant medical technician, and the results of our algorithm both before and after correction to the reference segmentation of a radiologist. The volume difference was calculated as the quotient of the segmented volume and the gold standard volume. Since we intend to do volumetry, we consider this volume difference the most important measure for our application. The volumetric overlap error is the number of voxels in the intersection of segmentation and reference, divided by the number of voxels in the union of segmentation and reference, substracted from 1. A perfect overlap would result in an error of 0. Both the numbers and a qualitative inspection suggest good agreement between manual and automatic results (see figure 2 for examples) for the majority of cases. The segmentation results of our algorithm compared to the reference segmentation have a median volume difference of 10.1 % before and 6.1 % after manual correction. An interesting feature of our method is that the algorithm also works with very big nodes and necrotic centers (figure 2 left). The volume differences after correction are still not as good as the manual results of a radiologist and a assistant medical technician, but they are in the same order of magnitude and were achieved much faster and with less human interaction. The RECIST criteria1 provide some context to the volume errors. These criteria are currently used in the clinic and compare the total diameter of a set of target lesions between two consecutive patient evaluations. A reduction of less than 30 % or an increase of less than 20 % in the total diameter of all target lesions is treated as “stable disease”. When we assume a spherical lesion, these numbers translate to a volume reduction of 66 % or a volume growth of 73 %, which would still be treated as a stable disease. To evaluate how well a transversal diameter estimates the real volume of a lymph node, we inspected the transversal diameters of the corrected algorithmic results and related them to the segmented volumes. We used linear regression to predict the volume from the third power of the measured radius and calculated the remaining
Table 2. Quantiles of error measures based on a comparison to the reference segmentation (radiologist). We compare the manual segmentation results of a second radiologist, an assistant medical technician, the initial result of our algorithm and the interactively corrected segmentation. All error measures would be 0 for a perfect match of segmentation and reference.
Volume difference (%)
Volumetric overlap error
Mean surface dist. (mm)
Maximum surface dist. (mm)
5%
25%
Median
75%
95%
2. radiologist med. technician algorithm corrected algorithm
0.4 0.5 1.6 0.9
2.5 2.6 5.0 3.6
4.2 5.9 10.1 6.1
13.2 15.1 18.9 18.7
15.8 34.4 38.9 36.6
2. radiologist med. technician algorithm corrected algorithm
0.13 0.08 0.12 0.08
0.16 0.13 0.15 0.14
0.21 0.18 0.19 0.18
0.26 0.29 0.29 0.27
0.34 0.43 0.41 0.37
2. radiologist med. technician algorithm corrected algorithm
0.3 0.2 0.4 0.4
0.3 0.3 0.5 0.5
0.3 0.4 0.6 0.6
0.5 0.5 0.8 0.8
0.7 1.2 2.0 1.0
2. radiologist med. technician algorithm corrected algorithm
2.0 1.5 1.6 1.6
2.1 2.4 2.0 2.0
2.6 3.0 2.7 2.6
3.5 5.4 5.0 4.7
5.5 12.9 16.7 12.7
Table 3. Leave-one-out comparison of all algorithmic segmentations. Each of five segmentations was in turn declared reference segmentation and the accuracy measures below were calculated for the remaining four segmentations.
a) before correction
b) after correction
5%
25%
Median
75%
95%
5%
25%
Median
75%
95%
Volume difference (%)
0.2
0.9
2.1
8.8
34.4
0.3
0.8
1.5
3.7
16.3
Volumetric overlap error
0.04
0.07
0.10
0.17
0.44
0.04
0.06
0.09
0.15
0.24
Mean surface dist. (mm)
0.3
0.4
0.5
0.6
1.5
0.3
0.4
0.5
0.6
0.8
Maximum surface dist. (mm)
0.7
0.9
1.2
3.8
13.3
0.8
1.0
1.3
2.3
5.7
relative difference to the real volume. Although we excluded lesions above 5 ml volume from the regression (= 4 largest lesions, resulted in extremely bad prediction), the remaining difference between predicted and real volume was between 15 % and 40 % of the real volume (25 % and 75 % quantiles). This corresponds to the very upper range of volume measurement errors made by our algorithm. In two out of 29 cases we could not get a usable segmentation at all, even after trying manual correction (figure 3). In such cases we see a poor contrast of the real target in combination with strong gradients and some appropriate gray values in the vicinity of the target. As mentioned in section 3.4, after manual correction we always perform another step of the deformable surface algorithm to re-align the surface with image features. In the two cases mentioned above, this algorithmic step is counterproductive, because the surface is drawn back to the wrong image feature. We have to perform further experiments to find out, whether these cases could be improved by parameter changes of the algorithm, or whether we need to add some completely manual tools. Table 3 shows that five algorithmic segmentations of the same lymph nodes match each other well, which suggests that our method produces repeatable results. These intra-observer differences are considerably smaller than the inter-observer differences shown in table 2.
5. DISCUSSION AND CONCLUSIONS We presented a semi-automatic algorithm for segmentation of (enlarged) lymph nodes in CT images. This algorithm is integrated into an application together with a tool for manual correction of the segmentation. We evaluated the algorithm with and without manual correction and could show that the segmentation results are similar to manual segmentations in the majority of cases. When comparing the results for different initializations of the algorithm, we see a good repeatability (better than the inter-individual variance of manual segmentations), which is important for a reliable diagnostic statement. The total amount of time needed to evaluate one patient image (around 10 minutes) is comparable to the amount needed for manual documentation of diameters, because most of the time is needed for finding the relevant nodes. Even for the sole purpose of calculating these diameters, our application would be of use, because it calculates transversal diameters as a by-product. The next steps of our work will be to look into the failed cases and either improve the algorithm to be more robust or integrate another manual tool for cases, where no satisfying segmentation can be achieved by our algorithm. Finally, the real-world use of the application has to be evaluated in radiology.
REFERENCES [1] P. Therasse, S. G. Arbuck, E. A. Eisenhauer, J. Wanders, and R. S. Kaplan, “New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada,” J Natl Cancer Inst 92, pp. 205–216, February 2000. [2] D. M. Honea and W. E. Snyder, “Three-dimensional active surface approach to lymph node segmentation,” in Proceedings of SPIE 1999, K. M. Hanson, ed., 3661, pp. 1003–11, 1999. [3] J. Yan, T. ge Zhuang, B. Zhao, and L. H. Schwartz, “Lymph node segmentation from CT images using fast marching method,” Computerized Medical Imaging and Graphics 28, pp. 33–38, jan 2004. [4] J. Dornheim, H. Seim, B. Preim, I. Hertel, and G. Strauss, “Segmentation of Neck Lymph Nodes in CT Datasets with Stable 3D Mass-Spring Models,” in Proc. of the 9th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI’06), Part II, R. Larsen, M. Nielsen, and J. Sporring, eds., LNCS(4191), pp. 904–911, MICCAI, 2006. [5] T. Heimann, S. M¨ unzing, H.-P. Meinzer, and I. Wolf, “A shape-guided deformable model with evolutionary algorithm initialization for 3D soft tissue segmentation,” Inf Process Med Imaging 20, pp. 1–12, 2007. [6] I. Wolf, M. Vetter, I. Wegner, T. B¨ottger, M. Nolden, M. Sch¨ obinger, M. Hastenteufel, T. Kunert, and H.-P. Meinzer, “The Medical Imaging Interaction Toolkit,” Medical Image Analysis 9, pp. 594–604, dec 2005. [7] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” Int J Comp Vis 1(4), pp. 321–331, 1988. [8] T. Schwarz, T. Heimann, R. Tetzlaff, A.-M. Rau, I. Wolf, and H. Meinzer, “Interactive Surface Correction for 3D Shape-Based Segmentation,” in SPIE Medical Imaging 2008: Image Processing, 2008 - in print.