Validation of Bone Segmentation and Improved 3-D Registration ...

8 downloads 2676 Views 1MB Size Report
boring contours is measured by comparing statistical properties of their vector ..... end we turn to. FDs, which encode a signal as a series of its frequency domain ...
324

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 25, NO. 3, MARCH 2006

Validation of Bone Segmentation and Improved 3-D Registration Using Contour Coherency in CT Data Liping Ingrid Wang, Michael Greenspan*, Member, IEEE, and Randy Ellis, Member, IEEE

Abstract—A method is presented to validate the segmentation of computed tomography (CT) image sequences, and improve the accuracy and efficiency of the subsequent registration of the three-dimensional surfaces that are reconstructed from the segmented slices. The method compares the shapes of contours extracted from neighborhoods of slices in CT stacks of tibias. The bone is first segmented by an automatic segmentation technique, and the bone contour for each slice is parameterized as a one-dimensional function of normalized arc length versus inscribed angle. These functions are represented as vectors within a -dimensional space comprising the first amplitude coefficients of their Fourier Descriptors. The similarity or coherency of neighboring contours is measured by comparing statistical properties of their vector representations within this space. Experimentation has demonstrated this technique to be very effective at identifying low-coherency segmentations. Compared with experienced human operators, in a set of 23 CT stacks (1,633 slices), the method correctly detected 87.5% and 80% of the low-coherency and 97.7% and 95.5% of the high coherency segmentations, respectively from two different automatic segmentation techniques. Removal of the automatically detected low-coherency segmentations also significantly improved the accuracy and time efficiency of the registration of 3-D bone surface models. The registration error was reduced by over 500% (i.e., a factor of 5) and 280%, and the computational performance was improved by 540% and 791% for the two respective segmentation methods. Index Terms—Computed tomography, computer assisted surgery, contour analysis, image registration, image segmentation, image shape analysis.

I. INTRODUCTION

T

Fig. 1. (a) CT image of a tibia, near the knee. (b) Histograms of HU values for bone and nonbone regions overlapping. Peaks are identified for fat, water, muscle, and bone tissues.

Manuscript received June 3, 2005; revised December 5, 2005. This work was supported in part by the Natural Sciences and Engineering Council of Canada, in part by The Institute for Robotics and Intelligent Systems, in part by the Ontario Research and Development Challenge Fund, in part by the Center for Innovative Technology for Medicine, and in part by the National Institutes of Health (NIH) under Contract U41RR019703. Asterisk indicates corresponding author. L. I. Wang is with the School of Computing, Queen’s University, Kingston, ON K7L 3N6, Canada, (e-mail: [email protected]). *M. Greenspan is with the Department of Electrical and Computer Engineering and the School of Computing, Queen’s University, Kingston, ON K7L 3N6, Canada (e-mail: [email protected]). R. Ellis is with the Department of Radiology, Harvard Medical School, Boston, MA 02115 USA and also with the School of Computing and Department of Surgery, Queen’s University, Kingston, ON K7L 3N6, Canada, (e-mail: [email protected]). Digital Object Identifier 10.1109/TMI.2005.863834

to the difficulty of fully automated segmentation of bone from other tissues. In cases where the output of segmentation is used to plan or even directly execute surgeries, segmentation errors could be critical. The fall back position has been to utilize manually assisted, rather than fully automated segmentation methods, which is practical but time consuming. Segmentation is a difficult problem due to similarities in the radiological density of bone and surrounding tissues, as shown in Fig. 1. A CT slice of a knee region is illustrated in Fig. 1(a) and a histogram of sensed Hounsfield Unit (HU) values of the bone and other tissues is shown in Fig. 1(b). There is significant overlap between the HU values of the bone and other softer tissues, so classification based upon simple thresholds is infeasible. This phenomenon is exacerbated in cancellous bone such as occurs near joints, where the porous structure of the bone tissue lowers its density. In diaphyseal regions, the cortical bone

HE automatic segmentation of bone tissue in a computed tomography (CT) image is an important component of image-based computer assisted orthopedic surgery. Despite the large volume of research into this problem [1]–[6], the development of a fully automated solution remains a significant challenge. Factors such as the inhomogeneous structure of bone, pathologies, and the inherent blurring of CT data all contribute

0278-0062/$20.00 © 2006 IEEE

WANG et al.: VALIDATION OF BONE SEGMENTATION AND IMPROVED 3-D REGISTRATION USING CONTOUR COHERENCY IN CT DATA

tends to be much denser than the surrounding tissues, so that automatic segmentation in these regions is feasible. In this paper, we present an effective method to validate segmentation results. The method was based upon a comparison of the shapes of contours extracted from neighborhoods of slices in CT stacks of tibias. We used Fourier Descriptors (FDs) as a metric to characterize the bone contour shapes. The bone was first segmented by one of two automatic segmentation techniques; our subsequent validation method was independent of the specific segmentation used. The first segmentation method was an intensity-based technique that was a simplification of the method of Kang et al. [1]. The second segmentation method was an edge-based technique proposed by Yao et al., and was based upon the estimation and correction of the normal direction of bone edges [7]. Following segmentation, the bone contours for each slice were then parameterized as a one-dimensional (1-D) function of normalized arc length versus inscribed angle. These functions were next represented as vectors within a -dimensional space comprising the first amplitude coefficients of their FDs. Assuming contour coherency, we suspected that the shapes of neighboring contours would vary gradually, and that this coherence could be used to validate the segmentation. Any deviations from this assumption could be measured by the statistical properties of the FDs of neighboring contours, and such segmentations were then flagged as having low-coherency. Low-coherency segmentations could result from a number of sources, such as segmentation errors, bone pathologies (e.g., fractures), sensor errors (e.g., noise and blurring), and anatomical structures (e.g., the merging or branching of two bones); whatever their source, such occurrences are difficult to segment. When low-coherency segmentation is detected, there are a number of possible ameliorations, including using postprocessing routines to attempt automatic classification and correction of the segmentation, simply removing the slice from further processing of the image sequence, or alerting a human operator that manual intervention is needed. Our experiments showed this technique to be very effective at identifying low-coherency segmentations. Compared with a human expert, in a set of 23 CT stacks (1,633 slices), the method correctly detected 87.5% and 80% of the low-coherency segmentations and 97.7% and 95.5% of the highcoherency segmentations, respectively, from the two segmentation techniques. Removal of the automatically detected low-coherency segmentations was also found to consistently and significantly improve both the accuracy and time efficiency of the registration of 3-D bone surface models that were reconstructed from the data. This paper continues in Section I-A with a review of previous work in CT segmentation and segmentation validation. In Section II, we present a contour function that represented the contour of a segmented region using Fourier descriptors. An adaptive edge-based segmentation method based upon a simplification of Kang’s method is proposed in Section II-B; this method included the use of an a priori bone-contour threshold and involved post processing to close gaps and remove noise. We also briefly describe an estimation/correction algorithm for detecting bone edges in CT images developed by Yao et al. [7].

325

A set of experiments are presented in Section II-C which quantitatively compare the effectiveness of the method with that of expert human operators. The effectiveness of the method at improving the accuracy and efficiency of registration results is also presented. The paper concludes in Section IV with a discussion and summary. A. Previous Work Medical image segmentation has proven to be a challenging task, and most current automatic methods rely upon manual postprocessing [1]–[4]. Previous work has shown that fully automatic segmentation sometimes fails due to constraints on image quality and variations of the anatomy of the object of interest. CT slices provide cross-sectional images that clearly distinguish dense cortical bones from soft tissues (mainly fat and muscle). However, difficulties are encountered in extracting contours near joints. This is because the bones may have injuries, calcium loss (osteoporosis), or inhomogeneous structures that cause the bones to be nearly indistinguishable from nearby soft tissues. Furthermore, limitations in the resolution (typically 512 512 pixels) of a CT scan can cause contours of adjacent bones to combine, as has been observed with the tibia and fibula, and at the hip joint [5]. Sebastian et al. [4] pointed out that the close spacing of some bones and inherent blurring in the CT imaging process causes the interbone channel to often appear brighter than the background soft tissue, thereby substantially reducing the boundary contrast. Finally, blood vessels that nourish the bone pass from the soft tissues through the bone surface, creating gaps in the bone surface. Current approaches to bone segmentation in CT images can be classified as intensity-based, edge-based, region-based, or deformable. The simplest of the intensity-based methods are manual segmentation, which has long been used to extract bone contours by manually selecting a HU-value threshold, and global thresholding, whereby the thresholds are determined automatically from an examination of the statistical distribution of the pixel values. The major drawback of global thresholding is the lack of local adaptability. Due to variable tissue density, pixels of the same tissue may have different intensities between slices and even within the same slice. Locally adaptive thresholding techniques avoid this drawback, but occasionally this results in the merging of small gaps between bones [1]. Edge-based methods allow the separation of regions through the location of their contours as points of high values in an intensity gradient field. These methods have a high sensitivity to noise, particularly between regions with small contrast differences, and often require gap closing methods to link separated edges belonging to the same boundary [5]. Watershed segmentation is also based on the image gradient and, therefore, shares many of the problems of edge-based methods, such as oversegmentation, sensitivity to noise, and poor detection of significant areas with low-contrast boundaries [2]. Region-based methods divide images into regions that satisfy a given homogeneity criterion [1]. A region-growing method typically starts with an initialization seed (a small region) and

326

then examines the statistical properties of the seed’s neighborhood to decide if neighboring pixels have similar features. Seeded region-growing does not incorporate any geometrical information and a region can, therefore, “leak” through narrow gaps or weak edges. It also tends to merge bones that are close [4]. Lately, there has been much interest in segmentation algorithms that use active contours or surfaces, but these too have limitations. In a snake-based approach, spurious edge points can trap the contour which makes the final delineation of the contour sensitive to initial conditions. As classical snakes [8] are based on gradient information, they share the limitations of edge detectors [5]. Both Pardo et al. [5] and Sebastian et al. [4] observed that deformable models encounter such problems as impractical initialization requirements, poor convergence on the bone boundary caused by noise or spongy bone texture, approximation due to injuries, and deformation. Hybrid techniques that integrate elements of the above approaches have recently been proposed [1], [2], [5], [6], [9]. Images are initially segmented using either simple thresholds, adaptive thresholds, or a region-based technique, following which the detected contours are refined using edge information or deformable models. Hybrid methods improve the segmentation accuracy by combining the strength of the various approaches, but may still fail under certain situations. Given the propensity of segmentation algorithms to fail under common conditions, it is important to check the outcome of a segmentation method using automated segmentation validation. Segmentation validation can be viewed in two ways: it can be considered as a postprocessing stage, which would then be followed by an error-correction stage, or validation could be considered as a phase within an iterative segmentation process. Rather than segmenting solely from first principles, the validation method would identify where errors may have occurred so that specialized segmentation routines could be applied to correct these errors. Most segmentation-validation algorithms follow a common pattern, in that the output of an automatic or semiautomatic algorithm is compared to some baseline. They vary in the way the baseline is estimated, the comparison metrics are selected, and the degree of precision of the method is determined. Usually, the baseline is established by a human expert [10], [11], or a group of experts [12]–[16]. To assemble a baseline, Yoo et al. suggested receiver operating characteristic (ROC) analysis [12], and Zou et al. used an expectation-maximization algorithm [14]. The most appropriate measure of the comparison of a segmentation to expert segmentation(s) is still an open question, although a number of measures have been proposed. Measurement of the volume of the segmentations has been proposed [10], [11], as has computation of overlapping voxels [13]. Everingham et al. proposed an aggregate fitness function that combines multiple fitness measures into a general comparison method [17]. Measurement in terms of the average tissue area detected per slice and the correlation coefficients between area measurements have been used [15]. Multiple metrics have been applied to examine segmentation accuracy [14], [16].

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 25, NO. 3, MARCH 2006

The precision of comparison methods has been obtained by statistical analysis [12]. For example, a similarity measure derived from the kappa statistic has been used [15]. However, many different approaches have been used for this purpose, including comparing the results of automatic segmentation and manual segmentation against that of manual segmentations [11], [13], optimal thresholds via maximization [14], and correlation coefficients compared to other published methods [16]. II. MATERIALS AND METHODS A. Contour Shape Description Our objective is to compare the shapes of segmented bone regions in neighborhoods of slices, and to identify those contours that differ sharply from their neighbors. It is possible to compare the segmented regions based solely upon their gross statistics such as size, moments, autocorrelation, etc. [18]. Such comparisons tend, however, to be insensitive to the typical segmentation errors that we wish to identify, such as small gaps and channel closings. Although these segmentation errors may be small, they can cause the merger of two contours or the splitting of a single contour, and can have a large effect on the registration accuracy of the reconstructed 3-D surfaces. We base our technique on a representation of the shape of the contour of each segmented region. Let be the bounding contour of is thin (i.e., only 1 segmented region . We assume that pixel thick), and that the sequence of points has been permuted lie adjacent on for all so that and and . Visiting the points in order, therefore, has the effect of traversing . The choice of can be arbitrary but, once chosen, the order of the remaining points is distinct; we assume that the permutation describes a clockwise traversal. There are a number of effective algorithms that generate such an ordered contour from a connected region [19]. is defined as the relationship beThe contour function distance of along the perimeter of , and the tween the measured about the centroid angle of from to the positive -axis. By convention, is selected as the smallest point of intersection of and the positive -axis. The contour function is similar to the cumulative angular function introduced by Zahn and Roskies [20] that has been applied recently by Peixoto et al. [21] and Wang et al. [22] for gesture recognition in video sequences. The contour function is a 1-D representation of the two-dimensional (2-D) region boundary that can be used to determine the shape similarity between two regions. It is invariant to translations, which do not affect the centroid of , and can be made invariant to scale by normalizing the circumference of . An for a segmented tibia is illustrated in Fig. 2. The example of protrusion at the bottom of the contour, boxed in Fig. 2(a), can be seen as a distinct ripple in the corresponding box in Fig. 2(b). 1) Fourier Descriptor Representation: For the purpose of shape comparison, it is desirable to represent the contour function in a form that is both compact and convenient, and yet captures the salient features of the shape. To this end we turn to FDs, which encode a signal as a series of its frequency domain

WANG et al.: VALIDATION OF BONE SEGMENTATION AND IMPROVED 3-D REGISTRATION USING CONTOUR COHERENCY IN CT DATA

327

Fig. 2. (a) Extracted contour. (b) Contour function of the contour. Boxed areas are the corresponding regions of the contour and its contour function.

components. The FDs form a sequence of amplitude and phase coefficients of the Discrete Fourier Transform of , defined as

Fig. 3. Original and reconstructed contour functions for increasing number of = 4, (b) = 8, (c) = 12, (d) = 15, (e) Fourier coefficients . (a) = 18, and (f) = 20.

K

K K

K

K

K

K

(1) The FDs are efficient to compute and can be used to exactly if all amplitude and phase coefficients are reconstruct coeffiemployed, or to approximately reconstruct it if cients are used. As the coefficients are ordered by increasing frelow-frequency components (the quency, using only the first harmonics) to compare shapes will have the desirable effect of describing the overall shape of the contour while ignoring noise and small perturbations. It is also desirable to keep small to improve efficiency. Brill [23] found that the optimum number of harmonics which yielded the highest recognition rate was in the range 15 to 20. Zahn and Roskies [20] and Persoon and Fu [24] also used 15 harmonics in shape recognition. Bennett and MacDonald [19] claimed that an appropriate number of harmonics was three to four times the number of points of maximum convex curvature on the boundary contour. empirically by reWe determined a reasonable value for constructing a number of challenging tibia contours. An example is the contour of Fig. 2(a), which was close to the knee and had not only a high curvature at four corners, but also had a small but significant indentation at the bottom of the contour. The original and reconstructed contour functions for this region , 8, 12, 15, 18, and 20 FDs. are illustrated in Fig. 3 using

K

Fig. 4. Example of reconstructed contour functions for = 20. (a) Extracted contour of a proximal-tibia image. (b) Original and reconstructed contour function of (a). (c) Extracted contour of a diaphyseal-tibia image. (d) Original and reconstructed contour function of (c). Boxed areas are the corresponding regions of a contour and its contour function.

Two further examples, one distal and one proximal, of reconstructed contour functions using are shown in Fig. 4.

328

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 25, NO. 3, MARCH 2006

We compared the similarity of two contour functions by considering their first FDs as a vector in a -dimensional space and be and measuring the distance between them. Let two contour functions for regions and , with respective FDs and . The similarity between the two contours was measured as the sum of the squared difference

(2) , the FDs are ordered by inAlthough in general creasing frequency so a direct ordinal comparison of the seand are requence is meaningful. The phase coefficients quired for signal reconstruction, but because they contribute little to the discrimination of the overall shape of the contour, only the amplitude coefficients are used for the similarity measure. The amplitude coefficients are also invariant to rotation, i.e., the choice of . B. Segmentation We have implemented two distinct segmentation methods which are used in the experimentation described below. The first was a simplification of the method of Kang et al. [1], which used both global thresholds and adaptive thresholds based upon the statistics within a local region. The second method, proposed by Yao et al. [7], was an edge-based technique based upon the estimation and correction of the normal direction of bone edges. 1) Segmentation 1: Simplified Kang Method: The first segmentation method was similar to, albeit somewhat simpler than, the recently reported work of Kang et al. [1]. Kang et al. described a 4 step process, the first step of which classified each voxel as either bone or soft tissue by applying a combination of adaptive local and global thresholding. In adaptive local thresholding, which was first proposed by Niblack [25], statistics were drawn from a 3-D neighborhood of 26 voxels, and a distinct threshold was determined for each neighborhood. In addition, a global lower threshold (LT) and upper threshold (HT) were determined by modeling the soft tissue and bone regions in the histogram of CT values as the intersection of two Gaussians. The adaptive threshold was applied only within the region bounded by LT and HT, and any voxel that exceeded the adaptive threshold within its local neighborhood within this region was subsequently classified as bone. Following this classification step, a 3-D morphological closing operation was used to eliminate boundary discontinuities. Two subsequent steps involved adjusting the boundary to remove the smoothing effect from the morphological operator, and finally optional manual interaction to correct the estimate of the resulting periosteal and endosteal surfaces. Our method was similar to the the first two steps of the above described method, the main difference being that the first step was executed on each 2-D slice separately, rather than in 3-D. The reason that we restricted ourselves to 2-D was that the CT data that we were testing had an interslice spacing of 5 mm, so

Fig. 5. with T

A segmentation result. (a) Original image, (b) Niblack thresholding , (c) morphological closing, and (d) region filling.

that the neighboring pixels within a slice were much closer than those across slices. Another difference is that we classified a pixel as bone if it exceeded both a global bone contour threshold ( ) and the local adaptive threshold value, thereby eliminating the need for value was determined empirically an upper threshold. The by examining the statistics of the distribution of pixel values of bone and soft tissue from a number of images, the resulting histogram of which is shown in Fig. 1. Details of the method are presented in Section II-C. We found used to determine that the value of was not sensitive to different patient data but that it did vary with different CT formats. An example of a segmentation using our method is illustrated in Fig. 5. It can be seen that small gaps in the bone contour have been filled, without affecting the global shape of the remainder of the contour. Following the gap closing, region-filling and connected-components methods were applied. All components that were smaller than a specified threshold value (16 pixels) were discarded as noise, and the remaining components where identified as bone regions, as illustrated in Fig. 5(d). We have tested the method extensively on numerous leg images, and have found that it does a reasonably good job of segmenting both cortical and cancellous bone. Some additional examples of difficult but successful segmentations are illustrated in Fig. 6. There were, however, cases where the method resulted in poor segmentations, and two common scenarios are shown in Fig. 7. In Fig. 7(c), a gap has failed to close in cancellous bone, despite the application of the morphological postprocessing. In Fig. 7(f), the small channel separating the tibia and femur has merged, resulting in a single bone region. It is our contention that many automatic segmentation methods can fail in these and other common scenarios, particularly when bone pathologies are present. 2) Segmentation 2: Yao’s Method: The second segmentation method we tested was an estimation/correction algorithm

WANG et al.: VALIDATION OF BONE SEGMENTATION AND IMPROVED 3-D REGISTRATION USING CONTOUR COHERENCY IN CT DATA

329

Fig. 6. Further segmentation results. (a) Image of tibia and fibula diphyses. (b) Image of a proximal tibia and fibula. (c) Image of the knee containing both a femoral condyle and a tibial eminence. (d)–(f) Adaptive thresholding of these images using T .

for detecting bone edges [7]. The method was a seeded edge-detection and tracing algorithm based on the estimation and optimization of the normal direction of bone edges. To eliminate edges belonging to nonbone tissues or objects, initial seeds were obtained by analyzing intensity histograms. Points were first selected by thresholding a CT slice by a value located at the peak of the greatest intensity. Those pixels so selected were further filtered by a weighted edge strength, which was calculated as where was the intensity value of the was the average intensity pixel, was the edge strength, was the variance of the intensity of the selected pixels, and values of these pixels. Only those pixels with a large weighted edge strength were kept as seeds of bone edges. The resulting edge points were detected by edge-filtering 1-D signals of pixel intensities along the estimated/corrected normal directions. The first derivative of the 2-D Gaussian was used as the normal direction of each seed. It was then corrected by controlling the difference of the normal directions at neighboring , equal to 10 . Once the correction of the normal edge points, direction was complete, a 1-D signal was constructed of the intensities of the pixels along the normal direction and centered at the seed. The 1-D signal was smoothed to suppress noise and reinforce weak edge strengths. Then, the first derivative of the , was used as an edge filter to detect the pixel Gaussian, with the maximal edge strength in the smoothed 1-D signal as an edge point in the 2-D image. The detected point was taken as a seed and the edge-point detection was iterated until no new edge points were found. The entire process was iterated for all

Fig. 7. Examples of imperfect segmentation results. (a) Image of a proximal tibia and patella. (b) Enlarged region of (a). (c) Segmentation of (b) with a gap in the upper region of the contour. (d) Image of a proximal tibia and fibula. (e) Enlarged region of (d). (f) Segmentation that merged the two contours.

of the initial seeds. Finally, false edge points were eliminated by grouping pixels with similar normal directions. Examples of Yao’s method’s results are shown in Fig. 8. The original images are the same as those in Fig. 6. C. Performance Analysis We have extensively tested and quantified the performance of the validation method in two separate experiments using the two different segmentation methods. In the first experiment, the detection accuracy was determined by comparing the automatic classification results against that of a human expert. In the second experiment, the effects on registration were measured by registering a manually validated clinically approved 3-D surface model of the bone which had been created using a semi-automatic method. The semi-automatic method was a manually selected threshold operation. This was followed by manual adjustments of the resulting contours by a human expert. The automatically segmented result was registered with the automatically segmented contours, both including and excluding the detected low-coherency segmentations. 1) Experiment 1: Detection Accuracy: This experiment used 23 distinct sets of CT images of human lower legs (a total

330

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 25, NO. 3, MARCH 2006

Fig. 10.

Fig. 8. (a)–(c) Segmentation results by Yao’s method. Original images are the respective subfigures of Fig. 6.

Fig. 9. Single pixel width bone contour mask. (a) Tibia image. (b) The grey shaded region is the bone mask of (a) created by a clinical expert. The bone contour mask is the white contour inside and near the perimeter of the bone mask.

of 1633 slices) taken with two CT scanners using the respective Native and DICOM data formats. In each slice, the tibia contours were extracted using the segmentation methods described in Section II-B. For the first segmentation method (simplified Kang), the window size used was 7 7 and the weight was values were determined to be 60 for the Native 0.05. The and 150 for the DICOM formats by the following method. The histograms in Fig. 1(b) were created by a semi-automatic threshold. segmentation process and used to determine the For bone, masks were created by a clinical expert who extracted the bone contours. To avoid extracting HU values of dense soft tissues surrounding bone surfaces, we extracted a one-pixel width mask of bone contours by compressing the bone mask by three pixels, as shown in Fig. 9. For soft tissues, a simple threshold was used to segment patient tissues from air and background. Masks of soft tissues were the difference between the tissue masks and the bone masks of each slice, as illustrated in Fig. 10. In this histogram, HU values of bones and soft tissues were extracted from one tenth of the CT slices that were uniformly distributed over the dataset. Three datasets were randomly selected to calculate the histograms by this means.

Soft tissue mask. (a) Original image. (b) Soft tissue mask of (a).

We especially sought to assess segmentations near the proximal tibia and those segmentations that had incomplete segmented contours due to bone pathologies or limitations of image quality. Starting from the diaphysis of a normal tibia, we observed that the shape changed gradually up to the region where the tibia and fibula join. By contrast, the shape of the tibia near the knee joint changed suddenly and dramatically. The CT slices were spaced less than 5 mm apart, so the contour shape of each slice was close to that of its neighbors (except possibly near the knee). Based on this observation, we used our similarity measurement [see (2)] to compare each slice with its neighbors. We expected that the contour shape of a normal tibia would resemble (i.e., have coherence with) those of its neighbors, except when abnormalities were present. The number of slices in the neighborhood over which the coherency is compared needed to be small enough so as to preserve the local details yet large enough to suppress noise. For example, when compared with only two adjacent neighbors, a regular shape could be misclassified if both neighbors happened to be irregular, and vice versa. We was effective, where the found that a neighborhood size slice included slices neighborhood of the . For each slice, the similarities were calculated for all contours extracted over the neighborhood and the following three measures were determined. : the distance of a contour to its nearest neighbor. • • : the mean similarity. : the median similarity. • An example of these three measures for a typical tibia sequence is illustrated in Fig. 11. The peaks in Fig. 11 indicate that the corresponding slices contained dissimilar shapes, so these are suspected to be bone anomalies. For each measure, a threshold was determined by averaging all except the largest value, i.e.,

A slice was then classified as having low-coherency if all three measures exceeded their threshold values for that slice. Automatic detection was performed for all 23 image sequences and the detection accuracy was evaluated by comparing the results of the automatic validation method against the judgment of a human expert. For example, if the human

WANG et al.: VALIDATION OF BONE SEGMENTATION AND IMPROVED 3-D REGISTRATION USING CONTOUR COHERENCY IN CT DATA

Fig. 11. Contour-function similarity measures for a tibia scan. (a) (b) , and (c) .

D

D

D

331

,

expert identified an automatically segmented slice as being incorrect, and the automatic validation method also identified this slice as having a low-coherency, then this contributed to the accuracy of the detected low-coherency segmentations. Similarly, if an automatic segmentation was approved by the human expert and was also deemed as having a high coherency by the automatic validation method, then the accuracy of detected high coherency segmentations would increase. 2) Experiment 2: Registration Enhancement: In the second experiment, we evaluated the effectiveness of segmentation validation for enhancing registration. Registration is the process of determining a transformation between two surfaces, and is typically executed on reconstructed surfaces using the iterative closest point (ICP) algorithm [26]. In computer-assisted surgery systems, it is important to register the surfaces with a high degree of positional accuracy, particularly for orthopedic surgical procedures. Poorly segmented slices can produce outliers that bias the results, resulting in misregistrations. Detecting and removing these segmentations prior to surface reconstruction can improve the accuracy of the subsequent registration. To evaluate the improvement to the registration accuracy, we registered bone models that were reconstructed from the contours generated by both segmentation methods. The surfaces were triangulated using an implementation of the marching cubes algorithm [27], and reconstructions were generated both before and after the removal of detected low-coherency segmentations. These automatically generated surface models were registered using the ICP algorithm to models that had been semi-automatically reconstructed from the same data by a human expert. An example of semi-automatically generated and automatically generated surface models, and their subsequent registrations, is shown in Fig. 12. We ran this experiment on 4 data sets that had been used clinically, i.e., the semi-automatic segmentations were used in actual computer-assisted surgeries. The experiments were run on a SunBlade 1000.

Fig. 12. A 3-D surface registration. (a) An expert semi-automatic reconstruction. (b) A reconstruction from an automatic segmentation. (c) A registration of (a) and (b). (d) The image from (c) rendered with the surface from (b) in a darker shade.

III. RESULTS A. Detection Accuracy Results The outcome of this experiment is summarized in Table I. For each of the 23 distinct image sequences, the percentage of correctly detected (i.e., identical to the human expert) broken bone regions (BC) and knee regions (KC) are recorded in columns 2 and 3, and 6 and 7, respectively, for the two segmentation methods. Broken bones can cause difficulties in automatic segmentation methods due to their abrupt and arbitrary changes in shape. Similarly, sudden contour mergers often occur in the proximity of knees, due to the cancellous nature and complex shape of the bone in these regions. Table I columns 4 and 8 indicate the percentage of total low-coherency segmentations correctly detected per sequence. These are the true positives, which were identified as low-coherency both by the automatic technique and the human expert. Columns 5 and 9 show the percentage of total high-coherency segmentations detected per sequence (i.e., true negatives). The last row of the table indicates the averages over all sequences. For the segmentations by the simplified Kang method, in 11 of the 23 sequences, all of the low-coherency segmentations were correctly detected by the automatic method, and the accuracy of detection was above 80% in all but 4 sequences. For the segmentations by Yao’s method, in 10 of the 23 sequences, all of the low-coherency segmentations were correctly detected by the automatic method, and the accuracy of detection was above 80% in all but 7 sequences. The segmentation-validation method,

332

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 25, NO. 3, MARCH 2006

TABLE I

VALIDATION ACCURACY.

BC = bone contours, KC = knee contours, low = low 0 coherency, high = high 0 conherency

TABLE II REGISTRATION RESULTS FOR RECONSTRUCTED 3-D BONE SURFACE MODELS INCLUDING AND EXLCUDING LOW-COHERENCY SEGMENTATIONS

therefore, resulted in a high detection accuracy for both low-coherency and high-coherency segmentations. Overall, 87.5% and 80% of low-coherency segmentations and 97.7% and 95.5% of high-coherency segmentations were correctly detected for the two respective segmentation methods, as indicated by the averages in the final row of Table I. B. Registration Enhancement Results Table II lists the execution times and root-mean-square (RMS) registration errors of the residuals of the surface correspondences both before and after the removal of low-coherency segmentations. Removing the automatically detected low-coherency contours improved the registration accuracy significantly for both segmentation methods, over all of the data sets. On average, the accuracy was improved by 570% (i.e., a factor of 5.7) for the simplified Kang segmentation and by 283% for Yao segmenta-

tion. The maximum improvement was 1151%, achieved for a simplified Kang segmentation, while the minimum was 129% for a Yao segmentation. Much of the computational expense in ICP is in establishing correspondences between surfaces that are highly separated, so the removal of low-coherency segmentations had the added benefit of improving the execution time of the registration. The time performance was improved by 540% and 791% for the simplified Kang and Yao segmentations, respectively, with a maximum and minimum improvement of 1508% and 152%, both for Yao segmentation. IV. DISCUSSION In this paper, we have used contour-analysis techniques to compare the shape coherency of regions within local neighborhoods of a sequence of CT slices. Those regions with a low-coherency have very different shapes than their neighbors. These

WANG et al.: VALIDATION OF BONE SEGMENTATION AND IMPROVED 3-D REGISTRATION USING CONTOUR COHERENCY IN CT DATA

are regions where the segmentation likely did not perform well, and may be caused by pathology (broken bone), anatomy (cancellous bone, e.g., near a joint), and sensor limitations (thin channels due to beam width, and noise). Our experiments have shown that the proposed method can accurately detect certain classes of low-coherency segments, such as broken bones and the start of the knee bone, in automatically segmented images. We have also shown that the removal of these slices can improve both the accuracy and efficiency of the registration of the reconstructed 3-D bone surfaces against semi-automatically reconstructed models that were manually validated and were of clinical quality. The segmentation method that we developed for our experi, which mentation made use of a predetermined threshold was the HU value where the histograms of cancellous bone value to be and muscle values intersected. We found the nearly constant across different patients, although it could vary for different sensors and calibrations, and for different anatomical structures. While the segmentation method was relatively simple, it did offer some advantages over more sophisticated approaches. No initial seed value was required, as is necessary in certain edge-tracing methods. Also, the surfaces resulting from the 3-D reconstruction were not approximated or interpolated, as can occur with methods based upon deformable models. The segmentation did fail in some cases, and it is our contention that other methods may also fail in these and other situations. Rather than attempting fully automated segmentation solely from first principles, we propose the use of contour analysis to detect known potential anomalies, which once detected, can then be handled appropriately. For example, these contours could be flagged for human intervention, discarded (e.g., prior to registration), or sent to a subsequent postprocessing routine for further consideration. Indeed, a complete segmentation method could iteratively detect and selectively correct predictable segmentation difficulties. In our data, a contour was extracted for each CT slice separately. It is also possible to derive the contours from a section of bone across slices. There are two reasons why, in this case, it makes sense to consider the 2-D contours that occur within each slice, i.e., perpendicular to the major axis of the tibia. One reason is that our images are spaced at 5 mm, and so resolution of a contour extracted within a single image will be greater than that extracted across images. A second reason is that, in the case of the tibia, anatomical variations occur generally more gradually in slices perpendicular to the major axis of the bone. The contours perpendicular to the major axis of the tibia are also generally of smaller circumference than those from any other direction. Slices in any other direction would have a greater natural variation, and be larger and, therefore, segmentation errors would be more difficult to discriminate from natural variations. This may not be the case for other bone regions, where contours derived from other directions may indeed be beneficial. The purpose of Experiment 1 was to evaluate the effectiveness of the method at automatic validation, thereby determining potentially problematic segmentation results. In general, there would not exist any ground truth semi-automatically segmented version of the same CT image set against which to compare. From a clinical perspective, the ability to quickly and automat-

333

ically identify those slices where the automatic segmentation may have been problematic is of great value, as it can vastly reduce the amount of time required for a practitioner to manually inspect and modify the results of an automatic segmentation. This experiment is an example of the automatic validation process. By comparing the classification of each automatically segmented slice against that of a human expert, within one data set, we have given a measure of the accuracy of the automatic validation method as it would be used clinically. It is also possible to use the contour similarity measure to compare the results of two different segmentations (either automatic or semi-automatic) for the same slice. This would be one way of evaluating the effectiveness of a candidate segmentation method, if a gold standard were known. However, this would only be valuable if the comparison technique (i.e., the contour similarity measure) itself were deemed to be a reliable measure, which is what Experiment 1 endeavors to establish. Without confidence in the comparison technique, then any measured dissimilarity result could be the result of a bad segmentation, or a failure in the comparison technique, or both. Similarly, a measured similarity could be the result either of a good segmentation, or a failure in the comparison technique. Any use of the comparison technique is, therefore, only valuable once some level of confidence in the comparison technique itself has been established. The 23 datasets that we had available to us had a 5 mm interslice spacing. This data was attractive for these tests, as they had been previously segmented and reconstructed using a semi-automatic method, and had been manually validated by an expert. In addition, 4 of these semi-automatically reconstructed datasets were clinically approved, which made them ideal for comparing the registration results. That they have a 5 mm, rather than a 1 mm, interslice spacing, was not significant, as it did not affect the method. The only aspect of the work that this larger interslice spacing could affect was the simplified Kang segmentation. A smaller interslice spacing could allow us to base the adaptive threshold on a 3-D neighborhood, rather than the 2-D neighborhood that is currently used. The influence of noise is of interest for a segmentation method. For example, both Kang et al. [1] and Yao et al. [7] have considered the effect of noise on their segmentation methods. Given the level of noise robustness of these methods, we felt that it was of greater interest to compare the results of the automatic validation method on the results of multiple segmentations. Further, the technique has been demonstrated on real patient data, acquired in a clinical setting. The amount of noise present in these images is, therefore, commensurate with what could be expected in practice. A thorough evaluation of the influence of noise is more important in studies where simulated data is used. A number of avenues remain for future research. While the presented contour function is an effective parameterization of a region boundary, there are other methods that could be used to encode and compare contours, such as techniques based on principle components analysis. Higher-order statistical properties of the contour functions could also be analyzed to further identify particular known classes of segmentation failure. In this way, contours could be identified as landmark features in an

334

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 25, NO. 3, MARCH 2006

anatomical atlas. We have given one example of a particular segmentation method and the failures that can result, but different segmentation methods will likely fail in different situations and different ways. The relationship between a specific segmentation method and the statistical properties of its resulting contours needs to be established. Finally, the notion of shape analysis can be extended directly to the evaluation and comparison of 3-D surfaces, following reconstruction. REFERENCES [1] Y. Kang, K. Engelke, and W. A. Kalender, “A new accurate and precise 3-D segmentation method for skeletal structures in volumetric CT data,” IEEE Trans. Med. Imag., vol. 22, no. 5, pp. 586–598, May 2003. [2] V. Grau, U. U. J. Mewes, M. Alcaniz, R. Kikinis, and S. K. Warfield, “Improved watershed transform for medical image segmentation using prior information,” IEEE Trans. Med. Imag., vol. 23, no. 4, pp. 447–458, Apr. 2004. [3] A. Elmoutaouakkil, E. Peyrin, J. Elkafi, and A. Laval-Jeantet, “Segmentation of cancellous bone from high-resolution computed tomography images: Influence on trabecular bone measurements,” IEEE Trans. Med. Imag., vol. 21, no. 4, pp. 354–362, Apr. 2002. [4] T. B. Sebastian, H. Tek, J. J. Crisco, and B. B. Kimia, “Segmentation of carpal bones from CT images using skeletally coupled deformable models,” Med. Image Anal., vol. 7, pp. 21–45, 2003. [5] X. M. Pardo, M. J. Carreira, A. Mosquera, and D. Cabello, “A snake for CT image segmentation integrating region and edge information,” Image Vis. Comput., vol. 19, pp. 461–475, 2001. [6] K. Haris, S. Efstratiadis, and A. Katsaggelos, “Hybrid image segmentation using watersheds and fast region merging,” IEEE Trans. Image Process., vol. 7, no. 12, pp. 1684–1699, Dec. 1998. [7] W. Yao, P. Abolmaesumi, M. Greenspan, and R. E. Ellis, “An estimation/correction algorithm for detecting bone edges in CT images,” IEEE Trans. Med. Imag., pp. 997–1010, Jan. 2004, submitted for publication. [8] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” Int. J. Comput. Vis., vol. 17, no. 4, 1988. [9] S. Shiffman, G. D. Rubin, and S. Napel, “Medical image segmentation using analysis of isolable-contour maps,” IEEE Trans. Med. Imag., vol. 19, pp. 1064–1074, Nov. 2000. [10] D. V. Iosifescu, M. E. Shenton, S. K. Warfield, J. D. R. Kiki-nis, F. A. Jolesz, and R. W. McCarley, “An automated registration algorithm for measuring mri subcortical brain structures,” NeuroImage, vol. 6, pp. 12–25, 1997. [11] S. Warfield, R. Mulkern, C. Winalski, F. Jolesz, and R. Kikinis, “An image processing strategy for the quantification and visualization of exercise induced muscle mri signal enhancement,” J. Magn. Reson. Imag., vol. 11, no. 5, pp. 525–531, 2000.

[12] T. S. Yoo, M. J. Ackerman, and M. Vannier, “Toward a common validation methodology for segmentation and registration algorithms,” in Proc. MICCAI 2000: 3rd Int. Conf. Medical Image Computing and Computer-Assisted Intervention, A. M. DiGioia and S. Delp, Eds., 2000, pp. 422–431. [13] R. E. Hagan, K. E. Mark, L. Wang, S. Joshi, M. I. Miller, and R. D. Bucholz, “Mesial temporal sclerosis and templobe epilepsy: MR imaging deformation-based segmentation of the hippocampus in five patients,” Radiology, vol. 216, pp. 291–297, 2000. [14] K. H. Zou, W. M. Wells, R. Kikinis, and S. K. Warfield, “Three validation metrics for automated probabilistic image segmentation in brain tumors,” Statist. Med., vol. 23, pp. 1259–1282, 2003. [15] A. P. Zijdenbos, B. M. Dawant, R. A. Margolin, and A. C. Palmer, “Morphometric analysis of white matter lesions in MR images: Method and validation,” IEEE Trans. Med. Imag., vol. 13, pp. 716–724, Dec. 1994. [16] G. Gerig, M. Jomier, and M. Chakos, “Valmet: A new validation tool for assessing and improving 3D object segmentation,” in Proc 4th Int. Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI 2001), M. A. Viergever, Ed., 2001, pp. 516–523. [17] M. Everingham, H. Muller, and B. Thomas, “Evaluating image segmentation algorithms using the Pareto fron,” in Lecture notes in Computer Science. Berlin, Germany: Springer-Verlag, 2002, vol. 2353, pp. 34–48. [18] R. C. Gonzales and R. E. Woods, Digital Image Prossing, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 2002, ch. 11. [19] J. R. Bennett and J. S. Mac Donald, “On the measurement of curvature in a quantized environment,” IEEE Trans. Comput., vol. C-24, Aug. 1975. [20] C. T. Zahn and R. Z. Roskies, “Fourier descriptors for plane closed curves,” IEEE Trans. Comput., vol. C-21, Mar. 1972. [21] P. Peixoto, J. Goncalves, and H. Araujo, “Real-time gesture recognition system based on contour signatures,” in Proc. Int. Conf. Pattern Recognition, vol. 1, 2002, pp. 447–450. [22] L. Wang, T. Tan, H. Ning, and W. Hu, “Silhouette analysis-based gait recognition for human identification,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 12, pp. 1505–1518, 2003. [23] E. L. Brill, “Character recognition via Fourier descriptors,” in WESCON Dig. Tech., Papers, Qualitative Pattern Recognition Through Image Shaping. WESCON, 1968, Session 25. [24] E. Persoon and K.-S. Fu, “Shape discrimination using Fourier descriptors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 3, May 1986. [25] W. Niblack, An Introduction to Digital Image Prossing. Upper Saddle River, NJ: Prentice-Hall, 1986, ch. 5. [26] P. J. Besl and N. D. McKay, “A method for registration of 3-D shapes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 14, no. 2, pp. 239–256, Feb. 1992. [27] W. E. Lorensen and H. E. Cline, “Marching cubes: A high resolution 3D surface construction algorithm,” in Proc. SIGGRAPH ’87, vol. 21, 1987, pp. 163–169.

Suggest Documents