Med Biol Eng Comput (2012) 50:877–884 DOI 10.1007/s11517-012-0929-1
ORIGINAL ARTICLE
Interactive segmentation of plexiform neurofibroma tissue: method and preliminary performance evaluation Lior Weizman • Lior Hoch • Dafna Ben Bashat • Leo Joskowicz • Li-tal Pratt • Shlomi Constantini Liat Ben Sira
•
Received: 8 October 2011 / Accepted: 31 May 2012 / Published online: 16 June 2012 International Federation for Medical and Biological Engineering 2012
Abstract Plexiform neurofibromas (PNs) are a major manifestation of neurofibromatosis-1 (NF1), a common genetic disease involving the nervous system. Treatment decisions are mostly based on a gross assessment of changes in tumor using MRI. Accurate volumetric measurements are rarely performed in this kind of tumors mainly due to its great dispersion, size, and multiple locations. This paper presents a semi-automatic method for segmentation of PN from STIR MRI scans. The method starts with a user-based delineation of the tumor area in a single slice and automatically segments the PN lesions in the entire image based on the tumor connectivity. Experimental results on seven datasets, with lesion volumes in the range of 75–690 ml, yielded a mean absolute volume error of 10 % (after manual adjustment) as compared to manual segmentation by an expert radiologist. The mean
L. Weizman (&) L. Hoch L. Joskowicz School of Engineering and Computer Science, The Hebrew University of Jerusalem, Jerusalem, Israel e-mail:
[email protected];
[email protected] L. Hoch e-mail:
[email protected] L. Joskowicz e-mail:
[email protected] D. Ben Bashat L. Pratt S. Constantini L. Ben Sira Tel Aviv Medical Center, Tel Aviv, Israel e-mail:
[email protected] L. Pratt e-mail:
[email protected] S. Constantini e-mail:
[email protected] L. Ben Sira e-mail:
[email protected]
computation and interaction time was 13 versus 63 min for manual annotation. Keywords Segmentation Neurofibromatosis-1 Plexiform neurofibromas PNS tumors Shwannoma
1 Introduction Neurofibromatosis-1 (NF1) is a common genetic disorder associated with the frequent occurrence of central and peripheral nervous system tumors [9]. The prevalence of NF1 among children is approximately 1:3,000 [12]. The lifelong follow-up of children and adults with NF involves many challenges, including the identification of the most appropriate time for surgical and oncological treatments. Tumors in NF involve the central and peripheral nervous system. Plexiform neurofibromas (PNs) are one of the hallmarks of NF1. They can appear later in life, although most of them are congenital and are present at birth. The prevalence is estimated to be 30–40 % in NF1 patients [5]. The number may be actually higher since many of the internal PNs are not diagnosed without appropriate imaging [21]. PN lesions are typically very large, complex, and have irregular shapes. They exhibit an erratic growth rate, with periods of spontaneous growth, followed by spontaneous stabilizations [14, 19]. PNs can involve different parts of the body (head, neck, trunk) and may extend into surrounding structures. As a result of their extent and location, PN can cause disfigurement and may compromise function or even jeopardize life. The most serious complication is the transformation of PN lesions into malignant peripheral nerve sheath tumors (MPNST). This occurs in 10 % of NF1 affected individuals, and is a leading cause of death in
123
878
Fig. 1 STIR coronal cross-section image with PN lesions contours marked in red (color figure online)
these patients [21, 24]. Accurate tumor quantification will help to identify this stage of malignant transformation and thus refer for surgical resection and oncological treatment. Magnetic resonance imaging (MRI) is the method of choice for noninvasive assessment of tumors. The scan protocol of MRI consists of several conventional and advanced pulse sequences. STIR (short-T1 inversion recovery) is the common pulse sequence to detect and characterize PN. In this pulse sequence, the tumors appear bright in relation to the surrounding tissues. Figure 1 shows an example of PN lesions in a STIR scan.
Med Biol Eng Comput (2012) 50:877–884
A variety of methods for tumor segmentation and quantification have been published so far. Some of them address the segmentation of brain tumors [4, 16, 17, 20] while others focus on body tumors, e.g., [6, 13]. However, very few automatic and semi-automatic methods for PN segmentation have been proposed in the literature. Solomon et al. [19] describe a method for volume quantification of PN based on an initial user definition of its location, followed by a histogram-based method that detects the PN areas in the manually defined area. Cai et al. [2] present a method for semi-automatic detection in whole body STIR scans in which the user is required to identify the center of every PN lesion. The method then expands automatically from the user’s predefined location to segment the entire PN lesion according to statistical measures. Both of these methods require intensive, continued, and time-consuming user interaction during the entire segmentation process for many tumors. In this paper, we present a new method for the segmentation of PNs from STIR MRI scan. The method utilizes a user-based initial delineation of the tumor area in a single slice to automatically segment the PN lesions in the entire scan. The method was tested on seven datasets with PN lesions. The mean absolute volume error is 10 % (after manual adjustment) and the mean volume overlap error is 27 % as compared to manual segmentation by an expert radiologist. The average computation time of our method, including the user interaction time, is 13 min. In contrast, the mean manual annotation time is approximately 63 min.
2 Method The input of our method is an MRI STIR image. We represent an MR image with L slices, each with n pixels, as the set {Si}Li=1, where Si ¼ fvi1 ; . . .; vin g and vij represents the intensity value of the jth pixel in the ith slice. The method consists of five steps (Fig. 2):
Fig. 2 Illustration of the steps of our method. Red background indicates a process which requires user interaction, and blue background indicates automatic processes (color figure online)
123
Med Biol Eng Comput (2012) 50:877–884
1. 2. 3. 4. 5.
PN region of interest selection. Slices intensity normalization. Automatic threshold calculation in a single slice. 3D region growing. Manual adjustment.
Step 4 of our method operates on the entire 3D volume of the scan, while the other steps operate on 2D slices of the scan individually. We describe in detail each step next. 2.1 PN region of interest selection PNs can be characterized by single or up to tens of lesions, which are mostly connected throughout the image. In this initial step, the user defines a region of interest (ROI) on a selected slice. The slice is selected so that the PN lesions appearing in it are connected with most of the PN lesions in the entire image. When such a slice does not exist, the user defines several ROIs on several slices. The selected slices are denoted by: fSr1 ; . . .; SrN g: The selected ROIs are required to contain the PN lesions and exclude connected normal tissues with similar intensity values. The automatic method relies on the assumption that there are PN lesions that are connected in 3D space and appear in more than a single slice. Figure 3a shows an example of a user-selected ROI. 2.2 Slices intensity normalization The MR signal is mostly attenuated in the marginal slices of the scans. Therefore, normalization of the scan is required [7]. Since voxels with PN have high intensity values, we perform the normalization with respect to the high-intensity pixels in every slice. First, the K-means algorithm [11] is applied to cluster the pixels in every ith slice into two clusters:
879
fC1i gLi¼1 ¼ fv1 ; . . .; vN1i g
ð1Þ
and fC2i gLi¼1 ¼ fv1 ; . . .; vN2i g
ð2Þ
where N1i and N2i represent the number of the pixels of the ith slice that in clusters 1 and 2, respectively, where cluster 1 includes pixels with higher intensity values and cluster 2 includes pixels with lower intensity values. In most cases dark and bright areas share only a small portion of the grey-level intensities; thus, we use a simple random seeding for the K-means algorithm. Then, every pixel in every slice is multiplied by the ratio factor: \fC1i g [ \fC1r1 g [
ð3Þ
where \ • [ denotes the average of a set, and {C1r1} are the intensity values of the pixels in slice Sr1, which was selected by the user in the previous step. The higher intensity values of every slice in the image are thus normalized with the higher intensity values of the first slice from those that were selected in the previous step. 2.3 Automatic threshold calculation The pixels that represent tumor intensity in each slice are automatically identified with a histogram-based analysis on the selected ROI using the method of Solomon et. al. [19]. In that method, the histogram values are divided into normal tissue and PN lesion values based on a threshold defined as the global minimum between the two main peaks in the histogram. When several ROIs are defined by the user, this step is repeated for every ROI.
Fig. 3 a ROI selected by the user on a STIR image. 3D region growing (b) PN segmentation propagates from an original slice (c) to an adjacent slice
123
880
Med Biol Eng Comput (2012) 50:877–884
Fig. 4 Automatic method results: a before and b after manual adjustment. The white arrow points to a manually added PN lesion (color figure online)
2.4 3D region growing Since PNs are mostly continuous throughout the STIR image, the connectivity analysis starts from the previously selected ROIs and propagates throughout the entire image. A connected components analysis [1] is performed slice by slice to find the hyper-intensity voxels in the entire image which are connected to the PN lesions that were previously found in the rith slices, on the slices that were selected by the user in Sect. 2.1. A voxel is considered to have hyperintensity value if its grey-level is above the defined threshold. The result is a connected hyper-intensity voxel that can either represent PN lesion or hyper-intensity healthy tissues. Figure 3b, c illustrates the 3D region growing results. 2.5 Manual adjustment Inevitably, there are cases in which isolated PN lesions exist in the image. In addition, hyper-intensity healthy tissues are sometimes connected to the PN lesions and therefore can be wrongly identified as PNs, and therefore leaks/over-segmentations may occur. To cope with this phenomenon, the user can manually eliminate healthy, wrongly segmented areas or add certain PN areas in the image to the final PNs segmentation. Figure 4 illustrates the automatic method results, before and after manual adjustment.
3 Experimental results To evaluate our method, we acquired MRI datasets of seven patients with NF having PN lesions. The MR images were acquired by 1.5 Tesla MR system (GE Signa EXCITE HDx, Milwaukee, WI, USA) at the Tel-Aviv Sourasky Medical Center and included STIR images.
123
The study was approved by the local ethical research committee. Images were acquired in coronal or axial planes. The spatial dimensions of the datasets varies between 256 9 256 9 16 and 512 9 512 9 59. The voxel size in the datasets varies between 0.4 9 0.4 9 3.3 and 1.9 9 1.9 9 9 mm3. An expert radiologist manually produced ground-truth segmentations for each scan. The lesions’ volumes in the datasets are in the range of 75–690 ml, and the time required for manual segmentation of those lesions (which can be considered as a measure for their complexity) is in the range of 10–180 min. Two additional raters blindly used our method to segment the PN lesions in the scans. Table 1 summarizes the segmentation results computation time, absolute volume error and volume overlap error as compared to manual segmentation by an expert radiologist. The values that refer to the results of the proposed method are the average values between the raters. The absolute volume error is defined as the difference in volume between the segmentation result and the ground truth volumes, regardless of their location in space. The volumetric overlap error, also known as 1-(Dice coefficient), is defined as: T 2 jS Rj VOE ¼ 1 100 jSj þ jRj where S and R refer to the two segmented volumes that are being compared. A voxel-perfect segmentation results in a volumetric overlap error of zero. See [8] for a detailed explanation of the validation measures used and their computation methodology. The overall mean absolute volume error is 10 % (after manual adjustment) as compared to manual segmentation by an expert radiologist. The overall mean volume overlap error is 27 % as compared to manual segmentation by expert radiologist. Figure 5 shows sample results versus
Med Biol Eng Comput (2012) 50:877–884
881
Table 1 Segmentation results Dataset #
Tumor location
Manual segmentation time (min:s)
Proposed method: automatic segmentation time (min:s)
Proposed method: manual interaction time (min:s)
Proposed method: total segmentation time (min:s)
Manually segmented volume (ml)
Automatically segmented volume (ml)
Absolute volume error (%)
Overlap error (%)
1
Spine
180:00
4:00
11:20
15:20
690.33
560
18.89
44.18
2
Shoulder
120:00
3:20
12:55
16:15
594.17
565.4
4.85
16.31
3
Cervical spine
90:00
2:20
15:53
18:13
392.46
352.21
10.26
22.3
4
Pelvis
15:00
1:30
11:50
13:20
302.56
277.92
8.14
34.5
5
Pelvis
15:00
1:00
10:18
11:18
303.73
269.67
11.22
36.63
6
Cervical spine
10:00
3:00
7:15
10:15
74.6
72.15
3.28
12.24
7
Cervical spine
10:00
1:10
5:45
6:55
82.63
71.79
13.11
22.14
Avg
62:52
2:20
10:45
12:44
348.64
309.88
9.96
26.9
SD
68:10
3:25
3:04
3:44
234.15
202.27
5.25
11.7
Cases are ordered by their level of complexity (from the most difficult to the simplest)
Fig. 5 Manual (red) versus automatic (green) segmentations of PN lesions (color figure online)
ground truth segmentations. As will be explained later, the results can be improved by additional manual adjustment. To provide a measure for the reliability of our method with respect to other methods in the field, we applied the method presented in Solomon et al. [19] on our datasets. Table 2 presents the results of our method compared with the results of [19]. It can be seen that our method outperforms the cited method in terms of absolute and overlap errors. Based on our results, we observe that the database can be divided into two groups: the first one includes PN lesions which require several hours for manual segmentation (rows 1–3 in Table 1); the second one includes PN lesions which can be manually segmented in relatively short time (rows 4–7 in Table 1). Although the algorithm segmentation time is comparable with the manual segmentation time in the easy cases, the major contribution of the method can be seen in the difficult cases, where the automatic segmentation process is expected to be completed in 13–17 min, while the manual segmentation
process requires 90–180 min, depending of the complexity on the PN lesions in the STIR image. 3.1 Inter-rater variability To provide a measure for the consistency of our method, we also measure the level of variability between the raters. Table 3 summarizes the segmentation results computation time, absolute volume variability and volume overlap variability between the two raters. It should be noted that based on our experience, the main reason for the inter-rater variability of our method is the final manual adjustment step, as the other steps are performed automatically. The resulting average absolute volume variability between the raters is 6 %, which is similar to the one reported in Solomon et al. [19]. However, we believe that the more important measure for consistency is the inter-rater overlap variability, which provides spatial information in addition to the total volume variation. The average inter-rater overlap variability of our method is 18 %. Since in most
123
882
Med Biol Eng Comput (2012) 50:877–884
Table 2 Our method versus Solomon et al. [19] method Dataset #
Manually segmented volume (ml)
Proposed method: segmented volume (ml)
Solomon et al.: segmented volume (ml)
Proposed method: absolute volume error (%)
1
690.33
560
494.41
18.89
2
594.17
565.4
501.16
3
392.46
352.21
276.8
4
302.56
277.92
347.18
8.14
5
303.73
269.67
630.85
11.22
6
74.6
72.15
82.02
3.28
9.95
7
Solomon et al.: absolute volume error (%)
Proposed method: overlap error (%)
Solomon et al.: overlap error (%)
28.38
44.18
45.91
4.85
15.65
16.31
34
10.26
29.47
22.3
32.08
14.75
34.5
42.45
36.63
61.52
12.24
21.8
107.7
82.62
71.79
38.38
13.11
53.54
22.14
66.33
Avg
348.64
309.88
338.69
9.96
37.06
26.9
43.44
SD
234.15
202.27
222.1
5.25
34.37
11.7
16.04
Cases are ordered by their level of complexity (from the most difficult to the simplest) Table 3 Inter-rater results Dataset #
Rater #1: manual interaction time (min:s)
Rater #2: manual interaction time (min:s)
Rater #1: segmented volume (ml)
Rater #2: segmented volume (ml)
Inter-rater absolute variability (%)
1
11:30
11:10
578.10
541.86
6.69
17.43
2
10:30
15:20
561.82
568.94
1.25
7.75
3
11:30
20:15
337.22
367.2
8.16
22.26
4
13:20
10:20
263.02
292.82
10.18
33.3
5
11:20
9:15
254.05
285.27
10.94
25.72
6
5:30
9:00
72.31
71.98
0.456
10.32
7
5:40
5:50
73.35
70.23
4.45
11.11
Avg
9:54
11:41
305.7
314.04
6.02
18.27
SD
3:04
4:45
205.65
199.59
4.14
9.34
Inter-rater overlap variability (%)
Cases are ordered by their level of complexity (from the most difficult to the simplest)
cases the clinical decision is based on the follow-up of the tumor growth (by various observers) rather than the absolute volume of the tumor at a specific point in time, this inter-rater variability can serve as an estimation for the change in volume that can be detected with our method.
user interaction are sufficient to obtain reliable PN segmentation results.
3.2 Time versus accuracy trade-off
We have presented a new method for the semi-automatic segmentation of PN lesions based on STIR MR images. The current manual segmentation of PNs is time consuming, error prone, and may require several hours for a single dataset. Our segmentation method effectively incorporates initial user delineation on a single slice to reliably detect the PN lesion in the entire 3D scan. Our method consists of five of steps, some of which incorporate and extend previously published methods. Intensity normalization of MR scan was previously introduced, e.g., in [3, 10, 18, 23]. Histogram-based PN detection was presented in [19], and 3D region growing for PN segmentation was presented in [2]. However, our method is unique in terms of the time and effort required
Our methods requires manual adjustment process, to finalize the automatic segmentation process. An experimental analysis of the performance of the method versus the manual user interaction time is presented in Fig. 6. We observe that the segmentation improves with the time spent on manual adjustment. However, the decision regarding the time required for adjusting the automatic PN segmentation result must take into account the nature of the PN lesion in the specific patient and the accuracy required for clinical significance. Therefore, although we cannot predict the time required for manual adjustment, we conclude from our experiments that in most cases several minutes of focused
123
4 Discussion
Med Biol Eng Comput (2012) 50:877–884
883
segment PNs in a more user friendly, less time consuming manner. The method may be beneficial for large-scale multi-institutional clinical trials where with large patient numbers, it may be difficult to perform manual segmented measurements.
Volume overlap error vs. manual correction time 100
Volumetric overlap error [%]
90 80 70 60 50
4.1 Future work
40 30 20 10 0
0
50
100
150
200
250
300
350
400
450
500
Manual correction time [sec]
Fig. 6 Analysis of method performance versus user interaction time: mean values
from the user to segment the PN lesion in a 3D scan. Unlike previously published methods for this task, our method does not require a separate user interaction for each slice/PN lesion in the image to perform the automatic segmentation. The unique nature of PN precludes the use of state-ofthe-art tumor segmentation methods, as they fail or yield poor results. However, despite these difficulties, our method yields results comparable to interobserver variability reported in [15]. This suggests that our method is reliable for PN segmentation. The potential clinical significance of PN segmentation is to provide a semi-automatic tool to reliably and efficiently determine the PN shape and boundaries, and to measure its volume. Although the accuracy required for clinical significance is unknown, the fact that our method provides an end-to-end reliable solution to the PN segmentation problem, with minimal user interaction, makes it a good candidate to become part of the clinical workflow. A prototype with a graphical user interface (GUI) for the method was developed and is currently being used as an experimental system by experienced radiologists. Our method is unique in that it requires less user interaction than manual or other semi-automatic PN segmentation methods. Unlike previously published methods that require intensive and time-consuming user interaction during the entire segmentation process, in our method the user is only required to perform coarse initial selection of the lesions area in a single slice, and a focused manual adjustment after the automatic process. The propagation of the segmentation results to the other slices in the image is performed automatically based on the unique nature of this specific tumor. As a result, our segmentation method is much more user friendly. Although the database used in our experiments is limited, we expect that the new approach would help to
We are currently extending the method to handle wholebody STIR scans. Whole-body MR imaging is used to examine the entire body in a relatively short time without ionizing radiation. A whole-body image is the preferred modality for a reliable evaluation of tumor burden in NF patients [25]. We also plan to add the FLAIR (fluid-attenuated inversion recovery) pulse sequence to eliminate the CSF and other cystic regions. This process will improve our results and reduce the false-positive errors of our method. Another direction that can be investigated is to use of the point spread function (PSF) [22] of the MR scanner to enhance the quality of the STIR image and to improve the slice intensity normalization process. Acknowledgments The authors wish to thank the Gilbert Israeli Neurofibromatosis Center (GINFC), for providing the real data and for supporting the medical aspects of the paper.
References 1. Alnuweiri H, Prasanna V (1992) Parallel architectures and algorithms for image component labeling. IEEE Trans Pattern Anal Mach Intell 14:1014–1034 2. Cai W, Kassarjian A, Bredella M, Harris G, Yoshida H, Mautner V, Wenzel R, Plotkin S (2009) Tumor burden in patients with neurofibromatosis types 1 and 2 and schwannomatosis: determination on whole-body MR images. Radiology 250:665–673 3. Colleweta G, Strzeleckib M, Mariettea F (2004) Influence of MRI acquisition protocols and image intensity normalization methods on texture classification. Magn Reson Imaging 22:81–91 4. Corso J, Sharon E, Dube S, El-Saden S, Sinha U, Yuille A (2008) Efficient multilevel brain tumor segmentation with integrated bayesian model classification. IEEE Trans Med Imaging 27(5):629–640 5. Dombi E, Solomon J, Gillespie AJ et al (2007) NF1 plexiform neurofibroma growth rate by volumetric MRI: Relationship to age and body weight. Neurology 68:643–647 6. Farmaki C, Marias K, Sakkalis V, Graf N (2010) Spatially adaptive active contours: a semi-automatic tumor segmentation framework. Int J Comput Assist Radiol Surg 5(4):369–384 7. Foo T, Hayes C, Kang YW (1992) Reduction of RF penetration effects in high field imaging. Magnet Reson Med 23:287–301 8. Gerig G, Jomier M, Chakos M (2001) Valmet: a new tool for assessing and improving 3D object segmentation. Lect Notes Comput Sci 2208:516–523 9. Huson S, Hughes R (1994) The neurofibromatoses: a pathogenetic and clinical overview. Chapman and Hall, London 10. Jager F, Hornegger J (2009) Nonrigid registration of joint histograms for intensity standardization in magnetic resonance imaging. IEEE Trans Med Imaging 28:137–150
123
884 11. Kanungo T, Mount D, Netanyahu N, Piatko C, Silverman R, Wu A (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892 12. Lammert M, Friedman J, Kluwe L, Mautner V (2005) Prevalence of neurofibromatosis 1 in german children at elementary school enrollment. Arch Dermatol 141:71–74 13. Lingurarua MG, Yao J, Gautam R, Peterson J, Li Z, Linehan WM, Summers RM (2009) Renal tumor quantification and classification in contrast-enhanced abdominal CT. Pattern Recognit 42(6):1149–1161 14. Mautner V, Asuagbor F, Dombi E, Fnsterer C, Kluwe L, Wenzel C, Widemann B, Friedman JM (2008) Assessment of benign tumor burden by whole-body MRI in patients with neurofibromatosis 1. Neuro-Oncology 10:593–598 15. Poussaint T, Jaramillo D, Chang Y, Korf B (2003) Interobserver reproducibility of volumetric mr imaging measurements of plexiform neurofibromas. AJR 180:419–423 16. Prastawa M, Bullitt E, Bullitt N, Leemput K, Gerig G (2003) Automatic brain tumor segmentation by subject specific modification of atlas priors. Acad Radiol 10:1341–1348 17. Prastawa M, Ho S, Gerig G (2004) A brain tumor segmentation framework based on outlier detection. Med Image Anal J 8(3):275–283 18. Shaha M, Xiaoa Y, Subbannaa N, Francisb S, Arnoldb DL, Collinsb DL, Arbela T (2011) Evaluating intensity normalization
123
Med Biol Eng Comput (2012) 50:877–884
19.
20.
21.
22.
23.
24.
25.
on MRIs of human brain with multiple sclerosis. Med Image Anal 15:267–282 Solomon J, Warren K, Dombi E, Patronas N, Widemann B (2004) Automated detection and volume measurement of plexiform neurofibromas in neurofibromatosis 1 using magnetic resonance imaging. Comput Med Imaging Graph 28:257–265 Solomon J, Butman JA, Sood A (2006) Segmentation of brain tumors in 4D MR images using the hidden markov model. Comput Methods Programs Biomed 84(2-3):76–85 Tucker T, Friedman J, Friedrich R, Funstere C, Mautner V (2009) Longitudinal study of neurofibromatosis 1 associated plexiform neurofibromas. J Med Genet 46:81–85 van Horssen P, Siebes M, Hoefer I, Spaan J, van den Wijngaard J (2010) Improved detection of fluorescently labeled microspheres and vessel architecture with an imaging cryomicrotome. Med Biol Eng Comput 48:735–744 Weisenfeld NL, Warfield SK (2004) Normalization of joint image-intensity statistics in MRI using the Kullback–Leibler divergence. Proceedings of IEEE international symposium on biomedical imaging: nano to macro (ISBI). vol 1, pp 101–104 Williams V, Lucas J, Babcock M, Gutmann D, Korf B, Maria B (2009) Neurofibromatosis type 1 revisited. Pediatrics 123: 124–133 Woodruff J (1999) Pathology of tumors of the peripheral nerve sheath in type 1 neurofibromatosis. Am J Med Genet 89:23–30