Segmentation technique for detecting suspect masses in dense breast digitized images as a tool for mammography CAD schemes 1
Homero Schiabel1
Vivian T. Santos1
LAPIMO – Lab. de Análise e LAPIMO – Lab. de Análise e Processamento de Imagens Médicas Processamento de Imagens Médicas e Odontológicas e Odontológicas EESC – USP EESC – USP Av. Trabalhador Sãocarlense, 400 Av. Trabalhador Sãocarlense, 400 13566-590 – São Carlos (SP), Brasil 13566-590 – São Carlos (SP), Brasil +55 (16) 33739355 +55 (16) 33739366
[email protected]
[email protected]
Michele F. Angelo1,2
1
LAPIMO – EESC – USP Departamento de Tecnologia Univ. Estadual de Feira de Santana Av. Universitária, s/n 44031-460 – Feira de Santana (BA), Brasil +55 (75) 32248056 2
[email protected]
Abstract Breast cancer is one of the most important cause to mortality rate among women. Computer-Aided Detection (CAD) schemes have been developed as a tool in detecting early breast cancer. This can be an important tool in mammography since previous studies have been indicated that the detection of breast cancer can be increased up to 20% when assisted by a CAD scheme. One of the main stages of such process is thus the segmentation of structures of interest, as the suspect masses. However, when evaluating mammograms obtained from dense breasts, a CAD scheme efficacy can be very reduced due to the poor contrast of such type of image. This work attempts hence to this challenge, by describing a methodology for segmenting suspect masses in dense breast images as a part of a CAD scheme under development. This methodology is based on the Watershed transformation, which is combined with two other procedures – a histogram equalization, working as pre-processing for enhance images contrast, and a labeling procedure intended to reduce noise. Tests with a set of 252 regions of interest extracted from 130 digitized mammograms have registered a scheme sensibility of 92% with about 90% of specificity. These results are promising when applied to dense breast images, which can improve significantly the performance of a processing scheme for such type of cases in mammography.
1. INTRODUCTION
CR Categories
Computer-aided detection (CAD) scheme is therefore the synthesis of such processes. Its main purpose is working as a “second opinion” to the radiologist in searching for the structures of clinical interest during the mammography analysis. One of its main stages is these structures segmentation on the digitized image. Although recent, CAD schemes have been a useful tool in mammography clinical practice in some radiological centers. Previous studies have been pointed that the early breast cancer detection rate can be increased up to 20% when assisted by a CAD scheme, reducing simultaneously the rates of exam repetition [12]. Recent researches have been revealed great potential relative to the use of CAD schemes in breast cancer screening and in its early detection. Using CAD schemes as a second opinion in mammography has increased the number of early detected breast cancers, as well as it has improved the radiologists’ performance. This is particularly important for those with less experience, since the right diagnoses rate has been close to those obtained from more experienced radiologists.
In mammography, identifying correctly the suspect structures is very important for breast cancer screening, since it can avoid the evolution of such a disease. However, it is well know that there is a concern in detecting some important details in mammograms due to the poor contrast between, for example, microcalcifications and the background, mainly for dense breasts. In fact, even with a detailed image visual scanning, in general with amplifying lens, there is a significant potential for missing some information due to the mammographic image characteristics [1]. From the development of more sophisticated radiographic systems along the 1990’s, an increased interest has been registered regarding automatic schemes for aiding the diagnosis in radiology. Simultaneously, digital image processing techniques have been developed with more efficiency, and they have been turned into an important tool helping the radiologist analysis [2]. Among the main developed techniques driven to mammography, the main attention was given to those intended to detect and/or classify microcalcifications [3,4,5,6], suspicious masses and tumors [3,7,8], besides preprocessing techniques designed to enhance mammographic images contrast [9,10,11].
D.1.1 [Programming Techniques]: Applicative (Functional) Programming; I.4.6 [Image Processing and Computer Vision]: Segmentation.
General Terms Algorithms, Performance, Experimentation, Theory.
Keywords Computer-aided detection; mammography; masses segmentation. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’08, March 16-20, 2008, Fortaleza, Ceará, Brazil. Copyright 2008 ACM 978-1-59593-753-7/08/0003…$5.00
1333
where: 0 ≤ rk ≤ 1, k = 0, 1, …, L-1
According to Vyborny et al. [13], 3 factors are taking into account when evaluating the performance of techniques intended to detect suspicious lesions in mammography: sensitivity, specificity and the database nature. Sensitivity indicates the number of right positive results (true positive rate) and specificity, the numbers of right negative results (true negative rate). A diagnosis scheme needs to be evaluated considering these features: detecting 100% of suspect regions has no significance if, meanwhile, the scheme yields as large number of false detections from “benign images” (false positive rates). The problem with these false positive rates in mammography is that they mean unnecessary biopsies and cyto-pathological exams, which could be avoided if the detection scheme had better specificity. Current CAD schemes for mammography have been shown about 90% of sensitivity for microcalcifications detection and 80% for nodules detection, with specificity around one false positive per image [14,15].
Its inverse function is given by: rk = T −1 (S k )
(a)
As an important step in CAD schemes development is the digitized mammograms segmentation in order to detect suspicious masses, this work describes a method with this aim, with emphasis in dense breast images. Dense breasts correspond to an important challenge for image processing schemes, since the contrast among the relevant structures is very poorer than for fatty breasts images. They are characteristic of young women, but the incidence rates regarding such type of breast images has increased significantly in the last ten years for women in the ages of 45-50 years or even older. In addition, dense breasts images are also a constraint for screening mammography.
The Watershed transform was used in the next step, after the histogram equalization. The method investigates the image histogram gray levels, since it is based on the principle that structures of interest correspond to a same frequency, with gradient and borders corresponding to high frequencies [16]. In the original Watershed Transformation, the drop “falls down” until the point of minimum, filling the vase. Beucher & Meyer [16] proposed the transform inversion, so that the point of minimum would be a hole by which the water would fill the vase. Hence, the technique was implemented according to the following steps, when the original ROI is displayed on the monitor (fig. 2a);
A pre-processing step was developed since the emphasis of such procedure is in evaluating images corresponding to high density breasts, which show poor contrast. Thus this step consisted in the image histogram equalization in order to increase such a contrast. In fig. 1 a region of interest (ROI), extracted from a mammogram, before and after equalization is shown as well as the respective histograms.
(a) the internal (object) and external (background) gradients area calculated for the whole image; the brightest pixels (points of minimum) are calculated and stored; they will be the image markers (fig. 2b); (b) the gradient is inverted and the distances are calculated in order to compose the skeletal image by the analysis of the neighborhood in addition to the points elimination, yielding the influence zones (Hit-and-miss: eliminating the object in order to work with the background – sequence of thinning in the background and thickening in the first plane). By using the gradient absolute value, a type of “topography” is established (Fig. 3); homogeneous regions form the plane regions on the “topography”;
The histogram equalization intends to get uniform histogram by spreading the gray scale distribution along the contrast scale in order to increase the image details detection. Each element is given by:
nk n
(1) where: 0 ≤ rk ≤ 1; k = 0, 1, ..., L-1, where L is the number of digital image gray levels; n = number of pixels in the image; pr(rk) = probability of gray level of k order; nk = number of pixels with gray level corresponding to k.
(c) “low regions flood”, from the points of minimum which work as seed points for the region growing; the stop point is the image gradient; areas marked with different “water levels” correspond to the segments divided by the watershed transformation lines (Fig. 4);
For the calculation, the cumulative distribution function is used: k
nj
j =0
n
= ∑ pr (rj ) k
j =0
(b)
2.1 The Watershed Transform-based algorithm
2. METHODOLOGY
S k = T (rk ) = ∑
(3)
(c) (d) Fig. 1. Example of histogram equalization used for detecting masses in dense breast images. (a) original ROI; (b) ROI after pre-processing; (c) and (d) respective histograms.
Therefore, the mass segmentation technique presented here is divided in four steps: pre-processing, histogram equalization, image processing by using the Watershed Transformation and post-processing by labeling connect components in order to reduce noise and hence the false positive rates. This method was tested with a number of images corresponding to dense breasts cases from our database and its sensitivity and specificity were verified in order to allow proposing it as a stage of a mammography CAD scheme under development.
pr (rk ) =
p / 0 ≤ Sk ≤ 1
(d) image is then converted in binary and the mass is stressed from the rest of the image, as illustrated in Fig. 5;
(2)
1334
(e) the final result, with the mass outlined in the original image, is shown in fig. 6.
2.2. Pos-processing In order to correct residual defects in the segmentation procedure described above, a post-processing stage was implemented, by applying a labeling technique. This procedure consists in giving a single color, for example, for pixels in a same connected region.
(a)
A pixel p, with coordinates (x,y), can have up to 8 neighbors with coordinates (x-1,y-1), (x,y-1), (x+1,y-1), (x-1,y), (x+1,y), (x-1,y+1), (x,y+1) and (x+1,y+1). This is its “8-neighborhood”. The connectivity among pixels is an important concept used to determine objects limits as well as regions components in an image. To determine if two single pixels are connected, we should verify if they are adjacent and if their gray levels satisfy given similarity criteria.
(b)
Fig. 2. (a) Example of original ROI; (b) Original ROI with markers
Thus the post-processing follows the procedure of scanning an image pixel by pixel, from left to right and up to down. Being p the pixel in any step during the scanning, and r and t, respectively, the upper and left neighbors of p, the nature of the scanning sequence assures that, when p is reached, the pixels r and t were already found and labeled if they have the value 1. If the p value is 0, the scanning is moved to the next position. If the p value is 1, r and t are examined. If both were 0, a new label is addressed to p (based on the current information, this is the first time that the connected component was found). If only one of such two neighbors was 1, its label is attributed to p. If both were 1 and had the same label, such a label is attributed to p. If both were 1, but with different labels, one of them is attributed to p and the procedure saves that both labels are equivalent (which means that the pixels r and t are connected by p). At the end, all the pixels with value 1 have been labeled, but some of these labels could be equivalent. Therefore, the next step is to order all the pairs of equivalent labels into equivalence classes, attributing a different label to each class and, then, scanning again the image, changing each label by that attributed to its equivalence class.
Fig. 3. ROI from the original image, markers and influence zones.
(a)
The labeling of 8-conneceted components was implemented here. It is made by the same way, but the two upper diagonal neighbors of p (called q and s) should be also examined. The nature of the scanning sequence assures that these neighbors will have been already processed when the procedure reaches p. If p is 0, the procedure is moved to the next position. If p is 1, and all its four neighbors are 0, a new label is attributed to p. If only one of the neighbors is 1, its label is attributed to p. If two or more neighbors are 1, one of the labels is attributed to p, and the adequate equivalences are saved. After the image scanning, the pairs of equivalent labels are ordered in equivalence classes, a unique label is attributed to each class and a second image scanning is made in order to change each label by that attributed to its equivalence class.
(b)
Fig. 4. Image with the watershed lines (a) and junction with the original image (b)
(a) (b) Fig. 5. Segmented image and its outline
After the labeling procedure, most of the pixels considered noise are eliminated by calculating the average and by excluding all the labels lower than such a value. Thus, the resulting image is aligned to the original one in gray scale, with purpose of comparison and stressing the suspect mass, alerting the radiologist for a region of interest. Although the watershed transform could yield an image with many undesirable details when applied to complex or very noisy images (problem called supersegmentation, due to the large number of local minima in the image), this can be solved by obtaining markers indicating regions to be effectively segmented and regions to be ignored.
Fig. 6. Original image with outlined mass
Fig. 7 illustrates a schematic diagram summarizing the procedures described in this section.
1335
Image histogram equalization
Digital images
Binary image: mass highlighted
“Lower regions flood” + regions growing
Y Noise
Labeling procedure to reduce noise
Equalized image exhibition
Gradient inversion and influence zones calculation
Data in Table 1 have shown therefore a good sensibility rate (92%) for this segmentation scheme and comparable to other reported schemes performances but when applied to non-dense breast images. These values have demonstrated also the efficacy in combining the histogram equalization, Watershed transformation and labeling techniques together. About 28% of such images have presented noises after segmentation besides the true-positive signals, which were reduced in 86% however with the labeling technique. Processing without the histogram equalization has yielded only 20% of accuracy in detecting suspect masses – taking as reference the radiological reports – with almost 80% of missing cases. The pre-processing step therefore was very important in the improvement in these rates, since the increase in the scheme sensibility was 4.4 times, with a decrease of only 3.4% in the specificity rate.
Gradient calculation
Calculation of the points of minimum
Resulting image
With purposes of illustration, Fig. 9 shows the results after each step considered in the described methodology for a single ROI from a dense breast mammogram.
N Resulting image
Table 1 – Results obtained from processing the 252 ROIs from digitized mammograms by using the masses segmentation technique shown in Fig. 7 (FN = false-negative, TP = truepositive, FP = false-positive and TN = true-negative rates).
Fig. 7. Schematic diagram representing the developed system for mass segmentation in mammographic images
IMAGES RESULTS FN % TP % FP % TN %
3. RESULTS Tests were performed in order to determine the efficiency of this methodology when applied to dense breast images. Therefore, a set of 130 digitized mammograms was used for testing the scheme intended to detect breast suspect masses. All mammograms were obtained from the archives of a hospital (Hospital of Clinics in Botucatu/SP, of Medicine School, UNESP), and they were digitized by a Lumiscan laser digitizer (Lumisys, Inc.), with 12 bits and 0.075mm of contrast and spatial resolutions, respectively.
122 79.2 32 20.8 7 7.1 91 92.9
After equalization 12 7.8 142 92.2 10 10.2 88 89.8
True positive [p(S/s)]
From the images set, 252 ROIs were selected in order to evaluate the masses detection (being 98 normal and 154 with masses, according to the radiologists reports). After processing all ROIs by the previously described Watershed transform-based technique, the results have been divided in two groups: one corresponding to the rates registered before using the preprocessing step for contrast enhancement, and other after the preprocessing application. Table 1 gives these results, which have pointed about 92% of true-positive (TP) rate, that is, the scheme sensibility, and 10% of false positive (FP) rate, when applying the whole segmentation procedure – including thus the contrast enhancement pre-processing. For all the tests, 3100 (corresponding to the value at the gray scale) was set as threshold for the marker in the segmentation scheme.
Original
False positive
Figure 8: ROC curve corresponding to the tests performed with digitized mammograms for masses detection (data on Table 1, after images equalization).
Fig. 8 illustrates the ROC curve corresponding to the results registered in the second line in Table 1, i.e, the effective data obtained with the whole segmentation procedure. The criterion used for such a graph determination was the variation of the threshold value for the image marker. The area under this ROC curve has registered the value Az = 0.91.
1336
radiologist is visually evaluating a mammogram from dense breasts – like those used here.
5. ACKNOWLEDGMENTS We would like to thank Hospital das Clínicas de Botucatu (SP), Brazil (FM-UNESP) and Prof. Dr. José Morceli. (a)
(b)
6. REFERENCES
(c)
(d)
[1] Giger, M.L. 2004. Computerized analysis of images in the detection and diagnosis of breast cancer. Seminars in Ultrasound, CT and MRI, 25, 411-418. [2] Davies, D.H. and Dance, D.R. 1990. Automatic computer detection of clustered calcification in digital mammograms. Physics in Medicine and Biology, 35, 8, 1111-1118. [3] Lai, S.M.; Li, X. and Bischof, W.F. 1989. On techniques for detecting circumscribed masses in mammograms. IEEE Transactions on Medical Imaging, 8, 377-386. [4] Jiang, Y.; Nishikawa, R.M.; Papaioannou, J. 1998. Requirement of microcalcification detection for computerized classification of malignant and benign clustered microcalcifications. In: SPIE International Symposium Medical Imaging - Image Processing. Proceedings, 3338, 313-317. [5] Cheng, H.D.; Cai, X.P.; Chen, X.W.; Hu, L.M.; Lou, X.L. 2003. Computer-aided detection and classification of microcalcifications in mammograms: a survey. Pattern Recognition, 36, 12, 2967-2991. [6] Kallergi, M. 2004. Computer-aided diagnosis of mammographic microcalcification clusters. Medical Physics, 31, 2, 314-326. [7] Brzakovic, D.; Luo, X.M. and Brzakovic, P. 1990. An approach to automated detection of tumors in mammograms. IEEE Transactions on Medical Imaging, 9, 3, p.233-241. [8] Hadjiiski, L. et al. 2004. Improvement in radiologists’ characterization of malignant and benign breast masses on serial mammograms with computer-aided diagnosis: an ROC study. Radiology, 233, 255-265. [9] Ram, G. 1982. Optimization of ionizing radiation usage in medical imaging by means of image enhancement techniques. Medical Physics, 9, 733-737. [10] Ji, T. L.; Sundareshan, M.K.; Roehrig, H. 1994. Adaptive image contrast enhancement based on human visual properties. IEEE Transactions on Medical Imaging, 13, 573-584. [11] Nunes, F.L.S.; Schiabel, H.; Benatti, R.H. 2002. Contrast enhancement in dense breast images using the modulation transfer function. Medical Physics, 29, 12, p.2925-2936. [12] Zheng, B.; Gur, D.; Good, W.F.; Hardesty, L.A. 2004. A method to test the reproducibility and to improve performance of computer-aided detection schemes for digitized mammograms. Medical Physics, 31,11, 2964-2972. [13] Vyborny, C.J.; Giger, M.L.; Nishikawa, R.M. 2000. Computer-aided detection and diagnosis of breast cancer. Radiologic Clinics of North America, 38, 4, 725-740. [14] Freer, T.W. and Ulissey, M.J. 2001. Screening mammography with computer-aided detection: prospective Study of 12,860 patients in a community breast center. Radiology, 220, 781786. [15] Baum, F.; Fischer, U.; Obenauer, S.; Grabbe, E. 2002. Computer-aided detection in direct digital full-field mammography: initial results. European Radiology, 12, 3015-3017. [16] Beucher, S. & Meyer, F. Image analysis and mathematical morphology, v.1. London Academic Press. Chap 12, 433-481.
(e)
Figure 9 - Images set corresponding to the results from each step of the mass segmentation methodology applied to a ROI: (a) original image; (b) image segmented only by using the Watershed transformation; (c) image after equalization; (d) image segmented by combining both techniques: histogram equalization and Watershed transformation; (e) resulting image after labeling for noise reduction.
4. DISCUSSION AND CONCLUSION The results from tests performed with the images set have allowed to verify that: (a) according to the rates shown in Table 1, the scheme sensibility was about 92% when applied the pre-processing technique (histogram equalization) and the specificity has reached approximately 90%; (b) the results above were significantly higher than for the tests without images pre-processing, when the segmentation with only the Watershed transformation was unable to detect properly the structures; (c) combining the pre-processing, the Watershed transformation technique and the labeling for eliminating false pixels was effective in yielding consistent results, by increasing the scheme accuracy, with a simultaneous relevant noise reduction. Taking into account that dense breasts, characteristic mainly from young women or from those submitted to hormonal therapy, are a challenge to the breast cancer detection and also to the CAD schemes, the main contribution of this research is providing the masses enhancement in such type of images allowing their segmentation. As the images used in the tests have the same main characteristics, we could note that the detected masses have presented similar gray levels and different from the other image structures. In addition, for this images set, the detected masses density variation was very small. The results given in Table 1 show how remarkable was the combination of the histogram equalization and Watershed transformation for the sensibility scheme in detecting suspect masses in dense breast images. Although such a procedure has also increased the noise during the segmentation, the labeling technique allowed to keep low rates for the false responses. The false detections are indeed a considerable problem for automatic computer schemes intending to aid a medical diagnostic. However, we consider that the rates registered in our tests are a promising feature regarding the current segmentation scheme proposal since they are below those commonly expected when the
1337