tem (CAD) is an ideal tool in assisting a radiologist, and can be used as a ... mammographic data, comes from the University of South. Florida [4]. Each case ...
Automatic Detection of Regions of Interest in Mammographies Based on a Combined Analysis of Texture and Histogram Tiffany Tweed and Serge Miguet Laboratoire ERIC, Université Lumière Lyon 2, 5 Av. Pierre Mendès-France, 69676 Bron - France E-mail: {ttweed,miguet}@eric.univ-lyon2.fr Abstract In this article, we present an algorithm that selects regions of interest (ROI) containing a tumour, based on the combination of a texture and histogram analysis. The first analysis compares texture features extracted from different regions in an image to the same features extracted from known tumorous regions. The second analysis detects the ROI with two thresholds computed from the histograms of known tumorous masks.
1. Introduction Breast cancer is one of the most common cause of women mortality. Thus, an early detection using mammographic screening is essential, in order to reduce this mortality. In France, a systematic screening for breast cancer is aimed at being generalized for women between 50 and 74 years old, so as to detect the first signs of change that could point out the presence of a malignant tumour. Then, this early detection improves the chance to treat and cure it. The number of mammograms to be analysed by the specialists is huge and in constant increasing, and only a few of them contain a malignant tumour. The small lesions like microcalcifications under 1 mm in diameter are particularly hard to detect. It is then a difficult task for a specialist to correctly detect all the cancers (high sensibility) while not classifying normal mammograms as tumorous (high specificity). A second reading can increase the detection and favour the screening sensibility, but the specialists are then not numerous enough to read the mammograms in good conditions. In order to be specific, a computer-aided diagnosis system (CAD) is an ideal tool in assisting a radiologist, and can be used as a second opinion. The development of such a tool has become an important topic of research in the area of medical image processing. In the breast case, these systems are usually designed to detect either masses or microcalcifications. Their general scheme is first a segmentation or a detection, then a feature extraction, and a classification or
decision making [1]. The image-processing techniques are variable : they are based on the texture analysis [6, 10], on the local statistics calculated on the image gray-tone values [7] or on the lesion outline analysis which includes shape factors [2, 5, 12, 13]. A combination of all those approaches is done in [11], in order to detect regions of interest. Recent models based on multiresolution detect masses [9] and classify them [2]. CAD systems are presented in [8, 16], that detect and classify microcalcifications. We are involved in the conception of a CAD tool that is able to do a second reading or point out a potential lesion to a specialist. We are working on a digital database containing 2620 patients, divided into three groups : benign, malignant and normal. This digital database, of 230 GB of mammographic data, comes from the University of South Florida [4]. Each case includes the mediolateral oblique MLO and cranio-caudal CC views of each breast. This database presents a great interest because a radiologist has marked and specified the tumorous region(s) for the benign and malignant cases (see fig.3(a)). Furthermore, a precized description of the lesion is taken from the ACR BI-RADS lexicon [3] . We first extract a huge number of feature vectors from normal and malignant regions. Then our approach consists on the selection of regions of interest (ROI) using those features, and on the application of knowledge engineering techniques in order to refine the eventual lesion areas. In this paper, we present the first step of this approach. Section 2 describes the feature extraction and section 3 the ROI selection, first by a texture analysis, then by a histogram analysis. In section 4, we combine the two analysis and discuss the different results obtained on a set of 41 mammographies of size around 3000*5000, whose intensities are quantized to eight bits per pixel and displayed in the form of 256 gray levels. Those mammographies are all presenting a malignant tumour corresponding to a mass.
2. Masks and feature extraction Our aim is to distinguish a region containing a tumour from a normal region. In order to detect a suspicious re-
1051-4651/02 $17.00 (c) 2002 IEEE
gion in an unknown mammography, we systematically apply masks on the whole image. Since we want detectors that are invariant in rotation, we use circular masks. The first problem we had to deal with was the choice of a mask radius for all the tests. Our first experiments [14, 15], based on a statistical analysis, allowed us to restrict the study to three different radius that are sufficient to capture all the information that is pertinent for the tumour detection. These three radius are of 70, 150 and 300 pixels. In order to make a compromise between the computation time and the fact of taking into account all the pixels, p the disc centers are disr, with r the disc radius. posed on an uniform grid of step For every mammography, several hundred of texture and histogram features are extracted from all the masks. The analysis of the attribute vectors must eliminate regions recovered by a significative number of masks. Thus, those features are compared to the same features extracted from the tumorous regions outlined by the specialists. Because of the disparity between the huge quantity of normal regions and the low number of tumorous regions, the classical knowledge engineering techniques have difficulties to distinguish and classify those regions. Furthermore, there is a variability in the texture and histogram features extracted from minor regions to be separated, like the interior and the exterior of the breast. This variability hides small fluctuations in the textured regions within the breast, and produces a poor tumour recognition. In order to apply knowledge engineering techniques to a limited number of regions, with a best balance between the classes we have to learn, we propose in section 3 an algorithm that selects ROI. The qualities of such a selection are: - selectivity : a reduced pixel number in the initial image must be retained - pertinence : any region containing a cancer must not be excluded - homogeneity : the feature variability must be lower in the selected region than in the whole image The algorithm first relies on the texture features, then on the histogram features.
2
3. Selection of regions of interest 3.1. Texture analysis Let TRi be one of the tumorous regions and Vj be its standardized texture features 1 . Let Nk be a disc with no shared pixel with TRi , and Ck be a disc with a significative number of shared pixels. The texture features Vj are computed for Nk and Ck in the same way and with the same standardization as for TRi . For each disc radius, the selection of a disc is based on the use of the euclidean distance.
Given a disc D, the euclidean distance between its features and those of one of the TRi is noted
d(D; TRi ) =
sX(V (D) j
Vj (TRi ))2
j
The two following axioms are then applied to the overall set of discs in the learning database : a disc is classified in the ROI if it is close enough to one of the tumorous regions ; it is eliminated from the ROI if it is far enough from all the tumorous regions. Thus, two thresholds 1 and 2 are constructed that respectively keep and reject a disc of an unknown mammography. Based on the computation of a threshold for each disc D
(D) = min (d(D; TRi)) i
1 and 2 are then defined in the learning database. For the set of all the malignant discs : 1 = max((Ck )) k
For the set of all the normal discs :
2 = min((Nk )) k
A given disc D extracted from an image that does not belong to the learning database is rejected from the ROI if D > 2 . It is kept in the ROI if D < 1
( )
( )
3.2. Histogram analysis A second selection based on a histogram analysis is justified by the important number of light pixels within a tumorous region. Thus, we determine two thresholds that characterize a ROI which contains a significative number of light pixels. The following predicate P ; enables to decide whether a disc is part of the ROI or not : a disc from a mammography belongs to the ROI if it contains at least % of pixels with a gray level greater than . The ROI thus constructed must contain all the tumorous discs, and a minimal number of normal discs. The two thresholds and that select relevant ROI are computed for a given mammography with the gray-level cumulative distribution function CDF as follows : be the number of pixels of let Hk with 2 ; the disc Ck with a gray level not greater than . Thus k , that corresponds to the pixel percentage of Ck with a gray level superior to is defined by :
(
[] []
)
[0 255]
Hk [ ] Hk [255]
k [] = 1
For the whole set of tumorous discs , we have :
1 These
texture features cannot be described more precisely because of a non-disclosure agreement
1051-4651/02 $17.00 (c) 2002 IEEE
opt [] = min(k []) k
opt is the pixel minimal percentage that selects in the ROI, using the predicate P ; , the whole set of tumorous discs Ck . A value of greater than opt would not retain some of the tumorous discs Ck in the ROI (unverified pertinence). A value of lower than opt would select a more important normal disc number Nk (poor selectivity).
(
)
[]
[]
[]
The curve C in bold corresponds, for each , to the maximal value of each CDF and is linked to opt with the following relation :
[]
C [] = 1 opt []
[]
The curve C is reported in fig.1(b). Some of the CDF of normal discs are also represented. For each value of , an unknown disc Dk is selected in the ROI if and only if its associated CDF curve is underneath the curve C . Thus, given a value of , it is possible to select normal discs in the ROI. We then search the value of that minimizes the number of selected normal discs.
[]
Cumulative distribution function
1,2
1
0,8
0,6
0,4
0,2
0 0
40
80
120
160
200
240
gray levels
(a)
Cumulative distribution function
1,2
Figure 2. Quality computed from the pairs (opt ; ) for picture 3(a)
[]
1
0,8
0,6
0,4
0 0
40
80
120
160
200
240
gray levels
(b)
[]
Figure 1. (a) gray-level CDF of tumorous discs - (b) gray-level CDF of some normal discs
[]. A quality ( []; ) as
For each gray level we determine opt coefficient is then computed for all the pairs follows :
q[] =
The quality computed from the different pairs (opt ; ) determined for picture 3(a) is represented in fig.2. In this case, for a low gray level, the threshold opt must be close to 100% in order to select in the ROI all the tumorous discs. So we are very specific, but the sensitivity is very low. Indeed, nearly all the normal discs of the image are selected in the ROI. The more the gray level increases, the more the threshold opt decreases. The number of normal discs decreases while increases, until a certain pair (opt ; ). In this exemple and opt : . The optimal quality thus obtained is equal to 57%. As a tumorous region contains light pixels, we just need to have a low number of light pixels in a disc in order to select it in the ROI.
[] []
0,2
[ ] = 0 93%
[]
= 205
opt
4. Results and discussion
# {real tumorous discs} # {selected discs in the ROI}
Thus we are able to determine a value of that optimizes the disc selection while minimizing the normal discs Nk in the ROI. In fig.1(a), we represent the gray-level CDF of all the tumorous discs of radius 150 extracted from the picture 3(a).
We show in the following the results obtained for the discs of radius 150 pixels, which lead to the best ROI selection. As we can see in fig.3(b), many discs are selected in the ROI using the texture analysis with the threshold 1 . Indeed, normal discs can be similar to one of the tumorous regions regarding the texture, and can then be retained in the ROI. On the other hand, we observe in fig.3(c) that a
1051-4651/02 $17.00 (c) 2002 IEEE
(a)
Figure 4. Percentage of selected pixels using the texture and histogram thresholds, according to two tumour classes
(b)
have a radius superior to 150 pixels. No small tumours are present among the 41 mammographies we have studied. With our approach, we are able to compute, for each image, the thresholds opt ; and 1 which generally retain only 5 to 10% of the image pixels for mammographies containing a medium tumour, and a slightly larger percentage for mammographies presenting a large tumour, due to the more important number of pixels in the tumorous region.
(
(c)
[] )
(d)
Figure 3. (a) Mass type lesion marked by a radiologist - Selection with (b) a texture analysis (c) a histogram analysis (d) a joined analysis
histogram analysis is more selective, but retains marks done by the specialist in order to annotate an image or artefacts caused by the acquisition system. Thus a combined analysis of texture and histogram eliminates thoses artefacts, as shown in fig.3(d). The selected discs in the ROI are those whose texture is close to the tumorous region texture and whose gray levels contain a significant number of light pixels. Fig.4 shows the selectivity effectiveness of our approach. We have classified the tumorous regions, approximated by a disc, into three classes : the small tumours (C1) have a radius inferior to 70 pixels, the medium ones (C2) have a radius between 70 and 150 pixels, and the large ones (C3)
Figure
5.
Representation
of
the
pairs
( []; ), ( [1 ]; 1) and ( [2 ]; 2 ) for opt
opt
opt
medium tumours (C2) and large tumours (C3)
1051-4651/02 $17.00 (c) 2002 IEEE
On the other hand, we observe a large variability in and , as shown in fig.5, and 1 , that are computed for each image. In order to study the impact of the choice of the pair opt ; in the quality, we represent in fig.5 in addition to them the two pairs opt 1 ; 1 and opt 2 ; 2 that lead to a 10% quality loss. In face of this variability, we are currently studying the gray level CDF of the different test images, in order to learn automatically the values of opt and which select all the tumorous discs and a minimal number of normal discs in the ROI.
opt []
(
(
[ ] )
[] )
(
[ ] )
[]
5. Conclusion We have presented a simple method that computes texture and histogram thresholds and selects regions of interest in order to detect tumours in mammographies. This method is based on a combined analysis of texture and histogram. The ROI thus constructed have the three following properties: a reduced pixel number in the initial image is retained, no region containing a cancer is excluded, the feature variability is lower in the selected region than in the whole image. As we want to select automatically all the tumorous discs in the ROI, we are very sensitive but not always very specific. Indeed, the quality computed is generally close to 45 % and the ROI thus constructed is twice too big. In some particular cases, the quality computed is between 5 and 10 % and the ROI is then not well delimited around the tumour. Moreover several ROI can be selected in a mammography, one of them containing the tumour. Thus, our following work will consist on segmenting the ROI into objects and extracting shape factors from each segmented ROI. We are also studying on how to determine automatically the two histogram thresholds from the histogram of the whole image. A final analysis of the texture, histogram and shape factors will enable the determination of the factors that are relevant in order to classify the ROI into one of these three groups: normal, benign or malignant.
References [1] U. Bick and K. Doi. Computer Aided Diagnosis Tutorial. CARS 2000 Tutorial on Computer Aided-Diagnosis, Hyatt Regency, San Francisco, USA, june 28 - july 1 2000. [2] L. M. Bruce and R. R. Adhami. Classifying mammographic mass shapes using the wavelet transform modulusmaxima method. IEEE Transactions on Medical Imaging, 18(12):1170–1177, december 1999. [3] B.-R. Committee. Illustrated Breast Imaging Reporting And Data System, american college of radiology edition, 1998. [4] M. Heath, K. W. Bowyer, and D. Kopans. Current status of the digital database for screening mammography. In Digital Mammography, pages 457–460. Kluwer Academic Publishers, 1998. http://marathon.csee.usf.edu/Mammography/Database.html.
[5] J. Kilday, F. Palmieri, and M. D. Fox. Classifying mammographic lesions using computerized image analysis. IEEE Transactions on Medical Imaging, 12(4):664–669, december 1993. [6] J. K. Kim and H. W. Park. Statistical textural features for detection of microcalcifications in digitized mammograms. IEEE Transactions on Medical Imaging, 18(3):231– 238, march 1999. [7] J. K. Kim, J. M. Park, K. S. Song, and H. W. Park. Adaptive mammographic image enhancement using first derivative and local statistics. IEEE Transactions on Medical Imaging, 16(5):495–502, october 1997. [8] S.-K. Lee, C.-S. Lo, C.-M. Wang, P.-C. Chung, C.-I. Chang, C.-W. Yang, and P.-C. Hsu. A computer-aided design mammography screening system for detection and classification of microcalcifications. International Journal of Medical Informatics, 60:29–57, 2000. [9] S. Liu, C. F. Babbs, and E. J. Delp. Multiresolution detection of spiculated lesions in digital mammograms. IEEE Transactions on Image Processing, 10(6):874–884, june 2001. [10] N. R. Mudigonda, R. M. Rangayyan, and J. E. L. Desautels. Gradient and texture analysis for the classification of mammographic masses. IEEE Transactions on Medical Imaging, 19(10):1032–1043, october 2000. [11] W. E. Polakowski, D. A. Cournoyer, S. K. Rogers, M. P. DeSimio, D. W. Ruck, J. W. Hoffmeister, and R. A. Raines. Computer-aided breast cancer detection and diagnosis of masses using difference of gaussians and derivative-based feature saliency. IEEE Transactions on Medical Imaging, 16(6):811–819, december 1997. [12] R. M. Rangayyan, N. M. El-Faramawy, J. E. L. Desautels, and O. A. Alim. Measures of acutance and shape for classification of breast tumors. IEEE Transactions on Medical Imaging, 16(6):799–810, december 1997. [13] L. Shen, R. M. Rangayyan, and J. E. L. Desautels. Application of shape analysis to mammographic calcifications. IEEE Transactions on Medical Imaging, 13(2):263–274, june 1994. [14] T. Tweed and S. Miguet. Analyse conjointe de l’histogramme et de la texture pour la sélection de régions d’intérêt dans les mammographies. In 8èmes rencontres de la société francophone de classification, 17-21 december 2001. in french. [15] T. Tweed and S. Miguet. Sélection automatique de régions d’intérêt pour la détection de zones cancéreuses dans les mammographies. In Journée thématique Coopération analyse d’image et modélisation, pages 2–5, Université Claude Bernard Lyon 1, France, 14 june 2001. Laboratoire LIGIM. in french. [16] S. Yu and L. Guan. A cad system for the automatic detection of clustered microcalcifications in digitized mammogram films. IEEE Transactions on Medical Imaging, 19(2):115– 126, february 2000.
1051-4651/02 $17.00 (c) 2002 IEEE