Automatic Classification of Mammographic Parenchymal Patterns: A ...

Automatic Classification of Mammographic Parenchymal Patterns: A Statistical Approach Styliani Petroudi, Timor Kadir and Michael Brady Medical Vision Laboratory, Engineering Science, Oxford University Ewert House, Ewert Place, Summertown, Oxford, OX2 7DD, United Kingdom Email: styliani, timork, [email protected]

Abstract— Breast parenchymal density has been found to be a strong indicator for breast cancer risk [1], however, to date, measures of breast density are qualitative and require the judgement of the radiologist. Objective, quantitative measures of breast density are crucial tools for assessing the association between the risk of breast cancer and mammographic density as well as for quantification of density changes to the breast. Various schemes have been proposed for classifying breast density patterns, though again each requires the judgement of the clinician to assign a particular region of tissue to its class and so it is time-consuming and prone to inter- and intra-radiologist disagreement. Motivated by recent results in texture classification [2], we present a new approach to breast parenchymal pattern classification. The proposed scheme uses texture models to capture the mammographic appearance within the breast area: parenchymal density patterns are modelled as a statistical distribution of clustered, rotationally invariant filter responses in a low dimensional space. This robust representation can accommodate large variations in intra-class mammogram appearance and can be trained in a straight-forward manner. Key to the approach is that parenchymal patterns can occupy disconnected regions in feature space. Objective descriptors of breast density based on the digital mammogram, are developed and validated. Keywords— BI-RADS, breast parenchymal density patterns, classification, image segmentation, mammograms, texture, Wolfe.

a.

b.

c.

d.

I. I NTRODUCTION Breast parenchymal density refers to the prevalence of fibroglandular tissue in the breast as it appears on a mammogram. Many studies have stressed the importance of breast density and it has been shown that breast density is an important factor in the development and risk of breast cancer, [1]. The findings are intuitively appealing, since breast cancer mostly arises from the epithelial lining of the ductal/lobular glands [3]. Moreover, denser tissue can lower mammographic sensitivity and obscure a lesion on mammography. Wolfe was the first to study the patterns of breast tissues observed in mammography and their association with breast cancer [4]. He proposed a classification scheme for mammograms (xeromammograms) based on the appearance of the parenchyma. The four classification categories, N1, P1, P2 and DY, are used as an indication of risk for developing breast cancer. In N1, the breast consists of mostly fatty tissue with no ducts visible. This category represents and essentially normal breast and is considered low risk. The P1 category represents a fatty breast, with predominant ducts in the anterior portion occupying up to a quarter of the breast area. It

Fig. 1. Examples of mammograms from the 4 BI-RADS categories: a) BIRADS I, b) BI-RADS II, c)BI-RADS III, d) BI-RADS IV.

is also considered low risk. In P2, the breast is involuted, with prominent duct patterns of moderate to severe degree, occupying more than a quarter of the breast. The visible duct pattern may occupy the entire breast. P2 is considered a highrisk category. In DY, the breast parenchyma is dense, which usually denotes connective tissue hyperplasia. It is considered to be the highest risk pattern and may appear homogeneous due to the overall increased density. The prominent duct pattern often cannot be seen. More recently, other classification methods have become popular, partly due to changes in the mammographic practice. For example, the American College of Radiology (ACR) proposes a modified version of the Wolfe patterns for the BI-RADS [5] classification scheme: I.

The breast is almost entirely fat.

II. III.

There are scattered fibroglandular densities. The breast is heterogeneously dense. This may lower the sensitivity of mammography. IV. The breast tissue is extremely dense, which could obscure a lesion in mammography. Examples of these breast types are shown in Figure 1. Other pattern classifications include the six-category classification (SCC). Mammogram parenchymal density pattern classification can assist clinical studies in defining the role of mammographic appearance as a breast cancer risk. It can be used to characterize an increase in breast density due to Hormone Replacement Therapy (HRT) that can cause tissue regeneration and even change in breast pattern. In addition, it may be used to signal the necessity for a detailed interpretation of certain mammograms or as evidence to reduce screening intervals. However, parenchymal pattern classification is currently based on subjective appraisal of the mammogram, leading in inter and intra observer variability [1]. For all of these reasons, our aim is to develop a fully automatic, accurate and repeatable mammogram density class classifier based on objective and quantitative measures. Numerous techniques have been proposed for breast density pattern classification. Boyd et al. [6] proposed a semiautomatic computer measure based on interactive thresholding and the percentage of the segmented dense tissue over the segmented breast area. Marias et al.’s [7] measure is based on the volume of non-adipose tissue in the breast after the mammogram has been normalized using the Standard Mammogram Form [8]. A small number of previous papers have suggested texture representations of the breast. Byng et al. [1] measures were based on skewness and fractal dimension. Miller and Astley [9] investigated texture-based discrimination between fatty and dense breast types applying granulometric techniques and Laws texture masks. Many classification methods make an implicit assumption about the underlying feature space. They force each parenchymal density class to be associated with a single, unique closed hypervolume of feature space. The schematic in Figure 2 illustrates this restriction. Some methods adopt a less strict policy. Bovis and Singh [10] estimate features from constructed Spatial Gray Level Dependency matrices and train multiple Artificial Neural Networks (ANN) to achieve two-categories (BI-RADS I and II vs BI-RADS III and IV) and four categories parenchymal pattern classification. Karssemeijer [11] developed an automated method for the determination of parenchymal patterns. Features are computed from the grey-level values and the distance from the skin-line and are used in a k-nearestneighbor classifier. However, these methods still lack an explicit representation for the feature space. Motivated by recent results in texture classification, [2] we present a method for automatically classifying parenchymal density patterns which incorporates an explicit representation of feature space that enables parenchymal pattern classes to populate disconnected regions in feature space. Our repre-

+

+

+

o o

o oo ++ + ooo

+ +

+

o

+ + o

+ + +

o

+

o

o

o

+ + o

o

+

o

+ +

o

oo

oo

+

+

o

o oo ++ + ooo

+ +

oo

+

o

o

oo

o

o

o

Fig. 2. Many approaches constrain classes to occupy closed hypervolumes in feature space - here illustrated in 2D for two classes (+ and o) by the two areas defined by the decision boundary line. By lifting this constraint single classes can populate disconnected regions in feature space.

sentation may be trained straightforwardly, certainly in comparison to an ANN. We illustrate our method using texture descriptors of the breast area and demonstrate significant improvements over previous methods. II. M ETHODOLOGY Our approach is based on recent trends in general texture classification, primarily [2], [12]. Such techniques define texture classes as statistical distributions (histograms) over “texton” dictionaries developed from a training set. Textons are defined as clustered filter responses [12]. Finally, classification is simply a matter of comparing histograms using an appropriate distance measure. To apply this approach, we must first derive the texton dictionary from the training set. At this stage, all mammograms in the training set are processed regardless of their density class. The training set consists of a representative set of mammograms with their associated BI-RADS density classification. This stage consists of three steps: breast area segmentation, filtering and clustering. Each mammogram is segmented into three distinct components: background, breast tissue and pectoral muscle. Unlike the majority of previously published breast parenchymal pattern classification techniques, we base our texture measures solely on the breast region, after removal of the pectoral muscle. Image features based on gray-level values, or on the histogram of the mammogram, are affected by the low optical density and the size of the pectoral muscle in a way that is hard to predict and which varies from mammogram to mammogram [11]. For this reason, we have developed an automatic breast region segmentation algorithm that accurately identifies the breast edge, as well as removing the pectoral muscle in MedioLateral Oblique (MLO) mammograms [13]. The method is based on intrinsic image and breast anatomy properties and results in a smooth outline of the breast region , Figure 3. Following segmentation, the breast area is filtered. To date, we have used the Maximum Response 8 (MR8) filter bank proposed by Varma and Zisserman [2]. This filter set is rotationally invariant and consists of 38 filters but only 8 filter responses. Rotation invariance enables correct classification of rotated versions of textures present in the training set.

a.

b.

TABLE I C LASSIFICATION ACCURACY

Accuracy%

BI-RADS I

BI-RADS II

BI-RADS III

BI-RADS IV

4 Density Classes

91%

64%

70%

78%

Low and High Density Classes

Fig. 3. The low optical density and the size of the pectoral muscle affect feature extraction in an unpredictable way.

The MR8 filter set consists of an edge and a bar filter at 6 orientations and at 3 spatial scales, as well as a Gaussian and a Laplacian of a Gaussian filter. However only the responses of the isotropic filters and the maximum responses of the anisotropic filters at each scale are recorded. Using the MR8 filter reduces the dimensionality of the feature space. The mammograms are pre and post processed as in [2]. We arrive at the texton dictionary by clustering the filter responses using the following procedure: all filtered responses are aggregated over all images per BI-RADS class and the K-Means algorithm [14] is used to compute 10 cluster centers per class. The cluster centers represent the textons. All cluster centers from all classes are used in the creation of the dictionary. With 4 BI-RADS categories and 10 textons per class the dictionary holds 40 textons. In such a scheme, the textons are realized under the operational definition by Leung and Malik [12] as cluster centers of filter responses over a stack of training images. Given the texton dictionary, each image pixel in the breast region of each mammogram in the training set is assigned a label by that texton which lies closest to it in the filter response space. A texton histogram is computed corresponding to each training mammogram. The set of texton histograms defines the breast parenchymal density models. As each histogram represents a model for the density with which it is associated, there are as many models as training images. For computational efficiency, these may be clustered as in [2]. In our case, this is not necessary. It is this explicit representation which enables parenchymal density patterns to populate different regions in the feature space, since each class is represented by several models. For classification, a test mammogram is segmented, filtered with the MR8 filter bank and mapped to a texton histogram, as explained above. The resulting histogram is compared to the distributions of all learnt models. The mammogram is assigned to the BI-RADS category corresponding to the nearest neighbor model using the χ2 distribution comparison classifiers based on the texton frequency representation by comparing its texton distribution to the learned models. Differ-

RESULTS .

91%

94%

ent comparison methods may be used but the χ2 significance test in conjunction with a nearest neighbor rule is often preferred. III. R ESULTS A skilled breast radiologist, and two image analysts trained in understanding breast patterns, classified 132 mammograms, all of which were selected from the Oxford Database, using the BI-RADS criteria by consensus opinion, in order to minimize possible differences in interpretation. In our initial experiments, the small training set comprised 44 of these mammograms (11 mammograms per class). The cluster centers were calculated for 8-bit Cranio-Caudal and Medio-Lateral Oblique mammograms that were downsampled to 300µm/pixel. Hence, training results in 11 texton distribution models for each BI-RADS class. To evaluate the classification models, the remaining (132-44) images were used as the testing set. Despite the small size of the training set, exact agreement with the ground truth was achieved in 76% of the cases, an improvement of 11% over results reported in [11]. Table I shows the classification accuracy of the presented technique discriminating between the 4 BI-RADS categories based on the ground truth. We calculate the accuracy as the percentage of correctly classified mammograms in a breast parenchymal density category over the ground thruth total number of mammograms in that category. The results show a 91% accuracy for BI-RADS I and a 78% accuracy for BIRADS IV, a substantial improvement over previous methods [15], [16], [10]. Moreover, when the two lower density class models are combined into a single class, as are the two higher density class models, a classification accuracy of 91% and 94% respectively is achieved. However, these results might be different if the texton dictionary and the resulting models are developed based on a 2 class scheme. IV. D ISCUSSION The proposed method for breast parenchyma density classification is simple and robust. The results show that the method is in good agreement with radiologist ratings, though a more thorough comparison is currently being carried out. This suggests that the technique may be useful as an objective and quantitative alternative in evaluating the breast cancer risk. It is automatic, easily reproducible, and removes observer variability. Prior pectoral muscle segmentation removes one

known source of unpredictability in algorithm response. The technique provides a means to easily achieve a representation of density patterns over disconnected regions in a multidimensional feature space. The key to the approach is the representation of density patterns which facilitates single classes to occupy disconnected regions in a multidimensional space. Experimenting with different size training sets indicates that the classification will improve substantially if a larger training set is used. The one used here is tiny. The filter responses used for representing each mammogram have certain advantages due to the rotational invariance of the filter bank. However, a smaller filter bank may be used e.g. MR4 [2] and the filter responses may be combined with mammographicspecific features such as skewness, and breast volume. In our experiments, the filter banks were applied to original mammograms i.e. without any preprocessing. Nonetheless, it is difficult to eliminate variability in image characteristics, such as contrast and brightness, because of the relatively weak control over the image acquisition process. These variations lead to a non-rigid variation in the mammographic intensity distribution. An aggregation of filter responses after convolution of the filter banks with the original mammogram and a physics normalized version, such as the Standard Mammogram Form [8] promises improved performance and consistency. V. C ONCLUSION Breast parenchymal density is associated with breast cancer risk. Automatic classification of breast density can find useful applications in a number of aspects of the screening process and the development of better Computer Aided Diagnosis systems. It may be used in display optimization, breast density segmentation, risk assessment, evaluating screening intervals, etc. In this paper we have presented results on mammographic breast density classification by objective and quantitative measures. The studied method is based on the statistical distribution of rotationally invariant filter responses in a low dimensional space [2]. With only a few models per BI-RADS class the presented algorithm results in good classification rates. In the future, we plan to investigate whether image pre-processing such as curvilinear structure removal etc. may result in improvements for classification. We plan to evaluate the classifications by applying different filter banks and by combining different classifiers. ACKNOWLEDGMENT The authors would like to thank Manik Varma for the helpful discussions and Dr Rosie Adams for the classification of the mammograms. Moreover the authors would like to thank Cancer Research UK for supporting this project. R EFERENCES [1] J.W. Byng, N.F. Boyd, E. Fishell, R.A. Jong, and M.J. Yaffe. Automated analysis of mammographic densities. Physics in Medicine and Biology, 41:909–923, 1996. [2] M. Varma and A. Zisserman. Classifying images of materials: Achieving viewpoint and illumination independence. In Proceedings of the European Conference on Computer Vision, Copenhagen, Denmark, pages 255–271, 2002.

[3] S.R. Wellings, H.M. Jensen, and R.G. Marcum. An atlas of subgross pathology of the human breast with special reference to possible precancerous lesions. Journal of National Cancer Institute, 55:231, 1975. [4] J.N. Wolfe. Risk for breast cancer development determined by mammographic parenchymal pattern. Cancer, 37:2486–2492, 1976. [5] American College of Radiology. Illustrated Breast Imaging Reporting and Data System (BI-RADS). American College of Radiology, third edition, 1998. [6] N.F. Boyd, J.W. Byng, R.A. Jong, E.K. Fishell, L.E. Little, A.B. Miller, G.A. Lockwood, D.L. Tritchler, and M.J. Yaffe. Quantitative classification of mammographic densities and breast cancer risk: Results from the canadian national breast screening study. Journal of the National Cancer Institute, 87(9):670–675, 1995. [7] K. Marias. Registrastion and Quantitative Comparison of Temporal Mammograms (with Application to HRT Data. PhD thesis, University College London, 2002. [8] R. Highnam and M. Brady. Mammographic Image Analysis. Kluwer Academic Publishers, 1999. [9] P. Miller and S. Astley. Classification of breast tissue by texture and analysis. Image and Vision Computing, 10:277–282, 1992. [10] K. Bovis and S. Singh. Classification of mammographic breast density using a combind classifier paradigm. In 4th International Workshop on Digital Mammography, pages 177–180, 2002. [11] N. Karssemeijer. Automated classification of parenchymal patterns in mammograms. Physics in Medicine and Biology, 43:365–378, 1998. [12] T. Leung and J. Malik. Representing and recognizing the visual appearance of materials seing three-dimensional textons. International Journal of Computer Vision, 43(1):29–44, June 2001. [13] S. Petroudi and M. Brady. An automated algorithm for breast background segmentation. In Medical Imaging Understanding and Analysis, 2003. In Press. [14] R. O. Dudaand P.E. Hart. Pattern Classificaton and Scene Analysis. Wiley, 1973. [15] K. Marias, S. Petroudi, J.M. Brady, R. Adams, and R.E. English. Subjective and computer-based characterisation of mammographic patterns. In International Workshop on Digital Mammography, 2002. [16] S. Petroudi, K. Marias, R. English, R. Adams, and M. Brady. Classification of mammogram patterns using area measurements and the standard mammogram form (smf). In Medical Imaging Understanding and Analysis, pages 197–200, 2002.