Mass Detection with Digitized Screening

15 downloads 0 Views 787KB Size Report
Inspired by face detection and deep learning CNN, we propose a CAD .... rather primitive, but they allow for high computational efficiency in ... image is tested with the first cascade, and the areas that do not conform to ... gradient magnitude and orientation at each image pixel in a region around an anchor point (keypoint).
Breast Cancer Screening Using Convolutional Neural Network and Follow-up Digital Mammography Yufeng Zhenga, Clifford Yangb, Alex Merkulovb a

b

Alcorn State University, Lorman, MS, USA University of Connecticut Health Center, Farmington, CT, USA Email: [email protected] ABSTRACT

We propose a computer-aided detection (CAD) method for breast cancer screening using convolutional neural network (CNN) and follow-up scans. First, mammographic images are examined by three cascading object detectors to detect suspicious cancerous regions. Then all regional images are fed to a trained CNN (based on the pre-trained VGG-19 model) to filter out false positives. Three cascading detectors are trained with Haar features, local binary pattern (LBP) and histograms of oriented gradient (HOG) separately via an AdaBoost approach. The bounding boxes (BBs) from three featured detectors are merged to generate a region proposal. Each regional image, consisting of three channels, current scan (red channel), registered prior scan (green channel) and their difference (blue channel), is scaled to 224×224×3 for CNN classification. We tested the proposed method using our digital mammographic database including 69 cancerous subjects of mass, architecture distortion, and 27 healthy subjects, each of which includes two scans, current (cancerous or healthy), prior scan (healthy 1 year before). On average 165 BBs are created by three cascading classifiers on each mammogram, but only 3 BBs remained per image after the CNN classification. The overall performance is described as follows: sensitivity = 0.928, specificity = 0.991, FNR = 0.072, and FPI (false positives per image) = 0.004. Considering the early-stage cancerous status (1-year ago was normal), the performance of the proposed CAD method is very promising. Keywords: Computer-aided detection (CAD), Breast cancer screening, Convolutional neural network (CNN), Transfer learning, Follow-up digital mammography, VGG-19.

1. INTRODUCTION There are three types of breast lesions according to the ACR Bi-RADS ® lexicon: mass, calcification and architecture distortion (AD). Calcifications are relatively easy to be detected. However, mass and AD are more challenging especially at their early stages. Computer-aided detection (CAD) tools are considered a radiologist’s “second pair of eyes”, which mark suspicious regions but leaves the final decision to the radiologists. CAD tools and digital mammograms not only save time in diagnosing cancers and improving detection rates, but also bring hope for early diagnoses and treatment of breast cancer. A typical CAD solution is comprised of two steps: cancer detection to locate a lesion and cancer classification to confirm it. The 1st step is very challenging because the size, location, and features of a lesion are so different among various cases. In case a detection fails (is missed), there is no way to find the lesion on the 2nd step. A mass is defined as a space-occupying lesion seen in at least two different projections.1 Masses are described by their shape (Round, Oval, Lobulated, Irregular) and margin characteristics (Circumscribed, Microlobulated, Obscured, IllDefined, Spiculated). On mammograms, masses appear denser than healthy tissues. However, the patterns of mass lesions are difficult to be directly defined by intensities or gradients because of large variations among individuals. For example, masses are quite difficult to be recognized in dense breasts. Therefore, many advanced features are used to identify mass lesions in screening mammograms in the literature. In general, neighborhood or regional textural features are generated by jointly considering differences of orientations and correlation of scales. Kegelmeyer et al.2 developed a method to detect spiculated masses using a set of 5 features for each pixel. They used the standard deviation of a local

edge orientation histogram (i.e., analysis of local oriented edges, ALOE) and a subset of Law’s texture features (of four dimensions). To address the variant mass size problem, Liu et al.3 proposed a multi-resolution algorithm by using the discrete wavelet transform based on Kegelmeyer et al.’s work. Matsubara et al. 4 presented an adaptive thresholding technique for the detection of masses. Qian et al.5 developed a multi-resolution and multi-orientation wavelet transform for the detection of masses and spiculation analysis. They observed that traditional wavelet transforms cannot extract directional information which is crucial for spiculation detection. Zheng6,7 proposed a “Gabor Cancer Detection” (GCD) algorithm that consists of three steps: preprocessing, segmentation (generating alarm segments), and classification (reducing false alarms). “Circular Gaussian Filter” (CGF) was introduced for segmentation and Gabor features were used for classification. The experimental results tested on the DDSM database (University of South Florida) showed the promise of GCD algorithm in mass detection: TPR (true positive rate) = 90% at FPI (false positives per image) = 1.21. Similar to other texture-based methods, the GCD algorithm is quite complicated and requires heavy computation time. Mammographers compare current mammograms with prior images to make decisions utilizing temporal changes. Few researchers reported breast cancer detections using temporal analysis plus Gabor features. Zheng et al.8 used current and prior mammograms to detect breast cancer masses without Gabor filtering. Tan9 utilized current and prior mammograms and Gabor filters and achieved an AUC (area under ROC curve) of 0.725 ± 0.026. Rangayyan et al.10 used single time point data to detect architectural distortions with Gabor filtering achieving an AUC of 0.61. Deep learning and CNN provides a new path to mammographic screening. Soriano et al.11 applied random and grid search algorithms for mammogram classification based on CNN. 85.00% accuracy was reported in classifying benign and malignant mammograms tested on the Digital Database for Screening Mammography (DDSM). Jadoon et al.12 extracted dense scale invariant features (DSIFT) from discrete wavelet (DW) and curvelet transform (CT) of mammograms, which are fed to CNN for classification. CNN-DW and CNN-CT have Fig. 1: Diagram of the proposed CAD method for breast cancer achieved accuracy rate of 81.83% and 83.74%, detection: Dif means the difference image between current scan respectively, when testing on the DDSM and the and prior scan. CNN-classified marks are multiple bounding Mammographic Images Analysis Society (MIAS) boxes highlighting cancerous areas, wherein no mark indicates database. Jiang et al.13 applied pre-trained GoogLeNet and normal status. AlexNet for classification of breast mass lesions, and achieved AUC = 0.88 and AUC = 0.83 when evaluating a new mammographic dataset. Early cancer detection with mammograms is very challenging due to the large variance of cancer (mass) patterns. There are many factors that impact the cancer appearance on mammograms such as type/stage of cancer, size/density of breast,

individual differences, etc. The location and size of cancer lesions vary from case to case, which makes cancer detection very difficult. Inspired by face detection and deep learning CNN, we propose a CAD method that uses multiple object detectors, follow-up scans, and deep learning CNN for early-stage cancer detection. AD is a special case of masses, which is covered by the term of mass detection. It is required that the training samples include both mass and AD cases. The proposed research is an innovative solution for breast cancer detection that incorporates object detection, temporal analysis and CNN into one CAD model. The objective of this research is to find a CAD solution that automatically detects and locates cancers accurately and quickly. The remainder of this paper is organized as follows: Section 2 provides an overview of the proposed CAD method. Section 3 presents cascading object detection methods to create a regional proposal. Section 4 describes the CNN model for breast cancer classification. Section 5 presents experiments, results and discussion. Section 6 concludes the paper.

2. OVERVIEW OF THE PROPOSED CAD METHOD A mass may not be visually perceived when it is small or homogeneous with surrounding tissues in its initial phase. The current CAD methods are not sufficiently accurate in detecting early-stage masses. Possible reasons for limited performance of existing CAD methods are lack of multiscale analysis and temporal analysis. Notice that mammographers compare current mammograms with prior images to make decisions utilizing temporal changes. A CAD model integrating both spatial and temporal features is anticipated to detect early-stage masses. Upon detection (a rectangle showing the suspicious cancer area), texture features are usually extracted from the small rectangular area, and then a classifier is trained to categorize a lesion into malicious or healthy (non-cancer) – a cancer classification stage. CNN model is a good option for feature extraction and classification. The key steps of the proposed CAD method (see Fig. 1) is described as follows: (1) Preprocess all mammographic images; (2) Create a region proposal using three cascading object detectors; (3) Refine the region proposal by keeping the regions voted by two or more detectors; (4) Create a 3-channel image with current scan (red), registered prior scan (green), and their difference image (blue); (5) Remove false positive marks (BBs) by deploying an adapted VGG-19 network to 3-channel regional images; (6) Annotate the mammogram as cancerous if any positive marks present, or healthy if all marks removed by CNN.

(a) (b) Fig. 2 Original digital mammograms of Case# 37 (current exam with mass present) from UCHC database: (a) Right CC view (3328×4096 pixels); (b) Right MLO view (3328×4096 pixels). Note the large areas of dark background at the left side of the image.

2.1 Mammographic image preprocessing Digital mammograms are originally stored in DICOM format. Each mammographic image has a large size (e.g., 3328×4096 pixels) and 12 bits per pixel (see Fig. 2). The mammographic images are preprocessed by employing normalization such as scaling image intensity to the range of [0, 1]. For fast processing, each original image is down-

sampled to ¼ of original size (i.e., reduced to half size in both row and column direction) and quantized to 8 bits per pixel. The dark background areas are cropped off, which leaves the breast area (region of interest) for further processing (refer to Fig. 3). To create a 3-channel image (refer to Section 5.3 and Fig. 9c, looks like a false-colored image) using current scan, prior scan and their difference image, the two mammograms (prior vs. current) must be aligned. Image registration technique such as normalized mutual information (NMI) with affine transforms35,36 is applied for image alignment.

(a) (b) (c) (d) Fig. 3 Preprocessed digital mammograms of Case# 37 from UCHC database: (a-b) Right CC (1051×1521 pixels) and MLO (1069×1746 pixels) views of the current exam with mass present and marked by yellow rectangle; (c-d) Right CC and MLO 1-year prior exam (not aligned yet) was normal.

3. CREATION OF REGION PROPOSAL 3.1 Creation of region proposal Object detection is to find the locations of desired objects in a scene, for instance, face detection, vehicle detection, and pedestrian detection. There are many methods developed for such applications. We review three commonly-used features for breast cancer detection, Haar features, local binary pattern, and histograms of oriented gradient. These features will be used to create a region proposal that includes possible cancerous areas.

(E)

(F)

(G)

(H)

14,15

Fig. 4: This figure shows four rectangles (A-D) initially used by Viola and Jones to represent images for the learning algorithm. Both the light and shaded regions are added up separately. The sum of the light regions is then subtracted from the sum of the shaded regions. (E-H) are the extended Haar-like features.

3.1.1 Haar features and Viola-Jones algorithm Face detection methods are well developed and quickly mark multiple faces on a picture regardless of their sizes and backgrounds, utilizing spatial changes. The Viola-Jones (VJ) algorithm has become a very common method of object detection, including face detection. Viola and Jones proposed this algorithm as a machine learning approach for object detection with an emphasis on obtaining results rapidly and with high detection rates. The VJ method uses three important aspects. The first is an image representation structure called integral images,14,15 wherein features are calculated by taking the sum of pixels within multiple rectangular areas. This is of course an extension of a method by

Papageorgiou et al.16 The rectangles shown in Fig. 4 (A-D) are the four rectangles that were initially used by Viola and Jones. Leinhart and Maydt later extended the VJ algorithm to allow for rectangle rotations of up to 45 degrees (Fig. 4G-H).17 Using these methods, the sum of both the white and the shaded regions of each rectangle are calculated independently. The sum of the shaded region is then subtracted from the white region. Viola and Jones admit that the use of rectangles is rather primitive, but they allow for high computational efficiency in calculations.14,15 This also lends well to the Adaboost algorithm that is used to learn features in the image. This extension of the Adaboost Fig. 5 Cascading object detector or classifier. algorithm allows for the system to select a set of features and train the classifiers, a method first discussed by Freund and Schapire.12 This learning algorithm allows the system to learn the differences between the integral representations of faces to those of the background. The last contribution of the VJ method is the use of cascading classifiers (refer to Fig. 5). At each stage, the classifier either rejects the instance (represented as a sliding-window from a given test image) based on a given feature value or sends the instance down the tree for more processing. Initially, a large number of negative examples are eliminated. In order to significantly speed the system up, Viola and Jones avoid areas that are highly unlikely to contain the object. The image is tested with the first cascade, and the areas that do not conform to the first cascade are rejected and no longer processed. The areas that may contain the object are further processed until all of the classifiers are tested. The areas in the last classifier are likely to contain the object (e.g., the face). 3.1.2 Local binary pattern The local binary pattern (LBP) feature descriptor, as one of the most popular feature extraction methods, is based on various local coding operators originally applied in texture description. Due to their relative robustness under local lighting variation, the LBP features are extensively applied in face recognition and modified to adapt to real applications. Currently, the studies of LBP feature focus on improving the LBP feature on exploring extra details in face images Fig. 6 Illustration of the LBP encoding process. and reducing the dimension. Some works realize the 18 improvement of LBP, such as Xie et al. who use the Gabor wavelets transformation on the original texture space built Local XOR Patterns of Gabor Phase (LGXP). The LBP texture analysis operator was first introduced as a complementary measure for local image contrast.19 The first operator worked with the eight-neighbors of a pixel, using the value of the center pixel as a threshold. An LBP code for a neighborhood was produced by multiplying the threshold values with weights given to the corresponding pixels, and summing up the result. The LBP encoding process is illustrated in Fig. 6. 7

LBP ( xc , yc ) = ∑ S [ I ( x p , y p ) − I ( x p , y p )] × 2 p ,

(1)

1 if x ≥ 0 . S ( x) =   0 if x < 0

(2)

p =0

where

3.1.3 Histograms of oriented gradient (HOG) The Histograms of Oriented Gradient (HOG) are the adaptation of Lowe’s Scale Invariant Feature Transformation (SIFT)20 approach with local spatial histogramming and normalization. A HOG feature is created by first computing the gradient magnitude and orientation at each image pixel in a region around an anchor point (keypoint). The region is split into N×N subregions. Orientations are quantized by the number of bins in the histogram (typically four orientations). For each histogram bin (orientation), we compute the sum of all magnitudes within the subregion having that particular

orientation. The histogram values are then normalized by the total energy of all orientations to obtain values between 0 and 1. Concatenating the histograms from all the subregions gives the final HOG feature vector. Sobel filters may be used to compute the gradient. As illustrated in Fig. 7, the extraction of HOG features is summarized as follows. • • • •

Compute (Sobel) gradient magnitude and orientation; Quantized to 4 orientations (#bins) for histogram; Fig. 7 Illustration of the extraction of HOG features. Sum all magnitudes within a subregion having particular orientation; and Concatenate the histograms from all the subregions to give the final HOG feature vector.

HOG features are widely used for vehicle detection together with a neural network (NN) classifier,21 an AdaBoost classifier,22 and a support vector machine (SVM) classifier (HOG–SVM).23 3.1.4 Advantages of cascading detectors with simple features (1) Simple features for fast detection: It is viable to detect the massive area using the Viola-Jones (VJ) algorithm. The standard Haar-like features used in face detection are demonstrated in Fig. 4 (A-D). We apply both the standard features and the extended Haar-like features (shown in Fig. 4E-H) for mass detection. The integral image is calculated with the preprocessed mammogram. Once the integral image is ready, the Haar-like features only involve addition and subtraction. Therefore, this feature extraction is very efficient and can be completed in a short time. (2) Multiscale analysis to detect masses of various sizes: Once a detector (as shown in Fig. 5) is trained, detection is done by sliding a window across an input image and passing the cropped sub-image through the classifier (i.e., detector). In order for classification to be size-invariant, the same procedure is also performed on the input integral image at various scales. Given this scheme, the output of classification is a series of sub-windows of the input image which contain the detected cancers. Cancer detection can be implemented with the VJ method by varying the size of the sliding window (rectangle). A real cancer may result in multiple nearby detections. It is necessary to combine overlapping detections into a single detection (rectangle). (3) Feature selection and classifier training using AdaBoost approach: At each stage (see Fig. 5), an AdaBoost-like approach is applied to selecting one or more Haar-like simple features, as well as determining appropriate thresholds which can be applied to reject a large number of negative training instances. Important input parameters for the training procedure are the minimum true positive rate (TPR) and maximum false positive rate (FPR) – the search for optimal feature and threshold selection will continue until those two requirements are met, at which point the remaining training examples will be passed on to the next stage. The feature optimization is obtained by reweighting the features so that the inputs where we made errors get higher weight in the learning process. For example, if those two parameters are set to 0.995 and 0.5 respectively, at each stage, feature selection and threshold optimization will be applied until the resulting stage is capable of classifying 99.5% of the positive instances as positive and does not classify more than 50% of the negative images as positive. (4) Reduction of false positives with cascading classifiers: As shown in Fig. 5, each stage of the classifier can make use of more than one feature in order to meet the requirements, in which case each stage can be viewed as a decision tree. It is also important to note that at each stage, the classifier uses a different set of negative training images which are sampled from a given database of images that do not contain the specified object (i.e., non-cancerous images). After training the desired number of stages, the result is a cascade of tree-like classifiers (i.e., detectors). The structure of the resulting classifier is essentially that of a degenerate decision tree. Each added stage to the classifier tends to reduce the false positive rate, but also reduces the detection rate (true positive rate). As such, it is essential to train the classifier with the appropriate number of stages for the given task. The training process is time-consuming but the detection (testing) process (i.e., thresholding) is very fast. 3.2 Refining the region proposal with majority vote For each mammogram, a region proposal is the union of (logically ORing) detected BBs from three detectors. As shown

in Fig. 9a, there are many detected regions (BBs). To reduce false positives, majority vote is applied to the regional proposal. If a region is detected as positive by two or more featured detectors, then this region is kept, otherwise removed. The refined regional proposal (see Fig. 9b) will be fed to CNN classification for false positive removal.

4. REGION CLASSIFICATION USING ADAPTED VGG-19 CNN 4.1 Convolutional neural network Convolutional neural networks (CNNs) are sort of combination of biology, math and computer science, but these networks have been the most influential innovations in the field of computer vision and artificial intelligence (AI). 2012 was the first year that CNN grew to prominence as Alex Krizhevsky24 used an 8-layer CNN (5 conv., 3 fully-connected) to win that year’s ImageNet competition (referred to as AlexNet thereafter), dropping the classification error record from 25.8% (in 2011) to 16.4% (in 2012), an astounding improvement at the time. Since then many companies have been using deep learning at the core of their services. For examples, Facebook uses neural nets for their automatic tagging algorithms, Google for their photo search, Amazon for their product recommendations, Pinterest for their home feed personalization, and Instagram for their search infrastructure. CNNs take a biological inspiration from the visual cortex. The visual cortex has small regions of cells that are sensitive to specific regions of the visual field. This idea was expanded upon by a fascinating experiment by Lettvin et al.25 in 1959 where they showed that some individual neuronal cells in the brain responded (or fired) only in the presence of edges of a certain orientation. For example, some neurons fired when exposed to vertical edges and some when shown horizontal or diagonal edges. Lettvin et al. found out that all of these neurons were organized in a columnar architecture and that together, they were able to produce visual perception.26 This idea of specialized components inside of a system having specific tasks (the neuronal cells in the visual cortex looking for specific characteristics) is one that machines use as well, and is the basis behind CNNs. A common misconception in the deep learning community is that without huge amount of data, it is not possible to create effective deep learning models. While data is a critical part of creating the network, the idea of transfer learning has helped to lessen the data demands. Transfer learning is the process of taking a pre-trained model (the weights and parameters of a network that has been trained on a large dataset) and fine-tuning the model with your small dataset. The idea is that the pre-trained model will act as a feature extractor. In general, the last layer of the network is replaced with your classification layer (depending on how many classes in your problem). Then train (adapt) the network normally. Let us say the pre-trained model was trained on ImageNet (ImageNet is a dataset that contains 14 million images with over 1,000 classes).27 The lower layers of the network will detect features like edges and curves. Most likely your network is going to need to detect curves and edges as well. Rather than training the whole network through a random initialization of weights, it is more efficient and effective to use the weights of the pre-trained model and focus on the more important layers (higher layers towards classification output) for training. If your dataset is quite different than ImageNet, then you would want to train more of higher layers. 4.2 VGG-19 model Simonyan and Zisserman of the University of Oxford created a 19-layer (16 conv., 3 fully-connected) CNN that strictly used 3×3 filters with stride and pad of 1, along with 2×2 max-pooling layers with stride 2, called VGG-19 model.28,29 Compared to AlexNet, the VGG-19 (see Fig. 8) is a deeper CNN with more layers. To reduce the number of parameters in such deep networks, it uses small 3×3 filters in all convolutional layers and best utilized with its 7.3% error rate. The VGG-19 model was not the winner of ILSVRC30 2014, however, the VGG Net is one of the most influential papers because it reinforced the notion that CNNs have to have a deep network of layers in order for this hierarchical representation of visual data to work. Keep it deep. Keep it simple. The VGG-19 model, a total of 138M parameters, was placed 2nd in classification and 1st in localization in ILSVRC 2014. This model is trained on a subset of the ImageNet27 database, which is used in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC).30 The VGG-19 is trained on more than a million images and can classify images into 1000 object categories, for example, keyboard, mouse, pencil, and many animals. As a result, the model has learned rich feature representations for a wide range of images.

Region-based CNN (R-CNN)31 generates a region proposal and uses CNN for object detection and classification. Fast R-CNN32 and Fasted R-CNN33 are proposed later to speed up the process and improve the accuracy. However, the RCNN method is not suitable for breast cancer detection since the mammograms dramatically vary in texture and size of lesion (cancer) from case to case.

Fig. 8 Illustration of the network architecture of VGG-19 model: conv means convolution, FC means fully connected

4.3 VGG-19 model adaption using 3-channel image made of both current and prior scans – capturing subtle changes over time Radiologists use current and prior mammograms (i.e., two examinations at different time periods) side by side to see small changes over time, and then make a diagnostic decision. Our digital mammographic database contains mammograms from current and prior exams (typically 1 year prior). In order to use both current and prior exams and their difference image, the two mammograms must be aligned using image registration technique. Since the VGG-19 model takes a color image as input, a 3-channel image (see Fig. 9c) is created by assigning (current exam, prior exam, difference image) to (red, green, blue) channel, respectively. All regional images (from the refined region proposal) are cropped from the 3-channel image and scaled to 224×224×3 for VGG-19 training and testing. In this way, subtle changes over time are reflected in this 3-channel image and featured in the adapted VGG-19 model.

5. EXPERIMENTAL RESULTS AND DISCUSSION 5.1 Digital mammogram database A retrospective study is conducted with 96 subjects (69 cancerous vs. 27 healthy) of originally digital mammograms (not digitized from films) collected at University of Connecticut Health Center (UCHC), called UCHC DigiMammo (UCHCDM) database.34 We are still in the progress of data collection to get more mammographic images. Each case includes 4 mammograms (two views [CC and MLO] from two sides) imaged at two different times, referred as to current and prior exam or scan (see Fig. 3). All mammographic images are deidentified, annotated in a descriptive text file with known pathology (healthy, mass, AD, calcification) and circled at the locations of cancers (if any) on a separate key image. These annotations are the ground truths for CAD model training and testing. In 69 cancerous subjects, there are 28 labels (i.e., rectangles shown in Fig. 3) for ADs, 41 for masses annotated by our expert radiologists. All mammographic images were preprocessed by normalization, downsampling, quantizing, and cropping (refer to Section 2.1), and registration between current and prior scans. All analyzed cases have both current and prior scans. 5.2 Creation of region proposal using three cascading detectors and refinement with majority vote Three 20-stage cascading detectors were trained by using three classical features: Haar, LBP, and HOG,34 respectively. 50% false alarm rate and twice number of negative (healthy) samples were used during the training process. Note that after 20 stages the false alarm rate should be theoretically reduced down to 0.520 (less than 1 of million). The negative

samples were obtained both from healthy images and from the non-cancerous areas in cancerous images. The detected BBs (bounding boxes) by three cascading detectors are shown in Figs. 9-13a. The number of BBs is reduced by simple majority vote, i.e., two or more detectors vote the same or similar region (BB). For example, a BB is considered as true positive when both Haar- and LBP-based detectors mark the same location (i.e., logical AND operation). In our experiments, a threshold of 0.5, the overlapping ratio between two BBs, was applied to refine the region proposal for CNN (as shown Figs. 9-13b). Of course, this process effectively reduces false positives.

(a) (b) (c) (d) Fig. 9 CAD analyses with Case #37, right breast with mass present marked with a green rectangle, Up/Bottom: CC/MLO. (a) Stage-1: Region proposal from 3 cascading detectors (138/188 cyan rectangles); (b) Stage-2: Refined region proposal voted by 2 or more detectors (63/90 pink rectangles); (c) Stage-3.1: 3-channel image formed by current scan (red), prior scan (green, 2 year ago), and their difference (blue) – the enlarged detections shown at upper-left corner; (d) Stage-3.2: CNN-classified and annotated (3/4 yellow rectangles – detected cancers).

5.3 CNN classification with 3-channel images Two exams of either CC or MLO view must be aligned using image registration technique. Then a difference image is obtained by subtracting the prior exam from the current exam and then scaled to the full-range intensity. A 3-channel image is created by assigning (current scan, prior scan, difference image) to (red, green, blue) channel, which looks like a false-colored image (see Fig. 9-13c). The regional images from the refined region proposal are cropped from the 3channel image and scaled to 224×224×3, which are used for CNN training based on a pre-trained VGG-19 model. The VGG-19 training has gone through 380 epochs (loops), four times more negative samples (from healthy cases and noncancerous areas in cancerous cases) than positive samples were used. The CAD performance is calculated by averaging the results of 10 runs, and each run is based on a random split of all

mammographic image samples, 75% for training vs. 25% for testing. The training of 3 cascading detectors (Haar, LBP, and HOG) only uses the current exams, while the CNN training takes 3-channel regional images as inputs.

(a) (b) (c) (d) Fig. 10 CAD analyses with Case #43, right breast with architecture distortion present marked with a green rectangle, CC view: (a) Stage-1: Region proposal from 3 cascading detectors (160 cyan rectangles); (b) Stage-2: Refined region proposal voted by 2 or more detectors (51 pink rectangles); (c) Stage-3.1: 3-channel image formed by current scan (red), prior scan (green, 1 year ago), and their difference (blue) – the enlarged detections shown at lower-left corner; (d) Stage-3.2: CNN-classified and annotated (7 yellow rectangles – detected cancers).

(a) (b) (c) (d) Fig. 11 CAD analyses with Case #63, left breast with architecture distortion present marked with a green rectangle, MLO view: (a) Stage-1: Region proposal from 3 cascading detectors (213 cyan rectangles) – the enlarged detections shown at upper-right corner; (b) Stage-2: Refined region proposal voted by 2 or more detectors (77 pink rectangles); (c) Stage-3.1: 3-channel image formed by current scan (red), prior scan (green, 1 year ago), and their difference (blue); (d) Stage-3.2: CNN-classified and annotated (0 yellow rectangles – false negative).

CNN-classification results based on the refined region proposal are the final CAD results. As shown in Fig. 9-13c (color presentation for illustration) and Fig. 9-13d (grayscale presentation for diagnosis), if the number of detections (rectangles) is 0 (none), that means the CAD result is normal; if the number is 1 or more, that means cancers are detected as the rectangles (BBs) indicated. Let us have case analyses as presented in Figs. 9-13. In case #37 (Fig. 9), there are 3 and 4 detected BBs in CC and MLO view, and they all well overlapped with the ground-truth BB (true positives). There are 7 BBs detected in Case #43 (Fig. 10) and all are close to the ground truth. In Case #43 (Fig. 11), there is 2 detected BBs from the same detectors in Stage

1 overlapping with the ground-truth but missed at Stage 2 (i.e. filtered out as false negatives). In Case #116 (Fig. 12), although 463 detections are present in Stage 1, all are eliminated in Stage 3 after CNN classification (true negative). In Case #114 (Fig. 13), there is 1 false positive remained in Stage 3. Of course, multiple overlapped BBs may be merged into one larger BB (but not yet merged in this paper).

(a) (b) (c) (d) Fig. 12 CAD analyses with Case #116, right breast with healthy condition: CC view: (a) Stage-1: Region proposal from 3 cascading detectors (463 cyan rectangles); (b) Stage-2: Refined region proposal voted by 2 or more detectors (204 pink rectangles); (c) Stage-3.1: 3-channel image formed by current scan (red), prior scan (green, 1 year ago), and their difference (blue); (d) Stage-3.2: CNN-classified and no yellow rectangles (0 yellow rectangle, i.e., non-detected cancers thus normal).

(a) (b) (c) (d) Fig. 13 CAD analyses with Case #114, left breast with healthy condition: CC view: (a) Stage-1: Region proposal from 3 cascading detectors (197 cyan rectangles); (b) Stage-2: Refined region proposal voted by 2 or more detectors (87 pink rectangles); (c) Stage-3.1: 3-channel image formed by current scan (red), prior scan (green, 1 year ago), and their difference (blue) – the enlarged detections shown at upper-right corner; (d) Stage-3.2: CNN-classified and 1 yellow rectangle (1 detected cancer, i.e., 1 false positive).

The overall CAD performance is listed in Table 1, which shows Sensitivity = 0.928, Specificity = 0.991, FPI = 0.004. Considering all cancerous cases were normal 1-2 years ago, these are early-stage cancers. The performance values show high specificity, low false positives, and pretty high sensitivity, which demonstrated a very promising CAD method. The time costs of the proposed CAD methods are given in Table 1, where the mean values were averaged across all detections over 985 images from 96 subjects. All CAD algorithms were implemented and run in Matlab 2017a (Version 9.2), on a laptop computer, MSI GT73VR, with the following configuration: Intel i7-7820HK CPUs 2.9GHz, 16GB RAM, 1.25TB hard disk, 64-bit Windows 10; NVIDIA GeForce GTX 1070 Graphics Board with 8GB video memory (on board). VGG-19 training (of 380 epochs) takes a longer time, 8.5 hours (510 minutes), but testing is quite fast.

The number of false positives per image (0.004 FPI) from the proposed CAD is very small. CNN is really powerful to eliminate false positives. Thus we need to increase the number of positive detections at Stage 1. We may relax the restriction from Stage 2 (voted by 2 detectors), or reconfigure the cascading detectors, or add more detectors. Table 1. The proposed CAD model performance for breast cancer detection. FPI = # false positives per image, BB = bounding box (rectangle) #Subjects in Database Mean Performance (Average of 10 runs) # Detected BBs per Image Final BBs vs. the ground-truth BB Total Training Time Mean Detection Time

69 Cancerous, 27 Healthy Sensitivity = 0.928, Specificity = 0.991, FPI = 0.004 3-detectors: 165, Voted-by-2-detectors: 63, After-CNN-classification: 3 OverlapRatio = 0.944, ConfidenceVal = 1.0 3 cascading detectors: 22 minutes VGG-19 380 epochs: 510 minutes 3 cascading detections: 0.62 second per image VGG-19 Classification: 0.88 second per image

Two views (CC and MLO) are processed separately in this paper, but the CAD processing and detection should be corresponded between two views. For example, two view detections may be used as cross reference and pinpointing the cancer location in 3D space. There are other ways to manipulate two exams (current vs. prior) such as the absolute difference, weighted difference according to scan years, or unsharp masking. The question is how radiologists extract changes from two exams, which may not be simple mathematical subtraction. Aligning two exams by image registration is challenging and time consuming, and a better technique is anticipated.

6. CONCLUSIONS We propose a CAD method for breast cancer detection that takes the advantages of cascading detectors, CNN, and follow-up checks. The three cascading detectors can create a region proposal that contains possible cancerous areas, while the adapted VGG-19 CNN is very effective in removing false positives. To detect subtle changes over time, 3channel images are formed with current scan, prior scan and difference image, and then fed to the CNN classifier. The proposed CAD method is validated by detecting masses and ADs with digital mammograms (UCHC DigiMammo Database), which shows very promising results, Sensitivity = 0.928, Specificity = 0.991, FPI = 0.004. This model can be extended to the detections of calcifications. The size of the database may limit statistical power; however, cascading detectors and neural network are able to increase accuracy with a larger database. The UCHCDM database is still growing, and we will revisit our CAD method in the near future and expect a better performance.

REFERENCES [1] American College of Radiology, ACR BI-RADS — Mammography, Ultrasound & Magnetic Resonance Imaging (4th ed.), American College of Radiology, Reston, VA (2003). [2] Kegelmeyer, Jr., W.P., Pruneda, J.M., Bourland, P.D., et al., “Computer-aided mammographic screening for speculated lesions,” Radiology, vol. 191, pp.331–337 (1994). [3] Liu, S.L., Babbs, C.F., and Delp, E.J., “MultiResolution Detection of spiculated Lesions in Digital Mammograms,” IEEE Transactions on Image Processing, vol. 10, pp.874-884 (2001). [4] Matsubara, T., Fujita, H., Endo, T., Horita, K., et al., “Development of mass detection algorithm based on adaptive thresholding technique in digital mammograms,” presented at Digital Mammogrpahy (1996). [5] Qian, W., Li, L., Clarke, L., Clark, R.A., and Thomas, J., “Comparison of adaptive and non adaptive CAD methods for mass detection,” Academic Radiology, vol. 6, pp.471-480 (1999). [6] Zheng, Y. and Agyepong, K., “Mass Detection with Digitized Screening Mammograms by Using Gabor Features”, SPIE Proceedings Vol. 6514, pp.651402-1-12, San Diego (2007).

[7] Zheng, Y., “Breast Cancer Detection with Gabor Features from Digital Mammograms,” Algorithms, Vol. 3, No. 1, 44-62, (2010). [8] Zheng, B., et al., “Performance change of mammographic CAD schemes optimized with most-recent and prior image databases,” Acad Radiol 10:283–288, (2003). [9] Tan, m., et al., “Assessment of a Four-View Mammographic Image Feature Based Fusion Model to Predict Near-Term Breast Cancer Risk,” Ann Biomed Eng. (2015). [10] Rangayyan, R. et al., “Computer-aided detection of architectural distortion in prior mammograms of interval cancer,” J Digit Imaging, 23(5):611-31 (2010). [11] Soriano, D., Aguilar, C., Ramirez-Morales, I., Tusa, E., Rivas, W., Pinta, M., “Mammogram Classification Schemes by Using Convolutional Neural Networks,” CITT 2017, Communications in Computer and Information Science, vol. 798, pp 71-85, Springer, Cham (2017) [12] Jadoon, M. Mohsin et al., “Three-Class Mammogram Classification Based on Descriptive CNN Features,” BioMed Research International 2017: 3640901 (2017). [13] Jiang, F., Liu, H., Yu, S., Xie, Y., “Breast mass lesion classification in mammograms by transfer learning,” ICBCB '17 Proceedings of the 5th International Conference on Bioinformatics and Computational Biology, Pages 59-62, Hong Kong, 2017. [14] Viola, P. and Jones, M., “Rapid Object detection using a boosted cascade of simple features.” Proceedings of CVPR, vol. 1 pp. 511–518 (2001). [15] Viola, P. and Jones, M., “Robust Real-time Object Detection,” International Journal of Computer Vision, Vol. 57, Iss. 2, pp. 137–154 (2001). [16] Papageorgiou, C., Oren, M., and Poggio, T., “A general framework for object detection,” Sixth International Conference on Computer Vision, pp. 555-562 (1998). [17] Lienhart, R. and Maydt, J., “An extended set of Haar-like features for rapid object detection,” Proceedings of the International Conference on Image Processing, vol. 1, pp. 900-903 (2002). [18] Xie, S., Shan, S. Chen, X., and Chen, J., “Fusing Local Patterns of Gabor Magnitude and Phase for Face Recognition,” IEEE Transactions on Image Processing,19(5),pp.1349~1361 (2010). [19] Ojala, T., Pietikainen, M., and Harwood, D., “A comparative study of texture measures with classification based on feature distributions,” Pattern recognition, pp. 51-59, (1996). [20] Lowe, D., “Distinctive image features from scale-invariant keypoints”, International Journal of Computer Vision, 60(2):91-110, ( 2004). [21] Gepperth, A., Edelbrunner, J., and Bocher, T., “Real-time detection and classification of cars in video sequences,” Intelligent Vehicles Symposium, pages 625–631 (2005). [22] Negri, P., Clady, X., Prevost, L., “Benchmarking haar and histograms of oriented gradients features applied to vehicle detection,” Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics, Robotics and Automation 1 (ICINCO 2007), Angers, France (2007). [23] Han, F., Shan, Y., Cekander, R., Sawhney, H. and Kumar, R., “A Two-Stage Approach to People and Vehicle Detection with HOG-Based SVM,” Proc. Performance Metrics for Intelligent Systems, pp. 133-140 (2006). [24] Krizhevsky, A., Sutskever, I., Hinton, G.E., “Imagenet classification with deep convolutional neural networks”, Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, Pages 1097-1105, Lake Tahoe, Nevada, (2012). [25] Lettvin, Maturana, McCulloch and Pitts, “What the Frog’s Eye Tells The Frog’s Brain,” Proc. Inst. Radio Engr. 1959, vol. 47 pages 1940-1951, (1959). [26] Hubel and Wiesel & the Neural Basis of Visual Perception, https://knowingneurons.com/2014/10/29/hubel-and-wiesel-theneural-basis-of-visual-perception/ (2014). [27] ImageNet, http://www.image-net.org. [28] Simonyan, K., Zisserman, A., “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv technical report, (2014). [29] University of Oxford, Visual Geometry Group, http://www.robots.ox.ac.uk/~vgg/research/very_deep/ [30] Russakovsky, O., Deng, J., Su, H., et al., “ImageNet Large Scale Visual Recognition Challenge.” International Journal of Computer Vision (IJCV). Vol. 115, Issue 3, pp. 211–252 (2015). [31] Girshick, R., Donahue, J., Darrell, T., and Malik, J., “Rich feature hierarchies for accurate object detection and semantic segmentation,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587 (2014). [32] Girshick, R., “Fast R-CNN,” In Proceedings of the IEEE International Conference on Computer Vision, pages 1440–1448 (2015). [33] Ren, S., He, K., Girshick, R. and Sun, J., “Faster R-CNN: Towards real-time object detection with region proposal networks,” In Advances in neural information processing systems, pages 91–99 (2015). [34] Zheng, Y., YangM C., Merkulov, A., Bandari, M., “Early breast cancer detection with digital mammograms using Haar-like features and AdaBoost algorithm,” SPIE Proceedings Vol. 9871, Sensing and Analysis Technologies for Biomedical and Cognitive Applications 2016, 98710D (2016). [35] Engeland, S.V., Snoeren, P., Hendriks, J., and Karssemeijer, N., “A comparison of methods for mammogram registration,” IEEE Trans. Medical Imag., Vol. 22, No. 11, pp.1436-1444 (2003). [36] Pluim, J.P., Maintz, J.B.A. and Viergever, M.A., “Mutual information-based registration of medical images: a survey,” IEEE Trans. Medical Imag., Vol.. 986-1004, (2003).

Suggest Documents