Department of Biomedical Informatics, Arizona State University. 2. Division of Gastroenterology and Hepatology, Mayo Clinic. ABSTRACT. Colonoscopy is the ...
AUTOMATIC POLYP DETECTION FROM LEARNED BOUNDARIES Nima Tajbakhsh1 , Changching Chi1 , Suryakanth R. Gurudu2 , and Jianming Liang1 1
Department of Biomedical Informatics, Arizona State University 2 Division of Gastroenterology and Hepatology, Mayo Clinic
ABSTRACT Colonoscopy is the primary method for detecting and removing polyps—precursors to colon cancer, but during colonoscopy, a significant number of polyps are missed— the pooled miss-rate for all polyps is 22% (95% CI, 19%26%). This paper presents an automatic polyp detection system for colonoscopy, aiming to alert colonoscopists to possible polyps during the procedures. Given an input image, our method first collects a crude set of edge pixels, then refines this edge map by effectively removing many non-polyp boundary edges through a classification scheme, and finally localizes polyps based on the retained edges with a novel voting scheme. This paper makes three original contributions: (1) a fast and discriminative patch descriptor for precisely characterizing image appearance, (2) a new 2-stage classification pipeline for accurately excluding undesired edges, and (3) a novel voting scheme for robustly localizing polyps from fragmented edge maps. Evaluations demonstrate that our method outperforms the state-of-the-art.
voting scheme for polyp localization. We should note that our method is not designed to delineate polyps but rather to use polyp boundaries as hints to localize polyps. This paper makes the following three original contributions: • A new patch descriptor that quickly and efficiently characterizes image appearance across object boundaries. Our descriptor is both rotation invariant and robust against linear illumination changes. • A 2-stage classification framework that is able to enhance low level image features prior to classification. Unlike traditional image classification where a single patch undergoes the processing pipeline, our system fuses the information extracted from a pair of patches for more accurate edge classification. • A novel vote accumulation scheme that robustly detects objects with curvy boundaries in fragmented edge maps. Our voting scheme produces a probabilistic output for each polyp candidate but does not require any predefined parametric model of the object of interest (e.g., circle and ellipse).
1. INTRODUCTION
2. PROPOSED METHOD
Existing methods for detecting colon polyps utilize texture [1], shape [2], spatio-tempral [3], and shadow information [4, 5]. However, these methods have limitations. First, texture becomes fully visible if the polyp appears within the depth of field of the camera, a condition that is not often met given the relatively large distance between polyps and the camera. Second, raw shape information without image context, as used in [2], can mislead the detector towards irrelevant objects that resemble polyps. Third, spatio-temporal features used in [3] may not be suitable for real time detection scenario since polyp localization in the current frame requires information from the future and past frames. Finally, shadow information [4, 5] can mislead the detector towards other structures with similar surrounding shadows such as vessels and stools. This paper proposes a novel method that combines image context with shape information to minimize the misleading effect of irrelevant objects with polyp-like boundaries. Given an input image, our method begins with collecting a crude set of boundary pixels that are refined by our patch descriptor and classification scheme, before feeding to our novel
Our polyp detection system is based on two key observations. First, polyps, irrespective of their morphology, feature a curvy segment in their boundaries (Fig. 1(a)). We use this property in designing our voting scheme that localizes polyps by detecting objects with curvy boundaries. Second, image appearance across the polyp boundaries is highly distinct, as shown in Fig. 1(b) where hundred thousands oriented image patches are averaged to show how the average image appearance across polyp boundaries (top-left) differs from that of vessels, lumen, and specular reflections. We exploit this property in designing our patch descriptor and classification scheme to distinguish polyp boundaries from the boundaries of other colonic objects, producing a refined edge map for our vote accumulation scheme.
978-1-4673-1961-4/14/$31.00 ©2014 IEEE
2.1. Feature Extraction To capture the unique image appearance of polyps along their boundaries, we extract 64x64 oriented sub-images along the edge normals. As a result, in the extracted sub-images, the edges always appear vertically in the middle. For an edge pixel at angle θ, its two possible normals (θ−π/2 and θ+π/2)
97
system where the first stage aims to produce mid-level image features from low-level input features while the second stage aims to learn both edge label (i.e., “polyp or “non-polyp”) and normal direction. The suggested classification system is general but requires labeling of sub-negative classes, which can be done either manually or in an unsupervised way by applying Kmeans on negative patches. To train the first layer, we collect N1 oriented patches around boundaries of polyps and four sub-negative classes: vessels, lumen areas, specular reflections, and around edges at random location in training images. We then train a fiveclass classifier using our proposed patch descriptor. The output of the classifier is an array of probabilities for the 5 object classes. Compared with the low level input features, which encode local image variation, the generated output array contains mid-level features that measure the global similarity between the underlying input sub-image and the general appearance of the predefined structures. To train the second layer, we collect N2 pairs of oriented patches from polyp boundaries and other random locations in training images. Let {p1i , p2i } be the extracted pair of patches around ith edge with the corresponding normals, {n1i , n2i }, where 6 n1i ∈ [0, π) and 6 n2i = 6 n1i + π. Based on the state of the ith edge, we assign a label, yi ∈ {0, 1, 2} to each pair of patches, where “0” is for a non-polyp edge, “1” is for a polyp edge with normal being n1i , and “2” is for a polyp edge with normal being n2i . Such labeling is possible given the ground truth for polyps. We extract low level features from each pair of patches and then apply the classifier trained in the first layer, resulting in two arrays of mid-level features per pair that are further concatenated to form a feature vector, {fi , yi }. Once all feature vectors are collected, we train a 3-class classifier to learn both edge label and edge normal direction. In the test stage, the label with maximum probability is assigned to the underlying edge pixel. Basically, the second layer of classification fuses knowledge captured from a pair of patches because combining the two sources of information can yield more accurate edge classification. Let us assume the patch p1i around ith edge resembles the appearance of a polyp; however, the counterpart patch p2i looks very similar to the average appearance of specular reflections or lumen areas. What is the decision regarding the underlying edge pixel? Relying on the first patch and declaring a polyp edge with edge normal being n1i or considering information from the counterpart patch and declaring a non-polyp edge? To solve this problem, we train a second classifier in the mid-level feature space to fully utilize such relationships.
Fig. 1: (a) Despite varying morphology, polyps feature a curvy segment in their boundaries. (b) From left to right: average appearance of polyps, lumen, vessels, and specular reflection across thousandths of image patches. Polyp boundary has a distinct appearance. (c) Illustrating how the patches have been collected from images. give two horizontally mirrored sub-images. The following description is valid for both situations. We form 8x16 patches all over each sub-image with 50% overlap along horizontal and vertical directions. Each patch is then averaged vertically, resulting in a 1D intensity signal which presents intensity variation along the horizontal axis. We then apply a 1D discrete cosine transform (DCT) to obtain a compact and informative presentation of the signal. To achieve invariance against constant illumination changes, the DC component (the average patch intensity) is discarded. To achieve invariance against linear illumination scaling, we divide the AC coefficients by the norm of the DCT coefficients vector. As a result, our descriptor can partially tolerate nonlinear illumination change over the whole sub-image particularly if the nonlinear change can be decomposed to a set of linear illumination changes on the local patches. Finally, we select the first few normalized AC coefficients from each patch corresponding to low frequency intensity changes and concatenate them to form a feature vector for a sub-image. Our image descriptor offers two advantages: (1) rotation invariance, which is important because we need features that can consistently represent image appearance across edges lying at arbitrary directions; (2) illumination invariance, which is essential because in colonoscopy, the source of light moves along with the camera, causing the same segment of a polyp boundary to appear with varying contrast in different frames. 2.2. Classification
2.3. Voting Scheme
The classification stage has two objectives: (1) to refine an edge map by discarding “non-polyp” edges, and (2) to determine the normal direction for the retained polyp-like edges such that the normals point toward polyp locations. To achieve these objectives, we design a 2-stage classification
Our voting scheme aims to find the location of a polyp from the curvy configuration of the polyp-like edges that have passed the classification stage—here on referred to as voters. Compared to Hough transform that is limited to detecting
98
objects with predefined shapes, our voting scheme is general and accommodates a wide range of objects with curvy boundaries. In addition, our method provides a probabilistic output for each detected object, which is essential for rejecting false polyp candidates. The voting scheme begins with grouping the voters into 4 categories, according to their edge normal orientation, (k+1)π 6 }, k = 0...3. The votes cast by the V k ={vi | kπ 4 < ni < 4 voters in each category are then collected and accumulated in four voting maps which are subsequently multiplied to form the final map whose maximum indicates the location of a polyp candidate. Mathematically, 3 X Y arg max Mv (x, y). (1) x,y
Fig. 2: (a) The geometric illustration of the voting scheme for an edge pixel lying at 135 degree. (b) The voting map for a polyp (shown in white) and the determined search radii for a subset of radial rays. shown in Fig. 2(b) represent the search radii for a subset of radial rays. Once search radii are determined, P179the probability of 1 a polyp candidate is measured as 180 θ=0 (Rθ ∨ Rθ+180 ), where Rθ is an indicator variable that takes 1 if the ray at angle θ hits at least 1 voter and 0 otherwise.
k=0 v∈Vk
where Mv (x, y), the vote cast by the voter v at a receiver pixel r = [x, y], is computed as follows: ( Mv (r)=
#» 2 #» #»vr), if ) cos(6 n Cv exp( −kσvrk i F
0,
if
6 6
#» < π/2 #»vr n i #» ≥ π/2 #»vr n
(2)
3. EXPERIMENTS
i
To evaluate our detection system, we employed CVC-ColonDB [5], the only publicly available polyp database containing 300 colonoscopy images with 20 pedunculated polyps, 180 sessile polyps, and 100 flat polyps. We first evaluated our patch descriptor and then the whole polyp detection system in comparison with the state-of-the-art methods [5, 6]. For feature evaluation, we collected 50,000 oriented images around polyp and other boundaries in colonoscopy images. We selected half of the images corresponding to the first 150 images for training a random forest classifier and used the rest for testing. Fig. 3(a) compares the resultant ROC curves of our image descriptor using 3 selected coefficients (315 features per sub-image) and those obtained from the other widely-used methods such as HoG1 , LBP2 , and Daisy3 . Due to space constraints, we excluded the ROC curves corresponding to more than 3 DCT coefficients, but our experiments demonstrated that it did not achieve any significant improvement in performance. In terms of running time, our descriptor significantly outperformed our closest competitor, Daisy descriptor. Fig. 3(b) shows that the discrimination power of each feature extracted by our image descriptor across all 50,000 images. Comparing the discrimination map against the average appearance of polyp boundary shown in Fig. 3 (c) reveals that the most informative features are extracted from the polyp boundary, indicating that our descriptor successfully captures the desired image information. For system evaluation, we employed 5-fold cross validation. To train our 2-stage classifier, we collected N1 =100, 000 oriented image patches from training images with almost 20,000 samples for each of the five predefined classes, and
where Cv is the probabilistic classification confidence as#» is the vector connecting the voter and signed to the voter, vr #» is the receiver, σF controls the size of voting field, and 6 n#»i vr #» #» angle between edge normal ni and vr. Fig. 2(a) shows the voting field for an edge pixel lying at 135 degree. As seen, the voter casts votes only in the region pointed by the normal direction. Such selectivity arises from the condition set on #» which prevents voters from casting votes in the oppo#»vr, 6 n i site direction. Recall that the normal direction is determined by the classification stage such that it points towards the location of the polyp. The exponential and cosinusoidal decay functions enable smooth vote propagation, which we will later use in our ray back projection technique to determine the likelihood of a polyp candidate. Edge grouping prior to vote casting is essential. In the suggested voting map, regions attain high accumulation values only if they receive adequate responses from each individual voting map. Therefore, regions surrounded by low curvature boundary (e.g., parallel edges) receive low responses, as their surrounding edge pixels can only contribute to a small fraction of the four voting maps. Fig. 2(b) shows the boundary of a polyp and its corresponding voting map. To assign a probability to a polyp candidate, we perform ray back projection in which radial rays are cast from the detection location outward in all possible directions and then the fraction of rays hitting the voters is calculated. The key to our ray back projection is how to determine the search radius for each individual ray. Short or long radii may underestimate or overestimate the polyp likelihood. We estimate search radius by modeling the decay in vote accumulation along each radial ray. For a ray at angle θ, the search radius is estimated as 3σθ where σθ is the standard deviation of the Gaussian function fitted to the corresponding decay signal. The green lines
1 http://lear.inrialpes.fr/pubs/2005/DT05/ 2 http://www.cse.oulu.fi/CMV/Downloads/LBPMatlab 3 http://cvlab.epfl.ch/software/daisy
99
Fig. 3: (a) Our patch descriptor outperforms other descriptors such Fig. 4: Examples of our polyp detection system. The edges retained after classification are shown in green. For visualization purposes, the voting maps are superimposed on the original images.
as LBP, HOG, and Daisy. (b) Discrimination power of our features when reshaped to a matrix and then scaled to the same size of used sub-images. Comparing the map with (c) the average polyp appearance reveals that the most informative features have been extracted around the polyp boundary. (d) The ROC curve for our polyp detection system.
emphasize that our system is not designed to compete with human experts (colonoscopists), but rather to provide feedback to enhance their diagnostic capabilities, especially during long and back-to-back colonoscopies, where human factors, such as insufficient attentiveness and fatigue, result in misdetection of polyps. In addition, our method is not limited to optical colonoscopy; as a matter of fact, we plan to investigate its effectiveness for polyp detection in capsule endoscopy. For our future work, we also plan to conduct largescale clinical evaluation of the suggested methodology.
Table 1: Precision and recall of polyp detection for the proposed system, our previous work [6], and the one suggested in [5]. Our new system outperforms the rest.
Recall 50% 60% 70% 80%
Precision Our method
Previous work [6]
SA-DOVA [5]
97% 96% 95% 93%
90% 88% 89% 86%
92% 78% 65% 60%
5. REFERENCES
collected N2 =100, 000 oriented image patches with 50% of images being extracted around polyps and the rest around edges in random location in training images. For classification, we chose the random forest classifier because of its high quality probabilistic output that we utilized in our voting scheme. The trained classification system followed by our voting scheme was then applied to the 5 test folds. Our voting scheme detected 267 out of 300 polyps. This outperformed the state-of-the-art [5], where only 252 candidates were cast inside the polyps. Examples of polyp localization are shown in Fig. 4. To obtain precision and recall rates, we then changed a threshold on the probabilities assigned to the generated polyp candidates. As shown in Tab. 1, the proposed system outperformed the one suggested in [5] as well as our previous work [6] where Haar features, a 1-layer classification, and a less accurate voting scheme were employed. Furthermore, we evaluated our system by including 3000 images that did not contain a polyp, i.e. 300 positive images from CVC-ColonDB and 3000 negative images from our private database. Fig. 3(d) shows the ROC curve for our polyp detection system.
[1] S. Karkanis, D. Iakovidis, D. Maroulis, D. Karras, and M. Tzivras, “Computer-aided tumor detection in endoscopic video using color wavelet features,” Information Technology in Biomedicine, IEEE Transactions on, vol. 7, no. 3, pp. 141–152, 2003. [2] S. Hwang, J. Oh, W. Tavanapong, J. Wong, and P. de Groen, “Polyp detection in colonoscopy video using elliptical shape feature,” in Image Processing, 2007. ICIP 2007. IEEE International Conference on, vol. 2, 2007, pp. II–465–II–468. [3] S. Y. Park, D. Sargent, I. Spofford, K. Vosburgh, and Y. ARahim, “A colon video analysis framework for polyp detection,” Biomedical Engineering, IEEE Transactions on, vol. 59, no. 5, pp. 1408–1418, 2012. [4] Y. Wang, W. Tavanapong, J. Wong, J. Oh, and P. de Groen, “Part-based multi-derivative edge cross-section profiles for polyp detection in colonoscopy,” Biomedical and Health Informatics, IEEE Journal of, vol. PP, no. 99, pp. 1–1, 2013. [5] J. Bernal, J. Snchez, and F. Vilario, “Towards automatic polyp detection with a polyp appearance model,” Pattern Recognition, vol. 45, no. 9, pp. 3166–3182, 2012. [6] N. Tajbakhsh, S. Gurudu, and J. Liang, “A classificationenhanced vote accumulation scheme for detecting colonic polyps,” in Abdominal Imaging. Computation and Clinical Applications, ser. Lecture Notes in Computer Science, 2013, vol. 8198, pp. 53–62.
4. CONCLUSIONS AND DISCUSSIONS We have presented a novel polyp detection method with performance superior to the state-of-the-art. We would like to
100