points against that of color salient points using the Harris-Laplace detector [12] and its extension to ..... the ABCD rule, the second was proposed by Di Leo et al.
Evaluation of Color Based Keypoints and Features for the Classification of Melanomas Using the Bag-of-Features Model Catarina Barata1 , Jorge S. Marques1 , and Jorge Rozeira2 1
Institute for Systems and Robotics, Instituto Superio T´ecnico, Portugal 2 Hospital Pedro Hispano, Matosinhos, Portugal
Abstract. Dermatologists consider color as one of the major discriminative aspects of melanoma. In this paper we evaluate the importance of color in the keypoint detection and description steps of the Bag-of-Features model. We compare the performance of gray scale against that of color sampling methods using Harris Laplace detector and its color extensions. Moreover, we compare the performance of SIFT and Color-SIFT patch descriptors. Our results show that color detectors and Color-SIFT perform better and are more discriminative achieving Sensitivity = 85%, Specificity = 87% and Accuracy = 87% in PH2 database [17].
Keywords: Melanoma, Dermoscopy,Bag-of-Features, Color Based Keypoints, Harris Laplace detector, SIFT, Color-SIFT
1
Introduction
Melanoma is one type of skin cancer. However, its great potential to rapidly grow and metastasize makes melanoma the most lethal among all the variations of skin cancer. Nonetheless, if it is detected at an early stage, before it contacts with the vascular plexus, it can be cured with a simple excision. Among the methods used by dermatologists to diagnose melanoma, dermoscopy has proved to significantly increase the diagnostic performance since it magnifies the lesion 10-100× and allows the observation of several structures inside the lesions, that are invisible to the naked eye [1]. The downside of dermoscopy is that it requires a trained practitioner, thus it can not be used by all dermatologists unless they receive specific training. A computer-aided diagnostics (CAD) system can be used to tackle this limitation, since it can be used by non-experienced dermatologists as a guidance tool. The analysis of dermoscopy images performed by the specialists is based on one or more medical algorithms. Usually dermatologists look for localized patterns (e.g., reticular, cobblestone), colors (e.g, dark brown, red, blue) and atypical structures (e.g., atypical pigment network and streaks), as well as assess the shape and the border of the lesion. One example of a medical procedure is the ABCD rule (Asymmetry, Border, Color and Dermoscopic structures) [2] which has inspired several CAD systems [3– 5]. An alternative approach is based on the 7-point checklist [6] and tries to assess 7 dermoscopic criteria usually associated with melanomas: atypical pigment network
2
and vascular pattern; irregular streaks, dots and pigmentation; regression structures and blue-whitish veil [7]. In this paper we use the Bag-of-Features (BoF) [8] model to locally describe the lesion at interest points, associated with specific texture patterns like lines and blobs, and classify it. Although this methodology has been previously used for melanoma detection [9], the keypoint sampling method was either performed densely or sparsely on luminance images. However, dermatologists state that color plays an important role on the diagnosis of melanoma [1] and it has been proved that using color information in the salient points sampling process improves the classification results of BoF in image retrieval [10] [11]. Thus, this paper compares the performance of luminance salient points against that of color salient points using the Harris-Laplace detector [12] and its extension to color. Moreover, we test two alternative scale selection strategies for color salient points [10], [11]. The latter has been recently proposed by St¨ottinger et al. [11] and, to the best of our knowledge, has never been applied to medical images. The other main contribution of this paper is the analysis of SIFT [13] as patch descriptors for dermoscopy images. As before, we assess the influence of having color information and compare the performance of SIFT with its color variations. To the best of our knowledge, the detectors and descriptors evaluated in this work have never been used for the diagnose of melanomas. The remaining paper is organized as follows. Section 2 describes the detectors and descriptors tested, Section 3 presents and discusses the experimental results and Section 4 concludes the paper.
2
Methods
In this section we will describe the BoF model used to classify the dermoscopy images, as well as the different keypoints detectors and patch descriptors evaluated in this work. The BoF model performs the classification of images using local information [8]. Thus, the first step of this method is to divide the image into small regions/patches that are described separately. A simple way to perform this division is to identify salient points in the image and then extract square patches of size δ × δ around them. There are two common ways of finding the interest points. The first consists of assuming that the keypoints are equally spaced and are the nodes of a regular grid placed on the image. An alternative is to use more elaborate detectors that look for specific texture patterns like lines or blobs [14]. In this work, we will use the second approach and a detailed description of the different detectors that are evaluated is performed in the following subsection. After finding the interest points and extracting the square patches around them, it is necessary to describe the patches. This is done by extracting a feature vector for each patch. In this paper, we compare two different types of features: SIFT [13] and Color-SIFT [15]. The number of patches extracted varies between images, which makes it impossible to compare the lesions. To tackle this issue, a clustering step is performed in the training phase, i.e, the patches extracted from the training images are used to computed a set of centroids using the K-means algorithm. Each centroid is called a visual word and the
3
set can be seen as visual dictionary [8]. Still in the training phase, the visual dictionary is used to analyze each image, compare its patch features with the centroids and assign them to the closest one, i.e., the centroid which minimizes the Euclidean distance. Then, a histogram that counts the frequency of occurrence of each visual word is built. This histogram is the new feature vector that characterizes the image. Finally, it is necessary to train a classification algorithm to be able to distinguish between melanomas and benign lesions. In this work, the decision rule is computed using the k-Nearest Neighbor (kNN) classifier. Since the images are described by histograms we have decided to use histogram intersection (1) as the comparative distance of kNN d(x, y) = ∑ min(xi , yi )
(1)
i
where x and y are histograms and xi and yi are the i-th bin. On the testing phase, we will use the visual dictionary computed during the training phase to identify the patches of the unseen images, build their histogram signatures and apply the classification rule to predict the label of the lesion (melanoma or benign). Fig. 1 shows the block diagram of the described BoF model.
Fig. 1. Block diagram of the BoF model.
2.1
Keypoints Detectors
Different keypoint detectors can be found in literature. These are usually an extension of corner, blobs or curvilinear structures detectors towards the scale invariance property [14]. To achieve this invariance it is necessary to start by constructing a scale space representation for each image, which is done by convolving the image I(x, y) with a set of Gaussian kernels GσD (x, y), each with a different scale σD . Then, the keypoints as well as their characteristic scale are computed using the information of the scale-space. One of the most popular detectors is the Harris Laplace, that simultaneously detects corners and blobs [12]. The main idea behind this detector is to detect keypoints in image I at different scales using the Harris corner detector by finding the local 3D maxima [12] of: det(M(x, y, σD )) − α tr2 (M(x, y, σD )), (2) 2 Lx (x, y, σD ) Lx Ly (x, y, σD ) M(x, y, σD ) = σD2 GσI (x, y) ∗ , (3) Lx Ly (x, y, σD ) Ly2 (x, y, σD )
4
where α = 0.06, σI is an integration scale and Lβ , β ∈ {x, y} are the first order derivatives of I(x, y)∗GσD (x, y) with respect to β . Then, the characteristic scale of the keypoint is selected using the Laplacian function [12] |σD2 (Lxx (x, y, σD ) + Lyy (x, y, σD ))|,
(4)
where Lβ β are the second order derivatives of I(x, y) ∗ GσD (x, y). A keypoint candidate (x, y) is selected only if it is both a extrema of the Harris and Laplacian functions. The previous descriptor works on images converted to gray-level. This has a number of side effects, namely the loss of distinctiveness in regions that exhibit chromatic variation [10]. To tackle this issue an extension of the previous detector to color was proposed. This is done both using Harris and Laplacian functions. Assuming that a color space C has n = 3 components [c1 , c2 , c3 ]T , the elements of matrix M (3) will be [11] Lx2 = ∑nj=1 (c j (x, y) ∗ Gx,σD (x, y))2 , Ly2 = ∑nj=1 (c j (x, y) ∗ Gy,σD (x, y))2 ,
(5)
Lx Ly = ∑nj=1 (c j (x, y) ∗ Gx,σD (x, y))(c j (x, y) ∗ Gy,σD (x, y)), where Gi,σD (x, y) are the first order derivatives of GσD (x, y). Then, the Harris energy (2) can be computed as in the luminance case [11]. Two different strategies are used to perform the scale selection in the color domain. Vigo et al. [10] propose a simple method to extend the scale selection to color in which they combine the channels in a vectorized fashion and obtain the following color Laplacian (6) σD2 Lxx (x, y, σD ) + Lyy (x, y, σD ) . These points are computed in the RGB color space and are referred to as Color Harris points. An alternative scale selection method has been recently proposed by St¨ottinger et al. [11] and it is based on Principal Component Analysis (PCA). This method is used to reduce the color image I to a single channel Ib as follows. Assuming that I consists of I = { f1 , ...., fm } color vectors and that each color vector fb = [c1 , c2 , ..., cn ]T has n color components with zero mean (in an RGB image, m is the number of pixels and n = 3). Then, PCA can be applied to I to determine its eigenvalues λb and eigenvectors eb . After computing these values, the single-channel saliency image Ib is obtained: Ib= fb e1 , where e1 is the eigenvector associated to the highest eigenvalue. The next step is to b Towards this goal, St¨ottinger et al. propose a robust to noise apply the Laplacian to I. and computationally efficient approximation to the conventional Laplacian [11] 2 2 [σD2 |Lbx (x, y, σD ) + Lby (x, y, σD )|] ∗ ΓσD (x, y),
(7)
where Lbi is computed as in (3) but using Ib and ΓσD is the circularly symmetric raised cosine kernel centered on (xc , yc ) p 1 + (cos( σπD (x − xc )2 + (y − yc )2 )) . (8) ΓσD (x, y) = 3
5
Fig. 2. Example of the keypoints detectors and corresponding scales. a) Original Image; b) Harris Laplace; c) Color Harris and d) RGB
As before, a keypoint and its characteristic scale are selected if it corresponds to a maxima of the Harris and Laplacian functions. These salient points are computed using the RGB color space and will be referred to as RGB points. Fig. 2 shows an example of a dermoscopy image and the output of the three keypoint detectors: keypoint positions and scales.
2.2
SIFT and Color-SIFT descriptors
The SIFT descriptor proposed by Lowe [13] is one of the most popular patch descriptors due to its rotation, illumination and scale invariance properties. It describes the shape of the region around the salient point using histograms of the gradient and it is computed in the luminance image. This is a drawback of the SIFT descriptor, since it ignores the color description, which provides discriminative information. Different strategies have been proposed to included color information in the SIFT descriptor. The simplest strategy consists of concatenating the SIFT vector with color histograms [15]. An alternative is to extended the regular SIFT to Color-SIFT [15]. In this paper we study the performance of three Color-SIFT descriptors proposed by van de Sande et al. [15]: OpponentSIFT, W-SIFT and rgSIFT. The OpponentSIFT descriptor applies the SIFT descriptor to the three channels of the opponent color space (O1 , O2 , O3 ), derived from the RGB color space as follows R−G √ O1 2 R+G−2B O2 = √6 . R+G+B O3 √
3
(9)
6
Channel O3 represents the intensity information while channels O1 and O2 contain the color information. Therefore, the OpponentSIFT descriptor characterizes not only the local shape of the patches but also their color [15]. Channels O1 and O2 still contain some intensity information. Hence, the OpponentSIFT descriptor is not invariant to intensity changes. To achieved this invariance van de Sande et al. defined the W-SIFT [15]. This descriptor uses the W invariant proposed by Geusebroek et al. [16] to cancel the illumination information in O1 and O2 by dividing them by the intensity O3 . Then, the local gradient histograms are computed for O2 O1 O3 and O3 as in the previous cases [15]. The last Color-SIFT descriptor considered in this paper is the rgSIFT, that is computed using channels r and g of the normalized RGB space R r R+G+B g = G . R+G+B B b R+G+B
(10)
Due to the normalization this descriptor is intensity invariant. The color description in the patches is achieved by using the components r and g (the information in b is redundant, because r+g+b = 1) [15].
3 3.1
Experimental Setup and Results Experimental Setup
We performed the experiments on the dataset PH2 of 176 dermoscopy images (25 melanomas) [17]. These images have a typical size of 537×765 and were acquired with a magnification of 20×. We used manual segmentations of the skin lesions to ensure that the experimental results were not influenced by segmentation errors. In the feature extraction process we have excluded all the patches for which the area was less than 50% inside the lesion. All the lesions were classified by an experienced dermatologist that also corrected the manual segmentations. We have tuned some parameters of BoF, namely the size of the dictionary K ∈ {100, 200, ..., 600} and the number of neighbors used in kNN k ∈ {3, 5, ..., 25}. We have computed the SIFT and Color-SIFT descriptors using the open source library VLFeat [18]. The metrics used to evaluate the performances are the Sensitivity (SE), Specificity (SP) and Balanced Accuracy (BAC). SE is the percentage of correctly classified melanomas and SP is the percentage of correctly classified benign lesions. Since the dataset is small, we performed the evaluation and parameter selection using stratified 10-fold cross validation. Thus, we split the lesions in 10 subsets, each with approximately the same number of lesions. From these ten subsets, each one is used for testing while the remaining nine are used for training and the reported results are the average of the ten validations. To deal with the problem of class unbalance we have created artificial samples by repeating the features of melanomas and adding Gaussian noise. We only used features from the training set to create artificial samples.
7
3.2
Results
For each of the three keypoint detectors (recall Section 2.1), we have tested the four SIFT descriptors. Fig.3 (1st column) shows the BAC results for each of the possible combinations of keypoint detectors and patch features. These results show that RGB points achieved worse performances than the other two sampling strategies, regarding the BAC metric. However, it is the detector for which the SIFT descriptor performs better. In the case of Harris Laplace and Color Harris points it is clear that Color-SIFT descriptors outperforms SIFT, which suggests that adding color information to the patch descriptors helps constructing more discriminative dictionaries, as expected.
Fig. 3. Performance comparison for the three keypoint point detectors using individual SIFT and Color-SIFT descriptors (1st column) and fusion of descriptors (2nd column) for the three keypoint detectors: Harris Laplace (1st row), Color Harris (2nd row) and RGB (3r row).
The best results for each detector and the respective descriptor can be seen in Table 1. For all the detectors the best results were achieved with one of the Color-SIFT descriptors. It is interesting to notice that the best results with the color detectors were achieved using OpponentSIFT which is not invariant to illumination. This might be explained by the fact that the images used were all acquired with the same device and protocol, which means that there is little illumination variability between images.
8 Table 1. Best classification results using single descriptors. Keypoint Detector Descriptor SE Harris Laplace rgSIFT 82% Color Harris OpponentSIFT 77% RGB OpponentSIFT 88%
SP 81% 83% 76%
BAC 81% 82% 78%
We have also investigated the combination of descriptors. There are two possible strategies for combining descriptors using BoF. The first consists of combining the descriptors into a single feature vector. This method is called early fusion. The alternative, called late fusion, consists of using each descriptor to compute an independent dictionary in the training phase. Then, in the test phase the different dictionaries are used separately to compute the image histograms. Finally, these histograms are combined into one single feature vector. Since we were mainly working with the descriptors proposed by van de Sande et al. [15], we have decided to follow the same fusion strategy that they used in their work and combine the different descriptors using early fusion. The performances of different fusions can be seen in Fig.3 (2nd column) and the best fusion results for each detector can be seen in Table 2. There is an improvement over the results obtained with each descriptor alone, which suggests that they are complimentary. This complementarity was also noted by van de Sande et al. [15]. Table 2. Best classification results using fused descriptors. Keypoint Detector Fusion Harris Laplace Color-SIFT Color Harris (Opponent+rg)SIFT + SIFT RGB (Opponent+rg)SIFT
SE 85% 85% 86%
SP 82% 87% 84%
BAC 83% 87% 84%
This increase in the performance is more significant in the case of the color detectors, allowing them to achieve better results than the conventional Harris Laplace detector. Hence, color detectors and the fused descriptors are complimentary and should be used together in future works as a way of extracting informative features in dermoscopy images. Our results also suggest that Color Harris points are slightly more discriminative than RGB points. These results also show that OpponentSIFT and rgSIFT are the preferred combination of Color-SIFT. From the analysis of the previous results we can assume that color information plays an important role in the description of the patches, thus Color-SIFT are better descriptors. Regarding the keypoint detectors, although the color detectors achieve better performances than Harris Laplace, this detector is still very competitive (see Tables 1 and 2). Moreover, Harris Laplace requires less computational effort than its color extensions. Thus, when a faster feature detection process is required Harris Laplace can be used. 3.3
Comparison with other models
To assess the relevance of the BoF model we have compared our results with the ones achieved in a work with approximately the same dataset and using color and texture descriptors [4]. This system is inspired in the ABCD-rule and uses a global description
9
of the lesion. The comparison can be seen in Table 3. Although there is a reduction in the SE with our new model, that can be explained by the use of additional melanomas, the value of SP increases significantly and the BAC is considerably larger. This suggests that a local analysis of the lesion can provide more discriminative information than a global one. Table 3. Comparison with different models.
Abbas [5] DiLeo [7] Dataset 120 287 Melanomas 60 173 SE 88% 83% SP 91% 76% BAC 89% 80%
Method Situ [9] Marques [4] Proposed 1505 163 176 407 17 25 86% 94% 85% 85% 77% 87% 85% 79% 87%
Although a direct comparison with other works is not possible due to different datasets, we can still assess if our results are similar to them. Thus, we have included three more works in Table 3, one was proposed by Abbas et al. [5] and is inspired in the ABCD rule, the second was proposed by Di Leo et al. [7] and tries to reproduce the 7-point checklist and the third was proposed by Situ et al. [9] and uses BoF with a grid sampling strategy and Haar wavelets and color moments as patch descriptors. Our results are in the same range of values as the ones achieved with these three works. From Table 3 we can see that our results are similar to the ones obtained with the state-of-the-art methods which is relevant and promising for future works.
4
Conclusions
In this paper we have evaluated a novel feature set for the classification of melanomas using the BoF model based on color keypoint detectors and Color-SIFT. Moreover, we have compared the performance of two different color scale selection methods. Our results showed that the color extensions of SIFT are more discriminative than luminance SIFT, thus, we conclude that color has a significative role in the description of the patches. Color keypoints performed better than luminance keypoints with SE=85%, SP=87% and BAC = 87% against SE=85%, SP=82% and BAC = 83%. However, Harris Laplace is still a competitive detector. Color Harris points performed better than RGB points (BAC = 87% against BAC = 84%). Future work should rely on the optimization of BoF, namely working towards a representative dictionary of dermoscopy visual words. Furthermore, we want to extend our work to a different dataset.
Acknowledgments We thank Prof. Teresa Mendonc¸a from Universidade do Porto for valuable information. This work was supported by FCT in the scope of the grant SFRH/BD/84658/2012 and of projects PTDC/SAUBEB/103471/2008 and PEst-OE/EEI/LA0009/2011
10
References 1. Argenziano, G., Soyer, H.P., De Giorgi, V., Piccolo, D., Carli, P., Delfino, M., Ferrari, A., Hofmann-Wellenhog, V., Massi, D., Mazzocchetti, G., Scalvenzi, M., Wolf, I.H.: Interactive atlas of dermoscopy (2000) 2. Stolz, W., Riemann, A., Cognetta, A.B.: ABCD rule of dermatoscopy: a new practical method for early recognition of malignant melanoma. European Journal of Dermatology 4 (1994) 521–527 3. Iyatomi, H., Oka, H., Celebi, M.E., Hashimoto, M., Hagiwara, M., Tanaka, M., Ogawa, K.: An improved internet-based melanoma screening system with dermatologist-like tumor area extraction algorithm. Computerized Medical Imaging and Graphics 32(7) (2008) 566–579 4. Marques, J.S., Barata, C., Mendonc¸a, T.: On the role of texture and color in the classification of dermoscopy images. In: In Proceedings of the 34th EMBC. (2012) 4402–4405 5. Abbas, Q., Celebi, M.E., Garcia, I.F., Ahmad, W.: Melanoma recognition framework based on expert definition of ABCD for dermoscopic images. Skin Research and Technology (2013 (to appear)) 6. Argenziano, G., Fabbrocini, G., Carli, P., De Giorgi, V., Sammarco, E., Delfino, E.: Epiluminescence microscopy for the diagnosis of doubtful melanocytic skin lesions. comparison of the ABCD rule of dermatoscopy and a new 7-point checklist based on pattern analysis. Archives of Dermatology 134 (1998) 1563–1570 7. Di Leo, G., Paolillo, A., Sommella, P., Fabbrocini, G.: Automatic diagnosis of melanoma: A software system based on the 7-point check-list. In: Proceedings of the 2010 43rd Hawaii International Conference on System Sciences. (2010) 1818–1823 8. Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. 9th IEEE International Conference on Computer Vision. (2003) 1470–1477 9. Situ, N., Wadhawan, T., Hu, R., Lancaster, K.: Evaluating sampling strategies of dermoscopic interest points. In: Proc. 8th ISBI. (2011) 109–112 10. Vigo, D.A.R., Khan, F.S., van de Wijer, J., Gevers, T.: The impact of color in bag-of-words based on object recognition. In: In Proceedings of ICPR. (2010) 1549–1552 11. Stottinger, J., Hanbury, A., Sebe, N., Gevers, T.: Sparse color interest points for image retrieval and object categorization. IEEE Transactions on Image Processing 21 (2012) 2681– 2692 12. Mikolajczyk, K., Schmid., C.: Scale and affine invariant interest point detectors. International Journal of Computer Vision 60(1) (2004) 63–86 13. Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2) (2004) 91–110 14. Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: In Proceedings of ECCV, Springer (2006) 490–503 15. van de Sande, K., Gevers, T., Snoek, C.G.M.: Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (2010) 1582–1593 16. Geusebroek, J.M., van den Boomgaard, R., Smeulders, W.M., Geerts, H.: Color invariance. IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2001) 1338–1350 17. Mendonc¸a, T., Ferreira, P.M., Marques, J.S., Marc¸al, A.R.S., Rozeira, J.: Ph2 - a dermoscopic image database for research and benchmarking. (2013, accepted for publication in EMBC) 18. Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008) http://www.vlfeat.org/.