Proceedings of the IEEE Int. Conf. on Image Processing (ICIP 2010), pp. 4357–4360
AUTOMATED COLOR NORMALIZATION FOR DERMOSCOPY IMAGES Hitoshi Iyatomi1 , M.Emre Celebi2 , Gerald Schaefer3, and Masaru Tanaka4 1
Department of Applied Informatics, Hosei University, Tokyo, Japan Department of Computer Science, Louisiana State University, Shreveport, LA, USA 3 Department of Computer Science, Loughborough University, Loughborough, UK 4 Department of Dermatology, Tokyo Women’s Medical University Medical Center East, Tokyo, Japan 2
ABSTRACT Accurate color information in dermoscopy images is very important for melanoma diagnosis since inappropriate white balance or brightness in the images adversely affects the diagnostic performance. In this paper, we present an automated color normalization method for dermoscopy images of skin lesions. We develop color normalization filters based on a total of 319 images which normalize color of images using the HSV color system. We determined that the color characteristics of the peripheral part of the tumors significantly influence the color normalization and confirmed that the developed normalization filter achieved satisfactory normalization performance as evaluated by a cross-validation test. Index Terms— dermoscopy, melanoma, color normalization 1. INTRODUCTION Advanced malignant melanoma is often incurable, however early-stage melanoma can be cured in many cases, particularly before the metastasis stage. Therefore, early detection is essential for the reduction of melanoma-related deaths [1]. Dermoscopy, a non-invasive skin imaging technique, was introduced to improve accuracy in the diagnosis of melanoma. However, dermoscopic diagnosis is often subjective and is therefore associated with poor reproducibility. Several groups have developed automated analysis procedures to overcome these problems and reported high levels of diagnostic accuracy [2]-[4]. We developed a prototype for a fully automated Internet-based melanoma screening system at our university [5]. The URL of the site is now http://dermoscopy.k.hosei.ac.jp. Using an Internet connection anyone who has a dermoscopy image can use our screening system from anywhere in the world. Our latest system achieved classification performance of 85.9% in sensitivity (SE) and 86.0% in specificity (SP) on a set of 1258 non-acral E-mail:
[email protected] This research was partially supported by the Ministry of Education, Culture, Science and Technology of Japan (Grantin-Aid for Young Scientists program (B), 21791101, 2009-2010) and the Louisiana Board of Regents (LEQSF2008-11-RD-A-12).
dermoscopy images [6]. Our present system provides the final diagnosis results in the form of a malignancy score between 0 and 100 within 3-10 seconds. It is well known to dermatologists and researchers in this field that color information in dermoscopy images is very important for the visual [1] as well as the computerized diagnoses of melanoma. Dermoscopy should therefore produce accurate color images but unfortunately, this is not true in practice. Device calibration to compensate for various imaging conditions such as magnification factors, lighting conditions, etc. is crucial in the development of a reliable system. Since scale and color calibration is not feasible in a web-based system, we need to consider this issue to secure system generality.To the best of our knowledge however, color calibration of dermoscopy is still an open issue and has been little investigated. Grana et al. reported a color calibration method for dermoscopy images taken by different dermoscope [7]. This method, however requires a user to do a calibration by him/herself and therefore we cannot apply this method. In our experience, most of these inappropriate color conditioned images can be calibrated effectively by adjustment of hue, saturation and intensity of the image in HSV color system. In this paper, we develop a fully automated color normalization method for dermoscopy images using the HSV color system. The proposed method accomplishes the normalization process based on the image content alone. 2. MATERIAL We used two different data sets described below: Dataset-A: A total of 319 digital dermoscopy images (247 benign and 75 melanomas) were collected from the university hospitals of Naples and Graz as presented in the EDRACDROM [8]. Dataset-B: A total 533 digital dermoscopy images (457 benign and 80 melanomas) were collected from the University of Vienna. Dataset-A is used for the development of the color normalization filters and their quantitative evaluation, whereas Dataset-B is used for alternative evaluation of the developed filters.
Proceedings of the IEEE Int. Conf. on Image Processing (ICIP 2010), pp. 4357–4360
f (x,y)
g 1 (H,S,V)
i 2
f (x,y)
g M (H,S,V) modification filters (known)
f2i (p)
h 2 (H,S,V) = g 2 (-H,1/S,1/V)
fMi (x,y)
...
h m (H,S,V) = g m (-H,1/S,1/V)
fmi (p) ...
...
f (x,y)
...
i
3.2. Experimental setting
h 1 (H,S,V) = g 1 (-H,1/S,1/V)
...
fmi (x,y)
g m (H,S,V)
target filter values (known)
f1i (p)
...
...
ι-th image
...
g 2 (H,S,V)
image features
...
i 1
...
modified images
fMi (p)
h M (H,S,V) = g M (-H,1/S,1/V)
feature extraction build a regression model C A (p)
Fig. 1. Overview of the development of normalization filters.
3.3. Building the color normalization filter
3. COLOR NORMALIZATION FILTERS 3.1. Method overview Fig. 1 shows a schematic of the proposed method. The proposed method builds color normalization filters based on the analysis of the given image. In this study, color normalization was performed based on the hue, saturation, and intensity channels, independently. First, we design a total of M color modification filters gm (H, S, V ) = {gm (H), gm (S), gm (V )}, (1 ≤ m ≤ M ). (1) They modify the hue (H), saturation (S), and intensity (V) values of the pixels in the processed image. Here, let the i-th dermoscopy image f i (x, y) in the training dataset (1 ≤ i ≤ N ) and we apply the abovementioned filters to the image and accordingly it creates M different imi ages fm (x, y). i We extract a total of P image features f m (p), (1 ≤ p ≤ i P ) from each image f m (x, y). With the total of N × M data, we build three multiple regression models C A (p) = {CH (p), CS (p), CV (p)} to express the relationship between i the “image features” f m (p) and the appropriate “normalization factor”. For the training dataset, the appropriate normalization factor h im (H, S, V ) is known as the inverse form of the color modification filter as follows: i him (H, S, V ) = gm (−H, 1/S, 1/V ).
We used Dataset-A as the training dataset (N = 319) and prepared a total of 75 modified images per image (M = 75) where the modification of 5 kinds in hue H = {−10, 5, 0, 5, 10}, 3 kinds in saturation S = {0.9, 1.0, 1.1}, and 5 kinds in intensity changes V = {0.8, 0.9, 1.0, 1.1, 1.2}. Note that if the modified value exceeds the limit of each channel, we use the acceptable maximum value instead. We extracted a total of 172 image features from the image (P = 172). Accordingly, a total 319×75 pairs of 172 dimensional feature vectors and corresponding appropriate normalization factor were used to build the color normalization filter (= regression model) in each color channel.
(2)
Note that hue (H) is in degrees, while saturation (S) and intensity (V) are expressed as multiplying factor and therefore the inverse form can be written as above. Also note that the explanatory variables and the corresponding predictors of the i regression model are f m (p) and him (H, S, V ), respectively. The constructed regression models C A (p) are the final product, i.e. color normalization filters, of this study.
In order to build robust and accurate color normalization filters, the preparation of effective image features is important. The proposed method firstly determines the tumor area and then extracts image features from the inside, outside, and peripheral of the tumor and from the entire image, respectively. Based on these extracted features, we build the multiple regression models with statistical feature selection to avoid problems of multi-collinearity or over-fitting.
3.3.1. Tumor area extraction The appropriate identification of tumor area is one of the important steps for computer-aided melanoma screening. We used our “dermatologist-like” tumor area extraction algorithm [6][9] to accomplish this task. This method can determine tumor areas almost equivalent to those determined by expert dermatologists.
3.3.2. Image feature extraction After extracting the tumor area, we rotated the tumor object to align its major axis with the Cartesian x-axis. Then, we extracted a total of 172 color related objective features from the image. They are (i) 120 primitive color features, (ii) 20 color related other features, and (iii) 32 color gradient features in the peripheral areas. This image feature set is a subset of that we previously defined and used in [6]. Due to the limitation of the manuscript length, please refer to it for more details. Note that the peripheral part of the tumor is defined as the region inside the border that has an area equal to 30% of the tumor area and determined by a recursive dilation process applied to the outer border, working inward from the border of the extracted tumor. The ratio of 30% was decided in our preliminary experiments with visual assessment by several dermatologists.
Proceedings of the IEEE Int. Conf. on Image Processing (ICIP 2010), pp. 4357–4360
Table 1. Modeling performance of regression model (Dataset-A) 2
Criterion Hue
#p 48
R 0.623
E 4.15
Saturation
40
0.296
0.062
Intensity
47
0.890
0.042
Selected features (first 10) ∗ † ‡ P P N min P σH #C†3 μP H R σ S3
T
μN R
μP S
σS3
TN
3 σH
P
maxP V
P #C†2
σSP
N μP G T σG
1/20
∇V
N μP V N μP B
σV3
P
3 σG
P
maxTG
maxP H
3 σB
P
minTR N μP B
P #C†4
minTH
N μP S
T μP R
N μP R
P #C†1
∗: Upper right suffix means features in T : tumor area, P : peripheral area, N : normal skin area, P N, P T, T N :difference between peripheral-normal, peripheral-tumor area, tumor area-normal skin area, respectively. †: Number of colors in the target area #C with suffix †1: 8bit in RGB, †2: 16bit in RGB, †3: 8bit in HSV, †4:16bit in HSV. 1/20
‡: ∇V
indicates the gradient of the intensity in the peripheral of the tumor where evaluation range is 1/20 of the tumor object.
3.3.3. Feature selection and build regression models
4.1.2. Normalization performance
The features used in the regression models (for H, S, V) were selected by an incremental stepwise method. This method searches appropriate input parameters sequentially according to the statistical rule. Due to the limitation of the manuscript length, please refer our previous paper [6] for detail. Based on these selected features, we build linear multiple regression model for each of the H, S, and V channels.
Next we evaluated the color normalization performance with 75-hold cross-validation test using Dataset-A. Note that this evaluation does not include evaluation data in the training data. The errors of the estimation are Hue: 4.34±5.21, Saturation: 0.069± 0.073, and Intensity: 0.047±0.053 (mean±SD), respectively. Fig.2 show examples of color calibration. From left to right, it corresponds to (a) original dermoscopy image, (b) the modified image created by g(H, S, V ), and (c) the calibrated image restored by the estimated gˆ(H, S, V ). Note that the difference between g and gˆ shows the calibration error. The evaluation result by the cross-validation is almost equivalent to the modeling one and we see this error range is fine in visual assessment. From these results, we conclude that our method does indeed calibrate dermoscopy images appropriately.
4. RESULTS AND DISCUSSION 4.1. Normalization performance 4.1.1. Modeling performance The incremental stepwise feature selection method selected regression models have 48, 40 and 47 parameters for C H (p), CS (p), and CV (p), respectively. Table 1 summarizes the modeling performance of each of the built linear regression models. From left to right, the columns in the table correspond to the number of constituent image features of the regression model (#p), determination coefficients adjusted by degree of freedom (R 2 ), standard error of the regression model (E), and first 10 features selected by the stepwise input-selection method. Note again that hue (H) is in degrees (0-360 deg), while saturation (S) and intensity (V) are expressed as multiplying factor and accordingly the standard error E in hue shows larger than other two. In the selected features section, for instance, min P H indicates minimum hue value in peripheral part of the tumor. Our regression model achieved good modeling performance for intensity and hue. Modeling performance for saturation was inferior to the others, however an error of around 6-7 % in saturation does not have a large impact compared with that in hue under visual assessment (e.g. saturation error of Fig. 2-1 is 12%.). We note as an interesting result, that peripheral part of the tumor is important for estimating the appropriate color normalization filter. Especially in hue channel, 9 out of 10 features are related to ones from the peripheral part of the tumor. This selectivity is much higher than in classification models between melanomas and nevi. On the other hand, the gradient of these areas was hardly ever selected.
4.2. Color normalization for unknown data Now we would like to evaluate the performance of the proposed color normalization method for the case where the test images are different (Dataset-B) from those in the training data set (Dataset-A). However, since we have no ground truth information defining the appropriate calibration, we cannot perform quantitative evaluation directly. We alternatively compared the distribution of image parameters among the training dataset (Dataset-A), the test dataset (Dataset-B), and test dataset after color normalization. This evaluation is based on the assumption that if color normalization works appropriately, the distribution of color features of the test images would be closer to that of the training dataset. We conducted a principal component analysis (PCA) and orthogonalized a total of 172 color related image features described above. Fig. 3 compares the parameter distribution with two principal components for the sake of simplicity (accumulated contribution ratio of those two is 72.2%). Note that we randomly reduced the number of plots to 100 in each dataset for visualization. We can clearly see the color normalization process makes the data distribution closer to the training dataset. The centroid and standard deviation of data distribution are summarized in Table 2. With this result, we can say that our method also has the ability to perform color normalization for unknown data.
Proceedings of the IEEE Int. Conf. on Image Processing (ICIP 2010), pp. 4357–4360
0
Training data (Dataset-A) Evaluation data (Dataset-B) Evaluation data after color normalization
1. Clark nevus
g(H, S, V ) = (−5.0, 0.9, 0.9)
gˆ(H, S, V ) = (−2.60, 1.02, 0.95)
2nd principal component
-500
-1000
-1500
2. melanoma in situ
g(H, S, V ) =
gˆ(H, S, V ) =
(10.0, 1.1, 0.8)
(7.88, 1.05, 0.81)
-2000 0
500
1000
1500
2000
2500
3000
1st principal component
Fig. 3. Transition of feature distribution after color normalization
3. melanoma
g(H, S, V ) =
gˆ(H, S, V ) =
(a) Dermoscopy image
(−10.0, 0.9, 1.2) (b) Modified image
(−5.07, 0.95, 1.14) (c) Calibrated image
Fig. 2. Sample results of color calibration (Dataset-A) Table 2. Comparison of distribution of image features Dataset 1st PC. 2nd PC. Dataset-A (training) 1891 ± 687 -1205 ± 430 Dataset-B 999 ± 390 -774 ± 307 Dataset-B with normalization 1206 ± 456 -945 ± 298 5. CONCLUSIONS In this paper we developed a new color normalization method for dermoscopy images. The key feature of our method is that color normalization can be performed based on image content alone. Color normalization for dermoscopy images is especially important for computer-based diagnosis systems. We are currently investigating the effectiveness of the color normalization for final diagnosis. 6. REFERENCES [1] W.Stolz, O.B.Falco, P.Bliek et al., Color Atlas of Dermatoscopy – 2nd enlarged and completely revised edition. Berlin: Blackwell publishing; 2002. [2] H.Ganster, A.Pinz, R.Rohrer, E.Wilding, M.Binder, H.Kitter, “Automated melanoma recognition, “ IEEE Trans. on Medical Imaging Vol.20, No.3, pp.233-239, 2001.
[3] P.Rubegni, G.Cevenini, M.Burroni et al.,”Automated diagnosis of pigmented skin lesions,” International Journal of Cancer, Vol.101, pp.576-580, 2002. [4] M.E.Celebi, H.A.Kingravi, B.Uddin et al., “A methodological approach to the classification of dermoscopy images,” Computerized Medical Imaging and Graphics, Vol.31, No.6, pp.362-371, 2007. [5] H.Oka, M.Hashimoto, H.Iyatomi, M.Tanaka, ”Internetbased program for automatic discrimination of dermoscopic images between melanoma and Clark nevi,” British Journal of Dermatology, Vol.150, No.5, p.1041, 2004. [6] H.Iyatomi, H.Oka, M.E.Celebi et al., “An Improved Internet-based Melanoma Screening System with Dermatologist-like Tumor Area Extraction Algorithm,” Computerized Medical Imaging and Graphics, Vol.32, No.7, pp. 566-579, 2008. [7] C.Grana, G.Pellacani and S.Seidenari, “Practical color calibration for dermoscopy, applied to a digital epiluminescence microscope,” Skin Research and Technology, Vol.11, pp.242-247, 2005. [8] G. Argenziano, H.P. Soyer, and V. De Giorgi, Dermoscopy: A Tutorial, EDRA Medical Publishing & New Media, 2002. [9] H.Iyatomi, H.Oka, M.Saito et al., “Quantitative assessment of tumour area extraction from dermoscopy images and evaluation of the computer-based methods for automatic melanoma diagnostic system,” Melanoma Research, Vol.16, No.2, pp.183-190, 2006.