{aranzazu.jurio,miguel.pagola,mikel.galar, carlos.lopez,daniel.paternain}@unavarra.es http://giara.unavarra.es. Abstract. In this work we carry out a comparison ...
A Comparison Study of Different Color Spaces in Clustering Based Image Segmentation Aranzazu Jurio , Miguel Pagola, Mikel Galar, Carlos Lopez-Molina, and Daniel Paternain Dpt. Autom´ atica y Computaci´ on, Universidad P´ ublica de Navarra, Campus Arrosad´ıa s/n, 31006 Pamplona, Spain {aranzazu.jurio,miguel.pagola,mikel.galar, carlos.lopez,daniel.paternain}@unavarra.es http://giara.unavarra.es
Abstract. In this work we carry out a comparison study between different color spaces in clustering-based image segmentation. We use two similar clustering algorithms, one based on the entropy and the other on the ignorance. The study involves four color spaces and, in all cases, each pixel is represented by the values of the color channels in that space. Our purpose is to identify the best color representation, if there is any, when using this kind of clustering algorithms. Keywords: Clustering; Image segmentation; color space; HSV; CMY; YUV; RGB.
1
Introduction
Segmentation is one of the most important tasks in image processing. The objective of image segmentation is the partition of an image into different areas or regions. These regions could be associated with a set of objects or labels. The regions must satisfy the following properties: 1. Similarity. Pixels belonging to the same region should have similar properties (intensity, texture, etc.). 2. Discontinuity. The objects stand out the environment and have clear contours or edges. 3. Connectivity. Pixels belonging the same object should be adjacent, i.e. should be grouped together. Because of the importance of segmentation process, scientific community has proposed lots of methods and techniques to solve this problem [2,14]. Segmentation techniques can be divided in Histogram thresholding, Feature space clustering, Region-based approaches and Edge detection approaches. Color image segmentation attracts more and more attention mainly due to the following reasons: (1) color images can provide more information than gray level images; (2) the power of personal computers is increasing rapidly, and PCs can be
Corresponding author.
E. H¨ ullermeier, R. Kruse, and F. Hoffmann (Eds.): IPMU 2010, Part II, CCIS 81, pp. 532–541, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Comparison Study of Different Color Spaces in Clustering
533
used to process color images now [7]. Basically, color segmentation approaches are based on monochrome segmentation approaches operating in different color spaces. Color is perceived by humans as a combination of tristimuli R (red), G (green), and B (blue) which are usually called three primary colors. From R,G, B representation, we can derive other kinds of color representations (spaces) by using either linear or nonlinear transformations. There exist several works trying to identify which is the best color space to represent the color information, but there is not a common opinion about which is the best choice. However some papers identify the best color space for a specific task. In [6] the authors present a complete study of the 10 most common and used colour spaces for skin colour detection. They obtain that HSV is the best one to find skin colour in an image. A similar study with 5 different colour spaces is made in [8] prooving that the polynomial SVM classifier combined with HSV colour space is the best approach for the classification of pizza toppings. For crop segmentation, in order to achieve real-time processing in real farm fields, RuizRuiz et al. [16] carry out a comparison study between RGB and HSV models, getting that the best accuracy is achieved with HSV representation. Although most authors use HSV in image segmentation, some works are showing that other color spaces are also useful [1,15]. When using any of the typical color spaces is not enough, some authors define a new kind of color spaces by selecting a set of color components which can belong to any of the different classical color spaces. Such spaces, which have neither psychovisual nor physical color significance, are named hybrid color spaces [17]. Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning and a common technique for statistical data analysis used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Image segmentation is also a topic where clustering techniques have been widely applied [9,5,13,11]. A cluster is a collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters. Therefore, as within any clustering technique we must measure the distance or the similarity between objects, in color image segmentation it is very important to define which color space is going to be used because such measure will be defined within said space. Clustering techniques can provide methods whose results satisfy the three properties demanded to segmented images. In this case the objects will be the pixels, and each pixel can be defined by its color, texture information, position, etc. In our experiments the features that identify each pixel are only the values of its three components in the selected color space. This work is organized as follows: We begin recalling the different color spaces. In section 3 we present the two clustering algorithms that will be used in the segmentation process. Next in experimental results, we present the settings of the experiment and the results obtained. Finally we show some conclusions and future research.
534
2
A. Jurio et al.
Color Spaces
A color space is a tool to visualize, create and specify the color. For computers color is an excitation of three phosphors (blue, red, and green) and for a printing press color is a reflectance and absorbance of cyan, magenta, yellow and black inks on the paper. A color space is the representation of three attributes used to describe a color. A color space is also a mathematical representation of our perception [1]. We can distinguish between these two clases: – Hardware oriented: They are defined according to the properties of the optical instruments to show the color, like TV, LCD screens or printers. Typical examples are RGB, CMY, YUV (it is the PAL/European standard for YIQ). – User oriented: Based on human perception of colors by hue, saturation and brightness. Hue represents the wave length of the perceived color, the saturation or croma indicates the quantity of white light present in the color and the brightness or value the intensity of the color. Typical examples are: HLS, HCV, HSV, HSB and MTM, L*u*v*, L*a*b* y L*C*h*. In this work we are going to compare four color spaces in image segmentation. These ones are RGB, CMY, HSV and YUV. 2.1
RGB
An RGB color space can be understood as all possible colors that can be made from three colourants for red, green and blue. The main purpose of the RGB color model is for the sensing, representation, and display of images in electronic systems, such as televisions and computers, though it has also been used in conventional photography. Although the RGB is the most used model to acquire digital images, it is said that it is not adequate for color image analysis. We are going to use this color space as the reference. 2.2
CMY
The CMY (Cyan, Magenta, Yellow) color model is a subtractive color model used in printing. It works by masking certain colors on typically white background, it means, absorbing particular wavelengths of light. Cyan is the opposite of red (it absorbs red color), magenta is the opposite of green and yellow is the opposite of blue. The conversion from RGB to CMY is: C = min(1, max(0, C − K )) C = 1 − R M = min(1, max(0, M − K )) M =1−G Y =1−B Y = min(1, max(0, Y − K )) K = min(C , M , Y )
(1)
A Comparison Study of Different Color Spaces in Clustering
2.3
535
HSV
The HSV color model is more intuitive than the RGB color model. In this space, hue (H) represents the color tone (for example, red or blue), saturation (S) is the amount of color (for example, bright red or pale red) and the third component (called intensity, value or lightness) is the amount of light (it allows the distinction between a dark color and a light color). If we take the HSV color space in a cone representation, the hue is depicted as a three-dimensional conical formation of the color wheel. The saturation is represented by the distance from the center of a circular cross-section of the cone, and the value is the distance from the pointed end of the cone. Let R, G, B ∈ [0,1] be the red, green, and blue coordinates of a RGB image, max be the greatest of R, G, and B, and min be the lowest. In equation 2 it is shown how to transform this image into HSV space.
H=
⎧ 0, ⎪ ⎪ ⎨ (60◦ × ◦
60 × ⎪ ⎪ ⎩ ◦ 60 ×
G−B + 360◦ ) max − min B−R + 120◦ , max − min R−G + 240◦ , max − min
S=
0,
max − min max
=1−
mod 360◦ ,
min , max
if max = min if max = R if max = G if max = B
if max = 0 otherwise
V = max
2.4
(2)
(3) (4)
YUV
YUV color model imitates human vision. Term YUV designates a whole family of so called luminance (Y) and chrominance (UV) color spaces. In this work, we use YCbCr, which is an standard color space for digital television systems. To convert a RGB image into YUV space it is used the following expression: ⎡
⎤ ⎡ ⎤⎡ ⎤ Y 0, 299 0, 587 0, 114 R ⎣ U ⎦ = ⎣ −0, 147 −0, 289 0, 436 ⎦ ⎣ G ⎦ V 0, 615 −0, 515 −0, 100 B
3
(5)
Clustering Algorithms
Among fuzzy clustering methods, the fuzzy c-means (FCM) method [2] is one of the most popular methods. One important issue in fuzzy clustering is identifying the number and initial locations of cluster centers. In classical FCM algorithm, these initial values are specified manually. But there exist another type of clustering algorithms that automatically determine the number of clusters and the location of cluster centers by the potential of each data point. Yao et al. in [18] proposed a clustering method based on the entropy measure instead of the potential measure. Also [10] we have proposed an improvement of such algorithm
536
A. Jurio et al.
based on the Ignorance functions [4] that also segment images without selecting the initial number of clusters. 3.1
Entropy Based Fuzzy Clustering Algorithm
The basis of EFC is to find the elements which, if they are supposed to be the center of the cluster, then the entropy of the total set of elements is the lowest. This entropy is calculated for each element taking into account the similarity of that element with all the elements left (S(xi , xj )), with the following expression: E(xi ) = −
j=i
(S(xi , xj )log2 S(xi , xj ) + (1 − S(xi , xj ))log2 (1 − S(xi , xj ))) (6)
j∈X
Such a way the algorithm first selects the element with lowest entropy as the center of the first cluster. Once it is selected, it is deleted from the center candidates list. Also, the elements whose distance to the cluster center is lower than a a given threshold (β) are deleted. Once those elements are deleted from the candidates list, the element with lowest entropy is taken as the center of the second cluster. The process is repeated until the candidates list is empty. Given a set T with N data, the algorithm is outlined as follows: 1. 2. 3. 4.
Calculate the entropy of each xi ∈ T , for i = 1, . . . , N . Choose xiM in achieving the lowest entropy. Delete from T , xiM in and all the data whose distance to it is smaller than β. If T is not empty, go to step 2.
We must notice that it is not possible to choose a priori the number of clusters in which the algorithm must split the data. The user must modify the value of threshold β to obtain the number of desired clusters. 3.2
Ignorance Based Clustering Algorithm
In [10] we propose a modification of the EFC algorithm. We replace the similarity between elements by restricted equivalence functions. In addition we use ignorance functions instead of entropy functions so, for us, the center of the cluster is the element which causes that the partition of the data has the lowest ignorance. With these two modifications we improve the results of the EFC, and solve some problems that it has with symmetrical data. The ignorance functions estimate the uncertainty that exists when there are two membership functions. However, in this case we want to calculate the total ignorance of a set of elements by means of their membership degree to a cluster. If we are completely sure that an element is the center of the cluster, then we have no ignorance. If the membership of the element to the cluster is 0.5, then we say that we have total ignorance. Therefore we can deduce from a general ignorance function (please see theorem 2 of [4]) the following expression to calculate the ignorance associated to a single element:
A Comparison Study of Different Color Spaces in Clustering
Ig(x) = 4(1 − x)x
537
(7)
Given a set T with N data, the ignorance algorithm is as follows: 1. Calculate the ignorance of each xi ∈ T , for i = 1, . . . , N . 1.1. Calculate the restricted equivalence between each pair of data. Eq(xi , xj ) = M (REF (xi1 , yx1 ), REF (xi2 , yx2 ), . . . , REF (xin , xjn )) for all j = 1..N where j = i (8) 1.2. Calculate the ignorance of each pair of data: Ig(Eq(xi , xj )) = (1 − Eq(xi , xj )) ∗ Eq(xi , xj ) 1.3. Calculate the ignorance of each datum. N j=1 Ig(Eq(xi , xj )) IT (xi ) = N
(9)
(10)
2. Choose xiM in achieving the lower ignorance. 3. Delete from T , xiM in and all the data whose distance to it is smaller than β. 4. If T is not empty, go to step 2.
4
Experimental Results
In this section we present the experimental study that we have done to discover which is the best color space to use in image segmentation based on clustering. We take four natural images with their ideal segmentation (see figure 1). These segmentations have been manually calculated into three different areas taking into account the image dataset [12]. These areas have been segmented following a color and object representation criteria. Each area, in the ideal segmented image (see figure 1), has been colored with the mean color of the pixels that belong to it. For this set of images we execute the two algorithms, Ignorance based clustering (section 3.2) and Entropy based fuzzy clustering (section 3.1), four times, each one with a different color space RGB, CMY, HSV and YUV. In clustering algorithms each pixel is an element represented by three parameters, so each xi is a vector with three values. These values vary in every execution, representing each color channel of the selected color space. For the Ignorance based clustering we have selected the following expression of equivalence: (11) Eq(xi , xj ) = (1 − |x3i − x3j |)3 and for the Entropy based we have selected the expression of similarity proposed in the original work [18]: S(xi , xj ) = e−αD(xi ,xj ) ¯ and D is the Minkowski distance with p = 1. where α = −ln(0.5)/D
(12)
538
A. Jurio et al.
Fig. 1. Original images and ideal segmentation Ideal
RGB
HSV
CMY
YUV
Fig. 2. Segmented images obtained with the ignorance based clustering
In our experiment we evaluate four different color spaces: CMY, RGB, YUV and HSV. In figures 2 and 3 we show the best segmented images obtained for every image and every color space for each algorithm. For a quantitative comparison we present table 1. In these tables we show the similarity between the ideal image and these segmented images using the following equation: SIM (A, B) =
1 3×N ×M
c∈{R,G,B}
i
1 − |Aijc − Bijc |
(13)
j
being N and M the number of rows and columns of the image, where A is the segmented image obtained, B is the ideal one and Aijc is the intensity of the pixel located in the i-th row in the j-th columns and the c-th channel of the image A. As every region of the image is coloured with the mean colour of that region, the more likeness are both mean colours, the greater is the similarity between those pixels. This similarity has been chosen because it fulfills the six properties demanded for a global comparison measure of two images[3].
A Comparison Study of Different Color Spaces in Clustering Ideal
RGB
HSV
CMY
539
YUV
Fig. 3. Segmented images obtained with the entropy based clustering Table 1. Similarities between the ideal images and best segmented images Ignorance Image CMY 1 0.9802 2 0.9724 3 0.9965 4 0.9408 Mean 0.9724
based clustering RGB YUV HSV 0.9396 0.9719 0.9818 0.9567 0.9575 0.9717 0.9246 0.9314 0.9687 0.8886 0.8851 0.9012 0.9273 0.9364 0.9558
(a)
Entropy based clustering Image CMY RGB YUV HSV 1 0.9801 0.9704 0.9715 0.9812 2 0.9712 0.968 0.9694 0.9723 3 0.9703 0.9278 0.993 0.9687 4 0.9355 0.9026 0.9222 0.9222 Mean 0.9642 0.9422 0.9640 0.9611
(b)
Fig. 4. Average similarity with respect the threshold value. (a) Ignorance based and (b) Entropy based clustering. Each line represent the average similarity for the set fo images.
We can see that the CMY space is the one which obtain better results in both algorithms. As we have explained before, both algorithms have a threshold value, which will be a key in the number of final clusters. The selection of the the best
540
A. Jurio et al.
threshold for every image is a difficult point and a future research line. In our first approach to this problem, we want to select the color space in which the influence of the threshold is the lowest. Therefore we have executed both algorithms for 45 different threshold values, ranging from 0 to 350. Such a way, we can recommend a color space to use within clustering. In figure 4(a) we show the mean performance of the ignorance based clustering obtained for different threshold values using the four color spaces. It is clear that the threshold value has less influence when using CMY. But best results are obtained with CMY. Similar conclusions can be obtained form figure 4(b) where the algorithm used is the Entropy based clustering.
5
Conclusions and Future Research
In this work we have studied four color spaces for image segmentation based on clustering. These spaces are RGB, HSV, CMY and YUV. The clustering algorithms we have worked with depend on a threshold value. In this sense, we have also studied the importance of this value in the final segmented image. Our experiments have revealed that the best results are obtained in most cases in the CMY color space. HSV also provides good results. Besides, CMY is the color space in which the quality of the segmented image is higher for any threshold. In the ignorance based algorithm this space is the best with a big difference while in the entropy based one, it is followed closer by YUV and HSV. So, we can conclude that the correct space to use is the CMY. However, this is a preliminary study and it must be enlarged with more images. They must include different kind of images, like real images, synthetic images, etc. It must also be enlarged with different ideal segmentations for each image. As ground truth segmentations are not unique, the most siutable color space could change for different ideal solutions. In the future, we will construct an automatic method to choose the best threshold in this kind of clustering algorithms. Besides, we want to extend this study by incorporating more color spaces, like L*a*b, YIB or LSLM and more clustering algorithms, like FCM. Acknowledgments. This research was partially supported by the Grant TIN2007-65981.
References 1. Alata, O., Quintard, L.: Is there a best color space for color image characterization or representation based on Multivariate Gaussian Mixture Model? Computer Vision and Image Understanding 113, 867–877 (2009) 2. Bezdek, J.C., Keller, J., Krisnapuram, R., Pal, N.R.: Fuzzy Models and algorithms for pattern recognition and image processing. In: Dubois, D., Prade, H. (Series eds.). The Handbooks of Fuzzy Sets Series. Kluwer Academic Publishers, Dordrecht (1999)
A Comparison Study of Different Color Spaces in Clustering
541
3. Bustince, H., Pagola, M., Barrenechea, E.: Construction of fuzzy indices from fuzzy DI-subsethood measures: Application to the global comparison of images. Information Sciences 177, 906–929 (2007) 4. Bustince, H., Pagola, M., Barrenechea, E., Fernandez, J., Melo-Pinto, P., Couto, P., Tizhoosh, H.R., Montero, J.: Ignorance functions. An application to the calculation of the threshold in prostate ultrasound images. Fuzzy Sets and Systems 161(1), 20– 36 (2010) 5. Celenk, M.: A Color Clustering Technique for Image Segmentation. Computer Vision Graphics and Image Processing 52(2), 145–170 (1990) 6. Chaves-Gonz´ alez, J.M., Vega-Rodr´ıguez, M.A., G´ omez-Pulido, J.A., S´ anchezP´erez, J.M.: Detecting skin in face recognition systems: A colour spaces study. Digital Signal Process. (2009), doi:10.1016/j.dsp.2009.10.008 7. Cheng, H.D., Jiang, X.H., Sun, Y., Wang, J.: Color image segmentation: advances and prospects. Pattern Recognition 34(12), 2259–2281 (2001) 8. Du, C.-J., Sun, D.-W.: Comparison of three methods for classification of pizza topping using different colour space transformations. Journal of Food Engineering 68, 277–287 (2005) 9. Lo, H., Am, B., Lp, C., et al.: A Comparison of Neural Network and Fuzzy Clustering-Techniques in Segmenting Magnetic-Resonance Images of the Brain. IEEE Transactions on Neural Networks 3(5), 672–682 (1992) 10. Jurio, A., Pagola, M., Paternain, D., Barrenechea, E., Sanz, J., Bustince, H.: Ignorance-based fuzzy clustering algorithm. In: Ninth International Conference on Intelligent Systems Design and Applications, pp. 1353–1358 (2009) 11. Jurio, A., Pagola, M., Paternain, D., Lopez-Molina, C., Melo-Pinto, P.: Intervalvalued restricted equivalence functions applied on Clustering Techniques. In: 13rd International Fuzzy Systems Association World Congress and 6th European Society for Fuzzy Logic and Technology Conference (IFSA-EUSFLAT 2009) (2009) 12. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database o f human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proc. 8th Int’l. Conf. Computer Vision, July 2001, vol. 2, pp. 416–423 (2001) 13. Nam, I., Salamah, S., Ngah, U.: Adaptive Fuzzy Moving K-Means Clustering Algorithm For Image Segmentation. IEEE Transactions on Consumer Electronics 55(4), 2145–2153 (2009) 14. Pal, N.R., Pal, S.K.: A review of image segmentation techniques. Pattern recognition 26, 1277–1294 (1993) 15. Pagola, M., Ortiz, R., Irigoyen, I., Bustince, H., Barrenechea, E., Aparicio-Tejo, P., Lamsfus, C., Lasa, B.: New method to assess barley nitrogen nutrition status based on image colour analysis: Comparison with SPAD-502. Computers and Electronics in Agriculture 65(2), 213–218 (2009) 16. Ruiz-Ruiz, G., G´ omez-Gil, J., Navas-Gracia, L.M.: Testing different color spaces based on hue for the environmentally adaptive segmentation algorithm (EASA). Computers and Electronics in Agriculture 68(1), 88–96 (2009) 17. Vandenbroucke, N., Macaire, L., Postaire, J.G.: Color image segmentation by pixel classification in an adapted hybrid color space. Application to soccer image analysis. Computer Vision and Image Understanding 90(2), 190–216 (2003) 18. Yao, J., Dash, M., Tan, S.T., Liu, H.: Entropy-based fuzzy clustering and fuzzy modeling. Fuzzy Sets Syst. 113(3), 381–388 (2000)