Adaptive clustering based segmentation for image ... - IEEE Xplore

0 downloads 0 Views 1015KB Size Report
Abstract— Image segmentation based on clustering low-level image features such as colour and texture, has been successfully employed in image classification ...
2013 5th Computer Science and Electronic Engineering Conference (CEEC)

University of Essex, UK

Adaptive Clustering based Segmentation for Image Classification Hanan Al-Jubouri

Hongbo Du

Harin Sellahewa

Department of Applied Computing University of Buckingham Buckingham, MK18, 1EG, UK hanan.al-jubouri, hongbo.du, harin.sellahewa @buckingham.ac.uk

Abstract— Image segmentation based on clustering low-level

regularity differences of different regions in an image. Therefore, integrating colour and texture make perfect sense for the image segmentation and this has been successfully demonstrated in recent works on image classification and content-based image retrieval [2, 3, 4, and 5]. Recently, Ilea and Whelan [3] presented a review of image segmentation algorithms based on the integration of colour and texture features. The review indicates that fusing these two features at data level promises a robust segmentation. Factors such as illumination variations, resolution and occluded objects in the images make the problem of classifying images according to their semantic content very difficult. This is addressed by using local features instead of global features for distinguishing images from different categories. For example, Li and Peng [6] proposed a multilevel image segmentation method that segments the image into a hierarchical structure recursively. Then the information from segmented regions at different levels in the image is exploited to construct visual vocabularies. SVM classifier is then used for image classification. Nezamabadi and Saryazdi [5] presented an object indexing method that first extracts local colour and texture features from Discrete Cosine Transform (DCT) coefficients of an image in the YCbCr colour space, and then groups the features into 10 clusters by using the k-means clustering algorithm. The centroids of the five biggest clusters are used to index the image. The Chi-square distance is then used to measure dissimilarities between a query image and stored images for image classification. The method in [5] was improved by AlJubouri et al. [2] with respect to the clustering algorithm used in segmentation and the similarity measures used to compare two images. In [2], image segmentation is performed by using the adaptive EM/GMM clustering algorithm CLUST. The algorithm automatically determines the number of clusters, i.e., image segments, overcoming the limitations of the fixed number of clusters used in [5]. The mean vectors of the segments are then used to index the image and the City-block distance function is used within adaptive dissimilarity measure. The experiments conducted on the WANG database [7] show better classification results with the adaptive EM/GMM clustering than those reported in [5]. The adaptive EM/GMM clustering is a model-based method and only one of many different adaptive clustering

image features such as colour and texture, has been successfully employed in image classification and content-based image retrieval. In segmentation based image classification, the role of clustering to segment an image into its relevant constituents that represent image visual content as well as its semantic content. However, image content can vary from having a simple foreground object on a regular background to having multiple objects of different sizes, shapes, colour and texture in complex background scenes. This makes automatic image classification a challenging task. This paper evaluates three adaptive clustering algorithms of different categories, i.e., partition-based, modelbased, and density-based in segmenting local colour and texture features for image classification. Experiments are conducted on the publicly available WANG database. The results show that the adaptive EM/GMM algorithm outperforms the adaptive k-means and mean shift algorithms. Keywords:Clustering; Content-Based Image Retrieval; Kmeans; EM/GMM; Mean shift; DCT; Classification

I.

INTRODUCTION

The proliferation of imaging devices and their use in a variety of applications make the volume of digital images being captured each day huge. Effective storage and retrieval of such large collections of images pose some interesting challenges to researchers in image processing and computer vision. Image classification offers an automatic way to speed up and improve the accuracy of image retrieval in large-scale databases when semantic labels of images are predefined. However, in most cases, such semantic labels or annotations are nonexistent. Manual labeling is laborious and inconsistent due to subjective interpretations. In such a context, automatic image classification could offer an effective and a useful tool to label the ever-increasing volume of images. Classifying unseen images into a set of labeled classes according to some semantic concepts requires effective means of representing images‟ visual features such as colour, texture and shape. Image segmentation by clustering image features has shown promising results in bridging the “semantic gap” between high-level semantics of an image and its low-level features [1]. In general, colour is known to be useful in representing patterns in an image and texture is effective in measuring structure, orientation, roughness, smoothness, or

978-1-4799-0383-2/13/$31.00 ©2013 IEEE

128

2013 5th Computer Science and Electronic Engineering Conference (CEEC)

selection of centroids, fixing the value of k prior to clustering, poor quality clusters when clusters of extremely different sizes and shapes exist [8, 9].

algorithms. In this paper, we extend our previous work in [2] by using other adaptive clustering algorithms that are partition-based and density-based. More specifically, we use adaptive K-means and the Mean Shift algorithm to segment images and present a performance evaluation of the three adaptive algorithms (i.e., including EM/GMM clustering) for image classification using the k-Nearest Neighbor (k-NN) classifier. Furthermore, the resulting covariance matrices from CLUST are used in addition to the mean vectors to index images. The purpose is to investigate the effectiveness of cluster shape implied by the covariance matrix of the cluster on image classification accuracy. The rest of the paper is organized as follows. Section II overviews the clustering algorithms used for image segmentation. Section III explains the image classification process. Section IV describes the benchmark WANG database and discusses experimental results. Section V concludes the paper and outlines our future work. II.

To overcome the limitation of a fixed number of k clusters, we use a brute-force algorithm to dynamically determine the right value of k according to cluster quality. One typical measure of cluster quality is the sum of squared errors (SSE). For k clusters C1, C2, …, Ck with their centroid mean vectors 1, 2, …, k, the SSE of the k clusters is expressed as:

SSE 

k

 

k 1xi ck

xi   k

2



Let C(k) represent a set of clusters obtained at round k. a bruteforce adaptive algorithm is then outlined as follows: Step 1: For k=2 to M (e.g. 10) do a) Run the basic k-means algorithm to detect k clusters; b) Save the clustering result C(k); c) Calculate SSE(k); Step 2: For each k, calculate the value of the second order derivative as follows:

CLUSTERING ALGORITHMS

A cluster is a collection of data objects that are similar to each other upon certain attributes. Clustering algorithms aim to group the data objects into clusters so that the intra-cluster similarity is maximized while the inter-cluster similarity is minimized [8]. According to the meaning of the clusters produced, clustering algorithms can be categorised into prototype-based, model-based, density-based, and graph-based solutions [8]. A prototype-based clustering algorithm divides data objects into k prototype clusters initially, and then refines these prototype clusters in an iterative process. A model-based algorithm produces a statistical model of distribution to represent the sample space from where the data objects are drawn. A density-based algorithm seeks to find dense regions where similar data objects are concentrated. Most, if not all, of these algorithms are parametric. For instance, the K-means and the EM/GMM algorithms need to know the number of clusters k in advance. In reality a fixed value for k may not truly reflect the right number of clusters (in our case, objects in images). Therefore, it is desirable for the algorithm to determine automatically the appropriate value of k.

f (k )  SSE(k  1)  2SSE(k )  SSE(k  1)  Step 3: Select the maximum value of f, and take C(k) as the final outcome. Although a number of different adaptive k-means algorithms do exist, e.g. [10], algorithm is simple and selects the optimal outcome at the time when the quality of the clusters improves the most, i.e. when the SSE score is mostly reduced. B. CLUST: An EM/GMM algorithm The Expectation-Maximization Gaussian Mixture Model (EM/GMM) algorithm is employed for maximizing likelihood estimation. Being a model-based method, the algorithm considers that a given set of data objects can be modeled by a collection of k Gaussian distributions  = {1, 2, …, k} where I = ( ai, µi, Ri) represents a Gaussian with the prior probability ai, the mean vector µi and the covariance matrix Ri. The algorithm first randomly sets estimation of the parameters and then refines the estimates in iterative process. The EM algorithm can be used to estimate parameters for and type of distribution. The steps of process are as follows [8]:

A. An Adaptive K-means algorithm The basic k-means algorithm as a partition-based method can be outlined as follows. Initially, k data objects of the data set are randomly chosen as the centroids µ1, µ2, …, µk of the k clusters respectively. The similarity between each data object and each centroid µi (1≤i≤k) is then measured, and the object is assigned as a member of the cluster Ci of its most similar centroid. All members of each cluster are then used to calculate a mean vector as the new centroid. Once the centroids for all clusters are updated, each data object is reassigned to the cluster of its nearest centroid. The process continues until there are no more membership changes. The algorithm is widely used due to its simplicity and efficiency, but the algorithm has some well-known limitations including nondeterministic results caused by the initial random

978-1-4799-0383-2/13/$31.00 ©2013 IEEE

University of Essex, UK

Step 1: Initialization: Estimate  parameters for k distributions randomly. Step 2: Expectation: Calculate the probability that each data object belongs to each of the distribution based on the estimated parameters. Step 3: Maximization: Use the probabilities computed from Step 2 to find the new estimates for the parameters of the disributions. The new estimates must maximize the likelihood of the distributions fitting the data objects.

129

2013 5th Computer Science and Electronic Engineering Conference (CEEC)

The process terminates if the new estimates do not change or the difference between the current estimates and the previous estimates is below a given threshold. Bouman [11] proposed the CLUST algorithm that operates on the Rissanen‟s Minimum Description Length (MDL) principle to determine the optimal number k distributions. Starting with a large value for k and terminating when k = 1 by merging closest pair of clusters (i.e. agglomerative hierarchical techniques) each time , the algorithm iteratively derives the best fit GMM to the data set using the EM algorithm and calculates the Rissanen‟s MDL measurement. The algorithm then finds the optimal value for k associated with the minimum MDL measurement.

Im

Im1

: : Imn

Segmentation

RGB into YCbCr Conversion

Feature Extraction

Segmentation

: :

: :

RGB into YCbCr Conversion

Feature Extraction

: :

Segmentation

Similarity Measures

Classification

We extract local colour and texture features as proposed in [5]. First, an image is divided into 8x8 blocks and the DCT operation is applied on each block for Y, Cb and Cr channels respectively. Then each resulting 8x8 block of DCT coefficients is further partitioned into B 0, B1… B9 sub-blocks as shown in Fig. 2. Finally, a 12-dimensional local feature vector, , is extracted from each 8x8 block to capture colour and texture information as defined below:  f0: CY (0, 0)/8 at B0 from the Y channel;  f1: CCb(0, 0)/8 at B0 from the Cb channel;  f2: CCr(0, 0)/8 at B0 from the Cr channel;  f3, f4, f5: CY(0,1), CY(1,0), CY(1,1) at B1, B2 and B3 from Y channel respectively;  f6, f7… f11: σ(B4), σ(B5),…, σ(B9) from Y channel respectively where σ(Bi) represents the standard deviation of the coefficients in sub-block Bi.

1  x  xi  k   nh d  h 

Due to the limitations of density-based clustering algorithms for high dimensionality, we have decided to apply the Mean Shift algorithm only on the 3D DCT colour features, i.e. (f0, f1, f2).

Step 1: Choose kernal and bandwidth Step2: Repeat for each point: a) Center the window at that point; b) Calculate the mean of the data with the window radius; c) Center the window at the new mean location; d) Repeat the steps b and c until convergence. Step3: Assign points that lead to nearby modes to the same cluster.

C. Segmentation Extracted local colour and texture features, as described above, are fed to a clustering algorithm to segment them into homogenous clusters. The centroid of each segment is used as the representation of the segment, and the centroids of all segments collectively form a feature vector for the image. Fig. 3 shows an example image in the WANG database and the outcome of the segmentation by the three clustering algorithms. For this image, the CLUST algorithm produced 5 segments, while adaptive k-means produced only 3 segments, and the Mean Shift algorithm produced 10. The segments by the CLUST algorithm appear “closer” to the objects in the image than those by the adaptive K-means algorithm and the Mean Shift algorithm.

Density-based methods are capable of finding arbitrary shaped clusters but they have limitations in handling highdimensional data where the very concept of density becomes unclear when data objects are further spread [14]. IMAGE CLASSIFICATION FRAMEWORK

The image classification process adopted in our work is illustrated in Fig. 1 and its key stages are described below.

B0 B2

B1 B3 B5

A. Pre-processing A colour image is converted from RGB colour space into YCbCr colour space to separate the texture information in Y channel and colour information in Cb and Cr channels.

978-1-4799-0383-2/13/$31.00 ©2013 IEEE

Feature Extraction

B. Local feature extraction

The steps of process can be summarized:

III.

RGB into YCbCr Conversion

Figure 1. Image classification framework.

C. Mean Shift algorithm The Mean Shift algorithm is a density clustering method that does not require the predefined number of clusters [12]. The algorithm considers clusters in the d-dimensional feature space as dense regions of underlying distributions. For each data point, a gradient ascent procedure on the local estimated density is followed by applying an estimated probability density function until convergence. The stationary points of this procedure represent the local maxima or modes of the distribution. The data points that eventually ascend to the same stationary point are considered as members of the same cluster. Given n data points xi, i=1… n on d-dimensional space Rd, the multivariate density estimate obtained with kernel K(x) and window radius h (bandwidth) is defined as

f ( x) 

University of Essex, UK

B4

B7

B6 B8

B9

Figure 2. 8x8 block DCT

130

2013 5th Computer Science and Electronic Engineering Conference (CEEC)

IV.

(c) Segment by k-means

(d) Segment by Mean Shift

Figure 3. An example image segmented using CLUST, adaptive k-means, and mean shift clustering.

D. Similarity measures and classcification: Due to the variable number of segments in images, we use the dissimilarity measure reported in [2] to compare two images. Let c Q  {c1Q ,..., cnQ } be the set of segment centroids for the query image Q, and c B  {c1B ,..., cmB } be the set of segment centroids for a database image B. The collection of all pairwise distances between the centroids of cQ and cB, measured by a distance function d, can be arranged into a distance matrix. The dissimilarity between Q and B is then defined as:

B. Results and Discussion Table II illustrates recall rates of classification based on using different clustering algorithms for image segmentation. On average, when the City-block distance function is used, the accuracy rate of classification for the CLUST algorithm is consistently higher than those for the adaptive K-means and the Mean Shift algorithms across all classes except for Building. Also on average, the adaptive k-means results are better than those for the Mean Shift algorithm except the Mountains class. For some classes, CLUST outperforms the adaptive K-means by as high as 43% (e.g. Mountains) and the Mean Shift by as much as 49%. When the classification is performed using the DKLD distance measure, the CLUST algorithm still consistently outperforms the other two algorithms across different classes with the highest difference of 34% over the adaptive K-means and 49% over the Mean Shift for People class. The evidence shows conclusively that the CLUST is more robust than the other two algorithms in terms classification accuracy. However, when the DL1 distance or the DKLD distance are used to classify images based on the segmentations results of the CLUST algorithm, the evidence favouring either of the two distance measures is less conclusive. The main reason behind the better classification results by CLUST algorithm may be because the segmentation results produced by the algorithm appear more closely reflect objects

n

D(Q, B)   min( d (ciQ , c Bj )) for j  {1,...m}  i 1

Table I shows an example distance matrix. The distance between Q and B is 0.176 + 0.1063 + 0.158 + 0.2713 = 0.7116. Any appropriate distance function can be applied in measuring the distance between two centroids. We use the City-block (DL1) distance function on DCT colour and texture features with the adaptive K-means algorithm. Similarly, DL1 is used on DCT colour features with the Mean Shift. Besides DL1, we use the Kullback-Leibler Divergence (DKLD) [13] when comparing image segments resulting from clustering DCT colour and texture features using the CLUST algorithm. The DKLD measure considers not only the dissimilarity between the mean vectors (i.e. centroids), but also the dissimilarity of variations (i.e. covariance matrices) which are readily available from the results of the segmentation. The reason behind this is to investigate the effect of the shape of the clusters in effective of image representation and to compare the results with those reported in [2] where just the centroids (i.e., mean vectors) are used to index the image. To focus on the effects of image segmentation to classification, the basic k-NN classifier with k = 5 and the majority voting is used. TABLE I.

TABLE II. CLASSIFICATION FOR TEN CLASSES BASED ON CLUSTERING BY ADAPTIVE K-MEANS, EM/GMM, AND MEAN SHIFT ALGORITHMS. Classes Elephants Flowers Buses Foods Horses Mountains People Beach Building Dinasours Average

DISTANCE MATRIX

Query Image Q 1

1 0.4625

Database Image B 2 3 4 0.5046 0.3982 0.5343

5 0.176

2

1.3809

0.3432

0.5527

0.1063

0.6106

3

0.4401

0.3304

0.3726

0.6004

0.158

4

0.9637

0.3294

0.3232

0.2713

0.4257

978-1-4799-0383-2/13/$31.00 ©2013 IEEE

EXPERIMENTS, RESULTS, AND DISCUSSION

A. Experiment data and Evaluation protocol The WANG contains 1000 images of sizes 256x384 or 384x256. The images are divided into 10 semantic classes (African people, beach, buildings, buses, dinosaurs, elephants, flowers, horses, mountains and foods), and each class includes 100 images. Fig. 4 shows sample images of classes [7]. We follow the commonly-used leave-one-out evaluation protocol to test image classification accuracy. The classification accuracy is measured by the recall (C) rate: N Re call (C )  RIC  100  TCID where NRIC is the number of correctly classified images of class C by k-NN and TCID is the total number of images of class C in the database.

(a) Original Image.

(b) Segment by CLUST

University of Essex, UK

131

Adaptive k-means DL1 74 91 81 57 84 41 54 35 53 100 67

Mean Shift DL1 61 64 81 52 90 46 39 19 53 96 60

CLUST DL1 [1] 92 99 95 70 99 84 61 68 53 100 82

DKLD 85 98 96 76 97 67 88 56 68 100 83

2013 5th Computer Science and Electronic Engineering Conference (CEEC)

TABLE IV.

TABLE III. NUMBER OF SEGMENTS BY DIFFERENT CLUSTERING ALGORITHMS.

CLUST

Classes

Adaptive kmeans Mean Std 3.17 0.450

Mean 5.33

Std 0.865

Flowers

5.61

0.874

3.16

Buses

6.68

0.993

3.71

Foods

5.63

1.021

Horses

4.75

Mountains

E F B D H M P C L S

Mean Shift Mean 6.37

Std 1.840

0.465

6.42

2.137

0.844

8.18

1.760

3.35

0.519

6.86

2.069

1.018

3.18

0.435

6.91

1.864

5.8

0.942

3.21

0.456

6.38

1.926

People

5.18

0.891

3.29

0.498

6.71

2.166

Beach

5.69

0.800

3.14

0.376

6.53

1.898

Building

5.38

0.918

3.14

0.426

6.27

1.932

Dinosaurs

4.87

0.966

3.28

0.486

7.13

1.864

Elephants

E F B D H M P C L S

In order to gain more insight from the experimental results, we have included as reference, the confusion matrices of the classification for each segmentation algorithm (Table IV for CLUST, Table V for adaptive K-Means and Table VI for the Mean Shift). The abbreviations used in all confusion matrices are: E: Elephants, F: Flowers, B: Buses, D: Foods, H: Horses, M: Mountains, P: People, C: Beach, L: Building, and S: Dinasours.

E

F

B

D

H

M

P

C

L

S

1 98 0 7 0 1 0 2 3 0

0 0 96 2 0 4 0 6 10 0

4 0 3 76 0 3 1 3 3 0

1 1 0 0 97 0 1 2 0 0

1 0 0 0 0 67 1 18 2 0

6 1 0 13 0 3 88 2 8 0

1 0 1 0 1 9 1 56 1 0

1 0 0 0 0 3 1 1 68 0

0 0 0 0 0 0 0 0 0 100

CONFUSION MATRIX: ADAPTIVE K-MEANS ALGORITHM

E

F

B

D

H

M

P

C

L

S

74 0 2 2 1 11 6 9 9 0

0 91 0 6 1 3 1 3 1 0

1 4 81 16 5 14 12 12 8 0

7 1 10 57 4 2 18 6 11 0

2 0 0 0 84 3 2 0 0 0

3 1 2 0 0 41 1 18 8 0

3 3 2 13 4 4 54 8 9 0

0 0 1 2 0 11 2 35 1 0

9 0 2 3 1 11 3 9 53 0

1 0 0 1 0 0 1 0 0 100

TABLE VI. E F B D H M P C L S

Table IV shows that 18 „Beach‟ images (i.e., C) are misclassified as Mountains and 9 Mountain images are misclassified as Beach class. Likewise, 6 Elephants images are misclassified as African people class. This is in fact largely due to the significant similarities between images of these two classes as can be seen by the examples in Fig. 4. Confusion matrices shown in Table V and VI illustrate the relatively poor classification results of the adaptive K-means and the Mean Shift algorithms where both false positive and false negative numbers are high A number of factors may have contributed to the poor performance of the Mean Shift algorithm. On one hand, the 3D DCT colour features upon which the algorithm is applied may not be representative enough to reflect the visual content of the image. On the other hand, as a density-based algorithm, the Mean Shift does not perform well if additional features on image texture are included into a high dimensional feature vector. Also, the Mean Shift algorithm requires a bandwidth parameter that defines a local area for calculating shifted means. Setting an inappropriate value for this parameter may lead to inappropriate clustering results. Methods that adaptively set the parameter values need to be investigated.

CONFUSION MATRIX: CLUST ALGORITHM

85 0 0 2 2 10 7 10 5 0

TABLE V.

in the images than its counterparts. In fact, on average, the number of segments produced by CLUST differs from the other two algorithms as shown in Table III. The adaptive Kmeans algorithm often produces fewer segments while the Mean Shift algorithm produces too many segments. Fewer or extra segments may under-fit or over-fit, causing additional errors at classification stage.

978-1-4799-0383-2/13/$31.00 ©2013 IEEE

University of Essex, UK

CONFUSION MATRIX: MEAN SHIFT ALGORITHM

E

F

B

D

H

M

P

C

L

S

61 2 5 1 2 10 8 7 11 1

0 64 0 3 1 0 0 1 1 0

5 26 81 13 1 23 23 31 9 0

2 6 1 52 1 1 6 1 2 0

4 0 1 10 90 0 3 0 2 0

5 0 5 0 2 46 2 13 4 1

4 0 2 9 2 1 39 3 10 2

0 1 2 2 0 7 3 19 3 0

12 1 2 4 0 10 6 23 53 0

7 0 1 6 1 2 10 2 5 96

V.

CONCLUSION AND FUTURE WORK

In this paper, we presented a performance evaluation of three adaptive clustering algorithms for image segmentation and classification. The test results show that the model-based clustering algorithm CLUST outperforms the partition-based adaptive K-means method and the density-based Mean Shift algorithm for most of the 10 classes when City-block distance function is used to measure similarity between two images. This means that local feature clustering using CLUST algorithm is more effective in segmenting image contents compared to using adaptive K-means or Mean Shift clustering. The results also reveal that considering the shapes of image segments into the calculation of dissimilarity using KullbackLeibler Divergence improves accuracy levels for some classes

132

2013 5th Computer Science and Electronic Engineering Conference (CEEC)

University of Essex, UK

but not all. However, it has been noted that the improvements in accuracy are made for the classes when other segmentation methods do not work well, demonstrating the potential of this dissimilarity measure. We plan to investigate another category of clustering methods known as graph-base algorithms. It is interesting to see how algorithms such as CHAMELEON [15] would perform against model-based methods when used for image classification.

a.

African People

b.

Elephants

c.

Beach

d.

Mountains

e.

Horses

f.

Foods

g.

Buses

h.

Buildings

i.

Dinasours

j.

Flowers

ACKNOWLEDGMENT The author Hanan Al-Jubouri would like to thank the Ministry of Higher Education and Scientific Research (MOHESR) and Al-Mustansiriya University in Iraq for sponsoring her DPhil study. REFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8] [9] [10]

[11]

[12]

[13]

[14] [15]

A. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, "Contentbased image retrieval at the end of the early years," IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 22, pp. 1349-1380, 2000. H. Al-Jubouri, H. Du, and H. Sellahewa, "Applying Gaussian Mixture Model on discrete cosine features for image segmentation and classification," Computer Science and Electronic Engineering Conference (CEEC), 2012 4th, pp. 194-199, 2012. D. E. Ilea and P. F. Whelan, "Image segmentation based on the integration of colour-texture descriptors-A review," Pattern Recogn., vol. 44, no. 10-11, pp. 2479-2501, 2011. R. Datta, D. Joshi, J. Li, and J. Z. Wang, "Image Retrieval: idea, influences, and trends of the new age," ACM Computing Surveys, vol. 40, pp. 1-60, 2008. H. Nezamabadi and S. Saryazdi, "Object-based image indexing and retrieval in DCT domain using clustering techniques," Transaction in World Academy of Science, Engineering and Technology, vol. 3, pp. 207-210, 2004. H. Li and Y. Peng, "Effective multi-level image representation for image categorization." In Pattern Recognition (ICPR), 2010 20th International Conference on, pp. 1048-1051. IEEE, 2010. J. Wang, J. and G. Wiederhold, "SIMPLIcity: semantics-sensitive integrated matching for Picture LIbraries," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, pp. 947-963, 2001. H. Du, data mining techniques applications: an introduction.: cengage learning EMEA, 2010. K. S. Candan and M. L. Sapino, Data management for multimedia retrieval.: Cambridge University Press, New York, 2010. S. K. Bhatia, “Adaptive K-Means Clustering”, In Proceedings of the Seventeenth International Florida Artificial Intelligence research Society Conference, Miami Beach, Florida, USA, pp 695-699, 2004. C. Bouman, "Cluster: an unsupervised algorithm for modeling Gaussian Mixture, " 1997, http:// engineering.purdue.edu/~bouman/software/ cluster/,Purdue University, accessed/ 9 June 2013. D. Comaniciu and P. Meer, "Mean shift: a robust approach toward feature space analysis," Pattern Analysis and Machine Intelligence, IEEE Transactions Pattren Anal. Machine Intell., pp. 603-619, 2002. T. A. Myrvoll and F. K. Soong, "On divergence based clustering of normal distributions and Its application to HMM adaptation," Eurospeech 2003- Geneva, 2003. A. K. Jain, "Data clustering: 50 years beyond K-means," Pattern Recognition Letters, vol. 31, pp. 651-666, 2010. G. Karypis, E. H. Han, and V. Kumar, "Chameleon: Hierarchical clustering using dynamic modeling." Computer 32, no. 8,pp 68-75, 1999.

978-1-4799-0383-2/13/$31.00 ©2013 IEEE

Figure 4. Sample images from the WANG Database. k.

133