Clustering of Digital Images based on Color histogram

Palestine Polytechnic University

The Second Students Innovative Conference (SIC2013) June 12,2013- Hebron, State of Palestine

Clustering of Digital Images based on Color histogram Anas A. Amro Master of Informatics Palestine Polytechnic University Hebron, Palestine [email protected]

Ibrahim N. Nassar Master of Informatics Palestine Polytechnic University Hebron, Palestine [email protected]

Abstract— Clustering is the process of partitioning or grouping a given set of patterns into disjoint clusters. This is done such that patterns in the same cluster are alike and patterns belonging to two different clusters are different. This paper presents a new approach for image clustering by applying k-means algorithm based on color histogram. Given millions of mixed images to be group into sets that contain similar images, clustering using this method can be beneficial, due to its efficiency. Keywords-Clustering; K-means algorithm; Color histogram.

I. INTRODUCTION Clustering of data is a method by which large a set of data are grouped into clusters of smaller sets of similar data. Computer-assisted analysis must partition objects into groups, and must provide an explanation for this partitioning [1]. Many clustering methods exist to partition a data set by some natural measure of similarity [2]. This similarity measure places similar objects close to one another forming a group, thus several clusters related to objects are formed. An ideal clustering algorithm is one that classifies data such that samples that belong to a cluster are close to each other while samples from different clusters are further away from each other. Many algorithms for clustering are available. A popular algorithm is the K-means, based on a given number of clusters the algorithm iterates to find best clusters for the objects. This paper discusses method for clustering images using Kmeans algorithm and color histogram. K-means clustering is an effective algorithm to extract a given number of clusters of patterns from a training set. The process of clustering many images has many phases: first the images are read for a specified folder. The color histogram for each image is calculated, and then a distance of each image with all of images is calculated to find the similarities. Finally the images can be grouped together according to their color similarities The rest of this paper is organized as follows: Section 2 discusses k-means algorithm and color histogram. Section 3 shows the methodology. Finally, Section 4 presents the Conclusion and results.

Hashim Tamimi College of IT and Computer Eng. Palestine Polytechnic University Hebron, Palestine [email protected]

II.

BACKGROUND

A. K-means algorithms The K-Means algorithm is a method to cluster objects based on their attributes into k partitions. It assumes that the k clusters exhibit Gaussian distributions. It assumes that the object attributes form a vector space. The objective it tries to achieve is to minimize total intra-cluster variance. The points are clustered around centroids which are obtained by minimizing the objective : ∑

∑

Where there are k clusters Si, i = 1, 2 … k and μi is the centroid or mean point of all the points xj ϵ Si As a part of this project, an iterative version of the algorithm was implemented. Various steps in the algorithm are as follows: 1. Compute the intensity distribution (also called the histogram) of the intensities. 2. Initialize the centroids with k random intensities. 3. Repeat the following steps until the cluster a label of the image does not change anymore. 4. Cluster the points based on distance of their intensities from the centroid intensities. c(i) = arg min j || x(i) - μj|| 2

(2)

5. Compute the new centroid for each of the clusters. ∑ ∑

{

} {

}

Where k is a parameter of the algorithm (the number of clusters to be found), i iterates over the all the intensities, j iterates over all the centroids and i are the centroid intensities [3].

B. Color Histrogram Color histograms are collected counts of data organized into a set of predefined bins, when we say data we are not restricting it to be intensity values. The data collected can be whatever feature you find useful to describe your image. Also the data contained in a digital image can be displayed as a histogram which is a plot of the pixel values versus the number of pixels that have that particular value. III.

METHODLOGY Start

Figure 2: the relationship between k and time Input name of images’ folder, number of clusters, Name of output folder

And if the number of bins increased, the time increases, the result shows in figure 3.

Histogram (# of bins ) Kmeans ( # of k )

Image new folders based on # of k

Figure 3: the relationship between bins and clusters

End

Figure 1: Image clustering based on k-means algorithm and histogram V. Figure 1 shows the steps of how we group the images to k clusters, the first step we enter the name of images folders which contains many types of images. After that we input the number of clusters k. Also we need to enter a name for the new folder which contains files based on k IV.

CONCLUSION AND RESUTL

We have successfully implemented k-means clustering algorithm. And we find histogram for each image in the folder after the program read them, also it can comparison each one with others, and finally it can classify all of them to a new folder. If k increases to classify the images, we notice the time is increased and the result for the relationship between them is in Figure 2.

[1] [2] [3] [4]

REFERENCES

M.J. A. Berry, G. Linoff, Data Mining Techniques- for Marketing, Sales and Customer Support. John Wiley & Sons, NY, USA, 1997. M.S Aldenderfer, R.K. Blashfield, Cluster Analysis, Sage Publications, Beverly Hills, USA, 1984. S. Clerk Tatiraju, Avi Mehta, Image Segmentation using k-means clustering, EM and Normalized Cuts. Team, O. D. (2011). OpenCV 2.4.5.0 documentation(Clustering,Histogram). Retrieved 5 1, 2013, from OpenCV: http://docs.opencv.org/modules/core/doc/

Clustering of Digital Images based on Color histogram

Clustering of Digital Images based on Color histogram

Suggest Documents

Segmentation of color images by clustering 2D histogram and merging ...

Histogram and watershed based segmentation of color images

Multilevel color histogram representation of color images by peaks for

Comparison of Color Spaces for Histogram-Based

Best Clustering Around the Color Images - ijcee

Histogram Equalization-Based Color Image ...

Automated Color Image Arrangement Method Based on Histogram ...

Color Correction for Digital Dermatologic Images - NCBI

Moment based normalization of color images

Correction of Distortions in Color Images Based on ... - Springer Link

Watermarking and authentication of color images based on ...

Watermarking of Color Images based on a multi ... - Semantic Scholar

shadow detection of urban color aerial images based on partial ...

An Evaluation of Color Histogram Based Methods in ... - CiteSeerX

Spatial and Color Clustering on an FPGA-based Computer System

Color Clustering in the Metal Inscription Images Using ... - Journal (UAD

On the optimal number of clusters in histogram clustering

On Comparison of Clustering Techniques for Histogram PDF ...

Tensor Based Feature Detection for Color Images

A Variable Bin Width Histogram Based Image Clustering Algorithm

Color Histogram Specification by Histogram Warping - Mark Grundland

Scene Classification based on Histogram of Detected

Real-time wrist localization in color images based on

Robust Watermarking for Images Based on Color ... - Semantic Scholar