an efficient content based image retrieval using color ...

1 downloads 0 Views 187KB Size Report
An integrated matching scheme based on Most. Similar Highest Priority (MSHP) principle is used to compare the query and target image. The adjacency.
Ch.Kavitha et al. / International Journal of Engineering Science and Technology (IJEST)

AN EFFICIENT CONTENT BASED IMAGE RETRIEVAL USING COLOR AND TEXTURE OF IMAGE SUBBLOCKS CH.KAVITHA Associate Professor, Department of Information Technology Gudlavalleru Engineering College Gudlavalleru, Krishna (Dist.), A.P, India

Dr. B.PRABHAKARA RAO Professor & Director of Evaluation Jawaharlal Nehru Technological University Kakinada Kakinada, East Godavari (Dist.), A.P, India

Dr. A.GOVARDHAN Professor & Principal JNTUH College of Engineering Jagtial, Karimnagar (Dist.), A.P, India Abstract : Image retrieval is an active research area in image processing, pattern recognition, and computer vision. For the purpose of effectively retrieving more similar images from the digital image databases, this paper uses the local HSV color and Gray level co-occurrence matrix (GLCM) texture features. The image is divided into sub blocks of equal size. Then the color and texture features of each sub-block are computed. Color of each sub-block is extracted by quantifying the HSV color space into non-equal intervals and the color feature is represented by cumulative color histogram. Texture of each sub-block is obtained by using gray level co-occurrence matrix. An integrated matching scheme based on Most Similar Highest Priority (MSHP) principle is used to compare the query and target image. The adjacency matrix of a bipartite graph is formed using the sub-blocks of query and target image. This matrix is used for matching the images. Euclidean distance measure is used in retrieving the similar images. As the experimental results indicated, the proposed technique indeed outperforms other retrieval schemes in terms of average precision. Keywords: Image retrieval, HSV color, texture, GLCM, integrated matching, MSHP 1. Introduction Image retrieval systems attempt to search through a database to find images that are perceptually similar to a query image. CBIR is an important alternative and complement to traditional text-based image searching and can greatly enhance the accuracy of the information being returned. It aims to develop an efficient visualcontent-based technique to search, browse and retrieve relevant images from large-scale digital image collections. Most of the CBIR techniques automatically extract low-level features (e.g. color, texture, shapes and layout of objects) to measure the similarities among images by comparing the feature differences. The need for efficient image retrieval is increased tremendously in many application areas such as medical imaging, military, digital library and computer aided design [1]. There are various CBIR systems with global features [2,3,4] and local features [4,5,6]. From these systems it is observed that local features based systems play a significant role in determining similarity of images. Using a single feature for image retrieval can not be a good solution for the accuracy and efficiency. High-dimensional feature will reduce the query efficiency; low-dimensional feature will reduce query accuracy, so it may be a better way to use multi features for image retrieval. Color and texture are the most important visual features. Firstly, we discuss the color and texture features separately. On this basis, a new

ISSN : 0975-5462

Vol. 3 No. 2 Feb 2011

1060

Ch.Kavitha et al. / International Journal of Engineering Science and Technology (IJEST)

method using integrated features is provided, and experiment is done on the real images, satisfactory result is achieved with integrated matching scheme based on MSHP principle. In [12] global HSV color and GLCM texture features are used to retrieve the images. HSV color space is widely used in computer graphics, visualization in scientific computing and other fields [9,19]. In this space, hue is used to distinguish colors, saturation is the percentage of white light added to a pure color and value refers to the perceived light intensity [10]. The advantage of HSV color space is its ability to separate chromatic and achromatic components. Therefore we selected the HSV color space to extract the color features according to hue, saturation and value. Texture feature is a kind of visual characteristics that does not rely on color or intensity and reflects the intrinsic phenomenon of images. It is the total of all the intrinsic surface properties. That is why the texture features has been widely used in image retrieval. Many objects in an image can be distinguished solely by their textures without any other information. There is no universal definition of texture. Texture may consist of some basic primitives, and may also describe the structural arrangement of a region and the relationship of the surrounding regions [7,17,20]. In our approach we have used the statistic texture features using gray-level co-occurrence matrix (GLCM).So we developed a technique which captures color and texture features of sub-blocks of the image. For each sub-block cumulative histogram and statistic texture features using GLCM are determined. In [14] an integrated matching procedure based on MSHP principle is used to find the matching between query and target image. The target and query images are matched with integrated matching scheme based on MSHP principle. Weighted Euclidean distance is used as a similarity measure in retrieving similar images. The section 2 outlines proposed method. The section 3 deals with experimental setup. The section 4 presents results. The section 5 presents conclusions. 2. Proposed method The proposed method is based on color and texture features of image sub-blocks with matching based on most similar highest priority principle. Our method of retrieving the images consists of four basic steps. 2.1. Partitioning the image into sub-blocks Firstly the image is partitioned into 6 (2X3) equal sized sub-blocks as shown in Fig.1. The size of the subblock in an image of size 256X384 is 128X128. The images with other than 256X384 size are resized to 256X384.

Fig. 1 partitioned image

2.2. Extraction of color of an image sub-block In the HSV color space, each component occupies a large range of values. If we directly use H, S and V components to represent the color feature, it requires lot of computation. So it is better to quantify the HSV color space into non-equal intervals. At the same time, because the power of human eye to distinguish colors is limited, we do not need to calculate all segments. Unequal interval quantization according the human color perception has been applied on H, S, and V components. Based on the color model of substantial analysis, we divided color into eight parts. Saturation and intensity is divided into three parts separately in accordance with the human eyes to distinguish [9,12,13]. In accordance with the different colors and subjective color perception quantification, quantified hue(H), saturation(S) and value(V) are showed as equation 1.

ISSN : 0975-5462

Vol. 3 No. 2 Feb 2011

1061

Ch.Kavitha et al. / International Journal of Engineering Science and Technology (IJEST)

      H      

o if h  316 , 20  1 if h  21, 40 

2 if h  41,75  3 if h  76 , 155  4 if h  156 , 190 

5 if h  191, 270  6 if h  271, 295 

7 if h  296 , 315 

 o if s  0, 0 .2   S  1 if s  0.2, 0.7   2 if s  0.7 , 1 

(1)

 o if v  0, 0.2   V   1 if v  0.2, 0.7   2 if v  0 .7, 1 

In accordance with the quantization level above, three-dimensional feature vector for different values of H,S,V with different weight to form one-dimensional feature vector named G: G=QsQvH+QvS+V

(2)

Where Qs is quantified series of S, Qv is quantified series of V. Here we set Qs=Qv =3, then G =9H+ 3S +V

(3)

In this way, three-component vector of HSV form One-dimensional vector, which quantize the whole color space for the 72 kinds of main colors. So we can handle 72bins of one-dimensional histogram. This quantification can be effective by reducing the computational time and complexity. It will be much of the deviation of the calculation of the similarity if we do not normalize, so we must normalize the components to the same range. The process of normalization is to make the components of feature vector equal importance. Color histogram is derived by first quantizing colors in the image into 72 bins in HSV color space, and counting the number of image pixels in each bin. One of the weaknesses of color histogram is that when the characteristics of images should not take over all the values, the statistical histogram will appear in a number of zero values. The emergence of these zero values would make similarity measure does not accurately reflect the color difference between images and statistical histogram method to quantify more sensitive parameters. Therefore, this paper represents the one-dimensional vector G by constructing a cumulative histogram of the color characteristics of image after using non-interval HSV quantization [12,13] for G. 2.3. Extraction of texture of an image sub-block GLCM [11,13] creates a matrix with the directions and distances between pixels, and then extracts meaningful statistics from the matrix as texture features. GLCM texture features commonly used are shown in the following. GLCM is composed of the probability value, it is defined by P(i, j d ,  ) which expresses the probability of the couple pixels at direction and d interval. When and d is determined, P(i, j d , ) is showed by Pi, j. Distinctly GLCM is a symmetry matrix; its level is determined by the image gray-level. Elements in the matrix are computed by the equation shown below: P(i, j d ,  ) 

P(i, j d ,  )

 i

j

P(i, j d ,  )

( 4)

GLCM expresses the texture feature according the correlation of the couple pixels gray-level at different positions. It quantificationally describes the texture feature. In this paper, four features are selected; include energy, contrast, entropy, inverse difference. Energy = E 

  P x , y 

2

x

(5)

y

It is a gray-scale image texture measure homogeneity changing, reflecting the distribution of image gray-scale uniformity of weight and texture. Contrast I =

 x  y 

2

P  x, y 

(6)

Contrast is the main diagonal near the moment of inertia, which measure the value of the matrix is distributed and images of local changes in number, reflecting the image clarity and texture of shadow depth. Contrast is large means texture is deeper.

ISSN : 0975-5462

Vol. 3 No. 2 Feb 2011

1062

Ch.Kavitha et al. / International Journal of Engineering Science and Technology (IJEST) Entropy S    P ( x, y) log P(x, y) x

(7)

y

Entropy measures image texture randomness, when the space co-occurrence matrix for all values is equal, it achieved the minimum value; on the other hand, if the value of co-occurrence matrix is very uneven, its value is greater. Therefore, the maximum entropy implied by the image gray distribution is random. 1 Inverse difference H P(x, y) (8)  y)2 1 ( x x y



It measures local changes in image texture number. Its value in large is illustrated that image texture between the different regions of the lack of change and partial very evenly. Here p(x, y) is the gray-level value at the Coordinate (x, y). 2.4. Integrated image matching

An integrated image matching procedure similar to the one used in [7,14] is proposed. With the decomposition of the image, the number of sub-blocks remains same for all the images. In [8] a similar sub-blocked approach is proposed, but the matching is done by comparing sub-blocks of query image with the sub-blocks of target image in corresponding positions. In our method, a sub-block from query image is allowed to be matched to any subblock in the target image. However a sub-block may participate in the matching process only once. A bipartite graph of sub-blocks for the query image and the target image is built as shown in Fig. 2. The labeled edges of the bipartite graph indicate the distances between sub-blocks. A minimum cost matching is done for this graph. Since, this process involves too many comparisons, the method has to be implemented efficiently. To this effect, we have designed an algorithm for finding the minimum cost matching based on most similar highest priority (MSHP) principle[14] using the adjacency matrix of the bipartite graph. Here in, the distance matrix is computed as an adjacency matrix. The minimum distance dij of this matrix is found between sub-blocks i of query and j of target. The distance is recorded and the row corresponding to sub-block i and column corresponding to sub-block j, are blocked (replaced by some high value, say 99999). This will prevent sub-block i of query image and sub-block j of target image from further participating in the matching process. The distances, between i and other sub-blocks of target image and, the distances between j and other sub-blocks of query image, are ignored (because every sub-block is allowed to participate in the matching process only once). This process is repeated till every sub-block finds a matching. The process is demonstrated in Fig. 3 using an example for 4 sub-blocks. The complexity of the matching procedure is reduced from O(n2 ) to O(n), where n is the number of sub-blocks involved. The integrated minimum cost match distance between images is now defined as: Dqt=∑ ∑ dij, Where i=1, 2,--n j=1, 2---n. And dij is the best-match distance between sub-block i of query image q and sub-block j of target image t and Dqt is the distance between images q and t.

Fig. 2: Bipartite graph showing 4 sub-blocks of target and query images.

ISSN : 0975-5462

Vol. 3 No. 2 Feb 2011

1063

Ch.Kavitha et al. / International Journal of Engineering Science and Technology (IJEST)

Fig 3: Image similarity computation based on MSHP principle, (a) first pair of matched sub-blocks i=2,j=1 (b) second pair of matched sub-blocks i=1, j=2 (c) third pair of matched sub-blocks i=3, j=4 (d) fourth pair of matched sub-blocks i=4,j=3, yielding the integrated minimum cost match distance 34.34.

3. Experimental setup 3.1 Data set: Wang’s [15] dataset comprising of 1000 Corel images with ground truth. The image set comprises 100 images in each of 10 categories. The images are of the size 256 x 384. 3.2 Feature set: The feature set comprises color and texture descriptors computed for each sub-block of an image as we discussed in section 2. 3.3 Computation of similarity

The similarity between sub-blocks of query and target image is measured from two types of characteristic features which includes color and texture features to formulate the bi-partite graph. Matching of the sub-blocks is done based on the most similar highest principle. Two types of characteristics of images represent different aspects of property. So during the Euclidean similarity measure, when necessary the appropriate weights to combine them are also considered. Therefore, in carrying out Euclidean similarity measure we should consider necessary appropriate weights to combine them. We construct the Euclidean calculation model as follows: D(A, B) =ω1D(FCA , FCB ) + ω2D(FTA , FTB)

(13)

Here ω1 is the weight of color features, ω2 is the weight of texture features, FCA and FCB represents the 72dimensional color features for image A and B. For a method based on GLCM, FTA and FTB on behalf of 4dimensional texture features correspond to image A and B. Here, we combine color features and texture features. The value of ω through experiments shows that at the time ω1=ω2=0.5 has better retrieval performance. 4. Experimental results

The experiments were carried out as explained in sections 2 and 3. The results are benchmarked with some of the existing systems using the same database [15]. The quantitative measure is given below 1 p (i )  1 100 1 j 1000,r (i , j )100, ID ( j )  ID (i )



Where p(i) is precision of query image I, ID(i) and ID(j) are category ID of image I and j respectively, which are in the range of 1 to 10. the r(i, j) is the rank of image j. This value is percentile of images belonging to the category of image i , in the first 100 retrieved images. The average precision pt for category t(1≤t≤10) is given by 1 pt  p(i ) 100 1i 1000, ID (i ) t



The comparison of proposed method with other retrieval systems is presented in the Table 1. These retrieval systems are based on HSV color, GLCM texture and combined HSV color and GLCM texture. Our sub-blocks based retrieval system is better than these systems in all categories of the database. The experiments were carried out on a Core i3, 2.4 GHz processor with 4GB RAM using MATLAB. Only simple features of image information can not get comprehensive description of image content. We consider the

ISSN : 0975-5462

Vol. 3 No. 2 Feb 2011

1064

Ch.Kavitha et al. / International Journal of Engineering Science and Technology (IJEST)

color and texture features combining not only be able to express more image information, but also to describe image from the different aspects for more detailed information in order to obtain better search results. Retrieval algorithm flow is as follows for each sub block of an image:

Fig 4. Algorithm scheme

The performance of a retrieval system can be measured in terms of its recall (or sensitivity) and precision (or specificity).Recall measures the ability of the system to retrieve all models that are relevant, while precision measures the ability of the system to retrieve only models that are relevant. They are defined as Precision=Number of relevant images retrieved Total number of images retrieved Recall=Number of relevant images retrieved Total number of relevant images

ISSN : 0975-5462

Vol. 3 No. 2 Feb 2011

1065

Ch.Kavitha et al. / International Journal of Engineering Science and Technology (IJEST)

Suggest Documents