SEMANTIC IMAGE RETRIEVAL WITH HSV CORRELOGRAMS T ...

3 downloads 0 Views 4MB Size Report
Recently introduced color correlograms have been found to be superior to color histograms and color coherence vectors in content-based image retrieval.
SEMANTIC IMAGE RETRIEVAL WITH HSV CORRELOGRAMS T. Ojala, M. Rautiainen, E. Matinmikko and M. Aittola MediaTeam Oulu Infotech Oulu, University of Oulu P.O.Box 4500, FIN-90014 University of Oulu, Finland

ABSTRACT Recently introduced color correlograms have been found to be superior to color histograms and color coherence vectors in content-based image retrieval. This is based on their ability to capture also spatial relationships of colors, in addition to occurrence statistics. In this work we study content-based retrieval of images using color correlograms computed in HSV color space. HSV correlograms are particularly interesting in terms of content-based retrieval, as HSV color space is supposed to provide better correspondence with human visual perception of color (dis)similarities than for example RGB color space. We explore different quantizations of the HSV color space, and try to make the correlogram more sensitive to changes in color content and less sensitive to illumination by quantisizing the hue component more precisely than the value component. The retrieval performance of HSV correlograms is quantitatively compared to that of RGB correlograms and HSV histograms based on queries with eight semantic categories of images manually designated by a human observer. 1. INTRODUCTION The breakthrough of the Internet, the World Wide Web and new devices for producing digital imagery such as digital (video) cameras have resulted in massive digital libraries. This in turn has created a demand for efficient tools for searching and browsing the imagery, either by the content of the images or by some metadata associated with the images. Content-based image retrieval has been subject to active research since the early 1990’s [4] and a large number of experimental retrieval systems have been introduced, among others BlobWorld [1], Chabot [14], Mars [15], NeTra [11], Photobook [16], PicHunter [3], PicSOM [10], QBIC [5], Surfimage [13] and VisualSEEK [19]. These systems retrieve images based on different cues such as color, texture, and shape, of which color remains as the most powerful and most useful feature for general purpose retrieval. Color-based retrieval first evolved from simple statistical measures such as average color to color histograms [5,14,16]. However, color histograms have limited discriminative power, because images with com-

pletely different spatial organization of colors can produce identical color histograms. This shortcoming is particularly harmful in the case of very large image databases. Consequently, several methods to incorporate spatial information of colors were proposed, among others [7,17,19]. Color correlogram introduced by Huang et al. [7] describes the spatial correlation of colors as a function of spatial distance. In practical terms a correlogram of an image corresponds to a table indexed by color pairs (ci,cj) so that the d-th entry for row (i,j) designates the probability of finding a pixel of color cj at a distance d from a pixel of color ci in the image. Very much for computational reasons Huang et al. concluded that the autocorrelogram of the image, a subset of the correlogram capturing the spatial correlation between identical colors only, is sufficient for the purpose of image retrieval [7]. They demonstrated autocorrelogram to provide very good performance in content-based retrieval in comparison to histograms and color coherence vectors. They developed the method further by introducing banded correlograms for reduced storage requirements, correlogram intersection for image subregion querying, and showed how correlograms can be used to locate objects in images [9]. They also introduced supervised learning methods based on the feedback provided interactively by a user to further improve the retrieval performance [8]. The superior performance of color correlograms was also confirmed in the large comparative study of Ma and Zhang [12] which benchmarked color correlograms with color histograms, color moments and color coherence vectors. The novel contribution of this study is to retrieve images using color correlograms computed in HSV color space, which has not been done before. Whereas Huang et al. [7,8,9] always considered RGB color space, Ma and Zhang [12] computed autocorrelogram in L*u*v* color space, which is recommended by CIE to be used in additive light-source conditions (for example quantifying differences in monitor displays), while the traditional approach in content-based image retrieval is to use a color space bottoming the Munsell color system (MTM, HSV) [6][19]. What makes HSV correlograms particularly interesting in terms of content-based retrieval is the fact that

HSV color space is supposed to provide better correspondence with human visual perception of color (dis)similarities than RGB color space, for example. We explore different quantizations of the HSV color space and try to make the correlogram more sensitive to changes in color content and less sensitive to illumination, by quantizing the hue component more precisely than the value component. We compare the retrieval performance of HSV autocorrelogram to that of RGB autocorrelogram and HSV histograms using an image data which has been manually partitioned into semantic categories by a human observer. A semantic category corresponds to a set of images which the human observer perceived to have identical semantic meaning, not necessarily identical color content or spatial structure. Consequently, the images in a particular category can have quite considerable variation in those terms. Images of eight semantic categories are used as query images, hence we are trying to quantify to which extent HSV correlograms are able to discriminate semantic categories of images. We retrieve images using all 822 images in the eight query categories, to obtain more reliable estimates on retrieval performance in comparison to experiments where the results are based on a small number of queries, for example just 29 in [7]. The remainder of the paper is organized as follows. Chapter 2 describes the methodology, defining the three color features used in this study and the (dis)similarity measures. Chapter 3 presents the experiments, image data and retrieval results for the three methods. Chapter 4 discusses the results, possible future work and concludes the paper.

which gives the probability that given any pixel p1 of color ci, a pixel p2 at a distance d from the given pixel p1 is of

2. METHODOLOGY

where A is a weight matrix, whose elements aij correspond to the similarity of colors ci and cj. In the case of autocorrelograms we follow the work of Huang et al. [7] and measure dissimilarity using the L1 norm:

2.1. Histogram, correlogram and autocorrelogram Let I be an XxY image which comprises of pixels p(x,y). Let [C] denote the set of C colors c1,...,cC that can occur in the image. For a pixel p, let I(p) denote its color c, and let Ic correspond to a pixel p, for which I(p)=c. The histogram of I is defined for color ci as h c ( I ) ≡ pPr [ p ∈ Ic ] ∈I i

i

which corresponds to the probability of any pixel in I being of color ci. In other words, the spatial relationship of colors is ignored. Let [D] denote a set of D fixed distances d1,...,dD, which are measured using the L∞ norm. The correlogram of image I is defined for color pair (ci,cj) and distance d as γ

(d ) c i, c j

(I ) ≡

Pr

p1 ∈ I c , p2 ∈ I i

[ p2 ∈ I c p1 – p2 = d ] j

color cj. The size of the correlogram is C2D. In this study we use autocorrelogram, which is a subset of correlogram. The autocorrelogram of image I gives the probability of finding identical colors at distance d: (d )

(d )

α c ( I ) ≡ γ c, c ( I )

Autocorrelogram provides significant computational benefits in comparison to correlogram. First, its size is only CD, thus the storage requirements are considerably smaller. Second, the average number of entries per histogram cell increases from (XY)/(C2D) to (XY)/(CD), which improves the statistical reliability of the histogram. This is particularly important if (auto)correlograms are compared as histograms with information theoretic (dis)similarity measures such as chi-squre or Jeffrey’s divergence - as a rule of thumb, statistics literature often requires on average 5 or 10 entries per histogram cell for the histogram to be statistically reliable. 2.2. Dissimilarity measures In the case of histograms the dissimilarity between a query image Q and a reference image R in the database is measured using the weighted Euclidean distance introduced in [6]: t

D h ( Q, R ) = [ h ( Q ) – h ( R ) ] A [ h ( Q ) – h ( R ) ]

D α ( Q, R ) =



c ∈ [ C ], d ∈ [ D ]

(d )

(d )

αc ( Q ) – αc ( R )

In other words, the autocorrelogram is interpreted as a feature vector in a CD dimensional feature space. Alternatively, we could treat autocorrelograms as histograms, and employ information theoretic (dis)similarity measures to compare them. 3. EXPERIMENTS 3.1. Image data The image database used in this study contained 2445 images from three different sources [2,18,20]. The images were partitioned into different semantic categories by a human observer. In other words, the images belonging to a certain category were perceived to be similar in terms of

sunset (28) urban view (157)

underwater (68)

822. The other 1623 images in the database served as uninteresting ‘bulk’, which was supposed to make the problem simulate real life retrieval of images from a large unorganized database. The eight images from the building category in Fig. 2 demonstrate the considerable intracategory variation in the image data, which presents a great challenge to the robustness of content-based retrieval.

building (335)

beach (52)

portrait (129)

item on black background (13)

semantic meaning, not necessarily have identical color content or spatial structure. An image could belong to multiple semantic categories. Images of eight categories were used as query images. An example image of each of the eight categories is shown in Fig. 1. The quantity of images per category varied from 13 to 335 so that the total number of query images was

item on white background (40)

Fig. 1. Example images of the eight semantic categories used in the queriues. The number in parentheses corresponds to the number of images in the category.

Fig. 2. Intracategory variation demonstrated by eight images from the building category.

3.2. Performance measure Each image of the eight semantic categories used in this study served as the query image in turn, i.e. 822 queries were performed in total. We report precision (percentage of query category images of all retrieved images) as a function of recall (percentage of retrieved query category images of all images in the query category) averaged over the 822 queries. We also study category-wise retrieval performance, where the retrieval results are correspondingly averaged over all query images belonging to a given category. 3.3. Retrieval results for HSV autocorrelogram under different quantizations of the HSV color space The HSV autocorrelogram was computed with a set of four distances (1,3,5,7) [7]. Five different quantizations of

the HSV color space were evaluated experimentally (the numbers designate the numbers of bins allocated to hue, saturation and value): (7,2,2), (9,3,3), (9,4,4), (9,5,5), and (12,3,3). Consequently, the size of the final autocorrelogram varied from 112 (7,2,2) to 900 (9,5,5). The hue channel was quantized so that the cut-off point between 0 and 2π was located at the center of the first bin. We quantize hue much more precisely than value as an attempt to make the HSV autocorrelogram more sensitive to changes in color content and less sensitive to changes in illumination. The average precision-recall curves for different quantizations of the HSV autocorrelogram are presented in Fig. 3. We observe that (12,3,3) provides best overall retrieval performance, hence HSV autocorrelograms computed using this quantization of the HSV color space will be used in further comparisons.

Fig. 3. Retrieval results for HSV autocorrelogram under different quantizations of the HSV color space.

Fig. 4. Retrieval results for HSV histogram, HSV correlogram and RGB correlogram. 3.4. Comparison of HSV autocorrelogram with HSV histogram and RGB autocorrelogram The HSV histogram was computed under the same quantization as HSV autocorrelogram, i.e. (12,3,3). The RGB autocorrelogram was computed with the same set of four distances as the HSV autocorrelogram. The RGB color space was quantized so that each channel was divided into an equal number of bins of equal width. In the experiments 3, 4 and 5 bins per channel were considered, hence the size of the RGB autocorrelogram varied from 108 to 500. We present results for the RGB autocorrelogram computed with 4 bins per channel, as it provided the best retrieval performance. Retrieval results for the three methods are illustrated in Fig. 4. As expected, HSV histogram provides the weakest retrieval performance, which demonstrates the importance of the spatial organization of colors captured by the autocorrelograms. HSV autocorrelogram achieves slightly bet-

ter precision than RGB autocorrelogram up to 20% recall, then RGB autocorrelogram takes over. For 1% recall HSV autocorrelogram gives about 3% higher average precision than RGB autocorrelogram, which corresponds to roughly 25 successful retrievals over the 822 queries. 3.5. Retrieval results for semantic categories using HSV autocorrelogram Retrieval results for the eight semantic categories are illustrated in Fig. 5 for HSV autocorrelogram. The actual quantities of images in each category differ significantly. We observe a great variation in retrieval performance between different categories, underwater being the easiest and beach the most difficult category in the light of obtained precision values. To a certain extent the results can be explained with the general structure of the images. For example both underwater and sunset images, which are found most precisely, have a specific spatial structure of colors.

Fig. 5. Retrieval results for semantic categories using HSV correlogram.

4. DISCUSSION This study compared the retrieval performance of HSV and RGB autocorrelograms and HSV histograms, of which HSV autocorrelogram provided the best overall performance in retrieving images of eight different semantic categories designated by a human observer. In the future, comparisons of HSV to perceptually uniform color spaces, such as CIE L*a*b* or MTM, may yield even bigger improvements to the correlogram performance. Our results confirm that the spatial organization of colors is of great importance in content-based image retrieval, providing considerable improvement to plain occurrence statistics of colors. (Auto)correlograms capture both global occurrence statistics and local spatial organization of colors by a simple spatial constraint. This raises the question if the performance could be further improved by a more extensive, possibly color independent, spatial constraint. Also, the de facto interpretations of (auto)correlograms are feature vectors which are compared with L1 norm for computational reasons. However, (auto)correlogram is a histogram, and better performance could be achieved with information theoretic (dis)similarity measures, at higher computational cost. Acknowledgements Financial support provided by the National Technology Agency of Finland and the Academy of Finland is gratefully acknowledged. Note This work is a part of the CMRS (Content-based Multimedia Retrieval System) project at the University of Oulu. Online demonstration and the ground truth data of the image database are available at http://www.mediateam.oulu.fi. References [1] Carson C, Belongie S, Greenspan H & Malik J (1997) Region-based image querying. Proc. IEEE Workshop on Content-Based Access of Image and Video Libraries, San Juan, Puerto Rico, 42-49. [2] CorelGALLERY images. http://www.corel.com. [3] Cox IJ, Miller ML, Omohundro SM & Yianilos PN (1996) PicHunter: Bayesian relevance feedback for image retrieval. Proc. 13th International Conference on Pattern Recognition, Vienna, Austria, 3:361-369. [4] Eakins J & Graham M (1999) Content-Based Image Retrieval: A report to the JISC Technology Applications Programme. Institute for Image Data Research, University of Northumbria at Newcastle, United Kingdom. http:// www.unn.ac.uk/iidr/research/cbir/report.html. [5] Flickner M, Sawhney H, Niblack W, Ashley J, Huang Q, Dom B, Gorkani M, Hafner J, Lee D, Petkovic D, Steele D & Yanker P (1995) Query by image and video content: The QBIC system. IEEE Computer Magazine 28:23-32.

[6] Hafner J, Sawhney HS, Equitz W, Flickner M, Niblack W (1995) Efficient color histogram indexing for quadratic form distance functions. IEEE Transactions on Pattern Analysis and Machine Intelligence 17:729-736. [7] Huang J, Kumar SR, Mitra M & Zhu WJ (1997) Image indexing using color correlograms. Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, 762-768. [8] Huang J, Kumar SR & Mitra M (1997) Combining supervised learning with color correlograms for content-based image retrieval. Proc. Fifth ACM International Conference on Multimedia, Seattle, WA, 325-334. [9] Huang J, Kumar SR, Mitra M & Zhu WJ (1998) Spatial color indexing and applications. Proc. Sixth International Conference on Computer Vision, Bombay, India, 602-607. [10] Laaksonen J, Koskela M & Oja E (1999) Content-based image retrieval using self-organizing maps. Proc. Third International Conference on Visual Information Systems, Amsterdam, The Netherlands, 541-548. [11] Ma WY & Manjunath BS (1997) NeTra: a toolbox for navigating large image databases. Proc. International Conference on Image Processing, Santa Barbara, CA, 1:568-571. [12] Ma WY & Zhang HJ (1998) Benchmarking of image features for content-based retrieval. Proc. 32nd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, 1:253-257. [13] Mitschke M, Meilhac C & Boujemaa N (1998) Surfimage: a flexible content-based image retrieval system. Proc. Sixth ACM International Conference on Multimedia, Bristol, UK, 339-344. [14] Ogle V & Stonebraker M (1995) Chabot: retrieval from a relational database of images. IEEE Computer Magazine 28:4048. [15] Ortega M, Rui Y, Chakrabarti K, Porkaew K, Mehrotra S, Huang TS (1998) Supporting ranked Boolean similarity queries in MARS. IEEE Transactions on Knowledge and Data Engineering 10:905-925. [16] Pentland A, Picard R & Sclaroff S (1996) Photobook: content-based manipulation of image databases. International Journal of Computer Vision 18:233-254. [17] Pass G, Zabih R & Miller J (1996) Comparing images using color coherence vectors. Proc. Fourth ACM International Conference on Multimedia, Boston, MA, 65-73. [18] QBIC Developers wwwqbic.almaden.ibm.com.

Kit

CD-ROM.

http://

[19] Smith J & Chang S-F (1996) VisualSEEK: a fully automated content-based image query system. Proc. Fourth ACM International Conference on Multimedia, Boston, MA, 87-98. [20] Swedish University Network FTP Server Images. ftp:// ftp.sunet.se/pub/pictures/.

Suggest Documents