query, that is âI want to find my friend John in all my image libraryâ ..... Apostol Natsev, Jelena TeÅ¡iÄ, Lexing Xie, Rong Yan, John R. Smith, IBM multimedia.
Color features performance comparison for image retrieval Daniele Borghesani, Costantino Grana, Rita Cucchiara Dipartimento di Ingegneria dell’Informazione, Università degli Studi di Modena e Reggio Emilia, Via Vignolese 905/b, 41100 Modena, Italy {daniele.borghesani, costantino.grana, rita.cucchiara}@unimore.it
Abstract. This paper proposes a comparison of color features for image retrieval. In particular the UCID image database has been employed to compare the retrieval capabilities of different color descriptors. The set of descriptors comprises global and spatially related features, and the tests show that HSV based global features provide the best performance at varying brightness and contrast settings. Keywords: color features, HSV, image retrieval, feature comparison
1 Introduction The increasing availability of multimedia digital libraries (publicly shared or personal) and low cost devices to produce them, raised the need of appropriate tools for the search within this enormous amount of data. Classical search methodologies in desktop and web contexts are based on textual keywords. In order to reuse the majority of all preexisting searching techniques with multimedia data, the most immediate solution is tagging, but it is generally boring from the user prospective and unfeasible if the amount of data to annotate is too high. The search and retrieval based on content (CBIR) is the most difficult but at the same time the most effective and elegant way to solve the problem. A lot of literature background has been produced so far, focusing on specific components (like the learning algorithms, the features to use, the way to select the most effective features, etc…), sometimes specialized on some real world context (news, sports, etc…). A lot of functioning systems have also been proposed. Columbia University proposed its semantic video search engine (CuVid [1]), including 374 visual concept detectors and using different combinations of input modalities (keyword, image, near-duplicate and semantic concepts). IBM multimedia search and video system (Marvel [2]) uses multimodal machine learning techniques for bridging the semantic gap, recognizing entities such as scenes, objects, events and people. University of Amsterdam also proposed a semantic video search system [3] featuring a fascinating user interface (RotorBrowser) and 500 automatically detected semantic concepts. We also proposed a general framework called PEANO (Pictorially Enriched ANnotation with Ontologies) which allows to automatically annotate video clips by comparing their similarity to a domain specific set of prototypes [4].
2
Daniele Borghesani, Costantino Grana, Rita Cucchiara
Considering the way scientific community tried to solve the problem, we can highlight two fundamental functionalities we would like in a CBIR system: • The ability to search and retrieve specific visual objects: given an image, we want to retrieve in our digital library all images containing the object depicted within the query, that is “I want to find my friend John in all my image library” • The ability to search and retrieve images by appearance similarity: given a sample image or a keyword (textual representation of a pictorial prototype), we want to retrieve the most similar images to the query, that is “I like this seaside landscape, I want to find all similar images I’ve got in my image library”. The global appearance similarity task, especially if it is fast to compute, has also an important side effect, that is the possibility to prune from the digital library all images that will not likely matter for other more specific retrieval techniques. This is a major advantage even because usually the local features exploited for the objects recognition or other more sophisticated global features, and the learning algorithms employed, are quite weighty to compute. The most straightforward representation for global features is the histogram. It allows a scale independent representation, suitable both for color and gradient information, and it has a robust and easy metric to evaluate similarity, that is histogram intersection. Besides, one of the most discriminative characteristic for global features is undoubtedly the color. It brings information about the nature of what we see, it allows inferences about environment depending on brightness conditions, and the way humans perceive the chromatic appearance helps recognition process of an object and changes understanding of the environment self. In this paper, we aim at analyzing in detail the discriminative capabilities of several well known color descriptors. We adopt a standard image database called UCIDv.2, freely available and complete with ground truth. To test their performance, we modify the brightness characteristic of images in order to test the behavior of these features in extreme conditions.
2 Color features for image retrieval The standard definition of HSV histogram is proposed as MPEG-7 scalable color description [5]: a 256-bins histogram providing 16 different values for H, and 4 different values for S and V. Normally, H is defined in the range [0..360], while S and V are defined in [0..1]. HSV36 has been presented in [6]. This procedure aims to improve the representation power of HSV color space. Firstly a non uniform quantization of hue has been introduced, dividing the entire spectrum into 7 classes: red, orange, yellow, green, cyan, blue and purple. Then another quantization has been proposed for S-V plane, in order to distinguish the black area (V