Images can be accessed and searched through web image search engines: the ... One way would be to use the focus of camer
Using the Knowledge of Object Colors to Segment Images and Improve Web Image Search Christophe Millet1,2, Isabelle Bloch2 and Adrian Popescu1 1
CEA/LIST/LIC2M 18 Route du Panorama, 92265 Fontenay aux Roses, France
[email protected] ,
[email protected] 2 GET-ENST - Dept TSI - CNRS UMR 5141 LTCI Paris, France
[email protected] Abstract With web image search engines, we face a situation where the results are very noisy, and when we ask for a specific object, we are not ensured that this object is contained in all the images returned by the search engines: about 50% of the images returned are off-topic. In this paper, we explain how knowing the color of an object can help locating the object in images, and we also propose methods to automatically find the color of an object, so that the whole process can be fully automatic. Results reveal that this method allows us to reduce the noise in returned images while providing automatic segmentation so that it can be used for clustering or object learning.
1. Introduction The Internet contains many images that we would be willing to exploit, for example for object learning. Images can be accessed and searched through web image search engines: the users enter a list of keywords and gets, as a feedback, a list of images whose title match these keywords, or whose accompanying text on the web page contains them. However, the raw set of images obtained that way could not be used directly for object learning, because it is too noisy. With a quick evaluation of four web image search engines (Google, Yahoo!, Ask and Exalead) with 50 queries, we found out that in the first 50 images about 50% of the images are noise: they are not related to the query. One of the reasons is that the indexing and retrieval of these images is only based on text, and does not consider the content of the images. In this paper, we focus on the colors of the objects we are looking for to analyze the content of images, and improve the results returned by web queries. There is also no database describing the colors of objects, so, even though it is not impossible to construct such a database by hand, some ideas to obtain that information automatically are proposed in this paper. This method is applied to the following problems: – automatic segmentation of the main object in images, and especially segmentation of objects with more than one color, – reducing noise from web search engines, – re-ranking images from web search engines. 1.1 Automatic segmentation
When retrieving images for a keyword from the Internet, we would like to automatically isolate the object corresponding to that keyword in the image. This kind of automatic segmentation of objects is a difficult problem: most images contain more than one object, so, when we apply classical automatic segmentation, it is difficult to know what object is the one we are looking for. One way would be to use the focus of cameras to determine the difference
between the object in focus and the background which is blur, as Swain et al (Swain, 1995) did for video images segmentation. However, in the case of irrelevant images, there can be an object in focus that is not the object we have queried. To automatically locate the object in images from the Web, Ben-Haim et al. (2006) first segmented the images into several regions, and then clustered these regions in order to find similar regions in different images. The largest cluster, called “significant cluster” is considered as the one about the object. A similar approach is presented in (Russel, 2006) where the authors make use of multiple segmentations for each image to discover objects in an image collection, in order to improve the segmentation quality over techniques that use only one segmentation. This is an interesting approach, however, our personal experiments with queries about outdoor animals demonstrated that the largest cluster is often about context (sky, grass, ...) or is a cluster of dark regions, such as shadows or black background. Therefore, what we propose in this paper is to use the semantics of objects. Since we know what object the picture is about, we can use knowledge about that object, here its color, to guide the automatic segmentation. Another motivation for this method is the possibility to segment objects, like zebra or panda, which are made of many colors. Segmentation of such objects is not possible with classical algorithms because of the strong edges between the colors. In (Liapis, 2004), the authors were able to segment a zebra using luminance histogram, where each pixel is classified, and then the classes are propagated. The drawback of this method is that the number of classes must be specified prior to the segmentation, so that it would not work for intricate images where the background can be divided in several regions. 1.2 Cleaning and re-ranking web images
Even though it is an acknowledged issue that results of image search engines have a lot of noise and are not sorted, there has not been much work yet in this area. Lin et al (2003) used a relevance model on texts in the html web pages containing the images to re-rank web image search results. They reported an increase of precision from 30% to 50% if they consider only the first 50 documents, but without using the content of the image. Cai et al (2004) proposed to cluster web image search engines results using both the content of the image, and the texts and links from the web page. Their idea was that some queries are polysemic and may return results related to different topics. For example, the query pluto mixes images about the dwarf planet and about the Disney character; or the query dog returns images of various species of dogs, and by clustering, one can separate those different topics. However, they did not try to clean or rank the images. A way to clean images is to try and detect some similarities between the different images, for example using interest points. This method is very difficult to apply to web images because, as we said, not all images contain the object of interest: the noise is in average 50% and can be up to 85% for some queries. Furthermore, there are many variations in qualities of pictures, so that it can be hard to find a repetitive pattern. In Ben-Haim et al. (2006), images are reranked using as a distance for a given image the smaller distance between one of its blobs and the mean of their “significant cluster”, which has been described in Section 1.1. Fergus et al. (2004, 2005), published an approach giving promising results for cleaning, reranking and learning from Google's Image Search results. They first apply several kinds of circular region detector based on interest points, and then compute the SIFT descriptor on these regions. They are used to train a translation and scale invariant probabilistic latent
semantic analysis (TSI-pLSA) model for object classification. This model can then be applied on the raw data to re-rank images, and clean them if the last ones are discarded. Fergus et al. reported an improvement of about 20% precision in average at 15% recall (that is, discarding 85% of the images). In (Millet, 2006) we explained how it is possible to clean and re-rank images using clustering techniques. Images were first automatically segmented with a waterfall segmentation of the image in twenty regions, where we merged all regions that had no contact with the border of the images and considered it as being the object to study. This merged region was then indexed with texture and color features, and the indexes were used by a shared nearest neighbor (SNN) clustering algorithm. Images which were not clustered were removed, and for each cluster, the probable colors of the objects were used to sort the clusters, putting on top the clusters containing the most objects of the expected colors. It gave promising results, but the segmentation was not always able to identify the right object. In this paper, instead, we use the color information at the segmentation stage, so that all the obtained segmented regions are of the correct colors. Therefore, the color itself can not be used to sort and rank images, but the size and position of the segmented regions can serve that purpose. We first explain how we associate color names with the HSV values of the pixels in Section 2. Then, we discuss in Section 3 the possibility to know the most frequent colors of a given object, and how to recognize and deal with objects which do not have a particular color. In Section 4, we detail the algorithm we use to segment a picture given one or more colors we want to focus on. Eventually, we explain and evaluate the applications proposed in Section 5 to bring out the interest of the proposed method for segmentation, and then conclude with future works. 2. The color of a pixel In this section, we develop a model to match the pixel values with color names. Naming colors is not a trivial issue, firstly because the number of colors to consider has to be defined, and secondly the separations between colors are unclear. Berk et al. (1982) compared several systems for naming colors in respect to how easy it is for a user to name the colors, and found that “the load on the casual user's memory is too great for this system to be easily used”. Then, they proposed a color-naming system (CNS) consisting of 10 basic colors: gray, black, white, red, orange, brown, yellow, green, blue, purple. Then, several adjectives can be used such as dark, medium, light, grayish,vivid,... and they also add the -ish forms such as reddish brown which allows them to build a complex system of 627 color names. Though, when naming a color, the first reaction for most people is to use only one of the 10 basic colors listed above. In our opinion, pink is lacking in the CNS system, since it is often used for naming objects: a lot of clothes, often for girls, are of that colors, and animals such as domestic pig or lesser flamingo are usually described by people as pink. Therefore, we have decided to consider 11 colors, introducing the pink color to the CNS. In order to map these 11 colors with pixel values, the HSV (hue, saturation, value) color space is used, since it is more semantic than RGB, and therefore makes it easier to deduce the color name of a pixel. Especially, the hue value is close to the concept of color names. In our HSV color space, each component has been scaled between 0 and 255. Pixels with a low saturation (S < 20 ) are considered achromatic and are assigned a negative hue. Our first idea was to separate clearly the different colors in the HSV space, that is: to each (h,s,v) triplet, associate one and only one color name. Though, we noticed some pixels for
which we named different colors for different contexts, and we decided to take the ambiguity between colors into account and to associate sometimes more than one color name for a given (h,s,v) triplet. So, instead of associating a color name to any pixel value, we defined for each color name a range of (h,s,v) values that it occupies. The complete correspondence is detailed in Table 1. Color
Hue
Saturation
Value