IntentSearch: Interactive On-line Image Search Re-ranking Jingyu Cui
Tsinghua University, Beijing, China.
[email protected]
Fang Wen
Microsoft Research Asia, Beijing, China.
[email protected]
Xiaoou Tang
Chinese University of Hong Kong, Hong Kong, China.
[email protected]
ABSTRACT In this demo, we present IntentSearch, an interactive system for realtime web based image retrieval. IntentSearch works directly on top of Microsoft Live Image Search, and re-ranks its results according to user specified query image(s) and the automatically inferred user intention. Besides searching in the interface of Microsoft Live Image Search, we also design a more flexible interface to let users browse and play with all the images in the current search session, which makes web image search more efficient and interesting. Please visit http://mmlab.ie.cuhk.edu.hk/intentsearch for the experience.
Categories and Subject Descriptors H5.2 [Information interfaces and presentation]: User Interfaces—Graphical user interfaces (GUI), Prototyping; H3.3 [Information storage and retrieval]: Information Search and Retrieval—Information filtering
General Terms Design
Keywords
Figure 1: Pipeline of IntentSearch. a face in the query image, then a face recognition algorithm would be much more effective than a general texture classification algorithm. In this paper, we develop a learning based technique to model user intention in order to adjust feature selection and combination. Text-based search results are re-ranked in an interactive manner according to user intention. For the first time, we can achieve realtime performance for actual Internet scale image search re-ranking. Our online demo consists of two interfaces, which allow user to either re-rank in Live Image Search, or browse in Rank Collage.
Image search, Intention, Visual
2. SYSTEM IMPLEMENTATION
1.
2.1 Inferring User Intention
INTRODUCTION
Most of the image search engines [1] nowadays use mainly text-based information. Since “surrounding text” is not always accurate, the returned images are often noisy and disorganized. Content-based image retrieval [3] uses visual feature to evaluate image similarity. However, due to the diversity of images and features, an universal feature set and distance measurement for all the images is hard to find. For example, it is difficult to find a feature to work well for both portrait images and scenery images. Relevance feedback [4] uses user labeled images to improve image rank. However, most relevance feedback methods require online training based on feedback samples, and cannot be easily used for realtime online application. Understanding the high level user intention is critical for truly effective utilization of human input. For example, if the system can understand that the user intends to search Copyright is held by the author/owner(s). MM’08, October 26–31, 2008, Vancouver, British Columbia, Canada. ACM 978-1-60558-303-7/08/10.
Given images returned by text based image search engines, we try to leverage the power of visual features adaptively to re-rank the search results. The pipeline of IntentSearch is shown in Figure 1, and the details are shown below: Human can easily categorize images into high level semantic classes, such as scene, people, and object. It is observed in our experiment that images inside each of these classes are similar in terms of which kind of features can discriminate them best from other images. Inspired by this observation, we roughly summarize general images into typical intention categories: General Object (Images containing close-ups of general objects), Object with Simple Background, Scene (Scenery images), Portrait (Images containing portrait of a single person), and People (Images with general people inside, and are not “Portrait”). Note that we are proposing a framework of intention categorization, which can be extended to more general cases with more than five intentions. We design several specific “attributes” advised by human prior to be highly related to intention categorization. Using these attributes, we are actually mapping the images into a
space in which intention categorization is relatively easy. As a second step, we use C4.5 decision tree to handle the set of inhomogeneous attributes. The attributes for intention categorization includes face existence, face number, face size, face position, directionality, color spatial homogeneousness, edge energy, and edge spatial distribution, etc. With these attributes, we train a C4.5 decision tree on an image set with manually labeled intentions. The training process decides decision boundaries of the intention categories in the feature space defined by those attributes, and intention of a new input image is easily decided by applying the rules of the decision tree to it. Each Intention Category corresponds to a Search Schema, represented by a vector α, which expresses the relative importance of all F different features available to characterize an image. The larger αm is, the more important the mth feature will be for the query image. The adaptive similarity measurement for the specific query image is a linear combination of its similarities on different features weighted by F P α: s(·) = αm sm (·), where sm (·) is the similarity of a
Figure 2: Live Image Search Re-ranking.
m=1
new image to the query image on the mth feature. For each category, the corresponding Search Schema α is obtained offline by training over user labeled images to optimize their performance under certain intention category. After given the search results, the user can drag more relevant images to the “Additional Images” pad, and the rank algorithm yield updated results. This step is similar to positive relevance feedback, but with a different user interface and experience. Users don’t feel like “labeling data” using our interface. Instead, they just drag the images they like, and more better images will appear, as though they are collecting good images. Also no online training is required since each image adopts the same intention feature weight as the original query. Detailed algorithm is described in [2].
2.2
On-line Live Image Search Re-ranking
Our system works directly on top of Live Image Search [1], with almost the same Web interface (Figure 2). After typing a query keyword, the original result of Live Image Search based on text is presented to user. The user can then drag an image to the “Key Image” pad, and initiate a content-based query. Our background algorithm infers the best Intention for the query image, then gives the re-ranked results based on an adaptive feature set. The user can either drag another image from current result to the “Key Image” pad for another round of query, or drag it to the “Additional Images” pad to let the system update the results.
2.3
Search Results Browsing by Rank Collage
User can switch to a Rank Collage view (Figure 3) anytime to browse the whole set of images in current search session. All images are presented in a collage, with the images near the center being bigger and more relevant to user query, and images further from the center being less relevant. When a new image is dragged to the center, a new round of search is started with the new image as query. Endless zooming, various operations on a single image, and multiple search results side-by-side comparison are also supported.
3.
Figure 3: Search Results Browsing by Rank Collage.
CONCLUSION
By inferring user search intention, we seamlessly connect visual image features with text-based image search, and en-
able more flexible and accurate image search. Our first interface allows user do interactive image search in Live Image Search, and the second one provides more enriched interaction to let user browse and play with the whole set of images online in realtime. Please visit http://mmlab.ie.cuhk.edu. hk/intentsearch to try the demo.
4. REFERENCES [1] http://www.live.com/?scope=images. [2] J. Cui, F. Wen, and X. Tang. Real time google and live image search re-ranking. In MULTIMEDIA ’08: Proceedings of the 16th annual ACM international conference on Multimedia, 2008. [3] R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 2007. [4] X. S. Zhou and T. S. Huang. Relevance feedback in image retrieval: A comprehensive review. Multimedia Systems, 8(6):536–544, 2003.