tags: random detroit urbanexploration archi- tecture blight decay urbanblight urbandecay abandoned trespass ...america unitedstates piano instrument ...
IMAGE RETRIEVAL AND WEB 2.0 – WHERE CAN WE GO FROM HERE? Christian Bauckhage, Tansu Alpcan
Robert Wetzker, Winfried Umbrath
Deutsche Telekom Laboratories 10587 Berlin, Germany
DAI-Labor, Technische Universit¨at Berlin 10587 Berlin, Germany
ABSTRACT Compared to only a few years ago, today there is an abundance of annotated image data available on the Internet. For researchers on image retrieval, this is an unforseen but welcome consequence of the rise of Web 2.0 technologies. Popular social networking and content sharing services seem to hold the key to the integration of context and semantics into retrieval. However, at least for now, it appears that this promise has to be taken with a grain of salt. In this paper, we present preliminary empirical results on the tagging behavior of power users of content sharing and social bookmarking services. Our findings suggest different promising research directions for image retrieval and we briefly discuss some of them. Index Terms— Web 2.0, image tagging, image retrieval 1. INTRODUCTION With the coming of age of interactive web services, which are commonly referred to as Web 2.0 technologies [1], access to huge amounts of labeled image data is suddenly not a problem anymore. Popular photo sharing and storage services such as flickr, photobucket, or whoophy have attracted very active communities of users who upload and annotate, several thousand pictures a day. For example, at the time of this writing, there are more than two billion pictures uploaded to flickr and this number is currently growing at a speed of about 3500 photos per minute. For researchers in computer vision and image retrieval, this brings opportunities and challenges alike. On the one hand, vast collections of manually labeled images provide the perfect foundation for statistical learning about image content and corresponding efforts towards the daunting problem of general image categorization are increasing [2, 3]. On the other hand, folksonomies, i.e. collaboratively created collections of tags one can find on the Internet, are already notorious for being misleading, less reliable, or even contradictory [4]. A possible explanation for these tendencies can be found in the fact that, for users, the social aspects of participating in an online community are of primary interest. To them, sharing their photos, tagging them, and commenting on other people’s photos is an act of communication. As
such, it will not merely deal with descriptions and semantics, but will also involve aspects of pragmatics. In this paper, we present preliminary empirical findings we obtained from projects fathoming the use of social networking services and online communities as a source of contextual and semantic information for information retrieval. We believe that these findings suggest several interesting research directions for image and video retrieval, and we discuss them briefly. 2. THE PRAGMATICS OF TAGS In the period from October 2007 to February 2008, we collected a data set of 43.252 images belonging to 200 different categories. These images were retrieved from flickr by means of tag-based searches over most interesting images. Consequently, our data collection will not be a representative sample of the kind of pictures generally found on flickr. Rather, it will be biased towards high-quality pictures uploaded by professional photographers or avid and ambitious hobbyists whose photos feature prominently among flickr’s most interesting pictures. The 200 categories comprise adjectives (above, bright, calm, dry, energetic, . . . ), adverbs (decaying, growing, intriguing, . . . ), and nouns which cover a large spectrum of objects, entities, or events. For instance, there are abstracta (beauty, circle, energy, mood, . . . ), animals (butterfly, camel, eagle, . . . ), man-made artifacts (bottle, castle, pipes, . . . ), places (airport, alps, city, india, . . . ), plants (orchid, rose, tree, . . . ), or times (autumn, dawn, fall, . . . ). Table 1 further characterizes our experimentation data. The images we downloaded came along with a total of
Table 1. Statistics obtained from our flickr test set. No. of images retrieved from flickr.com: No. of categories : No. of tags: No. of pruned tags: No. of different pruned tags: average number of pruned tags per image:
43.252 200 1.139.389 1.098.280 98.346 26
occurrence counts
7000 5000 3000 1000
tags per rank
100 200 300 400 500 600 rank w.r.t. occurrence counts
1000
taken as well as how it makes them feel. Note that in our data set we did not collect comments posted by other users. Similar to the tags which photographers assign to their pictures, user comments, too, provide a wealth of contextual information, most notably again on how a photo is perceived. Research Direction: The tagging behavior that becomes apparent from Tab. 2 suggests that it is worthwhile to attempt to automatically learn about the aesthetic content of pictures. First steps into this direction have been made already [5], but social networking sites have not yet been considered as a source of data that could drive this research. Being able to assess emotional qualities or pragmatic aspects of an image would, of course, be of great interest to automatic retrieval.
100
3. WEB WIDE CONTEXT
10 100 200 300 400 500 600 rank w.r.t. occurrence counts
Fig. 1. Tag frequencies obtained from our flickr test set. 1.139.389 tags. Pruning those tags consisting only of two or less characters (numerals or literals) left us with 1.098.280 tags. Collecting them into a dictionary revealed that are were 98.346 distinct tags in our data set. Figure 1 further shows that individual tag frequencies follow an exponentially decaying distribution: two of the tags in the dictionary occurred more than 7000 times, four tags occurred more than 5000 times, eight tags occurred more than 3000 times, etc. More than half of the tags in the dictionary, i.e. 46.544, however, occurred just once as a content descriptor for the images in our collection. So far, this empiric agrees with what one would expect to find according to Zipf’s law for natural language corpora. A more surprising result comes from Tab. 2 which lists the 45 most frequent tags in our collection. Given our choice of image categories, our data collection can be expected to be biased towards natural scenes and landscapes and this is actually well reflected by the tags in the table. Interestingly, however, the most frequent tag is “abigfave”, a term that does not refer to any natural object or entity. Rather, it is the name of one of flickr’s groups. All in all, there are 10 such tags among the top 45 ranking tags in our collection. All these groups are invited only groups, i.e. groups where flickr users can only post their photos when invited to do so because the group administrators consider them to be outstanding. It therefore seems plausible to conclude that users tag their photos thusly not to describe their content but to impress other users. Moreover, several tags in Tab. 2 refer to camera names or photographic techniques or they describe feelings of awe or appreciation. As with the names of exclusive groups, it appears that users want to tell more about a picture than just describing it’s content. They frequently share how a picture was
In an ongoing research project, we are considering the problem of trend discovery and estimation of user preference by means of mining social bookmarking data. Social bookmarking systems allow users to save links to web pages which they want to remember or share with others. Most social bookmark services also allow users to organize and augment their bookmarks with informal tags. They enable viewing bookmarks associated with a chosen tag and provide information about the number of users who have bookmarked the same page. Web feeds allow subscribers to become aware of new bookmarks as they are saved, shared, and tagged by other users. Popular bookmarking services also feature possibilities such as ratings and comments on bookmarks, the ability to import and export bookmarks from browsers, web annotation, and groups or other social network features. As they gather personal link collections in one place, bookmarking services therefore are interesting repositories for data mining for personalization. Table 3 presents some statistics of a data collection we retrieved from del.icio.us. Almost 100.000 of the 141.000.000 bookmarks we retrieved are pointing to flickr. In total, there are 62.266 distinct bookmarks to flickr which were provided by 45.976 users. Research Direction: Images that are being linked to from a bookmarking service have a much larger context than ordinary images. From the collection of links that accompany a link to an image or a video, one might deduce user profiles and preferences and again learn about emotional qualities. As social bookmarking services are very dynamic systems, one may explore how quickly (if at all) a bookmarked picture gains popularity or which photos are popular when. 4. HARNESSING WEB INTELLIGENCE Figure 2 shows three random samples of images which were selected from three randomly chosen categories in our flickr data set. The categories are wet, spring, and piano. In addition to illustrating that –with respect to real world scenes–
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
abigfave nature the bravo blue sky water clouds landscape green red light sunset impressedbeauty macro
7243 7059 6061 5683 5231 5167 4646 3879 3864 3807 3736 3545 3528 3263 2747
16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.
canon white canada hdr yellow nikon aplusphoto trees interestingness tree anawesomeshot reflection explore naturesfinest flower
2741 2725 2685 2644 2634 2603 2556 2481 2444 2423 2401 2367 2354 2300 2298
31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45.
topf25 sun diamondclassphotographer searchthebest color orange i500 wow night beautiful magicdonkey supershot quality outstandingshots lake
2265 2240 2213 2173 2149 2147 2132 2088 2044 2034 1965 1963 1881 1874 1849
Table 2. The 45 most frequent tags in our flickr.com test set together with their occurrence counts.
Table 3. Statistics obtained from our del.icio.us test set. No. of bookmrks retrieved from del.icio.us: No. of bookmrks to flickr.com: No. of different bookmrks to flickr.com: No. of bookmrks to main page of flickr.com: No. of users bookmarking flickr.com:
141.000.000 98.918 62.266 6.126 45.976
semantic labels can have a fairly wide range of connotations, the figure also points out that different users show fairly different tagging behaviors: while the user who tagged the picture in the middle of the upper most row assigned only 5 tags to the image, the user who tagged the image on the left of the lower most row did actually assign 171 tags to the picture. Currently, we are investigating as to how we may apply techniques which we developed in the context of a social search engine1 in order to summarize information provided in a tag cloud. Our original algorithm for matching user queries to appropriate experts is based on an ontology tree that covers various areas of knowledge. Mapping expert profiles and user queries to subtrees of the ontology allows for addressing the matching problem by means of simple operations in a semantics-induced vector space [6, 7]. The categories shown in the last line of text of each row in Fig. 2 were automatically deduced by our system using a taxonomy that was retrieved from the open directory project dmoz.org and populated with keywords for text comparison using a web crawler. Even though our search engine had not been tailored to the data we retrieved from flickr, the automatically determined categories are surprisingly accurate. Research Direction: Content summarization methods can help removing spurious and ambiguous tags. Abundant 1 see http://www.askspree.de and http://sourceforge.net/projects/spree/
amounts of labeled images call for novel, hybrid strategies that combine text retrieval approaches with techniques from content-based image retrieval. Trees of visual vocabularies have already proven to be powerful approaches to image retrieval; taxonomies of visual vocabularies may help understanding images on higher levels of abstraction. 5. REFERENCES [1] J. Musser and T. O’Reilly, “Web 2.0 – Principles and Best Practices,” Tech. Rep., O’Reilly Radar, Nov. 2006. [2] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge Results,” available online. [3] G. Griffin, A. Holub, and P. Perona, “Caltech-256 Object Category Dataset,” Tech. Rep. 7694, California Institute of Technology, 2007. [4] E. Peterson, “Beneath the Metadata: Some Philosophical Problems with Folksonomy,” D-Lib Magazine, vol. 12, no. 11, 2006. [5] R. Datta, D. Joshi, J. Li, and J.Z. Wang, “Studying Aesthetics in Photographic Images Using a Computational Approach,” in Proc. ECCV, 2006, vol. 3, pp. 288–301. [6] F. Metze, C. Bauckhage, T. Alpcan, K. Dobbrott, and C. Clemens, “The “Spree” Expert Finding System,” in Int. Conf. on Semantic Computing. 2007, pp. 551–558, IEEE. [7] R. Wetzker, T. Alpcan, C. Bauckhage, W.Umbrath, and S. Albayrak, “An unsupervised hierarchical method for automated document categorization,” in Proc. Int. Conf. on Web Intelligence. 2007, pp. 482–486, IEEE.
tags: fog foggy trees spring road lake tiorati harriman state park seven 7 lakes drive sepia lost deleteme8 saveme10 savedbythedeletemegroup i500 interestingness9 explore13may06 mist misty moist nebbia mostinterestinginternal wet cold interestingness1 chromatoned topf250 topv8888 SPREE: /Recreation/Outdoors/
tags: matildeb lisbon portugal landscape spring scenary lagoadealbufeira nationalpark weeklysurvivor nikonstunninggallery matilde bestnaturetnc06 flickrsbest md kao
SPREE: /Arts/Architecture/
tags: random detroit urbanexploration architecture blight decay urbanblight urbandecay abandoned trespass . . . america unitedstates piano instrument geotagged . . . myfavs myfavorite talent pianofree . . . (171 tags in total) SPREE: /Arts/
tags: men pool furry wet speedo
tags: 8thave rain window street cars wet nyc purge28 purgesurvivor trafficattack msc abigfav abigfave bratanesque
SPREE: /Sports/Water Sports/
SPREE: /Shopping/Niche
tags: high desert southern california spring 2007 hb19 colorful sparkling eggs canon powershot captured nice easter image abigfave anawesomeshot superbmasterpiece bloggedbyabigfave . . . instantfave 100faves100comments1000views (32 tags in total) SPREE: /Recreation/Birding/
tags: lilacs flowers blue purple sunlight michigan wallpaper stuckincustomsbrilliantcabal iloveit creamofthecrop spring quality bravo
tags: piano keyboard black white bach hdr photomatix music classic tthdr musician playitagainsam ivory fivestarsgallery fsgmusic artlibre
tags: light night classical music score piano keys keyboard chopin nocturne artlibre topv111 diamondclassphotographer topvaa bravo
SPREE: /Arts/Music/
SPREE: /Computers/Systems/Atari/
SPREE: /Science/Agriculture/Field Crops/
Fig. 2. Examples of images and corresponding tags retrieved from flickr.com. To produce this figure, three categories (wet, spring, piano) and three images per category were randomly selected from our test set. Note the the variety of motifs, tagging behavior, and tag semantics in this random sample. In addition, the figure shows semantic categories our SPREE system deduced from the available tags.