Generating location overviews with images and tags ...

9 downloads 0 Views 2MB Size Report
Oct 24, 2009 - navy pier, sears tower, michigan, cta, lake michigan, lincoln park, shedd, grant park, water tower, wrigley field illinois, park, architecture ...
Generating Location Overviews with Images and Tags by Mining User-Generated Travelogues Qiang Hao†*, Rui Cai‡, Xin-Jing Wang‡, Jiang-Ming Yang‡, Yanwei Pang†, and Lei Zhang‡ †

Tianjin University, Tianjin 300072, P.R. China Microsoft Research Asia, Beijing 100190, P.R. China





{qhao, pyw}@tju.edu.cn {ruicai, xjwang, jmyang, leizhang}@microsoft.com



ABSTRACT

Automatically generating location overviews in the form of both visual and textual descriptions is highly desired for online services such as travel planning, to provide attractive and comprehensive outlines of travel destinations. Actually, user-generated content (e.g., travelogues) on the Web provides abundant information to various aspects (e.g., landmarks, styles, activities) of most locations in the world. To leverage the experience shared by Web users, in this paper we propose a location overview generation approach, which first mines location-representative tags from travelogues and then uses such tags to retrieve web images. The learnt tags and retrieved images are finally presented via a novel user interface which provides an informative overview for a given location. Experimental results based on 23,756 travelogues and evaluation over 20 locations show promising results on both travelogue mining and location overview generation.

Figure 1. The framework of the proposed approach: (1) travelogue mining; (2) location-representative tag generation; and (3) image retrieval and overview generation. Although location overview generation is important, not many research efforts have been dedicated specially to this problem. Some related work focuses on the issue of landmark/scene image selection. For example, [2][3] used content-based technologies to select representative images (or canonical views) for a given landmark or scene. However, such methods usually suffer from the semantic gap between visual features and semantics. Textual information (e.g., location names) is mainly used to trigger image retrieval to obtain images related to a location [4]. As a group of comprehensive tags and related images are very informative to show the characteristics of a location, in this paper, we utilize textual information in a different way. Instead of simply using location names to retrieve images, we first generate locationrepresentative tags from relevant textual information, and then use these tags to retrieve web images. Both tags and images are presented to users through a novel user interface, to finally provide attractive and comprehensive location overviews. A straightforward approach to generating tags that characterize a location is to analyze the tags associated with Flickr images. Recently, mining location-related information from Flickr tags has been actively investigated [5][6]. However, as tagging on Flickr is open, the tags are semantically unconstrained and only a part of them are related to travel. Hence, it is a non-trivial task to distinguish such travel-related tags from others simply based on Flickr data, as there are few contextual evidences that can help. Besides Flickr data, we notice that people are creating and sharing a lot of high-quality travelogues on the Web on virtually any location. In contrast to Flickr data, travelogues not only naturally concentrate on travel-related themes but also contain much richer contextual semantics. In this way, travelogues provide a better data source to generate overviews and virtual tours for locations. Like other UGC data mining, the main challenge in travelogue mining is how to extract the required information from the noisy

Categories and Subject Descriptors

H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – information filtering. H.5.3 [Information Interfaces and Presentation]: Group and Organization Interfaces.

General Terms

Algorithms, Design, Experimentation

Keywords

Travelogue mining, location overview generation, virtual tour

1. INTRODUCTION

The booming Web 2.0 technologies enable people to plan trips by surveying experiences shared by previous travelers, from usergenerated content (UGC) such as Flickr images [1] and textual travelogues. Location overviews with both representative descriptions and images are highly desired to outline travel destinations in such a scenario. We believe a good location overview should be both representative and comprehensive. Being representative helps identify popular landmarks while being comprehensive is even more important because travel is more than landmark browsing; it is more attractive to show other interesting aspects such as styles (e.g., desert, ocean) and activities (e.g., dive, surf), which help travelers experience a virtual tour before their trips. * This work was performed at Microsoft Research Asia. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’09, October 19–24, 2009, Beijing, China. Copyright 2009 ACM 978-1-60558-608-3/09/10...$10.00.

801

and unstructured data. Specifically, in a typical travelogue, the location descriptions are intertwined with personal opinions or other common topics such as accommodation and transportation. We refer to the semantics that characterize locations as local topics and the others as global topics. That is, a local topic summarizes the typical characteristics of a few specific locations, e.g., beach for most locations by the sea; whereas a global topic is commonly shared by many locations, e.g., taxi, airport. Intuitively, local topics are the most desired for location overview generation. Motivated by these observations, we propose a framework of generating location overviews by mining travelogues, which consists of three modules, as shown in Figure 1. First, a probabilistic generative model is trained on a collection of travelogues crawled from the Web to learn global topics and local topics. Then, given a location, its relevant travelogues are projected into the space spanned by the learnt topics to generate location-representative tags based on the words projected onto the local topics. Finally, the generated tags are combined with the location name to trigger an image search engine to retrieve relevant images; the tags and images are eventually organized together to create an informative overview for the given location. Although we mainly focus on generating location-representative tags and only adopt a simple text-based image retrieval method currently, better image selection methods can be easily incorporated into the framework. The rest of this paper is organized as follows. Section 2 introduces the algorithms and implementations. Evaluation results are shown in Section 3. Section 4 gives conclusions and future work.

Algorithm 1. The generative process of a travelogue 1) choose 𝜋𝜋𝑑𝑑 = {𝑝𝑝(𝑥𝑥|𝑑𝑑)}𝑥𝑥∈{𝑔𝑔𝑔𝑔 ,𝑙𝑙𝑙𝑙𝑙𝑙 } , s.t. ∑𝑥𝑥∈{𝑔𝑔𝑔𝑔 ,𝑙𝑙𝑙𝑙𝑙𝑙 } 𝑝𝑝(𝑥𝑥|𝑑𝑑) = 1 2) choose 𝜃𝜃𝑑𝑑 = �𝜃𝜃𝑑𝑑 ,𝑧𝑧 �𝑧𝑧∈𝒵𝒵 𝑔𝑔𝑔𝑔 , s.t. ∑𝑧𝑧∈𝒵𝒵 𝑔𝑔𝑔𝑔 𝜃𝜃𝑑𝑑 ,𝑧𝑧 = 1 3) choose 𝜓𝜓𝑙𝑙𝑑𝑑 = �𝜓𝜓𝑙𝑙𝑑𝑑 ,𝑧𝑧 � 𝑙𝑙𝑙𝑙𝑙𝑙 , s.t. ∑𝑧𝑧∈𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙 𝜓𝜓𝑙𝑙𝑑𝑑 ,𝑧𝑧 = 1 𝑧𝑧∈𝒵𝒵

4) for each word 𝑤𝑤𝑑𝑑,𝑛𝑛 in document 𝑑𝑑: a) choose switch 𝑥𝑥𝑑𝑑,𝑛𝑛 ~Binomial(𝜋𝜋𝑑𝑑 ) b) if 𝑥𝑥𝑑𝑑,𝑛𝑛 = 𝑔𝑔𝑔𝑔, choose a global topic 𝑧𝑧𝑑𝑑,𝑛𝑛 = 𝑧𝑧 𝑔𝑔𝑔𝑔 ~𝜃𝜃𝑑𝑑 c) if 𝑥𝑥𝑑𝑑,𝑛𝑛 = 𝑙𝑙𝑙𝑙𝑙𝑙, choose a local topic 𝑧𝑧𝑑𝑑 ,𝑛𝑛 = 𝑧𝑧 𝑙𝑙𝑙𝑙𝑙𝑙 ~𝜓𝜓𝑙𝑙𝑑𝑑 d) choose word 𝑤𝑤𝑑𝑑,𝑛𝑛 ~�𝑝𝑝�𝑤𝑤�𝑧𝑧𝑑𝑑,𝑛𝑛 ��𝑤𝑤∈𝒲𝒲

Figure 2. The proposed model for travelogue mining Algorithm 2. Parameter estimation for the proposed model E-step (𝑛𝑛)

𝑝𝑝(𝑛𝑛) (𝑥𝑥|𝑑𝑑)𝜃𝜃𝑑𝑑,𝑧𝑧 𝑝𝑝(𝑛𝑛) (𝑤𝑤|𝑧𝑧), 𝑥𝑥 = 𝑔𝑔𝑔𝑔, 𝑧𝑧 ∈ 𝒵𝒵 𝑔𝑔𝑔𝑔 𝑝𝑝(𝑧𝑧, 𝑥𝑥|𝑑𝑑, 𝑤𝑤) ∝ � (𝑛𝑛) (𝑛𝑛) 𝑝𝑝 (𝑥𝑥|𝑑𝑑)𝜓𝜓𝑙𝑙𝑑𝑑 ,𝑧𝑧 𝑝𝑝(𝑛𝑛) (𝑤𝑤|𝑧𝑧), 𝑥𝑥 = 𝑙𝑙𝑙𝑙𝑙𝑙, 𝑧𝑧 ∈ 𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙

M-Step

𝑝𝑝(𝑛𝑛+1) (𝑥𝑥|𝑑𝑑) =

2. MODULE DETAILS

(𝑛𝑛+1) 𝜃𝜃𝑑𝑑,𝑧𝑧

In this section, we describe the detailed algorithms and implementations of the three modules in Subsection 2.1~2.3, respectively.

𝑝𝑝

2.1 Travelogue Mining

𝑤𝑤∈𝒲𝒲

𝑛𝑛(𝑑𝑑, 𝑤𝑤) log �𝑝𝑝(𝑥𝑥 = 𝑔𝑔𝑔𝑔|𝑑𝑑)[�

+ 𝑝𝑝(𝑥𝑥 = 𝑙𝑙𝑙𝑙𝑙𝑙|𝑑𝑑)[�

𝑧𝑧∈𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙

𝑧𝑧∈𝒵𝒵 𝑔𝑔𝑔𝑔

𝑧𝑧 ′ ∈𝒵𝒵 𝑔𝑔𝑔𝑔

(𝑤𝑤|𝑧𝑧) = =

∑𝑤𝑤 ∈𝒲𝒲 𝑛𝑛(𝑑𝑑,𝑤𝑤 )𝑝𝑝(𝑧𝑧 ′ ,𝑥𝑥=𝑔𝑔𝑔𝑔 |𝑑𝑑,𝑤𝑤 ) ∑𝑑𝑑 ∈𝒟𝒟 𝑛𝑛(𝑑𝑑,𝑤𝑤 )𝑝𝑝(𝑧𝑧,𝑥𝑥|𝑑𝑑,𝑤𝑤)

, 𝑥𝑥 ∈ {𝑔𝑔𝑔𝑔, 𝑙𝑙𝑙𝑙𝑙𝑙}

, 𝑧𝑧 ∈ 𝒵𝒵

∑𝑤𝑤 ′ ∈𝒲𝒲 ∑𝑑𝑑 ∈𝒟𝒟 𝑛𝑛(𝑑𝑑,𝑤𝑤 ′ )𝑝𝑝(𝑧𝑧,𝑥𝑥|𝑑𝑑,𝑤𝑤 ′ )

𝑔𝑔𝑔𝑔

, 𝑧𝑧 ∈ 𝒵𝒵 𝑥𝑥 , 𝑥𝑥 ∈ {𝑔𝑔𝑔𝑔, 𝑙𝑙𝑙𝑙𝑙𝑙}

∑𝑑𝑑|𝑙𝑙 =𝑙𝑙 ∑𝑤𝑤 ∈𝒲𝒲 𝑛𝑛(𝑑𝑑,𝑤𝑤 )𝑝𝑝(𝑧𝑧,𝑥𝑥=𝑙𝑙𝑙𝑙𝑙𝑙 |𝑑𝑑,𝑤𝑤 ) 𝑑𝑑 ∑ ′ 𝑙𝑙𝑙𝑙𝑙𝑙 ∑𝑑𝑑 |𝑙𝑙 =𝑙𝑙 ∑𝑤𝑤 ∈𝒲𝒲 𝑛𝑛(𝑑𝑑,𝑤𝑤 )𝑝𝑝(𝑧𝑧 ′ ,𝑥𝑥=𝑙𝑙𝑙𝑙𝑙𝑙 |𝑑𝑑,𝑤𝑤) 𝑧𝑧 ∈𝒵𝒵 𝑑𝑑

=�

𝑙𝑙∈ℒ

, 𝑧𝑧 ∈ 𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙

(3)

(4)

(5) (6)

𝑝𝑝(𝑙𝑙)𝐷𝐷𝐾𝐾𝐾𝐾 (𝜓𝜓𝑙𝑙 ‖𝜓𝜓𝑙𝑙 ̅),

where 𝐷𝐷𝐾𝐾𝐾𝐾 (∙) denotes the Kullback-Leibler divergence, and 𝜓𝜓𝑙𝑙 ̅ = ∑𝑙𝑙∈ℒ 𝑝𝑝(𝑙𝑙)𝜓𝜓𝑙𝑙 , with 𝑝𝑝(𝑙𝑙) equally set to 1⁄|ℒ| if have no prior over locations. The regularizer can be interpreted that the locationspecific distributions over local topics are encouraged to deviate from their "mean" 𝜓𝜓𝑙𝑙 ̅ , leading to both a diverse set of {𝜓𝜓𝑙𝑙 }𝑙𝑙∈ℒ and a set of local topics that are highly correlated with locations.

Given a set of global topics 𝒵𝒵 𝑔𝑔𝑔𝑔 and a set of local topics 𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙 , each word in the travelogue document 𝑑𝑑 ∈ 𝒟𝒟 with location label 𝑙𝑙𝑑𝑑 ∈ ℒ is generated from either (I) a global topic 𝑧𝑧 𝑔𝑔𝑔𝑔 ∈ 𝒵𝒵 𝑔𝑔𝑔𝑔 chosen according to a document-specific distribution over the global topics 𝜃𝜃𝑑𝑑 , or (II) a local topic 𝑧𝑧 𝑙𝑙𝑙𝑙𝑙𝑙 ∈ 𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙 chosen according to a location-specific distribution over the local topics 𝜓𝜓𝑙𝑙𝑑𝑑 . The variable 𝑥𝑥 ∈ {𝑔𝑔𝑔𝑔, 𝑙𝑙𝑙𝑙𝑙𝑙} associated with each word acts as a switch of (I)/(II) and is sampled from a document-specific Binomial distribution 𝜋𝜋𝑑𝑑 . The generative process of 𝑑𝑑 is described in Algorithm 1, with the graphical model representation shown in Figure 2. Thus, the log-likelihood of the collection of travelogues can be written as �

∑𝑤𝑤 ∈𝒲𝒲 𝑛𝑛(𝑑𝑑,𝑤𝑤 ) ∑𝑤𝑤 ∈𝒲𝒲 𝑛𝑛(𝑑𝑑,𝑤𝑤 )𝑝𝑝(𝑧𝑧,𝑥𝑥=𝑔𝑔𝑔𝑔 |𝑑𝑑,𝑤𝑤 )

(2)

with those locations. We measure the correlation between the set of local topics and the set of locations via mutual information, and define a regularizer, which is encouraged to have a high value, as 𝑝𝑝(𝑙𝑙, 𝑧𝑧) 𝑝𝑝(𝑙𝑙, 𝑧𝑧) log 𝐼𝐼(ℒ; 𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙 ) ≝ � � 𝑝𝑝(𝑙𝑙)𝑝𝑝(𝑧𝑧) 𝑙𝑙∈ℒ 𝑧𝑧∈𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙

2.1.1 The Generative Process of Travelogues

𝑑𝑑∈𝒟𝒟

=∑

(𝑛𝑛+1)

𝜓𝜓𝑙𝑙,𝑧𝑧

Probabilistic generative models such as probabilistic latent semantic analysis (PLSA) [7] and its extensions [8][9] have achieved success in many text mining tasks, by assuming that each document is generated from a mixture of topics, where each topic is a probability distribution over words in the vocabulary 𝒲𝒲. We propose a probabilistic generative model which extends the standard PLSA to model the generative process of travelogues with two distinct types of topics: global topics and local topics.

𝐿𝐿(𝒟𝒟) = �

(𝑛𝑛+1)

∑𝑧𝑧∈𝒵𝒵 𝑥𝑥 ∑𝑤𝑤 ∈𝒲𝒲 𝑛𝑛(𝑑𝑑,𝑤𝑤 )𝑝𝑝(𝑧𝑧,𝑥𝑥|𝑑𝑑,𝑤𝑤 )

(1)

2.1.3 Parameter Estimation

The Expectation-Maximization (EM) algorithm is commonly used for maximum likelihood estimation (MLE) of parameters in PLSA-based models. However, due to the introduction of the regularizer, the parameter estimation of the proposed model goes beyond MLE. Hence, we apply the generalized ExpectationMaximization (GEM) algorithm [10] to estimate the parameters (denoted by Ω for simplicity). Similar to [9], the algorithm iteratively finds a better value of Ω, denoted by Ω(𝑛𝑛+1) , given Ω(𝑛𝑛) namely the value of Ω estimated in the last GEM iteration, by solving the constrained optimization problem Ω(𝑛𝑛+1) = argmaxΩ �(1 − 𝜆𝜆)𝒬𝒬�Ω; Ω(n) � + 𝜆𝜆𝜆𝜆(ℒ; 𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙 )� � subject to 𝒬𝒬�Ω; Ω(n) � > 𝑄𝑄�Ω(n) ; Ω(n) � ,

𝜃𝜃𝑑𝑑,𝑧𝑧 𝑝𝑝(𝑤𝑤|𝑧𝑧)]

𝜓𝜓𝑙𝑙𝑑𝑑 ,𝑧𝑧 𝑝𝑝(𝑤𝑤|𝑧𝑧)]�,

where 𝑛𝑛(𝑑𝑑, 𝑤𝑤) denotes the number of occurrences of word 𝑤𝑤 in 𝑑𝑑.

2.1.2 Mutual-Information Based Regularization

As each local topic is expected to characterize a few specific locations, its occurrences in documents should be highly correlated

802

(a) San Francisco

(b) Las Vegas

Figure 3. An illustration of two generated location overviews, with top 10 tags for each location and top three images for each tag. In these overviews, it should be noticed that 1) the larger the node size and the font size, the more representative the tag is; 2) the closer the two nodes, the more similar their semantics are; and 3) the larger the picture size, the more typical the image is. where 𝒬𝒬�Ω; Ω(n) � is the expectation of the likelihood as

Table 1. Top words of example global topics and local topics

𝒬𝒬�Ω; Ω(n) �

Topic

= � � 𝑛𝑛(𝑑𝑑, 𝑤𝑤) � � 𝑝𝑝(𝑧𝑧, 𝑥𝑥 = 𝑔𝑔𝑔𝑔|𝑑𝑑, 𝑤𝑤)log�𝑝𝑝(𝑥𝑥 = 𝑔𝑔𝑔𝑔|𝑑𝑑)𝜃𝜃𝑑𝑑,𝑧𝑧 𝑝𝑝(𝑤𝑤|𝑧𝑧)� 𝑑𝑑∈𝒟𝒟 𝑤𝑤 ∈𝒲𝒲

𝑧𝑧∈𝒵𝒵 𝑔𝑔𝑔𝑔

+ � 𝑝𝑝(𝑧𝑧, 𝑥𝑥 = 𝑙𝑙𝑙𝑙𝑙𝑙|𝑑𝑑, 𝑤𝑤) log�𝑝𝑝(𝑥𝑥 = 𝑙𝑙𝑙𝑙𝑙𝑙|𝑑𝑑)𝜓𝜓𝑙𝑙𝑑𝑑 ,𝑧𝑧 𝑝𝑝(𝑤𝑤|𝑧𝑧)�� 𝑧𝑧∈𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙

and is computed after the E-step in Algorithm 2; 𝜆𝜆 ∈ [0, 1] denotes the trade-off between 𝒬𝒬�Ω; Ω(n) � and the regularizer 𝐼𝐼(ℒ; 𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙 ) in the objective function. As 𝐼𝐼(ℒ; 𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙 ) only depends on {𝜓𝜓𝑙𝑙 }𝑙𝑙∈ℒ , the other parameters can be solved separately by maximizing 𝒬𝒬�Ω; Ω(n) � as in the standard EM algorithm, yielding close-form updating formulas shown with Equation (3) ~ (5) in Algorithm 2. With these parameters calculated and fixed, {𝜓𝜓𝑙𝑙 }𝑙𝑙∈ℒ can be solved iteratively using penalty function methods. Equation (6) gives the starting point for such iterations, which is also the close-form updating formula when 𝜆𝜆 is set to zero.

query, a ranking list of images is retrieved via an image search engine. In our implementation, the ranking lists of both tags and retrieved images are truncated to retain several top tags and several top images for each tag. For any image that simultaneously occurs in multiple lists, we only retain the occurrence with the highest rank and remove the others from the corresponding lists. With the tags and images, the next challenge is how to organize such textual and visual information and present a comprehensive and easy-to-understand overview to users. Actually, besides the tags and images themselves, we still have more information that may be interesting to users, listed as follows. • The representativeness of each tag: Users would like to know which tags are more special for the give location. • The representativeness of each image: Given a tag, users expect to know which image is the most representative one. • The relationships among different tags: Some tags may describe similar aspects of a location. Exploring such relationships can help users better understand the location. The representativeness of a tag is ready-to-use. For image ranking, in current implementation we simply follow the suggestion of the image search engine. For the similarity measure of any pair of tags, we adopt the normalized Google distance (NGD) [11]. The user interface of location overviews is shown in Figure 3, in which we show all the information with a graph structure. In the graph, each node denotes a tag. The larger the node size and the font size, the more representative the tag is. For example, in Figure 3(a), Alcatraz, Pier 39, and Golden Gate Bridge are the most representative tags for San Francisco. Each tag is surrounded by its relevant images. Similarly, the larger the picture size, the more typical the image is. All the tags are organized in such a way that the closer the two nodes, the more similar their semantics are. For example, in Figure 3(b), the tags gamble and casino are adjacent to each other due to their high semantic similarity.

2.2 Location-Representative Tag Generation

Given a location 𝑙𝑙, its relevant travelogues are either retrieved from our data collection or specified by users. These relevant travelogues are aggregated as a virtual document 𝑑𝑑𝑙𝑙 , which is then projected into the topic space spanned by the learnt topics 𝒵𝒵 ≝ 𝒵𝒵 𝑔𝑔𝑔𝑔 ∪ 𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙 , with its probability distribution over the topics inferred iteratively using EM algorithm, as 𝑝𝑝(𝑧𝑧|𝑑𝑑𝑙𝑙 , 𝑤𝑤) =

𝑝𝑝(𝑛𝑛+1) (𝑧𝑧|𝑑𝑑𝑙𝑙 ) =

𝑝𝑝(𝑛𝑛) (𝑧𝑧|𝑑𝑑𝑙𝑙 )𝑝𝑝(𝑤𝑤|𝑧𝑧) , 𝑧𝑧 ∈ 𝒵𝒵, ∑𝑧𝑧 ′ ∈𝒵𝒵 𝑝𝑝(𝑛𝑛) (𝑧𝑧 ′ |𝑑𝑑𝑙𝑙 )𝑝𝑝(𝑤𝑤|𝑧𝑧 ′ )

∑𝑤𝑤 ∈𝒲𝒲 𝑛𝑛(𝑑𝑑𝑙𝑙 , 𝑤𝑤)𝑝𝑝(𝑧𝑧|𝑑𝑑𝑙𝑙 , 𝑤𝑤) , 𝑧𝑧 ∈ 𝒵𝒵. ∑𝑧𝑧 ′ ∈𝒵𝒵 ∑𝑤𝑤 ∈𝒲𝒲 𝑛𝑛(𝑑𝑑𝑙𝑙 , 𝑤𝑤)𝑝𝑝(𝑧𝑧 ′ |𝑑𝑑𝑙𝑙 , 𝑤𝑤)

After the convergence of the inference, the possibility of each unique word 𝑤𝑤 to be a representative tag for location 𝑙𝑙 is measured in terms of its expected number of occurrences (in 𝑑𝑑𝑙𝑙 ) that are projected onto the local topics, yielding the scores as 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑙𝑙,𝑡𝑡𝑡𝑡𝑡𝑡 (𝑤𝑤) = 𝑛𝑛(𝑑𝑑𝑙𝑙 , 𝑤𝑤)

Top Words

room, bed, bathroom, tv, comfort, shower, large, chair, table, Global check, airport, bus, travel, train, car, flight, early, service, shuttle, great, beautiful, amaze, top, spectacular, breathtaking, incredible beach, island, hawaii, lava, kona, waikiki, volcano, hilo, ocean, Local canyon, grand_canyon, rock, rim, hike, zion, trail, utah, cliff, ski, mountain, vermont, snow, stowe, lift, aspen, winter, ice

∑𝑧𝑧∈𝒵𝒵 𝑙𝑙𝑙𝑙𝑙𝑙 𝑝𝑝(𝑧𝑧|𝑑𝑑𝑙𝑙 )𝑝𝑝(𝑤𝑤|𝑧𝑧) . ∑𝑧𝑧 ′ ∈𝒵𝒵 𝑝𝑝(𝑧𝑧 ′ |𝑑𝑑𝑙𝑙 )𝑝𝑝(𝑤𝑤|𝑧𝑧 ′ )

In this way, we finally obtain a list of tags for 𝑙𝑙 (denoted by 𝒯𝒯𝑙𝑙 ), ranked in a decreasing order of their representativeness scores.

2.3 Image Retrieval and Overview Generation

With the location-representative tag list 𝒯𝒯𝑙𝑙 , we can construct multiple queries for image retrieval, by concatenating the location name 𝑙𝑙 and a tag 𝑡𝑡 ∈ 𝒯𝒯𝑙𝑙 in the form of "𝑙𝑙 + 𝑡𝑡 ". Then, for each 803

Table 2. Top tags mined from travelogues by our approach and obtained from Flickr (ground-truth shown in bold and italic) City Top 10 Travelogue Tags Top 10 Flickr Tags

navy pier, sears tower, michigan, cta, lake michigan, lincoln park, illinois, park, architecture, downtown, lollapalooza, music, shedd, grant park, water tower, wrigley field chicagoist, art, skyline, baseball hawaii, waikiki, island, oahu, beach, pearl harbor, diamond head, surf, hawaii, oahu, waikiki, beach, asian, ocean, diamond head, Honolulu snorkel, luau booty, women, sunset strip, casino, show, hotel, bellagio, gamble, hoover dam, mandalay, nevada, casino, hotel, bellagio, strip, wedding, gambling, Las Vegas luxor, slot venetian, neon, ces manhattan, subway, times square, central park, brooklyn, empire state manhattan, brooklyn, central park, art, architecture, music, New York City building, statue of liberty, broadway, ellis island, rockefeller center museum, queens, park, gothamist alcatraz, pier 39, golden gate bridge, cable, muni, union square, sausali- california, golden gate park, alcatraz, bay area, golden gate San Francisco to, fisherman’s wharf, muir woods, golden gate park bridge, park, art, bay, bridge, mission Chicago

activities (e.g., surf, gamble) that are highly correlated with travel; while top Flickr tags are less representative and comprehensive. From the example location overviews illustrated in Figure 3, we can observe that, for locations famous for its landmarks (e.g., San Francisco), the tags tend to be dominated by the landmark names, with a few tags presenting other aspects (e.g., cable, muni). On the other hand, for locations with diverse aspects other than landmarks, our approach captures their characteristics that are attractive to travelers and described extensively in the relevant travelogues. For example, casino, gamble and show are generated to characterize Las Vegas from the perspectives of style and activity.

Figure 4. Precision@K results of tags mined from travelogues by our approach and obtained from Flickr, given 20 groundtruth tags for each location, averaged over 20 locations.

4. CONCLUSIONS AND FUTURE WORK

In this paper, we presented a framework of generating location overviews in the form of tags and images by mining usergenerated travelogues. Given a location, its representative tags are first mined from relevant travelogues and then used to retrieve web images; the tags and images are finally presented via a novel user interface. Evaluation shows that the proposed approach can generate representative and comprehensive location overviews. For the future work, we plan to investigate more effective image selection strategies to take full advantage of the information mined from travelogues. Also, it is necessary to tackle various levels of granularity of locations to better meet application needs.

3. EVALUATION

We crawled 23,756 travelogues from IgoUgo [12] to form a corpus, covering 760 unique locations in the United States. After removing stop-words and stemming, we trained a generative topic model on the corpus to learn 50 global topics and 100 local topics. Table 1 shows some top words (i.e., words with highest probabilities in a topic) of the learnt topics. It indicates that global topics capture semantics (e.g., accommodation, transportation, opinions) common to most locations, while local topics capture semantics (e.g., beach, canyon, ski) specific to only a few locations. We selected 20 locations, each associated with at least 100 travelogues, to evaluate the generated location-representative tags. Due to the lack of ground-truth, we manually generated one as follows. For each location, we first formed a pool of tags by merging the top 50 tags mined from travelogues by our approach, called travelogue tags, and the top 50 tags obtained from Flickr ("tagsForPlace" API), called Flickr tags. Then, 10 graduates were asked to select 20 most representative tags for the location. The 20 tags with the most votes were finally taken as the ground-truth tags. We used the Flickr tags as the baseline and adopted precision@K (i.e., the number of tags that match the ground-truth ones within top K tags divided by K) to evaluate the performance. As the results shown in Figure 4, the travelogue tags match more groundtruth tags than the Flickr tags. With K increasing, the precision@K of travelogue tags is still more than twice as large as that of Flickr tags. One reason is that the top Flickr tags tend to be dominated by words not informative enough to characterize locations, although some common or noisy tags (e.g., travel, city) have already been removed; whereas our approach utilizes global topics to capture and filter out such words. Table 2 gives the top travelogue tags and Flickr tags for five cities, with the tags matching the ground-truth shown in bold and italic. It shows that the tags mined by our approach characterize the corresponding locations clearly and comprehensively, from diverse aspects including landmarks (e.g., statue of liberty), styles (e.g., island, beach) and

5. REFERENCES

[1] Flickr. http://www.flickr.com/ [2] L. Kennedy and M. Naaman. Generating diverse and representative image search results for landmarks. WWW, 2008. [3] I. Simon, N. Snavely, and S. M. Seitz. Scene summarization for online image collections. ICCV, 2007. [4] F. Jing, L. Zhang, and W.-Y. Ma. VirtualTour: an online travel assistant based on high quality images. MM, 2006. [5] L. Kennedy et al. How Flickr helps us make sense of the world: context and content in community-contributed media collections. MM, 2007. [6] E. Moxley, J. Kleban, and B. S. Manjunath. SpiritTagger: a geo-aware tag suggestion tool mined from Flickr. MIR, 2008. [7] T. Hofmann. Probabilistic latent semantic analysis. UAI,1999. [8] Q. Mei et al. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. WWW, 2006. [9] Q. Mei et al. Topic modeling with network regularization. WWW, 2008. [10] R. M. Neal and G. E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models, MIT Press, 1999. [11] R. Cilibrasi and P. Vitányi. The Google similarity distance. IEEE Trans. Knowl. Data Eng, 19(3):370-383, 2007. [12] IgoUgo. http://www.igougo.com/

804