Automatic Generation of Social Event Storyboard from Image Click ...

13 downloads 431803 Views 2MB Size Report
Example of Adele storyboard, from July 2012 to December 2012. The first event is the ...... Timeline generation: Tracking individuals on twitter. In Proceedings of ...
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015

1

Automatic Generation of Social Event Storyboard from Image Click-through Data Jun Xu, Tao Mei ∗ , Senior Member, IEEE, Rui Cai, Member, IEEE, Houqiang Li, Senior Member, IEEE and Yong Rui, Fellow, IEEE

Abstract—Recent studies have shown that a noticeable percentage of web search traffic is about social events. While traditional websites can only show human-edited events, in this paper we present a novel system to automatically detect events from search log data and generate storyboards where the events are arranged chronologically. We chose image search log as the resource for event mining, as search logs can directly reflect people’s interests. To discover events from log data, we present a Smooth Nonnegative Matrix Factorization framework (SNMF) which combines the information of query semantics, temporal correlations, search logs and time continuity. Moreover, we consider the time factor an important element since different events will develop in different time tendencies. In addition, to provide a media-rich and visually appealing storyboard, each event is associated with a set of representative photos arranged along a timeline. These relevant photos are automatically selected from image search results by analyzing image content features. We use celebrities as our test domain, which takes a large percentage of image search traffics. Experiments consisting of web search traffic on 200 celebrities, for a period of six months, show very encouraging results compared with handcrafted editorial storyboards. Index Terms—Event storyboard, social media, click-through data, non-negative matrix factorization, image search.

I. I NTRODUCTION As social creatures, people are by nature curious about others’ activities. Information on famous persons have often been of particular interest. This tendency has remained true in the internet era [35]. Since common search engines as well as news websites often experience massive search demands about a myriad of current affairs, a great amount of news and events are collected from the web. However, most social events originate from professional editors. In this case, it is quite meaningful to detect such events for users automatically instead of manual efforts. Current search engines often show the summaries of famous persons as a simple profile. From such a summarization, people can easily get a celebrity’s basic information like portrait, nationality, birthday, representative works, and awards. The search engine summaries can be considered a concentrated ∗ Correspondence author: T. Mei ([email protected]). J. Xu ([email protected]) is with the University of Science and Technology of China. This work was performed When J. Xu visited Microsoft Research as an intern. H. Li ([email protected]) is with the University of Science and Technology of China. T. Mei, R. Cai and Y. Rui are with Microsoft Research, Beijing, China {tmei, ruicai, yongrui}@microsoft.com. This work was supported in part to H. Li by NSFC under Contract 61325009, 61272316. Copyright (c) 2009 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to [email protected].

Fig. 1. Screen shot of www.people.com, a website for celebrity news. The marked region shows recent news of Britney Spears, arranged along timeline.

version of a person’s larger relevant event collection. Although such a short profile is very helpful for quickly introducing a person, it cannot satisfy people’s curiosity for more detailed and timely information of celebrities. By contrast, some professional websites provide comprehensive and upto-date information on famous persons. Fig. 1 shows a screen shot of www.people.com, a website well-known for celebrity news and photos. In the marked region of Fig. 1, it shows Britney Spears’s recent news (events) arranged along a timeline. This is a very nice feature for fans to trace their idols’ activities. Almost all these websites are powered by human editors, which inevitably leads to several limitations. First, the coverage of human center domains is small. Typically, one website only focuses on celebrities in one or two domains (most of them are entertainment and sports), and to the best of our knowledge, there are no general services yet for tracing celebrities over various domains. Second, these existing services are not scalable. Even for specific domains, only a few top stars are covered1 , as the editing effort to cover more celebrities is not financially viable. Third, reported event news may be biased by editors’ interests. In this paper, we aim to build a scalable and unbiased solution to automatically detect social events especially related to celebrities along a timeline. This could be an attractive supplement to enrich the existing event description in search result pages. In this paper, 1 e.g.

http://www.people.com/people/celebrities/

1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015

Fig. 2. Example of Adele storyboard, from July 2012 to December 2012. The first event is the expectation that she would give birth soon. The second event is the release of her new album “Skyfall”. The third event is about her weight.

we will focus on those events happening at a certain time favored by users as our celebrity-related social events. Meanwhile, about 30% of search queries aim to search for real-world events according to statistics from a commercial search engine data [23]. A further-70% of these queries are related to celebrities, including artists, sports stars, politicians, scientists, entrepreneurs, et al. Thus, we will focus on events related to celebrities because of the volume of related search queries and the ability to obtain ground truth events from professional websites. The most related research topics to this paper are event/topic detection from Web. There have been quite a few works that examine related directions [2], [4], [9], [10], [19], [23], [25], [26], [30], [34], [38]. The most typical data sources for event/topic mining are news articles and weblogs. Various statistical methods have been proposed to group documents sharing the same stories such as [7], [10], [19], [23]. Temporal analysis has also been involved to recover the development trend of an event like in [13], [16], [33]. However, we argue that news articles are not good enough for mining events considering uses’ interests, as most reports from mainstream media are dominated by breaking news and influential social events. Similarly, weblog is not an ideal choice as blog posts are mainly about individual stories covering regular people rather than interesting events for all general users. Besides news and weblog data, there have been some recent research efforts attempting to extract events from web search logs [21], [40]. According to our study, search log data is a good data resource for detecting those events and gaining user attention instantly , because 1) search logs may cover a wide variety of real-world events. 2) search logs directly reflect users’ interests, as they are in essence a majority voting over billions of internet users; and 3) search logs respond promptly to events happening in real time. Discovering events from a search log is not a trivial task. Existing work on log event mining [21], [40] mostly focus on merging similar queries into groups, and investigating whether these groups are related to semantic events like “Japan Earthquake” [40] or “American Idol” [21]. Basically, their goals are to distinguish salient topics from noisy queries. Directly applying their approaches will fail as the discovered topics are more likely related to vast and common topics, which may be familiar to most users. Here, we would like to detect those more interesting social events to entertain users and fit their browsing taste, which could be supplementary

2

to some current knowledge bases. Taking singer Adele as an example, major groups of queries on Adele are her more popular songs like “Rolling in the Deep” or “Someone Like You”. Event news like her pregnancy are considered noisy data in the clustering, as the related queries are not prominent enough in comparison to the popular ones about her songs. Therefore, we need an elaborate way to balance the discovery process: on the one hand, we should distinguish informative queries from noisy ones; On the other hand, we should prevent social events from being overwhelmed by popular queries. In addition, we need to fully consider the time factor when discovering social events since they will often have a burst in the time dimension. Events can be more easily recognized if we add time information into consideration. Therefore, to achieve this goal, a novel approach is proposed in this paper using Smooth Nonnegative Matrix Factorization (SNMF) for event detection, by fully leveraging information from query semantics, temporal correlations, and search log records. We use the SNMF method rather than the normal NMF method or other MF method to guarantee that the weights for each topic are non-negative and consider the time factor for event development at the same time. The basic idea is two-fold: 1) promote event queries through by strengthening their connections based on all available features; 2) differentiate events from popular queries according to their temporal characteristics. To provide a comprehensive and vivid storyboard, in this paper, we also introduce an automatic way to attach a set of relevant photos to each piece of event news. In [37], a method for photo selection from image search logs is presented. Actually, directly triggering an image search engine with event queries will not always return satisfying photos. The reason is, some dominant photos (e.g. a celebrity’s portrait) have high static ranks and will disrupt the ranking list of an event image search. The idea behind our approach is to leverage the information of content duplication among images returned by event queries and common popular queries. In this way, photos that have more duplicates returned for queries of the same event, while at the same time they do not appear in search results of those popular queries, will be selected to describe that event. Here, we provide an example of our results. Fig. 2 shows the singer Adele storyboard from July 2012 to December 2012. Three events with automatically selected images are discovered about her. The preliminary evaluations on 200 celebrities over a period of six months have shown strong promise. In a user study, auto-selected event photos had higher relevance scores, compared with the top search results returned by Google and Bing. In summary, we make the following contributions: • We propose a novel framework to detect interesting events by mining users’ search log data. The framework consists of two components, i.e., Smooth Non-Negative Matrix Factorization event detection and representative event related image photo selection • We have conducted comprehensive evaluations on largescale real-world click through data to validate the effectiveness. The rest of paper is organized as follows. Section II discuss-

1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015

es related works. Section III introduces our detailed approach for the work. In Section IV we present the data statistics. Finally, we discuss a comprehensive set of experimental results in Section V and Section VI concludes this paper. II. R ELATED W ORK The representative work for event/topic detection is the DARPA-sponsored research program called TDT (topic detection and tracking) [2], [36], [38], which focus on discovering events from streams of news documents. With the development of Web 2.0, weblogs have become another data source for event detection [23], [25], [30]. Some of these research efforts develop new statistical methods [7], [10], [19], [23], and some others focused on recovering the temporal structure of events [1], [13], [16], [18], [33]. There are some research efforts investigated the problem of merging multiple document streams for event detection [10], [34]. As we argued before, web documents (both news articles and blog posts) are not suitable for social event detection. The cost to filter celebrity related information from massive web documents is expensive, and the coverage of social events is also weak. Web search log is another data source which has attracted the interests of many researchers. Search log data contains useful information like user queries and clicked search result URLs. It has been successfully exploited in various areas like relevance ranking [15], [24], query expansion [5], [11] and query alternation [28]. Besides, search log data is an unbiased statistic showing user intention. It is therefore a good resource for event detection, especially for those events attracting the interests of internet users. Zhao et al. [40] and Liu et al. [21] have done lots of work in this area. In [40], a bipartite graph is constructed based on query and click URL pairs, and two similarity measurements are proposed for event clustering. In [21], Random Walk and Markov Random Fields (MRF) are utilized for modeling search log data. These methods have been proven effective in detecting significant events like “Japan Earthquake” [40] or “American Idol” [21]. In contrast with these papers which target popular events, this work is more interested in identifying social events of a celebrity from his/her salient topics (e.g., a singer’s popular songs). This because salient topics help identify who a celebrity is [29], while the social events tell you what a celebrity has been up to recently. In addition, we also work on providing a rich description to the mined social events with relevant photos. To generate a vivid storyboard for our social event, it is similar to image selection to some distance. Traditionally, photos are selected according to their local and global features to judge the photo quality and relevance such as [8], [20], [32]. In our work, photo selection is our final step to help us summarize the events from our photo collection. We have a completed timeline for our storyboard which is quite different with common photo selection job. III. A PPROACH In Section A we will introduce our framework. Next, we present how to detect social events with search log data in Section B. Finally, how to get associated images to represent relevant events is explained in Section C.

3

A. Framework The framework overview of the proposed approach is shown in Fig. 3, which mainly consists of two components: (A) event detection and (B) representative event photo selection. There are three steps for event detection. First, topic factorization methods are adopted to discover groups of queries that have a high co-occurrent frequency. This solves issue with sparsity and random noise in the query set. As we want to detect social events, but not those in the salient topics, we have to keep a relatively large number of topics in the factorization step, and then merge topics with similar behaviors in the second step. To merge correlated topics, we consider topic distributions on both the timeline and the space of click-URLs. Lastly, a rank function is introduced to highlight topics which are very likely to be social events. Again, information on query semantics, temporal correlations and search log mappings are combined in the ranking process. After ranking, the top topics are referred to as social events. Non-top but salient topics are called profile topics. For representative event photo selection, top queries from social events and profile topics are first sent to commercial search engines (Google or Bing) to collect two sets of image thumbnails. These two sets are considered the most relevant images to the social event and the celebrity’s background, respectively. However, image search results are very noisy, and sometimes a photo has high-ranking scores in both image sets. To identify the most representative photos for an event, we propose measuring the content similarity among images in these two image sets, using both global and local image features. The assumption is that event related photos should have similar (duplicate) images in the social event image set, but should not have similar ones in the profile image set. Based on this assumption, a simple ranking function is proposed to sort photos in the social event image set. In this way, we can identify a set of relevant photos to describe each detected event. All the social events, together with their photos, construct a story board of that celebrity. B. Event Detection by SNMF The most straightforward way to discover events from search log data is to identify “abnormal” queries. For example, for the well-known singer Adele, the query “Adele pregnant” is somewhat abnormal in comparison to more common queries like “Adele lyrics” and “Adele mp3”. To characterize how “abnormal” a query is, we have to resort to statistical measures like occurrence frequency and temporal density. Unfortunately, such statistics are quite unstable as the log data is quite noisy and sparse. Therefore, it is not feasible in practice to determine an appropriate boundary to distinguish events from others. In addition, query-level statistics ignore relationships among correlated queries (e.g., “Adele pregnant” and “Adele baby”). As a result, the evidence of an event becomes obscure as we cannot integrate the statistics of correlated queries. Experimental results reported later in this paper show the limitations of this simple solution. To deal with noisy and sparse data, topic modeling (or topic factorization) has proven to be an effective approach, especial-

1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015

4

– data must be decomposed into a sum of additive components. In other words, both the coefficients of “documents’ distributions over topics” and the coefficients of “topics’ distributions over queries” must be non-negative. This makes sense, especially for event modeling, as it is hard to accept the explanation that we observe a certain query just because some events didn’t happen that day. In addition, the non-negative coefficients also improve event mining in the next subsections. The log data is first converted into a matrix D of the size |Q| × |D|. Each row represents a query and each column indicates one day. Every item Dij is the number ith query that was observed on the j th day. NMF aims to find two nonnegative matrices W and H satisfying D ≈ W × H.

Fig. 3. The overview of the proposed approach, consisting of two main parts: (A) event detection by SNMF and (2) representative event photo selection.

ly for text mining [3], [6], [14]. Through topic modeling, highdimensional sparse data is projected into a low-dimensional topic space, in which the correlations among original feature dimensions are embedded. Topic modeling is also good at suppressing random noise. In this paper, we choose topic factorization as the first step to cook the search log data. For a celebrity with N log records, each one is represented by a triplet ri = (qi , di , ui ), 1 ≤ i ≤ N , where qi ∈ Q, di ∈ D, ui ∈ U. Here, Q, D, U are the collections of unique queries, days, and click-URLs in the celebrity’s log data. We choose days as the unit of time, as the resolution is enough to characterize the period of a social event. 1) SNMF Topic Factorization: In classic topic modeling, the inputs are text documents consisting of words and the outputs are decompositions of these documents into topics. Here, each topic is a distribution over the word vocabulary. Analogically, we treat one day’s log data as a “document” and each query as a “word”. The “vocabulary” consists of all the unique queries of a celebrity in his/her log records, i.e., the set Q defined in Section III A. The assumption is, various stories (potentially interesting events or other representative aspects) of a celebrity are considered as “latent topics” leading to different search queries. It should be noted that we choose a whole query as a “word” but not break each query into real English words. This is because a query is more like a short phrase having specific semantic meanings compared to single word. Breaking a query into words may introduce unexpected ambiguities to topic factorization. For example, the word “love” in the queries “love story” and “love Harry Styles” of Taylor Swift has completely different semantics – the former is about one of her famous songs and the latter is about her ex-boyfriend. Widely used algorithms for topic factorization include probabilistic latent semantic indexing (PLSI) [14], latent Dirichlet allocation (LDA) [7], singular value decomposition (SVD) [3], non-negative matrix factorization (NMF) [17], and their variants. In this paper, we choose NMF as it has a nice advantage

(1)

W = [w1 , . . . wK ] in which every column wk (1 ≤ k ≤ K) denotes a topic, and K is the pre-defined number of topics. H = [h1 , . . . h|D| ] in which each column hd (1 ≤ d ≤ |D|) is the decomposition coefficients of topics for the dth day. According to [17], the decomposition problem converts to minimizing the cost function g arg min DKL (DkW × H) s.t.W ≥ 0, H ≥ 0. W,H

(2)

g Here, DKL (AkB) is the generalized Kullback-Leibler divergence of two matrices A and B X Aij g − Aij + Bij ). (3) DKL (AkB) = (Aij ln Bij ij

Like most other topic modeling algorithms, the standard NMF ignores the orders of input documents. In other words, permutation of the order of columns in D would not affect the decomposition results. However, for log mining, the temporal order is a critical factor which needs to be taken seriously. That is to say, there shouldn’t be significant difference between queries (and related topics) from two adjacent days. Similar constraints also arise when decomposing time-series signals such as audio stream [12]. To embed such constraints, Smooth Non-Negative Matrix Factoriazation(SNMF) was proposed by introducing an extra regularization factor S(H) to the cost function, as g arg min {DKL (DkW × H) + λ × S(H)}, W,H P|D| S(H) = d=2 khd − hd−1 k2 s.t.W ≥ 0, H ≥ 0.

(4) Here, S(H) acts as a penalty which favors the smoothness (small l2 distance) between two adjacent columns in H, and λ is a nonnegative weight to adjust the degree of smoothing. In implementation, we first solve the standard NMF problem in (2), then use its decomposition results as the initial values for the constrained optimization in (4). For more details please refer to [12], [17]. Fig. 4 gives an example of a topic’s distributions over the time-line, created by NMF and SNMF respectively. It is clear that for NMF, the curve of the topic’s weight jumps dramatically. By contrast, the curve generated by SNMF varies more rationally along the time-line. There are two parameters, the number of topics K and the smoothing weight λ in the SNMF topic factorization. We will

1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015

Fig. 4. Comparison of topic distribution Adele’s Music generated by NMF and SNMF (with λ = 10). The horizontal axis is time (in days) and the vertical axis is the topic’s weight in each day’s log data.

investigate their influences on performance in the experiments section. It should be noted that, in general we choose a relatively large K, as we don’t want to miss some social events which are not that prominent in comparison to popular ones. The side-effect of a large K is the risk of over-splitting topics. Therefore, an additional fusion step is introduced in the next subsection, in which more informative clues such as the search log data are taken into account to merge similar topics. 2) Topic Fusion: After the factorization step, we have K topics {t1 , . . . , tK } and two matrices W and H. To characterize a topic, the most intuitive clues are its distributions, both over the query vocabulary and over the time line. These two distributions can be directly obtained from W and H. Another useful clue from the search log data is the set of search log URLs, which have proven to be effective for query clustering [40]. The assumption is, queries trigging the same URL are very likely to have similar semantics. Consequentially, two topics should be semantically correlated if they have similar distributions over the click-URL space U defined in III B. In this paper, we combine these three clues to measure the similarity between two topics, and merge all the topics in an unsupervised way. A. Topic similarity over queries . Given a topic tk (1 ≤ k ≤ K), its distribution over the queries can be approximated by the k th column of W. As W is a non-negative matrix, it is straightforward enough to transform the k th column into a distribution PQ (qi |tk ) by normalizing it with the sum P|Q| of its elements, i.e., PQ (qi |tk ) = Wik / j=1 Wjk . Then, the distance between two topics tk and tl , over the query vocabulary, is defined by the symmetric Kullback-Leibler divergence, as distQ (tk , tl ) = KLQ (tk , tl ) |Q| PQ (qi |tk ) PQ (qi |tl ) 1X = (PQ (qi |tk ) ln + PQ (qi |tl ) ln ). 2 i=1 PQ (qi |tl ) PQ (qi |tk ) (5) B. Topic similarity over timeline . Similarly, a topic tk ’s distribution over the timeline, PD (di |tk ), can be approximated by normalizing the k th row in H. That is, PD (di |tk ) = P|D| Hki / j=1 Hkj . However, the similarity of tk and tl over the time-line cannot be directly measured by the KL divergence

5

Fig. 5. Comparison of the distributions along the timeline for two topics generated by SNMF: (a) Adele’s pregnancy and (b) Adele’s Music.

between PD (di |tk ) and PD (di |tl ). This is because the timeline is not always well aligned – even for two topics describing a same story, their temporal distributions may still have a slight time lag. To deal with the alignment issue, we shift one distribution forward and backward with a small offset (one day in the implementation), and select the smallest symmetric K-L divergence as the distance between tk and tl , as distD (tk , tl ) = min {KLD (tk , tl ; ∆), ∆ ∈ {−1, 0, 1}}, (6) where KLD (tk , tl ; ∆) is the shift-enabled K-L divergence mentioned above, ∆ is the offset in days. C. Topic similarity over search log URLs . From the search log, the relationships between the search log URLs and the queries can be described by a |U| × |Q| matrix L in which each element Lij denotes the number of times that the URL ui being clicked given the query qj . By multiplying L and W, we can propagate a topic’s weights over queries to the search log URLs. Next, a topic tk ’s distribution over search log URLs P|Uthe | is defined as PU (ui |tk ) = (LW)ik / j=1 (LW)jk ; and the corresponding distance between tk and tl is distU (tk , tl ) = KLU (tk , tl ),

(7)

where KLU has the same form as KLQ in (5). The three distance scores in (5)∼(7) are simply added up to describe the overall distance between two topics. Then, agglomerative hierarchical clustering is adopted to merge similar topics in a bottom-up way. We selected complete linkage as the merge criterion for the clustering, to ensure strong connections between those merged topics. The stop threshold is automatically estimated by identifying the significant jump from the ascending sorted distance scores of all topic pairs. 3) Event Ranking: The last step is to distinguish event related topics from others. Although this is essentially a classification problem, collecting enough unbiased training data is quite difficult in practice. Therefore, we treat it as a ranking problem, to leverage several heuristics summarized based on a number of observations. Similar to the above part, these heuristics are based on the distributions of a topic over the time-line, over the query vocabulary, and over the search log URLs. Fig. 5 shows the distributions of two topics along the timeline. One is a generic topic about Adele’s music and the other is about her pregnancy. It is clear that the curve of her

1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015

6

events is defined as topic rankevent (tk ) = scoret (tk )×scoreq (tk )×scoreu (tk ). (11)

We choose multiplication but not addition here, because a social topic should satisfy all of the above three criteria. By contrast, topics with small ranking scores usually describe some popular aspects of a celebrity, like his (or her) profile. We call these profile topics in the following sections.

(a) Google’s search results for “Amanda Bynes car accident”

C. Event Photo Selection

(b) Bing’s search results of “Amanda Bynes car accident”

(c) Search results of “Amanda Bynes” Fig. 6. Image search results returned by commercial search engines, for queries on social events and profile topics. Photos with (partial) duplicates are highlighted with blue rectangles.

pregnancy has a clear evolutional process (occur, sustain, and decay), which is very close to a Gamma distribution. Inspired by such an observation, we first fit the distribution with a Gamma distribution, and then use the estimated parameters to re-produce an artificial curve, denoted as Gamma(di |tk ). As a result, the timeline based ranking score is defined as scoret (tk ) = exp−KL(PD (·|tk )kGamma(·|tk )) .

(8)

In this way, higher scores will be assigned to topics whose temporal curve look more like a Gamma distribution. The observations for the distributions over queries and over search log URLs are similar. That is, social events have more concentrated distributions than generic topics. In reality, the numbers of queries and URLs associated with a social event are much smaller than those of a generic topic. For example, the two numbers of search log URLs related to Adele’s lyrics and Adele’s pregnancy have different orders of magnitude. A natural choice for measuring the degree of concentration of a distribution is entropy. To promote topics with more concentrated distributions, another two ranking scores are defined as |Q|

scoreq (tk ) = 1.0 +

1 X (PQ (qi |tk ) ln PQ (qi |tk )) (9) ln |Q| i=1

scoreu (tk ) = 1.0 +

1 X (PU (ui |tk ) ln PU (ui |tk )) (10) ln |U| i=1

|U |

Lastly, the ranking score of a topic tk associated with some

People often say that “a picture is worth a thousand words”. Without a doubt, interesting events associated with related photos are more attractive to the audience. For each detected social event, it is straightforward to identify a set of most relevant queries by inspecting the event’s distribution in the query space. The simplest way to get events related photos is to directly search commercial image search engines with these event queries. Fig. 6 (a) and (b) show the search results returned by Google and Bing, for the query “Amanda Bynes car accident”. It is clear that the relevance of returned images is not satisfactory. There are a lot of irrelevant images such as portraits of Amanda Bynes. In the celebrity domain, the main reasons for this are (1) some portrait photos have high static ranking scores and (2) queries are too short to accurately describe a event. Therefore, we need a better way to collect event photos. By investigating a good number of examples, it is observed that (1) in search results of an event query, images with (partial) duplicates are very likely to be relevant to the event; and (2) portraits (or other popular) images also appear in search results of a celebrity’s hot queries (e.g., name of a celebrity). For example, in Fig. 6 (a) and (b), the images with duplicates (marked by blue rectangles) are more relevant to car accident; most portrait photos in (a) and (b) have similar ones in Fig. 6 (c), which shows the results for the query “Amanda Bynes”. Based on these observations, two criteria are formulated to re-rank photos: • Promote those images which have (partial) duplicates in the search results of queries form social events. • Penalize those images which have similar ones in the search results of queries from popular topics. To do this, for each social event, the 5 most dominant queries are selected for image searching. For each query, thumbnails of the top 100 images returned by a commercial search engine are downloaded. In this way, we construct a candidate photo set for the events, denoted by Ievent , which has 500 thumbnails. Similarly, the top 10 queries from profile topics are used to collect a set of the most representative images of that celebrity, denoted by Iprof ile , which has 1000 thumbnails in total. For each celebrity, Iprof ile is shared across various social events. Before processing, blur features and dark features are used to remove photos that are low quality. In the following subsections, we will introduce how to measure the content similarities among images in Ievent and Iprof ile ; and how to re-rank photos in Ievent based on these similarities. These steps will help identify those photos that most represent the event in question.

1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015

1) Image Similarity Measures: To measure image similarity, we considered both global and local image features in this paper. Global features are extracted based on a whole image, and are suitable for identifying fully duplicate images. By contrast, local features describe a local image patch, and have been widely used for recognizing partial duplicates. Supporting partial duplicate detection is quite important in this step, as many images have been edited (e.g., cropping or stitching) before being published online. The global feature adopted in this paper is the blockbased intensity histogram [27]. Each image is divided into 64 (8 × 8) blocks, and for the ith block a 256−dimensional intensity histogram gi is computed based on the pixels within that block. Consequently, the global feature-based similarity between two images Ix and Iy is defined as2 64

simhist (Ix , Iy ) = max{1.0−

1 X Ix I ||g − gi y ||2 , 0}. (12) 64 i=1 i

For local feature-based similarity measurements, we choose the classic SIFT (Scale-Invariant Feature Transform) feature and follow the matching process proposed in [22]. In [22], there is a geometric verification process which ensures that the remaining SIFT correspondences between two images are compliant with each other. This is a very strong assumption, and two images are very likely to be partial duplicates with each other if the number of surviveing SIFT correspondences is larger than a threshold. For two images Ix and Iy , the local feature-based similarity is defined as  1 inlier(Ix , Iy ) > δsif t simsif t (Ix , Iy ) = , (13) 0 inlier(Ix , Iy ) ≤ δsif t where inlier(Ix , Iy ) is the number of survived SIFT correspondences between Ix and Iy , and the threshold δsif t is set as 12 as suggested in [22]. Both the global and local similarity measurements can be accelerated via off-the-shelf indexing technologies like k-d tree or hashing kernels [39], which have proven to be very efficient for million-scale image retrieval. Therefore, the computation cost in this paper is affordable. At last, the integrated content similarity between image Ix and Iy are defined as sim(Ix , Iy ) = max{simhist (Ix , Iy ), simsif t (Ix , Iy )}. (14) 2) Event Photo Re-ranking: To promote a photo Ix ∈ Ievent which has duplicates in Ievent , we define the weighting score w+ (Ix ) as X w+ (Ix ) = sim(Ix , Iy ). (15) Iy ∈Ievent ,Iy 6=Ix

According to the definition, the more duplicates in Ievent , the more important the photo Ix is. Similarly, to punish photos with similar ones in Iprof ile , another weighting score w− (Ix ) is defined as w− (Ix ) = 1.0 − 2 Although

max

Iy ∈Iprof ile

{sim(Ix , Iy )}.

(16)

l2 distance is not the best one to measure the similarity of two histograms, it works well in practice. We adopt l2 distance mainly because it can be easily accelerated via off-the-shelf indexing technologies.

7

w− becomes very small if Ix has similar images in Iprof ile . Finally, the new ranking score of a photo Ix ∈ Ievent is img rankevent (Ix ) =

|Ievent | − idx(Ix ) · w+ (Ix ) · w− (Ix ), (17) |Ievent |

where idx(Ix ) is the zero-based index of the photo Ix in the search results returned by search engines. According to the new ranking scores, the photos with the highest scores are considered to be the most representative images of that event. IV. DATA A NALYSIS In this paper, we use the image search log collected by a commercial search engine, consisting of queries, clicks and search results from July to December 2012. After filtering the log data with the 200 celebrities’ names, we obtain more than 190 million log records. In other words, for each celebrity, there are an average of around 5000 log records in every day. The data has been updated by the search engine to remove private information, and each log record has three main fields: time, query, and click-URL, as shown in Fig. 7. Given a celebrity, only records with a query containing the celebrity’s name are retained for further event detection. To guarantee data quality, we also ignore those log records whose click-URL is empty. It should be noted that sometimes users click more than one URLs in a given set of search results. For such a situation, there will be multiple records, each of which correspond to one clicked URL. This is to reserve more information about query and URL pairs, which is helpful in measuring the similarity among different log records. For example, the 4th and 5th rows in Fig. 7 are two different click-URL for the singular query “Jennifer Lopez Movies”. V. EVALUATION AND DISCUSSION A. Experimental Settings For evaluation, the first step is to choose a list of celebrities. In this paper, we select target celebrities from three main data resources: (1) Google Zeitgeist 20123 which contains the hottest celebrities in search queries; (2) the most popular celebrities in Yahoo!4 , the list from the largest internet portal; and (3) the Forbes celebrity 100 list5 . After removing those candidates which have few log records or no related ground truth, we come up with a list of 200 celebrities from who are singers, actors/actresses, and politicians. For quantitative performance measurement, the most challenging step is to prepare a benchmark dataset with ground truth labels. In practice this turn out to be a laborious task. Ten websites, as listed in Table V-A, are adopted for ground truth generation. For each website, we first develop a site-specific crawler to download those pages containing celebrity-related social events. Then, we manually write regular expressions to extract events related information from every fetched web page. In this way, we convert these web pages into a table of structured data, of which there are three fields: celebrity name, event time, and event descriptions. To provide a convincible 3 http://www.google.com/zeitgeist/2012/ 4 http://omg.yahoo.com/top-celebrities/ 5 http://www.forbes.com/celebrities/

1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015

8

Fig. 7. A snapshot of the web search log data used for celebrity social event mining.

TABLE I 10 W EBSITES FOR GROUND TRUTH GENERATION

No. 1 2 3 4 5 6 7 8 9 10

Website URL http://www.celebitchy.com/archives-by-category/ http://www.people.com/people/celebrities/ http://www.egotastic.com/celebrities/ http://www.hellomagazine.com/celebrities/ http://www.idontlikeyouinthatway.com/pictures/ http://www.okmagazine.com/celebs-list http://omg.yahoo.com/top-celebrities/ http://www.popsugar.com/celebrities http://theblemish.com/ http://www.thehollywoodgossip.com/stars/

Fig. 8. Comparisons of the performance with different smoothing weight λ in the topic factorization(Mirco-Precision, Micro-Recall, Macro-Precision and Macro-Recall).

scores in two ways P200 ground truth list for each celebrity, we only keep those events which appear on at least two websites. Here, to judge whether two stories are about the same event, we adopt a simple yet effective heuristic rule. That is, two events are considered to be matched with each other if they happen within 3 days (±1 day accepted), and there are more than two common keywords (both celebrity name and stop words are ignored) in their descriptions. To prevent the influence of the noisy data from mixed topics, we only extract the top appearing words as descriptions. To evaluate performance, we adopted the classic precision and recall measures. As introduced in IV C, the social events discovered in this paper are sorted in a descending order according to their scores defined in (11). Hence, for each celebrity ci , the precision and recall are computed based on the top gt(ci ) topics in the ranking list. Here, gt(ci ) is the number of social events in ci ’s ground truth list. Similarly, a discovered topic is said to match a ground truth event, if (1) the difference in time is within 3 days and (2) there should be more than two keywords from the event’s description appearing in the topic’s top 5 queries. Suppose there are matched(ci ) topics matched with some events in the ground truth, and there are covered(ci ) events in the ground truth appear in the discovered topics, we have precision(ci ) = matched(ci )/gt(ci ) and recall(ci ) = covered(ci )/gt(ci ). It should be noted that here we have 0 ≤ covered(ci ) ≤ matched(ci ) ≤ gt(ci ), this is because sometimes a social event could be over split into multiple topics. For such a situation, precision is still good but recall drops. To better measure the overall performance for all 200 celebrities, we average the precision and recall

precmicro

=

matched(ci ) P200 i=1 gt(ci )

i=1

P200

i=1 covered(ci ) P200 i=1 gt(ci ) 200 X

recmicro

=

precmacro

=

1 200

=

1 X covered(ci ) 200 i=1 gt(ci )

recmacro

i=1 200

(18)

matched(ci ) gt(ci ) (19)

The micro-averages focus on the performance at the eventlevel, while the macro-averages measure performance at the celebrity-level, ignoring the difference in celebrity’s popularity. B. Event Topic Discovery As mentioned in IV A, in the topic factorization step there are two parameters, the smoothing weight λ and the number of topics K. In this section, we first investigate the influence of the two parameters in event detection, and then compare the overall performance of our approach with two other approaches. The parameter λ in the SNMF controls the smoothness of topic distributions along the timeline. If λ = 0, it degenerates to the standard form of NMF. The larger λ is, the stronger the regularization applied is. In the experiment, we vary λ from 0 to 100 on nine different scales (and K is fixed at 40). The performance measurements under different λ are shown in Fig. 8. From Fig. 8, it is clear that the performance of standard NMF (λ = 0) is not good. This is because the

1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015

9

Fig. 11. Subjective evaluation for the relevance of event photos returned by Bing, Google, and the proposed approach. Fig. 9. Comparisons of the performance with different number of topics K in the topic factorization(Mirco-Precision, Micro-Recall, Macro-Precision and Macro-Recall).

Fig. 10. Comparisons of the overall performance with different approaches.

violently varying timeline distribution (refer to the example shown in Fig. 4) cannot accurately characterize the temporal evolution of a topic, and will hurt the next steps of topic fusion and event ranking. When λ increased, the performance got better and better, which demonstrates the necessity of adopting SNMF for topic factorization. However, when λ becomes large enough, the performance starts creeping down. There are two reasons leading to the performance drop: (i) a strong smoothing operation will weaken peaks (just like the one shown in Fig. 5) which reflect the occurrence of some events; and (ii) the strong regularization factor will dominate the cost function in equation (4) and the obtained W × H cannot approximate the original matrix D very well. According to Fig. 8, we set λ = 10 in the following experiments. For the number of topics K, we vary it in the range of (10 ∼ 50). The performance curves are shown in Fig. 9. From Fig. 9, it is noted that as K increases, the performance curves have a clear trend of “up–maintain–down”. When K is small, some social events are easily mixed with other popular topics; when K is very large, the fusion step in IV B may fail to merge some relevant topics. Both of the two situations will hurt performance. By contrast, these curves are relatively stable for the range 20 ≤ K ≤ 40, which indicates the effectiveness of the topic fusion component. We set K = 40 in the following experiments. Lastly, we compare the overall performance of the proposed method with three other approaches. One is the straightforward abnormal query-based strategy mentioned in the beginning of IV; and the second is the approach proposed by Zhao et al. [40] which also utilized web search logs. In [40], the query and URL pairs in log data are represented with a bipartite graph, based on which a novel clustering method is adopted

to group queries/URLs into events. The third one [1] converts the time series data into time-interval sequences of temporal abstractions and present minimal predictive recent temporal paaterns framework to select the event patterns. For the convenience of comparison, the number of abnormal queries and number of clusters in [40] are both set to the ground truth event number gt(ci ). Fig. 10 shows the experimental results. It is clear that the abnormal query-based solution achieve the worst performance. As we argue in IV, statistics at query-level are very noisy and unreliable. The scores of [40] and [1] are not good enough, too. This is because (1) some popular topics (events) will dominate the clustering and (2) sometimes events related search log URLs are too sparse to bridge related queries. (3) the time patterns in the search log can be overwhelmed by the common noisy data. Therefore, topic factorization and event ranking are both necessary components in the solution. C. Evaluation of Event Photo Relevance To measure the relevance of social event photos, we have to resort to subjective evaluation. 10 undergraduate students were invited as judges. For each event, the judges first read the related webpages (through search log URLs) to know the story. Then, photos returned by Google, Bing, and our approach were presented to the judges in a random order. Each photo was assigned one of the three scores: perfect, relevant, and irrelevant. “Perfect” means the photo is about both the celebrity and the event, “relevant” means the photo is at least about the celebrity, and “irrelevant” means the photo is totally wrong. Considering the cost of human judges, we randomly selected 50 correctly discovered event topics for evaluation; and for each event, only the top five photos returned by the search engines and our approach was labeled. The comparison results are shown in Fig. 11. The event photo re-ranking method introduced in V is helpful to identify event relevant photos from image search results. To provide a vivid feeling to the event photo selection, some example cases are shown in Fig. 12, in which perfectly relevant photos are marked with blue rectangles. D. Examples of Event Storyboard Finally, the storyboard will be generated using the selected events with relevant photos. To ensure the high quality of the photos, low visual quality photos will be eliminated at first. Besides, time and location contexts are other important fact for the storyboard photos. For each detected event, the

1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015

10

Fig. 13. The storyboards for Tome Cruise and Barack Obama, from July 2012 to December 2012.

TABLE II H UMAN EVALUATION FOR STORYBOARD AND HUMAN WEB PAGE . E ACH METHOD IS EVALUATED BY 10 PERSONS ( SCALE 1-10, HIGHER IS BETTER ).

Fig. 12. Showcases of the event photos returned by Bing (1st row), Google (2nd row), and the proposed approach (3rd row.). Perfectly relevant photos are marked with blue rectangles.

happening time and location will be extracted at first. We use the surrounding texts of the photos at web to identify whether the photo is relevant to our detected event. Those photos with big time and location difference compared to detected event will be ranked in low portion. As discussed in the Introduction, it would be an attractive feature if we could generate a storyboard based on the discovered event topics and photos. Actually, for each event, we have its top-related queries and search log URLs. After downloading webpages following those hot search log URLs, we can extract sentences which contain top queries from the fetched pages. Candidate sentences are organized into a graph, in which each edge denotes how many common words (excluding stop words) are shared between the two corresponding sentences. Given the link graph, we can select the most representative sentence following a PageRank like ranking strategy. Such a sentence can be used as a short description to a social event in storyboard. Fig. 13 gives two examples of storyboards for Tom Cruise and Barack Obama. To compare the difference between the physical event time

Method

Correctness

Photo Relevance

Event Representation

Our storyboard Web Page

7.9 10

6.5 9.4

7.9 9.8

and detected event time, we calculate the whole events of 200 celebrities about the time delay. To some distance, the detected event time can reflect when common users are interested about the event. The average delay is about 2 days, which means that real time event will be noticed by users in a fast time from our search log data. Moreover, we find that those celebrities with higher reputation have a much small time delay than other common celebrities. E. User study for the storyboard To make our results more convincing, 10 persons are hired to evaluate the storyboard results from correctness, photo relevance and event representation aspects. Each one will score it from 1 to 10 score. At the same time, we select the web page edited by human as our ground truth to compare with. From Table. II, the results of generated storyboard can be satisfying in correctness and event representation aspects, which approve that the search log can well reflect interesting events. Currently, the selected photos are not good enough because most photos is about the celebrity person and it is difficult to get the photo which is relevant to the event. Overall, our method is promising according to the user study. F. Potential application and Other entity extension We have applied our approach into a real phone demo system [31]. Fig. 14 shows the basic page or our celebrity

1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015

Fig. 14. A celebrity social event system: (a) home page,(b) list view of topics related to Jenifer Aniston, (c) list view of hot event in October, and (d) relevant images about the event ”girlfrined”, developed on Windows Phone 8.

social event system. Uses can interactively switch among four views: people-centric, timeline-centric, month-centric and topic-centric. Other than celebrity, more entities events can be detected using a similar strategy with the search log data. Such as landmark, brand, we can detect their development and evolution from timeline with related photos, which make it easier for users to know more about them. VI. C ONCLUSIONS In this paper, we use search logs as data source to generate social event storyboards automatically. Unlike common text mining, search logs have short, sparse text queries and the data size is much bigger than some news websites or blogs. Based on these features, we do not use the query text information to do the analysis. Structure and statistic information are used to get the topics and event detection in our work, which can fit the data well. Furthermore, we add time information in our approach to SNMF to make it easier to discover social events compared with traditional NMF methods. Our work performs better than traditional works in this area, e.g. [40], because we can distinguish the topics in a way that gets the events which are most appealing to common users. The associated images were selected to make up the storyboard in a timeline to present a good representation of the mined events using the image search results features and relationships. R EFERENCES [1] C. Alexander, B. Fayock, and A. Winebarger. Automatic event detection and characterization of solar events with iris, sdo/aia and hi-c. In AAS/Solar Physics Division Meeting, volume 47, 2016. [2] J. Allan, J. G. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study final report. 1998. [3] S. Arora, R. Ge, and A. Moitra. Learning topic models–going beyond svd. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on, pages 1–10. IEEE, 2012. [4] N. Babaguchi, S. Sasamori, T. Kitahashi, and R. Jain. Detecting events from continuous media by intermodal collaboration and knowledge use. In Multimedia Computing and Systems, 1999. IEEE International Conference on, volume 1, pages 782–786. IEEE, 1999. [5] P. N. Bennett, R. W. White, W. Chu, S. T. Dumais, P. Bailey, F. Borisyuk, and X. Cui. Modeling the impact of short-and long-term behavior on search personalization. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pages 185–194. ACM, 2012. [6] D. M. Blei. Introduction to probabilistic topic models. Comm. ACM, 55(4):77–84, 2012. [7] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993–1022, 2003.

11

[8] Y.-J. Chang, H.-Y. Lo, M.-S. Huang, and M.-C. Hu. Representative photo selection for restaurants in food blogs. In Multimedia & Expo Workshops (ICMEW), 2015 IEEE International Conference on, pages 1–6. IEEE, 2015. [9] H. L. Chieu and Y. K. Lee. Query based event extraction along a timeline. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 425–432. ACM, 2004. [10] T.-C. Chou and M. C. Chen. Using incremental plsi for thresholdresilient online event analysis. Knowledge and Data Engineering, IEEE Transactions on, 20(3):289–299, 2008. [11] H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Probabilistic query expansion using query logs. In Proceedings of the 11th international conference on World Wide Web, pages 325–332. ACM, 2002. [12] S. Essid and C. F´evotte. Smooth nonnegative matrix factorization for unsupervised audiovisual document structuring. Multimedia, IEEE Transactions on, 15(2):415–425, 2013. [13] G. P. C. Fung, J. X. Yu, H. Liu, and P. S. Yu. Time-dependent event hierarchy construction. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 300–309. ACM, 2007. [14] T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 50–57. ACM, 1999. [15] T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133–142. ACM, 2002. [16] N. Kawamae. Trend analysis model: trend consists of temporal words, topics, and timestamps. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 317–326. ACM, 2011. [17] D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, pages 556–562, 2001. [18] J. Li and C. Cardie. Timeline generation: Tracking individuals on twitter. In Proceedings of the 23rd international conference on World wide web, pages 643–652. ACM, 2014. [19] Z. Li, B. Wang, M. Li, and W.-Y. Ma. A probabilistic model for retrospective news event detection. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 106–113. ACM, 2005. [20] A. Liu, W. Lin, and M. Narwaria. Image quality assessment based on gradient similarity. Image Processing, IEEE Transactions on, 21(4):1500–1512, 2012. [21] H. Liu, J. He, Y. Gu, H. Xiong, and X. Du. Detecting and tracking topics and events from web search logs. ACM Transactions on Information Systems (TOIS), 30(4):21, 2012. [22] D. G. Lowe. Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on, volume 2, pages 1150–1157. Ieee, 1999. [23] Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of the 15th international conference on World Wide Web, pages 533–542. ACM, 2006. [24] T. Mei, Y. Rui, S. Li, and Q. Tian. Multimedia search reranking: A literature survey. ACM Computing Surveys (CSUR), 46(3):38, 2014. [25] M. Platakis, D. Kotsakos, and D. Gunopulos. Searching for events in the blogosphere. In Proceedings of the 18th international conference on World wide web, pages 1225–1226. ACM, 2009. [26] S. D. Roy, T. Mei, W. Zeng, and S. Li. Towards cross-domain learning for social video popularity prediction. Multimedia, IEEE Transactions on, 15(6):1255–1267, 2013. [27] Y. Rui, T. S. Huang, and S.-F. Chang. Image retrieval: Current techniques, promising directions, and open issues. Journal of visual communication and image representation, 10(1):39–62, 1999. [28] E. Sadikov, J. Madhavan, L. Wang, and A. Halevy. Clustering query refinements by user intent. In Proceedings of the 19th international conference on World wide web, pages 841–850. ACM, 2010. [29] S. Song, Q. Li, and N. Zheng. Understanding a celebrity with his salient events. In Active Media Technology, pages 86–97. Springer, 2010. [30] Y. Suhara, H. Toda, and A. Sakurai. Event mining from the blogosphere using topic words. In ICWSM, 2007. [31] S. Tan, C.-W. Ngo, J. Xu, and Y. Rui. Celebrowser: An example of browsing big data on small device. In Proceedings of International Conference on Multimedia Retrieval, page 514. ACM, 2014. [32] T. C. Walber, A. Scherp, and S. Staab. Smart photo selection: Interpret gaze as personal interest. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 2065–2074. ACM, 2014.

1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015

[33] X. Wang and A. McCallum. Topics over time: a non-markov continuoustime model of topical trends. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 424–433. ACM, 2006. [34] X. Wang, K. Zhang, X. Jin, and D. Shen. Mining common topics from multiple asynchronous text streams. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 192– 201. ACM, 2009. [35] W. Weerkamp, R. Berendsen, B. Kovachev, E. Meij, K. Balog, and M. De Rijke. People searching for people: Analysis of a people search engine log. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 45–54. ACM, 2011. [36] J. Weng and B.-S. Lee. Event detection in twitter. ICWSM, 11:401–408, 2011. [37] C.-C. Wu, T. Mei, W. H. Hsu, and Y. Rui. Learning to personalize trending image search suggestion. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pages 727–736. ACM, 2014. [38] Y. Yang, T. Pierce, and J. Carbonell. A study of retrospective and online event detection. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 28–36. ACM, 1998. [39] X. Zhang, L. Zhang, and H.-Y. Shum. Qsrank: Query-sensitive hash code ranking for efficient epsilon-neighbor search. In Proc. CVPR, pages 2058–2065, 2012. [40] Q. Zhao, T.-Y. Liu, S. S. Bhowmick, and W.-Y. Ma. Event detection from evolution of click-through data. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 484–493. ACM, 2006.

Jun Xu is a PHD student at USTC(University of Science and Technology of China). He received the B.E. degree from USTC, in 2015. His research interests include data mining, video analysis, pattern recognition, computer vision and multimedia content analysis. He took part in Microsoft Research, Beijing, China as an intern since from 2012 to 2013 and from 2014 to 2016.

Tao Mei (M07-SM’11) is a Lead Researcher with Microsoft Research, Beijing, China. His current research interests include multimedia analysis and retrieval, and computer vision. He has authored or co-authored over 100 papers in journals and conferences, 10 book chapters, and edited four books. He holds over 15 U.S. granted patents and 20+ in pending. Tao was the recipient of several paper awards from prestigious multimedia journals and conferences, including IEEE Communications Society MMTC Best Journal Paper Award in 2015, IEEE Circuits and Systems Society Circuits and Systems for Video Technology Best Paper Award in 2014, IEEE Trans. on Multimedia Prize Paper Award in 2013, and Best Paper Awards at ACM Multimedia in 2009 and 2007, etc. He was the principle designer of the automatic video search system that achieved the best performance in the worldwide TRECVID evaluation in 2007. He is an Editorial Board Member of IEEE Trans. on Multimedia, ACM Trans. on Multimedia Computing, Communications, and Applications, Machine Vision and Applications, and Multimedia Systems, and was an Associate Editor of Neurocomputing, a Guest Editor of eight international journals. He is the General Co-chair of ACM ICIMCS 2013, the Program Co-chair of ACM Multimedia 2018, IEEE ICME 2015, IEEE MMSP 2015 and MMM 2013, and the Area Chair for a dozen international conferences. Tao received B.E. and Ph.D. degrees from the University of Science and Technology of China, Hefei, China, in 2001 and 2006, respectively. He is a Senior Member of the IEEE and the ACM, and a Fellow of IAPR.

12

Rui Cai is a Lead Researcher at Microsoft Research Asia. He received the B.E. and Ph.D. degrees in computer science from Tsinghua University, Beijing, China, in 2001 and 2006, respectively. His research interests include web search and data mining, machine learning, pattern recognition, computer vision, multimedia content analysis, and signal processing. He is a member of Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE).

Houqiang Li (SM’12) received the B.S., M.Eng., and Ph.D. degrees from the University of Science and Technology of China (USTC), Hefei, China, in 1992, 1997, and 2000, respectively, all in electronic engineering. He is currently a Professor at the Department of Electronic Engineering and Information Science, USTC. He has authored or co-authored over 100 papers in journals and conferences. His current research interests include video coding and communication, multimedia search, and image/video analysis. Dr. Li served as an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY from 2010 to 2013 and has been in the Editorial Board of Journal of Multimedia since 2009. He has served on technical/program committees, organizing committees and as Program Co-Chair, Track/Session Chair for over ten international conferences. He was the recipient of the Best Paper Award for Visual Communications and Image Processing in 2012, for International Conference on Internet Multimedia Computing and Service in 2012, for the International Conference on Mobile and Ubiquitous Multimedia from in 2011, and a senior author of the Best Student Paper of the 5th International Mobile Multimedia Communications Conference in 2009.

Yong Rui is currently Deputy Managing Director of Microsoft Research Asia (MSRA), leading research groups in multimedia search and mining, and big data analysis, and engineering groups in multimedia processing, data mining, and software/hardware systems. A Fellow of IEEE, IAPR and SPIE, a Distinguished Scientist of ACM, and a Distinguished Lecturer of both ACM and IEEE, Rui is recognized as a leading expert in his research areas. He holds 60 issued US and international patents. He has published 16 books and book chapters, and 100+ referred journal and conference papers. Ruis publications are among the most cited C 15,000+ citations and his hIndex = 54. Dr. Rui is the Editor-in-Chief of IEEE Multimedia Magazine, an Associate Editor of ACM Trans. on Multimedia Computing, Communication and Applications (TOMM), a founding Editor of International Journal of Multimedia Information Retrieval, and a founding Associate Editor of IEEE Access. He was an Associate Editor of IEEE Trans. on Multimedia (2004-2008), IEEE Trans. on Circuits and Systems for Video Technologies (2006-2010), ACM/Springer Multimedia Systems Journal (2004-2006), and International Journal of Multimedia Tools and Applications (2004-2006). He also serves on the Advisory Board of IEEE Trans. on Automation Science and Engineering. He is an Executive Member of ACM SIGMM, and the founding Chair of its China Chapter. Dr. Rui received his BS from Southeast University, his MS from Tsinghua University, and his PhD from University of Illinois at Urbana-Champaign (UIUC).

1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Suggest Documents