1 Social Event Classification via Boosted Multi-Modal Supervised Latent Dirichlet Allocation SHENGSHENG QIAN, TIANZHU ZHANG, and CHANGSHENG XU∗ , Institute of Automation, Chinese Academy of Sciences and China-Singapore Institute of Digital Media M. SHAMIM HOSSAIN, SWE Dept., College of Computer and Information Sciences, King Saud University, Riyadh, KSA
With the rapidly increasing popularity of Social Media sites (e.g., Flickr, YouTube, and Facebook), it is convenient for users to share their own comments on many social events, which successfully facilitates social event generation, sharing and propagation and results in a large amount of user-contributed media data (e.g., images, videos, and text) for a wide variety of real-world events of different types and scales. As a consequence, it has become more and more difficult to exactly find the interesting events from massive social media data, which is useful to browse, search and monitor social events by users or governments. To deal with these issues, we propose a novel boosted multi-modal supervised Latent Dirichlet Allocation (BMM-SLDA) for social event classification by integrating a supervised topic model, denoted as multimodal supervised Latent Dirichlet Allocation (mm-SLDA), in the boosting framework. Our proposed BMMSLDA has a number of advantages. (1) Our mm-SLDA can effectively exploit the multi-modality and the multi-class property of social events jointly, and make use of the supervised category label information to classify multi-class social event directly. (2) It is suitable for large-scale data analysis by utilizing boosting weighted sampling strategy to iteratively select a small subset of data to efficiently train the corresponding topic models. (3) It effectively exploits social event structure by the document weight distribution with classification error and can iteratively learn new topic model to correct the previously misclassified event documents. We evaluate our BMM-SLDA on a real world dataset and show extensive experimental results, which demonstrate that our model outperforms state-of-the-art methods. Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval; H.4.m [Information Systems]: Miscellaneous General Terms: Algorithms, Experimentation, Performance Additional Key Words and Phrases: Social Event Classification, Multi-Modality, Supervised LDA, AdaBoost, Social Media. ACM Reference Format: Shengsheng Qian, Tianzhu Zhang, Changsheng Xu, and M. Shamim Hossain. 2014. Social Event Classification via Boosted Multi-Modal Supervised Latent Dirichlet Allocation. ACM Trans. Multimedia Comput. Commun. Appl. 99, 1, Article 1 (January 2014), 20 pages. DOI:http://dx.doi.org/10.1145/0000000.0000000 This work is supported in part by the National Program on Key Basic Research Project (973 Program, Project No.2012CB316304), and National Natural Science Foundation of China (61225009, 61303173), also by the Singapore National Research Foundation under International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office. The authors extend their appreciation to the Deanship of Scientific Research at King Saud University, Riyadh, Saudi Arabia for funding this work through the International research group project No. IRG-14-18. ∗ Changsheng Xu is the corresponding author. Shengsheng Qian, Tianzhu Zhang, and Changsheng Xu are with National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, P. R. China. The authors are also with China-Singapore Institute of Digital Media, Singapore 119613, Singapore. (e-mail: shengsheng.qian, tzzhang,
[email protected]). M. Shamim Hossain, SWE Dept., College of Computer and Information Sciences, King Saud University, Riyadh, KSA. (e-mail:
[email protected]). Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or
[email protected]. c 2014 ACM 1551-6857/2014/01-ART1 $15.00
DOI:http://dx.doi.org/10.1145/0000000.0000000
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
1:2
S. QIAN, T. ZHANG, C. XU, and M. Shamim Hossain
Fig. 1. Social events in Flickr have multi-modal property (e.g., images, title, tags, description) and multi-class property (e.g., Syrian civil war, US presidential election).
1. INTRODUCTION
With the rapid development of Internet, there are more and more social media sites (e.g., Flickr1 , YouTube2 , Facebook3 , and Google News4 ) for people to capture and share social media data online. As a result, a popular event that is happening around us and around the world can spread very fast, and there are substantial amounts of events with multi-modality (e.g., images, videos, and text) on the Internet. Most of these social events uploaded by users are related with some specific topics, and it is time-consuming to manually identify or cluster them. Therefore, automatically mining and identifying social events from massive social media data is important and helpful to better browse, search and monitor social events by users or governments. However, it is difficult to achieve this goal because: (1) With the massive growth of event, its number is substantial; (2) Unlike normal textual documents, there are substantial amounts of social events with multi-modal property (e.g., images, videos, and textual content) and multi-class property (different categories, e.g., Syrian civil war, US presidential election) on the Internet. An example is shown in Fig.1. Recently, how to address these challenges for automatically mining and monitoring social events has drawn much attention in the multimedia research community, such as social event classification [Becker et al. 2010; Liu and Huet 2013; Reuter and Cimiano 2012; Diakopoulos et al. 2010; Makkonen et al. 2004], social event mining [Radinsky and Horvitz 2013; Patel et al. 2008; Suhara et al. 2008], and investigating event detection [Allan et al. 1998; Kumaran and Allan 2004; Zhao et al. 2006; Chen et al. 2009; Zhang et al. 2012; Li et al. 2012; McMinn et al. 2013; Zhang and Xu 2014]. Most of these methods focus on feature designing for social event modeling, such as textual features [Kumar et al. 2004; Chieu and Lee 2004; Lin and Liang 2008]. While social media events have rich visual information, such as images and videos, which are helpful for the analysis and mining of social events [Wu et al. 2008]. For the same social events, they may have different textual descriptions (comments, tags, etc.) due to different users, but their visual information may be similar. Therefore, multi-modal feature fusion becomes more and more popular for social event analysis. In [Reuter and Cimiano 2012], multi-modal features are adopted to model the similarity of events and media data for their assignment. In [Liu and Huet 2013], the similarity between different social events is conducted based on time, location or text features. In above work, many multi-modal features, such as tag, time, location or visual features are exploited, and as a result encouraging performance is achieved. However, most of these methods ignore the semantic relationship among multiple modalities and the multi-class property of social events, which do not exploit the discriminative category information to improve the event classification performance. 1 http://www.Flickr.com 2 http://www.youtube.com 3 http://www.facebook.com 4 http://news.google.com
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
Social Event Classification via Boosted Multi-Modal Supervised Latent Dirichlet Allocation
1:3
Fig. 2. The flowchart of the training and test process of our proposed BMM-SLDA model for social event classification. In the training step, our BMM-SLDA can learn multiple topic models inside a boosting procedure with different training documents. In the testing step, documents are described by the learned topic models and classified with the final classifiers to determine their class labels.
Different from the previous work, we attempt to exploit the multi-modal and multi-class property jointly for social event modeling. For simplicity, we take Flickr, one of the most popular photo sharing websites, as the social media platform in our study of social event analysis. In Fig.1, we show an example of two different social events with multi-modality including user-provided metadata (e.g., title and tags) and images. Here, each image and its corresponding metadata are considered as one social media document. We assume that social events represented with different modalities but describing the same concept are quite related in their hidden topic. Therefore, it is suitable to adopt topic model based methods to mine multi-modal topics of social events. To make this come true, we can extend the traditional Latent Dirichlet Allocation (LDA) with multi-modal property. However, the existing multi-modal Latent Dirichlet Allocation (mmLDA) is an unsupervised topic model , which cannot utilize the supervised category labels to boost classification performance. To overcome this problem, we propose a novel multi-modal supervised Latent Dirichlet Allocation (mm-SLDA) to consider the supervised information. Moreover, all the above topic models ignore the document weight distribution and their training process is time-consuming on a large-scale dataset. Therefore, we utilize boosting weighted sampling algorithm to reduce the number of training documents by considering the document weight distribution and iteratively improve the social event model. Inspired by the above discussion, we propose a novel boosted multi-modal supervised latent dirichlet allocation (BMM-SLDA) algorithm to iteratively obtain multiple classifiers for social event classification. The basic idea is to integrate a supervised topic model process in the boosting framework. Each iteration of boosting begins to select a small part of documents from large-scale training data according to their weights assigned by the previous boosting step. Based on the sampled small subset of data, the proposed mm-SLDA is applied to learn the corresponding topic model. In mm-SLDA training process, it can capture the visual and textual topics across multi-modal social event data jointly and directly utilize the supervised event category information for social event classification modeling, which is implemented by considering the class label response to variable drawn from a softmax regression function. The resulting topic model is then applied to learn a new classifier. Based on the learned classifier, the documents in the training data are classified to obtain the classification scores. Finally, the new weights of documents in training data are updated by the use of the classification scores. Based on the above procedure, it is clear that the documents are iteratively reused with different weights to learn multiple topic models to build a strong social event classifier. In a word, our algorithm has two steps. Figure 2. shows the main steps of the proposed algorithm. In the training step, we iteratively learn multiple topic models based on the selected docACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
1:4
S. QIAN, T. ZHANG, C. XU, and M. Shamim Hossain
uments from the training data, and obtain the corresponding weak classifiers. In the testing step, the documents are described by the learned topic models and classified with the combination of the corresponding weak classifiers to determine their final class labels. Compared with existing methods, the contributions of this work are four-fold. (1) Our BMM-SLDA algorithm is suitable for large-scale data analysis by utilizing boosting weighted sampling strategy to iteratively select a small subset data to efficiently train the corresponding topic models. (2) Our BMM-SLDA effectively exploits social event structure by document weight distribution with classification error, and can iteratively learn new topic model to correct the previously misclassified documents. (3) Our proposed mm-SLDA can effectively exploit the multi-modal property and the supervised information of social event jointly. (4) We collect a large-scale dataset for research on social event classification with multi-modality information, and will release it for academic use. We evaluate our proposed model and demonstrate that it achieves much better performance than existing methods. Preliminary results of the proposed BMM-SLDA method are presented in [Qian et al. 2014a] and [Qian et al. 2014b]. Compared with the preliminary results, a number of improvements are made: (i) We propose a novel BMM-SLDA for social event classification. Compared with our BMM-SLDA, the model in [Qian et al. 2014a] is its special case without the weighted sampling strategy, and the model in [Qian et al. 2014b] is only a topic model method without the boosting strategy. (ii) We collect a large-scale dataset to evaluate the proposed method and will release it for academic use. (iii) More detailed analysis of recent related work is performed and more experiments with state-ofthe-art results are conducted. The rest of the paper is organized as follows. In Section 2, the related work is reviewed. The BMM-SLDA is presented in Section 3. In Section 4, our proposed mmSLDA and its optimization process are introduced. In Section 5, we report and analyze extensive experimental results. Finally, we conclude the paper with future work in Section 6. 2. RELATED WORK
In this Section, we briefly review previous methods which are most related to our work including event classification methods and existing topic model methods. Event Classification: With the massive growth of social events on the Internet, efficient organization and monitoring of social events becomes challenging. Researchers have been working on social event analysis and proposed many different methods [Becker et al. 2010; Liu and Huet 2013; Reuter and Cimiano 2012; Diakopoulos et al. 2010; Makkonen et al. 2004; Becker et al. 2009; Firan et al. 2010], which are based on single-modality (e.g., text, images ) information or multi-modality information. About the single-modality analysis, many existing methods adopt textual information (e.g., names, time references, locations, title, tags, and description) or visual information (e.g., images and videos) [Diakopoulos et al. 2010; Makkonen et al. 2004] to model social event. Diakopoulos et al. [Diakopoulos et al. 2010] study event visualization and event analysis by analyzing Twitter messages related to media events. Makkonen et al. [Makkonen et al. 2004] extract meaningful semantic features such as names, time references, and locations, and learn a similarity metric based on a single clustering partition. Becker et al. [Becker et al. 2009] exploit the rich context associated social media data and use clustering algorithms to identify events. However, the single-modality based methods ignore the multi-modal property of social event and cannot outperform the multi-modality based methods generally. To address this problem, many researchers attempt to adopt multiple different features (e.g., time, tag, location feature, images, and videos) to calculate the similarity of social event documents [Liu and Huet 2013; Firan et al. 2010; Becker et al. 2010; Reuter and Cimiano 2012]. Liu and Huet [Liu and Huet 2013] investigate the different features between events and social media data and how to deal with missing values in social media data. In [Firan et al. 2010], the similarity of the social documents is calculated based on each feature such as time, tag and location feature. In [Becker et al. 2010], an incoming set of media documents is classified into the ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
Social Event Classification via Boosted Multi-Modal Supervised Latent Dirichlet Allocation
1:5
related events by including a function trained using machine learning techniques and the similarity of social media documents is modeled by multiple features. In [Reuter and Cimiano 2012], a combination of the featured-specific similarity metrics is built for each social media event in their event identification framework. While the above methods focus on feature design to improve experimental performance, the semantic relationship and importance of those features have not been studied in details. Different from the above methods, we make use of the rich multi-modal contents associated with social events, including user-provided textual information (e.g., title, tags) and visual information (e.g., images), and propose a novel mm-SLDA to model the multi-modality and the supervised category labels jointly. Topic Model: Topic models, such as Latent Dirichlet Allocation (LDA) [Blei et al. 2003] and Probabilistic Latent Semantic Analysis (PLSA)[Hofmann 1999], have been widely applied to various applications and have many extensions, such as supervised Latent Dirichlet Allocation (SLDA) [Blei and McAuliffe 2008; Lacoste-Julien et al. 2008; Niu et al. 2011; Gao et al. 2012; Bao et al. 2013], multi-modal Latent Dirichlet Allocation (mmLDA) [Ramage et al. 2009b; Sang and Xu 2012]. Wang et al. [Wang et al. 2005] propose a novel topic model to simultaneously discover groups among the entities and topics among the corresponding text. Phan et al. [Phan et al. 2008] propose a framework aiming at building text & Web classifiers with hidden topics from large-scale data collections utilizing the LDA models in order to help classify short text segment. In [Ramage et al. 2009a], LDA is extended to a supervised model to directly learn word-tag correspondences, which improves expressiveness over traditional LDA with visualizations of a corpus of tagged web pages. Gao et al. [Gao et al. 2012] propose supervised cross-collection Latent Dirichlet Allocation (scLDA), which can sufficiently capture and explore various topics across collections and utilize category information for text and image classification. In [Bao et al. 2013], a supervised topic model called partially supervised cross-collection LDA (PSCCLDA) is proposed for cross-domain learning in a unified way. However, the traditional LDA [Blei et al. 2003] and SLDA [Blei and McAuliffe 2008] mainly focus on how to apply the model to textual corpora, while the SLDA model using continuous response values via a liner regression cannot be used for multi-class classification problem [Wang et al. 2009] and it also does not consider multi-modal corpora. The mmLDA [Sang and Xu 2012] considers multi-modal information, such as users’ textual annotations and visual images, and is proposed for social relation mining. Our proposed topic model is different from the previous models. Compared with the mmLDA[Sang and Xu 2012], our proposed model focuses on social event classification. Furthermore, the traditional multi-modal LDA [Putthividhy et al. 2010] does not utilize the supervised category labels. We extend multi-modal LDA to a supervised topic model (mm-SLDA) with softmax regression function, which is used because social events have multi-class property and can be classified into multiple classes directly. Therefore, the proposed mm-SLDA model combines the generative and discriminative approaches, which is better for modeling social event data. Moreover, all the above topic models ignore the document weight distribution and it is time-consuming to train the models on a large-scale dataset. Different from these models, we utilize boosting weighted sampling strategy to iteratively select only a small part of training documents by considering the document weight distribution to train the corresponding topic models. As a result, our algorithm is much more efficient. 3. THE PROPOSED ALGORITHM
In this Section, we first overview our proposed algorithm for multi-modal social event classification. We then introduce our proposed model and its learning algorithm in details. Finally, we show how to use the proposed model to build classifier for social event classification. 3.1. Overview
To describe social event with multi-modal property, we adopt the traditional bag-of-word representation for both image and text. In such a representation, word ordering is ignored and a document is simply reduced to a vector of word count. A multimedia document consisting of an image and the corresponding text information (such as title, description, tags, etc.) is thus summarized in our ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
1:6
S. QIAN, T. ZHANG, C. XU, and M. Shamim Hossain
Algorithm 1: Social Event Classification via Boosted Multi-Modal Supervised Latent Dirichlet Allocation. input : Training Data ES = {ei , yi }M i=1 . Topic number K and The number of weak classifiers T . Document weight distribution d1i = 1/M, ∀i = 1, . . . , M T , coefficients {α }T , topic models {F (e)}T . output: weak classifiers {ht (e)}t=1 t t=1 t t=1 for t = 1 to T do Sample ESt from ES according to dt , and learn the topic models Ft (e) on ESt as in Section 3.2.1. Design a weak classifier ht (e) as in Section 3.2.2. Compute their error εn and αn according to Eq.(2) and Eq.(3), respectively. Update instance weight distribution dt+1 as in (4) in Section 3.2.3. end representation as a pair of vectors of word counts. For visual description, the words are based on image patches, which are obtained by SLIC segmentation method [Achanta et al. 2010]. To obtain the description for image patch, we densely sample SIFT points, and adopt the popular sparse coding based method[Liu et al. 2011; Zhang et al. 2013] to encode each SIFT point. Then, based on the feature codes of all SIFT points of each patch, we adopt max pooling to obtain its description. Once obtaining all image patch descriptions, we adopt K-means to build a codebook (5000 words). By hard assignment coding of each patch, each image can be described as the counts of the words in the codebook. An image word is denoted as a unit-basis vector v of size Dv with exactly one non-zero entry representing the membership to only one word in a dictionary of Dv words. A text word wn is similarly defined for a dictionary of size Dw . An image is a collection of N v word occurrences denoted by v = {v1 , v2 , . . . , vN v }, and the text is a collection of N w word occurrences denoted by w = {w1 , w2 , . . . , wN w }. Given a set of social media documents, the problem that we address in this paper is how to identify events (e.g., Syrian civil war, US presidential election) that are reflected in the documents, as well as the documents that correspond to each event. We cast our task as a classification problem over social media documents (e.g., images, text). Let ES = {(e1 , y1 ), (e2 , y2 ), . . . , (eM , yM )} denote a training dataset of M documents, where em = [vm , wm ] is the mth image-text pair and ym ∈ {1, 2, . . . ,C} is the class label of the document em . Here, C is the number of event class labels. Let ET = {ei } be the test documents. Our aim is to learn a classifier H(e) for ∀e ∈ ET with the assistance of the labeled set ES . To achieve this goal, we propose a boosted topic model learning method to iteratively obtain multiple topic model classifiers for social event classification. As shown in Algorithm 1, it gives the details of the training step of the proposed BMM-SLDA algorithm. In the training step, multiple topic models are learned inside a boosting procedure with different training documents. In each iteration t, we sample a subset ESt from the whole training document set ES according to their weights dt assigned by the previous boosting iteration. This subset ESt is then used as a guide to learn the corresponding topic model parameters Ft (e) as introduced in Section 3.2.1. Then, the learned parameters are applied to learn a new weak learner ht (e). Based on the classifier, the documents e ∈ ES are classified to obtain their classification errors εt . Finally, the new weights of documents are updated by using the classification error εt . In this way, documents in the training set with large weights are more likely to be selected for training a new topic model Ft+1 (e) in the next iteration. Once this procedure converges, we obtain a set of topic models {Ft (e)}, and a set of weak learners {ht (e)} and their corresponding combination coefficients {αt } to get the final strong classifier H(e) as shown in (5). After training, each document e ∈ ET is mapped with topic models {Ft (x)} to obtain their descriptions. Then, each mapped description is classified by its corresponding weak classifier ht (e). The predicted results of all T weak classifiers are combined to decide the final class label H(e). The detail is introduced in the next subsections. ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
Social Event Classification via Boosted Multi-Modal Supervised Latent Dirichlet Allocation
1:7
3.2. Our Boosting Model
At each iteration t, our algorithm has four major components, topic model learning, weak learner designing, document weight updating, and social event classifier, described as follows. 3.2.1. Topic Model Learning. In each iteration t, a subset ESt is sampled from the whole training set ES according to their weights dt . Then, on this subset, we apply our mm-SLDA to learn the corresponding topic model Ft (e) = {Φw , Φv , η}, and the details will be introduced in Section 4. Once obtaining the topic model Ft (e), we can predict the label of a new event document. Given a new social event document enew , which is composed of many textual words wnew and associated visual words vnew , we first sample the topic assignments of all tokens including text words and visual words. Then, we can obtain the empirical topic proportion of various topics enew and adopt the learned class coefficients η for prediction. 3.2.2. Weak Learner Designing. Once we obtain the topic model Ft (ei ), we need to design an effective weak learner ht (ei ). Specifically, we adopt an effective softmax regression function. The weak classifier is defined as in Eq.(1).
ht (enew ) =
(p(yenew = c|enew , η))
arg max
yenew ∈{1,2···,C}
=
(1)
(ηTc enew )
arg max
yenew ∈{1,2···,C}
where p(yenew = c|enew , η) = exp(ηTc enew )/ ∑Cl=1 exp(ηTl enew ). After constructing the weak learner, similar to the conventional multi-class AdaBoost scheme [Zhu et al. 2009; Zhang et al. 2009; Zhang et al. 2011], we compute its classification error εt and assign a weight αt for the weak learner ht (e) as shown in Eq.(2) and Eq.(3), respectively. Here, I(·) is an indicator function, so that I(true) = 1, and I( f alse) = 0. M 1 dti · I M i ∑i=1 dt i=1
yi 6= ht (ei )
(2)
αt = ln (1 − εt )/εt + ln(C − 1)
(3)
εt =
∑
3.2.3. Document Weight Updating. Our document weight distribution update scheme is shown in (4). In each iteration of the conventional AdaBoost algorithm, weights of the misclassified documents are increased while weights of the correctly classified documents are decreased. It is defined by i dt+1 = dti · exp αt · I yi 6= ht (ei ) , ∀i = 1, . . . , M, (4)
where dt = [dt1 , dt2 , . . . , dtM ]> and dti is the weight of the ith document in the ES . 3.2.4. Social Event Classifier. Once the boosting procedure converges, we obtain a set of topic models {Ft (e)}, and a set of weak learners {ht (e)} and their corresponding combination coefficients {αt }. Then, the learned social event classifier H(e) is T
H(e) = arg max
∑ αt · I
ht (e) = c
(5)
c∈{1,...,C} t=1
4. OUR MM-SLDA MODEL
Based on the sampled subset ESt at the t th iteration, we deal with the social event classification problem via mm-SLDA model, which can make use of the event multi-modal property and the event category information jointly to learn an effective and discriminative event model. Table I ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
1:8
S. QIAN, T. ZHANG, C. XU, and M. Shamim Hossain Table I. The key notations of our proposed mm-SLDA model. Notations m ∈ {1, 2, ..., M} ym ∈ {1, 2, ...,C} Nv, Nw v Φw k , Φk Ωm αΦw , αΦv , αΩ w, v zw , zv zm ηc , c ∈ {1, ...,C}
Description The index of social event document The supervised label of a social event document em The collection of images and associated text for a social event document The multinomial distributions of textual word and visual descriptor specific to the topic k The multinomial distribution of topics specific to event document em The parameters of Dirichlet distribution The textual word and visual descriptor vectors The topic assignments for textual word and visual descriptor The empirical topic distribution for event document em The class coefficients, which each coefficient is the K dimensional vector
Fig. 3. The proposed multi-modal supervised Latent Dirichlet Allocation topic model for social event classification. For details, please refer to the corresponding text.
lists the key notations of our proposed mm-SLDA model. The proposed model has the graphical representation as shown in Fig. 3. From the figure, we can see that our model can mine the visual and textual topics of different social events together by considering the supervised label information. Input Mt = |ESt | documents with their labels ym , our aim is to infer the event document distribution Ωm , a set of C class coefficients η1:C , and the K text and image topics Φw and Φv . Here, the K is the number of topics. The Ωm represents that many tags and associated image in a social event document share the same document-specific distribution over topics. Each inferred coefficient ηc is a K dimensional vector, and represents the parameter values of softmax regression in the cth class. The generative process of mm-SLDA for an image-text pair document m with N v visual words, N w text words and their labels are given as follows: 1. For each visual topic k ∈ {1, 2, . . . , K}, draw Φv |αΦv ∼ Dir(αΦv ) 2. For each textual topic k ∈ {1, 2, . . . , K}, draw Φw |αΦw ∼ Dir(αΦw ) 3. Draw topic proportions Ωm |α ∼ Dir(αΩ ) 4. For each visual word vn , n ∈ {1, 2, ..., N v } (a) Draw a topic assignment zvn |Ωm ∼ Mult(Ωm ) (b) Draw a visual patch vn |zvn , Φv ∼ Mult(Φvzn ) 5. For each textual word wn , n ∈ {1, 2, ..., N w } (a) Draw a topic assignment zwn |Ωm ∼ Mult(Ωm ) (b) Draw a word wn |zwn , Φw ∼ Mult(Φwzn ) 6. Draw class label ym |zm ∼ softmax(zm ,η), where zm = (zm v + zm w )/(N v + N w ). ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
Social Event Classification via Boosted Multi-Modal Supervised Latent Dirichlet Allocation
1:9
Here, the softmax function provides the following distribution C
p(ym |zm , η) = exp(ηTym zm )/ ∑l=1 exp(ηTl zm ), where K-dimensional vectors η1:C and zm represent a set of class coefficients in our mm-SLDA and the empirical proportion of textual and visual topics occurred in event document m, respectively. As in the generative process of the mm-SLDA model, we obtain the topic zn of a word by Gibbs Sampling method firstly, and then obtain the representatives of textual document and visual image. Since the topic number is set K, the textual document and visual image could be represented by K-dimensional vectors. In our model, we treat visual and textual topics as the same as hidden topics. As a result, one social media document including textual document and visual image can also be represented by K-dimensional vector in the hidden semantic topic space by integrating the learned textual and visual topics. Note that, the zm indicates the feature description of the m-th media document and represents the empirical proportion of textual and visual topics occurred in the media document m. The zm w and zm v are the empirical topic frequencies of textual information and visual information, respectively, and are K-dimensional vectors in the hidden semantic topic space. Because visual and textual topics are treated as the same hidden semantic topics, we represent zm by integrating the obtained zm w and zm v and using the simple frequency normalization, and then use a softmax regression function to represent the class information of the media document as in [Wang et al. 2009]. During the model learning process, we assume that the priors distributions follow symmetric Dirichlet, which are conjugate priors for multinomial. Each event document has two different kinds of features: wn , vn , where wn and vn are textual information and visual information, respectively, and they are semantically related to each other in each social event document. 4.1. Parameter Inference
In our mm-SLDA, the posterior joint probability can be factorized as: p(zw ,zv , w, v, y|αΩ , αΦw , αΦv , η) ∝ p(zw , zv |αΩ )×p(w|zw , αΦw ) × p(v|zv , αΦv ) × p(y|¯z, η) Z
p(zw |Ω)p(zv |Ω)p(Ω|αΩ )dΩ
Z
p(v|zv , Φv )p(Φv |αΦv )dΦv × p(y|¯z, η)
∝
Z
p(w|zw , Φw )p(Φw |αΦw )dΦw
B(nw.,k + αΦw ) B(nv.,k + αΦv ) B(nwm,· + nvm,· + αΩ ) ∏ ∏ B(αΦw ) ∏ B(αΦv ) B(αΩ ) m=1 k k M
∝ M
C
C
(m)=l }
∏ ∏ {exp(ηTl zm )/ ∑ j=1 exp(ηTj zm )}1{y
,
(6)
m=1 l=1
w w w w v k Γ(αk ) where B(α) = ∏ Γ(∑k αk ) is a normalizing constant, nm,· =< nm,1 , . . . , nm,k , . . . , nm,K > and nm,· =< nvm,1 , . . . , nvm,k , . . . , nvm,K >, where nwm,k = ∑i 1(zwm,i = k) and nvm,k = ∑i 1(zvm,i = k) are the numbers of topic k assigned to the textual information and visual information of event document m respectively. Similarly, nw·,k =< nw1,k , . . . , nwi,k , . . . , nwDw ,K > and nv·,k =< nv1,k , . . . , nvi,k , . . . , nvDv ,K >, where nwi,k and nvi,k are the numbers of word i assigned to textual topic Φwk and visual topic Φvk in all event documents, respectively. 1{·} is the indicator function, so that 1{true} = 1, and 1{false} = 0. Exact inference is often intractable in many topic models and appropriate methods must be used, such as variational inference [Blei et al. 2003]and Gibbs sampling [Griffiths and Steyvers 2004]. Gibbs sampling is a type of Markov chain Monte Carlo algorithm and is involved into an EM strategy for parameter inference in this paper. In EM terminology, we sample the value of z by
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
1:10
S. QIAN, T. ZHANG, C. XU, and M. Shamim Hossain
Gibbs sampling method given the parameters η1:C in E-step, and update η1:C by maximizing the joint likelihood of variables in M-step. E-step: In the E-step, we adopt collapsed Gibbs sampling to sample from the distribution conditioned on the previous state. Under our model, the hidden variables zw and zv need to be assigned. The conditional posterior distribution of the latent topic indicators in text document can be written as p(zwm,i = k|z¬(m,i),w , w, v, y, αΩ , αΦw , αΦv , η) ∝
(n¬(i) + αΩ ) m,k [∑Kk=1 (nm,k
¬(m,i),w
+ αΩ )] − 1
C
+ αΦw Dw [∑ p=1 (nwp,k + αΦw )] − 1 nq,k
(m)=l }
C
∏ {exp(ηTl z)/ ∑ j=1 exp(ηTj z)}1{y
(7)
l=1
Here, z¬(m,i),w denotes the vectors of topic assignment except the considered word at position i in the ¬(m,i),w textual information w of event document m, nq,k denotes the number of times of word q assigned w to topic k except the current assignment in the text document w, [∑Dp=1 (nwp,k + αΦw )] − 1 denotes the total number of words assigned to topic k except the current assignment in the text document w, n¬(i) denotes the number of text words and image patches in event document m assigned to topic m,k k except the current assignment, [∑Kk=1 (nm,k + αΩ )] − 1 denotes the total number of text words and image patches in event document m assigned to topic k except the current assignment, αΩ , αΦw , αΦv are symmetric hyperparameters controlling the corresponding Dirichlet prior distributions, η denotes class coefficients and each class coefficient ηc is a K dimensional vector. The descriptions of parameters in images v are similar, and the conditional posterior distribution of the latent topic indicators is:
p(zvm,i = k|z¬(m,i),v , w, v, y, αΩ , αΦw , αΦv , η) ∝
(n¬(i) + αΩ ) m,k [∑Kk=1 (nm,k
¬(m,i),v
+ αΩ )] − 1
C
+ αΦv Dv v [∑ p=1 (n p,k + αΦv )] − 1 nq,k
C
∏ {exp(ηTl z)/ ∑ j=1 exp(ηTj z)}1{y
(m)=l }
(8)
l=1
After Gibbs sampling, we can estimate Φw , Φv , Ω as follows. Φwk,q
=
nwq,k + αΦw D
w (nwp,k + αΦw ) ∑ p=1
Φvk,q =
Ωm,k =
nvq,k + αΦv D
v (nvp,k + αΦv ) ∑ p=1
(nm,k + αΩ ) ∑Kk=1 (nm,k + αΩ )
(9)
(10)
(11)
M-step: In the M-step, we update the class coefficients η by maximizing the joint likelihood. Because we fix parameters obtained in the E-step, it is equivalent to maximizing p(y|z, η) where each event document m is represented by z newly updated in the E-step. Specifically, we learn L2 -regularized softmax regression model which solves the following unconstrained optimization ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
Social Event Classification via Boosted Multi-Modal Supervised Latent Dirichlet Allocation
1:11
problem: min(− η
1 MT
MT
∑
T
C
∑ 1{y(m) = c} log
m=1 c=1
eηc zm C
+
ηTl zm
∑e
λ C T ∑ ηi η) 2 i=1
(12)
l=1
where λ is a regularization parameter and is set to be 1.0, and we apply a trust region Newton method [Lin et al. 2008] for optimization. 4.2. Weighted Sampling Strategy
The above proposed mm-SLDA model assumes that each token in the document is equally important in Gibbs sampling process. However, in our boosting iteration process, we sample a subset ESt from the whole training document set ES according to their weights dt assigned by the previous boosting iteration. As a result, each document is given a weight value. To effectively train our mm-SLDA model, it is better to consider the weight distribution of documents to calculate the conditional probabilities. Our basic hypothesis is that it will improve the classification accuracy by taking the weight distribution of each document into account to recalculate p(zm,i |z, w, v, y, αΩ , αΦw , αΦv , η)). The effectiveness of this weighted sampling strategy will be demonstrated in our experimental results. In our implementation, the boosting weight value of each document is mapped to each word of the document, and the Eq.(7) and Eq.(8) will be updated by incorporating the weight distribution information and replacing the counts denoted n in Eq.(7) and Eq.(8) with the weight value denoted W as in Eq.(13) and Eq.(14). p(zwm,i = k|z¬(m,i),w , w, v, y, αΩ , αΦw , αΦv , η) ∝
¬(i) + α ) (Wm,k Ω
[∑Kk=1 (Wm,k
C
+ αΩ )] − 1 C
¬(m,i),w
+ αΦw Dw w [∑ p=1 (Wp,k + αΦw )] − 1 Wq,k
(m)=l }
∏ {exp(ηTl z)/ ∑ j=1 exp(ηTj z)}1{y
(13)
l=1
p(zvm,i = k|z¬(m,i),v , w, v, y, αΩ , αΦw , αΦv , η) ∝
¬(i) + α ) (Wm,k Ω
[∑Kk=1 (Wm,k
C
+ αΩ )] − 1 C
¬(m,i),v
+ αΦv Dv v [∑ p=1 (Wp,k + αΦv )] − 1 Wq,k
(m)=l }
∏ {exp(ηTl z)/ ∑ j=1 exp(ηTj z)}1{y
(14)
l=1
i is the updated weight value of tokens of type i in document m assigned to topic k. W ¬(i) Here, Wm,k m,k denotes the weight values of text words and image patches in event document m assigned to topic ¬(m,i),w denotes the weight values of word q assigned to topic k k except the current assignment, Wq,k ¬(m,i),v
is similar in the image v. The except the current assignment in the text document w and Wq,k other equations for Gibbs sampling and the estimations of Φw , Φv , Ω remains unchanged [Wilson and Chew 2010]. For our weighted sampling strategy, if all the term weight values are 1, our model becomes the same as the above proposed mm-SLDA in Eq.(7) and Eq.(8). 4.3. Discussion
The details of the proposed mm-SLDA model and weighted sampling strategy are discussed in above. Compared with the existing topic model methods, the differences are: ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
1:12
S. QIAN, T. ZHANG, C. XU, and M. Shamim Hossain Table II. Illustration of the event name, duration time and number of documents for each event in our collected social event dataset. Event ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Event Name Senkaku Islands dispute Occupy Wall Street United States presidential election War in Afghanistan Gangnam Style North Korea nuclear program Greek protests Mars Reconnaissance Orbiter the 2011 Norway attacks Syrian civil war the development of Microsoft Windows Apple Jobs the dispute of the South China Sea cannabis legalization in the U.S. 2011 England riots Japan earthquake and tsunami Putin’s return Death of Michael Jackson
Start Time 2008.06 2011.09 2009.10 2001.10 2012.07 2000.01 2011.05 2005.04 2011.01 2011.01 2000.01 2007.06 2012.04 2000.01 2011.08 2011.03 2012.05 2009.06
End Time 2012.12 2012.09 2013.01 2012.08 2013.01 2012.04 2012.04 2012.08 2011.07 2013.05 2013.10 2013.09 2013.07 2012.01 2011.10 2011.12 2013.10 2011.03
# of Documents 4844 7524 5558 7922 4630 4646 7637 6077 4552 6684 4554 5039 7026 4564 6034 7469 6737 6129
(1) Compared with the traditional unsupervised topic models, such as LDA [Blei et al. 2003] and mm-LDA [Ramage et al. 2009b; Sang and Xu 2012], our proposed mm-SLDA is a supervised topic model that can utilize the supervised category labels to potentially learn semantic topics that are in line with the class label. Moreover, the traditional topic model [Blei et al. 2003] mainly focuses on how to apply the model to textual corpora while our mm-SLDA sufficiently considers multi-modal corpora, which can effectively exploit the textual and visual topic property of social event jointly. (2) Compared with the traditional supervised topic models (SLDA) [Blei and McAuliffe 2008; Lacoste-Julien et al. 2008; Niu et al. 2011; Gao et al. 2012; Bao et al. 2013], our proposed mmSLDA is a flexible model for multi-class problem. The SLDA model using continuous response values via a liner regression cannot be used for multi-class classification problem. However, our mm-SLDA can sufficiently utilize the supervised category labels of social events by adding the softmax regression function to train event model and classify events into multiple classes directly. (3) All the above topic models, such as LDA [Blei et al. 2003], mm-LDA [Ramage et al. 2009b; Sang and Xu 2012], SLDA [Blei and McAuliffe 2008], and our proposed mm-SLDA, ignore the document weight distribution. Meanwhile, it is time-consuming to train the models on a largescale dataset. Different from these models, we propose a novel boosted multi-modal supervised latent dirichlet allocation (BMM-SLDA) algorithm based on our mm-SLDA to iteratively learn multiple topic models for social event classification. Specifically, we utilize boosting weighted sampling strategy to reduce the number of training documents by considering the document weight distribution and take the weight information of each document into account to train our mm-SLDA at each iteration. As a result, our model is much more efficient for social event modeling. 5. EXPERIMENTAL RESULTS
In this Section, we show extensive experimental results on our collected dataset to demonstrate the effectiveness of our model. We first introduce dataset construction and then show feature extraction. Finally, we give results and analysis. 5.1. Dataset Collection
Nowadays, there are already some public event datasets, such as the MediaEval social event detection(SED) [Reuter et al. 2013] and TRECVID [Over et al. 2013]. However, the existing MediaEval SED dataset organized and attended by people and illustrated by social media content created by people do not consider current major social hot events. The TRECVID dataset mainly focusACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
Social Event Classification via Boosted Multi-Modal Supervised Latent Dirichlet Allocation
1:13
es on some simple and general events. To the best of our knowledge, there is no multi-modality social event dataset available for social event classification task. Therefore, we collect the dataset by ourselves from the photo-sharing website Flickr. The dataset contains 18 different social event categories happened in the past few years as shown in Table II. For each social event, we use keywords and the site’s public API to crawl related images and text information. Each image and the associated text information (tags, title, and description) are considered as a social event document. The collected 18 social event categories cover a wide range of topics including politics, economics, entertainment, military, society, and so on. For each social event category, there are about 4000 to 8000 event documents, and totally, there are about 107, 600 social event documents on this dataset. From our collected dataset, some events are quite similar, such as “Senkaku Islands dispute” and “the dispute of the South China Sea”, “War in Afghanistan” and “Syrian civil war”, “Occupy Wall Street” and “Greek protests”. Those events have similar topics, which brings great challenges to event classification. Some examples of these 18 social events on this dataset are shown in Fig.4.
Fig. 4. Examples of 18 social events in our collected dataset. Here, we show text and image samples on Flickr and each event has hundreds of text documents with the related images. For better viewing, please see the original pdf file.
5.2. Feature Extraction
For textual description, we use stemming method and stop words elimination and remove words with a corpus frequency less than 15, and there are 35, 565 unique words left. For visual description, the words are based on image patches, which are obtained by SLIC segmentation method [Achanta et al. 2010]. To obtain the description for image patch, we densely sample SIFT points, and adopt the popular sparse coding based method to encode each SIFT point. Then, based on the feature codes of all SIFT points of each patch, we adopt max pooling to obtain its description. Once obtaining all image patch descriptions, we adopt K-means to build a codebook. By hard assignment coding of each patch, each image can be described as the counts of the words in the codebook. 5.3. Results and Analysis
In this subsection, we show more results and analysis. In the subsection 5.3.1, we show the parameter analysis of our model. In the subsection 5.3.2, we give the qualitative evaluation of the mined visual and textual topics. In the subsection 5.3.3, we show the quantitative results compared with the existing methods. ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
1:14
S. QIAN, T. ZHANG, C. XU, and M. Shamim Hossain 1
Accuracy on Testing Data
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6 1000
2000
3000
4000
5000
6000
7000
8000
Codebook Size
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
Accuracy
Accuracy
Fig. 5. The classification accuracy is changed with different codebook size.
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
5
10
15
20
25
30
Number of Topics
(a) Accuracy vs number of topics.
35
40
0
0
1
2
3
4
5
6
7
8
9
10
EM Iteration
(b) The iteration process of EM.
Fig. 6. The parameter analysis of our proposed topic model mm-SLDA. The classification accuracy is changed with the number of topics and the EM iteration.
5.3.1. Parameter Analysis. Codebook Size: For visual image processing, the codebook size is important. We evaluate the classification accuracy of our proposed BMM-SLDA with different codebook size. The result is shown in Fig.5. Experimental results demonstrate that the classification accuracy is quite stable when the codebook size is changed from 5000 to 7000. In our implementation, the codebook size is set 5000 on out dataset due to its efficiency and performance. Topic Number K: In topic modeling, how to set the topic number K is not trivial. We evaluate the accuracy of our proposed BMM-SLDA with different numbers of topics. For simplicity, we only evaluate this parameter in our proposed mm-SLDA model. The result is shown in Fig.6(a). From the Fig.6(a), we can see that the number of topics is changed from 1 to 40 and the accuracy of our model is quite stable when the number of topics is changed from 30 to 40. Note that the value of K depends on the social event dataset and larger topic number requires more computation cost. In our ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
Social Event Classification via Boosted Multi-Modal Supervised Latent Dirichlet Allocation 1
1:15
1 0.95
0.95
Accuracy on Testing Data
Accuracy on Training Data
0.9 0.9
0.85
0.8
0.75
0.85 0.8 0.75 0.7 0.65
0.7 0.6 0.65 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0.55 1
2
3
4
5
Iteration
6
7
8
9
10
11
12
13
14
15
Iteration
(a) On training data.
(b) On testing data.
Fig. 7. The classification accuracy of our BMM-SLDA, which is improved with the boosting procedure on training data and testing data, respectively.
dataset, we set K as 30 in the subsequent experiments, which achieves the best performance for our mm-SLDA model. EM Convergence: We adopt EM algorithm to solve our mm-SLDA model, and it is important to guarantee its convergence. The iteration process of our optimization is shown in Fig.6(b). From the Fig.6(b), we can see that the performance of our mm-SLDA increases quickly during the first 3 iterations and tends to converge after that. The result shows that EM algorithm can guarantee the convergence of our mm-SLDA. Boosting Convergence: In Fig.7, we show that the classification accuracies increase with the iteration of boosting procedure on training data and testing data, respectively. From the results, we can see that, when we iteratively learn more and more useful topic models, the accuracies can be improved. Based on the iteration process as shown in Fig.7, we can see that our BBM-SLDA can converge well and the reasonably accurate solutions are available after 10 iterations. 5.3.2. Qualitative Evaluation. We demonstrate the effectiveness of our BMM-SLDA model on the mining of social events and show the discovered topics in Fig.8. For simplicity, we only visualize the topics mined in the first iteration of our boosting procedure. Here, we show 8 of the discovered 30 topics with their top five textual words and the five most related images, respectively. In text and image visualization, the textual words are sorted by the probability p(w|z), while the images are sorted by counting the number of visual descriptors and textual words with the corresponding topic in different event documents p(zk |wd , vd ):
p(zk |wd , vd ) =
nvd,k + nwd,k ∑Kk=1 (nvd,k + nwd,k )
(15)
where nvd,k = ∑i 1(zvd,i = k) and nwd,k = ∑i 1(zwd,i = k) represent the numbers of topic k assigned to the textual words and the visual descriptors of event document d, respectively. By providing a multi-modal information of the representative textual and visual words, it is very intuitive to interpret the social events with each associated topic. As shown in Fig.8, the results are impressive and satisfy our expectation, where most of the extracted event topic are meaningful and textual words are well aligned with the corresponding visual image content. Fox example, in Fig.8, “topic #11” shows the top five textual words and the five most related images. Specifically, textual words, such as riot, England, London, attack, police, are shown according to the rank of their probability values in “topic #11”. Similarly, images are shown according to the Eq.(15). For example, 0.05857 is the probability value p(w|z) of the term “riot” and 0.62124 is the probability ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
1:16
S. QIAN, T. ZHANG, C. XU, and M. Shamim Hossain
Fig. 8. The discovered topics by our proposed BMM-SLDA. Here, we show eight topics with their top five textual words and the five most related images. See text for more details.
value of Eq.(15) for the image. After learning, because the number of topics 30 is greater than the number of event categories 18 in our collected dataset, there are also some “not good” discovered topics, such as “topic #18”, “topic #30”, as shown in Fig.8. Based on the results, we can confirm that our proposed model can effectively mine the topics of social events. 5.3.3. Quantitative Evaluation. We compare our models (mm-SLDA, BMM-SLDA* and BMMSLDA) with 5 baseline methods (mmLDA+SG, mmLDA+SVM, SVM, SLDA(Visual), and SLDA(Text)), which are the most related to our work. For the three standard classification algorithms mmLDA+SG, mmLDA+SVM, SVM, their features are different. For mmLDA+SG and mmLDA+SVM, we use the unsupervised mmLDA model to represent each event document as a Kdimensional vector. For SVM, we adopt traditional Bag-of-Words model to describe each event document. Then, we train classifier using softmax regression method and Support Vector Machine (SVM), respectively. Finally, we use the trained model to predict class labels of test data. The SLDA(Visual) and SLDA(Text) [Wang et al. 2009] are the supervised model with only visual feature and textual feature, respectively. The mm-SLDA is our proposed supervised model trained with all training data, which ignores the document weight distribution compared with the BMM-SLDA* and BMM-SLDA. The BMM-SLDA* is a special case of BMM-SLDA. Specifically, the BMM-SLDA* uses the standard mm-SLDA to train the model while the BMM-SLDA considers the weight distribution of documents to train our mm-SLDA and adopts the weighted sampling strategy to calculate the conditional probabilities. ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
Social Event Classification via Boosted Multi-Modal Supervised Latent Dirichlet Allocation
1:17
Table III. The social event classification accuracy compared with other existing methods on test dataset. Methods mmLDA+SG SVM SLDA(Text) BMM-SLDA*
Accuracy 0.665±0.034 0.767±0.015 0.702±0.022 0.742±0.012
Methods mmLDA+SVM SLDA(Visual) mm-SLDA BMM-SLDA
Accuracy 0.724±0.011 0.312±0.069 0.722±0.009 0.835± 0.023
In our implementation, to make the classification accuracy unbiased by one testing data, we adopt 5 random splits of the data. For each experiment, we randomly select half as training data and the other half as testing data. Finally, we report the mean accuracy and its standard deviation as shown in Table III (“mean ± standard deviation”). Because the dataset is quite challenging, no methods can achieve 100% accuracy performance. SLDA (Text) is better than mmLDA (SG), which shows that the supervised information is useful. SLDA (Text) is better than SLDA (Visual), which shows the textual information is much more helpful than the visual information for social event classification. This can be explained that the images are very diverse. From the results, we can see that our mm-SLDA model can outperform SLDA. This is because SLDA only uses the supervised information. Different from these methods, our mm-SLDA can exploit the multi-modal property and the multi-class property jointly for social event modeling and boost the classification performance. In our experiment, even though the SVM method using traditional feature extraction shows much better performance than mmLDA+SVM, it is still worse than our proposed BMM-SLDA. It may be because our BMM-SLDA effectively exploits social event structure by document weight distribution with classification error to calculate the conditional probabilities, and can iteratively learn new topic model to correct the previously misclassified documents. 6
4
x 10
mm−SLDA SLDA(Text) BMM−SLDA
Runtime per iteration(seconds)
3.5 3 2.5 2 1.5 1 0.5 0
1
10
20
30
40
50
Number of topics
Fig. 9. Comparison of computational cost of different topic models, including mm-SLDA, SLDA(Text), and BMM-SLDA.
Compared with mm-SLDA, BMM-SLDA* and BMM-SLDA are much more effective and efficient on classification performance and time efficiency, respectively. In Fig.9, it shows the computational cost in each iteration process of different topic model methods on training dataset. We observe that our proposed BMM-SLDA model is much more efficient, which is because the training complexity per iteration is proportional to the number of documents. In our method, at each iteration of the boosting procedure, we only select a small part of documents from each category based on their weight distribution to train model while other topic models, such as SLDA(Text) and mmSLDA, train model using the entire training dataset. As a result, with the social event data growing in social media sites, the traditional topic model methods using all dataset to train model will consume unimaginable time. Different from these methods, our proposed BMM-SLDA model works on a ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
1:18
S. QIAN, T. ZHANG, C. XU, and M. Shamim Hossain
0.01 0.02 0.00 0.00 0.08
0.13 0.03 0.00 0.00 0.03
0.09 0.07 0.00 0.00 0.05
0.20 0.06 0.00 0.00 0.09
0.22 0.35 0.00 0.00 0.03
0.02 0.39 0.00 0.00 0.01
0.20 0.00 0.05 0.00 0.01 0.00 0.78 0.02 0.00 0.00 0.98 0.00 0.01 0.03 0.43
0.00 0.01 0.03 0.00 0.01
0.00 0.01 0.00 0.00 0.05
0.02 0.00 0.02 0.00 0.00
0.00 0.00 0.01 0.00 0.00
0.00 0.00 0.09 0.00 0.04
0.00 0.00 0.02 0.00 0.05
0.01 0.00 0.02 0.00 0.00
0.00 0.00 0.00 0.00 0.00
05 06 07 08 09
0.01 0.00 0.00 0.00 0.03
0.02 0.00 0.07 0.00 0.00
0.02 0.06 0.00 0.00 0.04
0.40 0.03 0.00 0.00 0.05
0.02 0.01 0.00 0.00 0.00
0.04 0.00 0.01 0.02 0.66 0.00 0.03 0.01 0.00 0.90 0.00 0.00 0.00 0.00 1.00 0.00 0.01 0.04 0.07 0.47
0.00 0.00 0.00 0.00 0.01
0.01 0.06 0.00 0.00 0.11
0.29 0.01 0.01 0.00 0.01
0.00 0.00 0.00 0.00 0.01
0.00 0.01 0.00 0.00 0.01
0.00 0.02 0.00 0.00 0.02
0.06 0.02 0.00 0.00 0.00
0.02 0.00 0.00 0.00 0.01
10 11 12 13
0.00 0.00 0.00 0.02
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.02
0.01 0.10 0.02 0.01
0.03 0.09 0.04 0.03
0.00 0.01 0.00 0.00
0.86 0.04 0.03 0.02
0.00 0.27 0.01 0.06
0.01 0.01 0.01 0.13 0.09 0.06 0.75 0.03 0.03 0.01 0.71 0.00
0.03 0.06 0.03 0.02
0.03 0.05 0.03 0.02
0.01 0.07 0.03 0.01
0.00 0.02 0.01 0.07
10 11 12 13
0.00 0.02 0.02 0.11
0.00 0.02 0.01 0.00
0.00 0.03 0.07 0.01
0.00 0.03 0.01 0.07
0.00 0.00 0.02 0.00
0.00 0.03 0.02 0.01
0.00 0.01 0.01 0.00
0.01 0.03 0.02 0.01
0.00 0.06 0.01 0.00
0.98 0.01 0.01 0.01
0.00 0.38 0.03 0.07
0.00 0.00 0.00 0.10 0.03 0.20 0.66 0.01 0.02 0.10 0.55 0.00
0.00 0.01 0.01 0.01
0.00 0.01 0.01 0.01
0.01 0.02 0.01 0.03
0.00 0.03 0.06 0.00
14 15 16 17 18
0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
0.02 0.03 0.01 0.02 0.05
0.02 0.01 0.02 0.03 0.06
0.06 0.00 0.00 0.01 0.02
0.01 0.01 0.01 0.01 0.04
0.00 0.00 0.08 0.00 0.01
0.05 0.03 0.01 0.03 0.13
0.02 0.01 0.01 0.01 0.08
0.73 0.04 0.02 0.02 0.01 0.01 0.87 0.01 0.01 0.00 0.01 0.01 0.79 0.06 0.00 0.02 0.05 0.03 0.78 0.01 0.04 0.06 0.05 0.07 0.38
14 15 16 17 18
0.04 0.00 0.00 0.00 0.01
0.01 0.01 0.00 0.01 0.02
0.04 0.03 0.00 0.02 0.02
0.01 0.00 0.00 0.00 0.01
0.00 0.00 0.00 0.00 0.00
0.00 0.01 0.00 0.00 0.02
0.01 0.04 0.00 0.01 0.00
0.00 0.00 0.00 0.00 0.01
0.01 0.01 0.00 0.00 0.03
0.00 0.01 0.00 0.00 0.00
0.00 0.00 0.18 0.00 0.01
0.02 0.00 0.00 0.00 0.03
02
03
04
05
06
07
08
09
10
11
12
13
01
02
03
04
05
0.04 0.05 0.01 0.00 0.08
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
18
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.54 0.00
0.01 0.00 0.02 0.03 0.00
0.02 0.00 0.00 0.00 0.00
0.02 0.01 0.01 0.01 0.01
0.01 0.00 0.03 0.01 0.00
0.00 0.00 0.04 0.00 0.01
0.00 0.00 0.00 0.00 0.00
0.01 0.00 0.02 0.00 0.00
0.04 0.00 0.00 0.01 0.00
0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.01
0.00 0.01 0.01 0.00 0.00
0.00 0.00 0.00 0.00 0.00
0.04 0.00 0.00 0.00 0.00
05 06 07 08 09
0.04 0.04 0.00 0.00 0.28
0.00 0.00 0.00 0.00 0.00
0.09 0.00 0.01 0.00 0.89 0.00 0.00 0.00 0.00 0.87 0.00 0.00 0.00 0.00 0.94 0.00
0.00 0.00 0.01 0.00 0.00 0.04 0.00 0.52 0.04 0.10 0.05 0.02 0.06 0.03 0.04 0.69 0.06 0.03 0.01 0.05 0.02 0.00 0.00 0.86 0.00 0.00 0.02 0.00
0.00 0.03 0.01
10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.95 0.01 0.00 0.01 0.01 0.00 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.63 0.05 0.06 0.13 0.02 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.90 0.02 0.02 0.01 13 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.01 0.87 0.01 0.02
0.00 0.02 0.00 0.01 0.04 0.02 0.00 0.01 0.01
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.03 0.02 0.89 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.92 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.03 0.13 0.03 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.02 0.02 0.04 0.01
0.00 0.02 0.01 0.00 0.02 0.01
0.03 0.02 0.00 0.90 0.02 0.13 0.00 0.00 0.01 0.07 0.00 0.00 0.01 0.05 0.00 0.01
0.01 0.01 0.03 0.84 0.01 0.01 0.00 0.02 0.01 0.47 0.00 0.00 0.04 0.00 0.00 0.01 0.00 0.02 0.05 0.01 0.02 0.01 0.04 0.03 0.01
0.00
0.01 0.02 0.01 0.03 0.01 0.01 0.94 0.00 0.00 0.04 0.75 0.01 0.04 0.01 0.78
14 15 16 17 18
0.65 0.01 0.00 0.00 0.02
0.00 0.00 0.00 0.00
0.01 0.01 0.01
0.73 0.04 0.00 0.00 0.95 0.00 0.00 0.03 0.82
18
17
05
04
03
18
(c) BMM-SLDA*. avg. accuracy: 74%.
17
16
15
14
13
12
11
10
09
08
07
06
05
04
03
0.00 0.03 0.03 0.00 0.00 0.00 0.42 0.02 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.07 0.04 0.00 0.00 0.00 0.01 0.06 0.00 0.00
02
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.08 0.03 0.00 0.00 0.08
0.00 0.00 0.00 0.00
15
0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05
0.01 0.01 0.00 0.00 0.01
0.00 0.00 0.00 0.00
14
0.06 0.05 0.07 0.08
13
0.00 0.02 0.02 0.04 0.49 0.06 0.14 0.20 0.00 0.84 0.07 0.01 0.00 0.01 0.86 0.01
12
0.86 0.04 0.01 0.04
11
01 02 03 04
10
0.00 0.00 0.00 0.00
09
0.00 0.00 0.00 0.00
08
0.00 0.00 0.00 0.00
07
0.00 0.00 0.00 0.00
06
0.00 0.00 0.00 0.00
02
0.00 0.00 0.00 0.00
01
14 15 16 17 18
0.00 0.00 0.00 0.00
(b) SLDA(Text). avg. accuracy: 69%.
16
10 0.00 0.00 0.00 0.00 0.00 0.00 11 0.00 0.00 0.00 0.00 0.00 0.00 12 0.00 0.00 0.00 0.00 0.00 0.00 13 0.02 0.00 0.00 0.00 0.00 0.00
17
0.02 0.00 0.00 0.00 0.72 0.00 0.01 0.00 0.00 0.96 0.01 0.00 0.00 0.00 1.00 0.00 0.10 0.00 0.04 0.36
0.00 0.00 0.00 0.00
0.02 0.01 0.00 0.01 0.79
16
0.45 0.00 0.00 0.00 0.04
0.00 0.00 0.00 0.00
0.01 0.02 0.00
15
0.09 0.09 0.00 0.00 0.08
0.00 0.00 0.00 0.00
0.00 0.00 0.01
0.78 0.01 0.00 0.05 0.01 0.00 0.86 0.00 0.00 0.01 0.00 0.00 0.82 0.00 0.00 0.00 0.01 0.00 0.95 0.00
14
0.03 0.08 0.00 0.00 0.10
0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.01
13
0.34 0.06 0.00 0.00 0.08
0.00 0.00 0.00 0.00
12
0.01 0.02 0.00 0.00 0.13
0.00 0.00 0.00 0.00
11
05 06 07 08 09
0.13 0.00 0.01 0.03
10
0.18 0.02 0.10 0.01 0.99 0.01 0.01 0.00 0.04 0.93 0.02 0.00 0.18 0.04 0.74 0.01
09
0.55 0.00 0.01 0.00
0.05 0.08 0.00 0.00 0.11
0.00 0.17 0.00 0.00 0.00 0.01
01
01 02 03 04
0.01 0.00 0.02 0.04 0.00 0.06 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.01 0.05 0.02 0.38 0.00 0.04 0.01 0.01
08
18
17
16
15
14
(a) mmLDA+SG. avg. accuracy: 66%.
07
05 06 07 08 09
0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
06
0.00 0.00 0.00
01 0.58 0.00 0.01 0.09 0.00 0.00 0.00 02 0.00 0.93 0.02 0.01 0.00 0.00 0.01 03 0.00 0.00 0.94 0.01 0.00 0.00 0.00 04 0.08 0.01 0.11 0.19 0.00 0.03 0.03
01
01 0.47 0.08 0.06 0.11 0.25 0.01 0.00 02 0.00 0.95 0.03 0.00 0.01 0.00 0.00 03 0.00 0.01 0.97 0.01 0.00 0.00 0.00 04 0.01 0.09 0.11 0.65 0.07 0.00 0.05
(d) BMM-SLDA. avg. accuracy: 81%.
Fig. 10. Comparison with four different methods by use of confusion matrices. The rows denote true label and the columns denote the estimated label.
small part of the data at each iteration and is much more efficient due to the boosting weighted sampling strategy. Therefore, our BMM-SLDA can work on a large-scale dataset. In our experiments, at each iteration of the boosting procedure, we only select 500 documents from each category based on their weight distribution. Moreover, the BMM-SLDA* and BMM-SLDA effectively model the document weight distribution with classification error, and iteratively learn new topic models to correct the previously misclassified social event documents. As a result, our BMM-SLDA* and BMM-SLDA consider the data structure and can boost the classification performance. In addition, BMM-SLDA not only considers the weight distribution of documents to train our mm-SLDA, but also adds the effective weight values of each document to calculate the conditional probabilities. Based on the results, we can confirm that it is important to iteratively learn multiple topic models and multiple weak classifiers with a small sampled subset data by considering the document weight distribution for large-scale social event modeling. Our task is a multi-class social event classification problem and the results are illustrated by the confusion matrices as in Fig.10. Fig.10(a) and Fig.10(b) are the classification results of 2 baseline methods, mmLDA+SG and SLDA(Text), on our event dataset, respectively. Fig.10(c) and Fig.10(d) present the results of our proposed methods, BMM-SLDA* and BMM-SLDA, on the dataset, respectively. From the experimental results, we can see that BMM-SLDA can reduce the error of SLDA(Text) [Blei and McAuliffe 2008] by at least 12% and achieve better results in 12 out of the 18 different social events. 6. CONCLUSIONS
In this paper, we propose a novel BMM-SLDA algorithm based on multi-modal supervised latent dirichlet allocation model to iteratively learn multiple topic model classifiers for event classification in social media. Our proposed model can exploit the multi-modal property and the multi-class ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
Social Event Classification via Boosted Multi-Modal Supervised Latent Dirichlet Allocation
1:19
property of social event jointly, and can predict the label of a new social event document directly. In addition, Our BMM-SLDA algorithm is suitable for large-scale data analysis by utilizing boosting weighted sampling strategy to iteratively select a small subset data to efficiently train the corresponding topic models and effectively exploits social event structure by document weight distribution with classification error, and can iteratively learn new topic model to correct the previously misclassified documents. We have conducted experiments on our collected dataset and extensive results have demonstrated that our model outperforms all other existing models. In the future, we will investigate more tasks under this framework, such as event detection and event evolution in social media. Also, we tend to explore whether the classification performance can be increased by considering other multi-modality information, such as videos and audios. REFERENCES R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. 2010. Slic superpixels. (2010). In Technical report, EPFL. James Allan, Ron Papka, and Victor Lavrenko. 1998. On-line New Event Detection and Tracking. In SIGIR. 37–45. Yang Bao, Nigel Collier, and Anindya Datta. 2013. A Partially Supervised Cross-collection Topic Model for Cross-domain Text Classification. In CIKM. 239–248. Hila Becker, Mor Naaman, and Luis Gravano. 2009. Event Identification in Social Media. In WebDB. Hila Becker, Mor Naaman, and Luis Gravano. 2010. Learning Similarity Metrics for Event Identification in Social Media. In WSDM. 291–300. David Blei and Jon McAuliffe. 2008. Supervised Topic Models. In NIPS. 77–84. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. JMLR 3 (March 2003), 993–1022. Chien Chin Chen, Meng Chang Chen, and Ming-Syan Chen. 2009. An Adaptive Threshold Framework for Event Detection Using HMM-based Life Profiles. ACM Trans. Inf. Syst. 27, 2 (2009), 9:1–9:35. Hai Leong Chieu and Yoong Keok Lee. 2004. Query Based Event Extraction Along a Timeline. In SIGIR. 425–432. Nicholas Diakopoulos, Mor Naaman, and Funda Kivran-Swaine. 2010. Diamonds in the rough: Social media visual analytics for journalistic inquiry. In IEEE VAST. 115–122. Claudiu S. Firan, Mihai Georgescu, Wolfgang Nejdl, and Raluca Paiu. 2010. Bringing order to your photos: event-driven classification of flickr images based on social knowledge. In CIKM. 189–198. Haidong Gao, Siliang Tang, Yin Zhang, Dapeng Jiang, Fei Wu, and Yueting Zhuang. 2012. Supervised Cross-collection Topic Modeling. In MM. 957–960. T. L. Griffiths and M. Steyvers. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences 101 (2004), 5228–5235. Thomas Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57. Ravi Kumar, Uma Mahadevan, and D. Sivakumar. 2004. A graph-theoretic approach to extract storylines from search results. In KDD. 216–225. Giridhar Kumaran and James Allan. 2004. Text Classification and Named Entities for New Event Detection. In SIGIR. 297–304. Simon Lacoste-Julien, Fei Sha, and Michael I. Jordan. 2008. DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification. In NIPS. 897–904. Chenliang Li, Aixin Sun, and Anwitaman Datta. 2012. Twevent: Segment-based Event Detection from Tweets. In CIKM. 155–164. Chih-Jen Lin, Ruby C. Weng, and S. Sathiya Keerthi. 2008. Trust Region Newton Method for Logistic Regression. JMLR 9 (2008), 627–650. Fu-ren Lin and Chia-Hao Liang. 2008. Storyline-based Summarization for News Topic Retrospection. Decis. Support Syst. 45 (2008), 473–490. L. Liu, L. Wang, and X. Liu. 2011. In defense of softassignment coding. In ICCV. 2486–2493. Xueliang Liu and Benoit Huet. 2013. Heterogeneous Features and Model Selection for Event-based Media Classification. In ICMR. 151–158. Juha Makkonen, Helena Ahonen-Myka, and Marko Salmenkivi. 2004. Simple Semantics in Topic Detection and Tracking. Inf. Retr. 7, 3-4 (2004), 347–368. Andrew J. McMinn, Yashar Moshfeghi, and Joemon M. Jose. 2013. Building a Large-scale Corpus for Evaluating Event Detection on Twitter. In CIKM. 409–418. Zhenxing Niu, Gang Hua, Xinbo Gao, and Qi Tian. 2011. Spatial-DiscLDA for Visual Recognition. In CVPR. 1769–1776.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.
1:20
S. QIAN, T. ZHANG, C. XU, and M. Shamim Hossain
Paul Over, George Awad, Martial Michel, Jonathan Fiscus, Greg Sanders, Wessel Kraaij, Alan F. Smeaton, and Georges Quenot. 2013. TRECVID 2013 – An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics. In Proceedings of TRECVID 2013. NIST, USA. Dhaval Patel, Wynne Hsu, and Mong Li Lee. 2008. Mining Relationships Among Interval-based Events for Classification. In SIGMOD. ACM, 393–404. Xuan-Hieu Phan, Le-Minh Nguyen, and Susumu Horiguchi. 2008. Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-scale Data Collections. In WWW. ACM, 91–100. D. Putthividhy, H.T. Attias, and S.S. Nagarajan. 2010. Topic regression multi-modal Latent Dirichlet Allocation for image annotation. In CVPR. 3408–3415. Shengsheng Qian, Tianzhu Zhang, and Changsheng Xu. 2014a. Boosted Multi-Modal Supervised Latent Dirichlet Allocation for Social Event Classification. In ICPR. Shengsheng Qian, Tianzhu Zhang, and Changsheng Xu. 2014b. Multi-Modal Supervised Latent Dirichlet Allocation for Event Classification in Social Media. In ICIMCS. 152–157. Kira Radinsky and Eric Horvitz. 2013. Mining the Web to Predict Future Events. In WSDM. 255–264. Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D. Manning. 2009a. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In EMNLP. 248–256. Daniel Ramage, Paul Heymann, Christopher D. Manning, and Hector Garcia-Molina. 2009b. Clustering the Tagged Web. In WSDM. ACM, 54–63. Timo Reuter and Philipp Cimiano. 2012. Event-based classification of social media streams. In ICMR. 22:1–22:8. Timo Reuter, Symeon Papadopoulos, Georgios Petkos, Vasileios Mezaris, Yiannis Kompatsiaris, Philipp Cimiano, Christopher M. De Vries, and Shlomo Geva. 2013. Social Event Detection at MediaEval 2013: Challenges, Datasets, and Evaluation. In MediaEval. Jitao Sang and Changsheng Xu. 2012. Right Buddy Makes the Difference: An Early Exploration of Social Relation Analysis in Multimedia Applications. In ACM MM. 19–28. Yoshihiko Suhara, Hiroyuki Toda, and Akito Sakurai. 2008. Extracting Related Named Entities from Blogosphere for Event Mining. In ICUIMC. ACM, 225–229. Chong Wang, David Blei, and Fei-Fei Li. 2009. Simultaneous image classification and annotation. In CVPR. 823–836. Xuerui Wang, Natasha Mohanty, and Andrew McCallum. 2005. Group and Topic Discovery from Relations and Text. In Proceedings of the 3rd International Workshop on Link Discovery. 28–35. Andrew T. Wilson and Peter A. Chew. 2010. Term Weighting Schemes for Latent Dirichlet Allocation. In HLT. 465–473. Xiao Wu, Chong-Wah Ngo, and Alexander G. Hauptmann. 2008. Multimodal News Story Clustering With Pairwise Visual Near-Duplicate Constraint. IEEE Transactions on Multimedia 10, 2 (2008), 188–199. T Zhang, B Ghanem, S Liu, C Xu, and N Ahuja. 2013. Low-Rank Sparse Coding for Image Classification. In ICCV. 281–288. Tianzhu Zhang, Jing Liu, Si Liu, Yi Ouyang, and Hanqing Lu. 2009. Boosted exemplar learning for human action recognition. In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on. IEEE, 538–545. Tianzhu Zhang, Jing Liu, Si Liu, Changsheng Xu, and Hanqing Lu. 2011. Boosted exemplar learning for action recognition and annotation. Circuits and Systems for Video Technology, IEEE Transactions on 21, 7 (2011), 853–866. Tianzhu Zhang and Changsheng Xu. 2014. Cross-Domain Multi-Event Tracking via CO-PMHT. ACM Trans. Multimedia Comput. Commun. Appl. 10, 4 (2014), 31:1–31:19. Tianzhu Zhang, Changsheng Xu, Guangyu Zhu, Si Liu, and Hanqing Lu. 2012. A Generic Framework for Video Annotation via Semi-supervised Learning. IEEE Transactions on Multimedia 14, 4 (2012), 1206–1219. Qiankun Zhao, Tie-Yan Liu, Sourav S. Bhowmick, and Wei-Ying Ma. 2006. Event Detection from Evolution of Clickthrough Data. In KDD. 484–493. Ji Zhu, Hui Zou, Saharon Rosset, and Trevor Hastie. 2009. Multi-class AdaBoost. (2009).
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 99, No. 1, Article 1, Publication date: January 2014.