Focus Article
Sentiment–topic modeling in text mining Chenghua Lin,* Ebuka Ibeke, Adam Wyner and Frank Guerin In recent years, there has been a rapid growth of research interest in natural language processing that seeks to better understand sentiment or opinion expressed in text. There are several notable issues in most previous work in sentiment analysis, among them: the trained classifiers are domain-dependent; the labeled corpora required for training can be difficult to acquire from real-world text; and dependencies between sentiments and topics are not taken into consideration. In response to these limitations, a new family of probabilistic topic models, namely joint sentiment–topic models, have been developed, which are capable of detecting sentiment in connection with topic from text without using any labeled data for training. In addition, the sentiment-bearing topics extracted by the joint sentiment–topic models provide means for automatically discovering and summarizing opinions from a vast amount of user-generated data. © 2015 John Wiley & Sons, Ltd. How to cite this article:
WIREs Data Mining Knowl Discov 2015, 5:246–254. doi: 10.1002/widm.1161
INTRODUCTION
I
n recent years, there has been a rapid growth of research interest in natural language processing that seeks to better understand sentiment or opinion expressed in text. One reason is that with the rise of various types of social media, communicating on the web has become increasingly popular, where millions of people broadcast their thoughts and opinions on a great variety of topics such as feedback on products and services, opinions on political development and events, and information sharing about global disasters. Therefore, new computational tools are needed to help organize, summarize, and understand this vast amount of information. Additionally, the discovery of opinions reflecting people's attitudes toward various topics enables many useful applications, which is another motivation of sentiment analysis. There are several notable issues in most previous work in sentiment analysis, among them: the trained classifiers are domain-dependent; the labeled corpora required for training can be difficult to acquire from *Correspondence to:
[email protected] Department of Computing Science, University of Aberdeen, Aberdeen, UK Conflict of interest: The authors have declared no conflicts of interest for this article.
246
real-world text1; and dependencies between sentiments and topics are not taken into consideration. For example, when a supervised Support Vector Machine classifier trained on movie review data is tested on book reviews, more than a 20% accuracy loss was observed when compared to in-domain testing.2 Such phenomena reflect the fact that sentiments are context and topic dependent; when appearing under different topics within Electronics review data, the adjective ‘fast’ may have negative sentiment orientation with ‘battery’ as in ‘battery drains fast’ and positive sentiment orientation with ‘CPU’ as in ‘fast CPU’. These issues suggest that modeling sentiment and topic simultaneously may help find better feature representations for sentiment classification. From the application perspective, although it is useful to detect the overall sentiment orientation of a document,3–6 it is just as useful, and perhaps even more interesting, to understand the underlying topics and the associated sentiments about topics of a document. Consider the Amazon Kindle cover reviews shown in Figure 1 as an example. This Kindle cover receives a very high average rating from a total of 529 reviews. However, the review bar chart shows that there are still quite a lot of customers who only gave a four- or threestar rating to the product. When making a purchase, despite the overall rating, it would be very helpful to know the pros and cons of the product being discussed
© 2015 John Wiley & Sons, Ltd
Volume 5, September/October 2015
WIREs Data Mining and Knowledge Discovery
Sentiment–topic modeling
Customer Reviews Amazon kindle Keyboard Leather Cover, Black 855 Reviews 5 star: 4 star: 3 star: 2 star: 1 star:
Average Customer Review (855 customer reviews)
(594) (167) (47) (22) (25)
Share your thoughts with other customers Create your own review
Review 1 Title: Lovely quality (4-star) By Technophobe, 19 April 2011
Beautiful piece of kit, protects my beloved kindel from knocks, scratches etc and looks very good at the same time. The locking mechanism that secures the kindle into the cover is very clever and looks to be safe and secure. The leather is good quality, and I love the bright apple green - very chic and smart. Only reason its not 5 stars is it wasn’t exactly cheap - bring the price down a few pounds and it would perhaps represent better value for money and earn it 5 starts. No regrets about buying it though, does what its meant to and looks good at the same time!
Review 2 Title: Very good except the price (4-star) By Val, 20 June 2011
The cover is very good, clips onto the Kindle easily and great protection when being transported. It makes holding and reading the Kindle so much more natural, like reading a book. I do however think the price is too high, although good quality there isn’t an enormous amount of leather used. FI GU RE 1 | Amazon Kindle cover reviews. Text highlighted in green and red indicate the pros and cons, respectively, on particular topics about the product.
in the reviews. An inspection of those 200 four- and three-star reviews reveals that actually, many people think the cover design and quality are very good, but it is just overpriced. Having obtained such information, the Amazon Kindle cover would still be the best buy for customers with large budgets, while others may choose a less expensive alternative. Nevertheless, people can still easily be overwhelmed by the quantity and variety of the available data, such as the example given above. In response to the research challenges identified above, a new family of probabilistic topic models, namely sentiment–topic models have been proposed, which address the above shortcomings of the current sentiment analysis approaches by modeling sentiment in conjunction with topics from text data.7–13 These sentiment–topic models share some similar visions as
Volume 5, September/October 2015
follows: (1) they are based on unsupervised or weakly supervised learning, which do not require labeled data for training and can be easily transferred between domains; (2) models can account for the topic or domain dependencies in sentiment classification; and (3) they facilitate automatic extraction of opinions with respect to topics from text, thus providing more informative sentiment–topic mining results to users than a general sentiment polarity or star rating. The remainder of the paper is structured as follows: Joint Sentiment–Topic Modeling section reviews some representative sentiment–topic models and the different perspectives of those models in modeling sentiment and topic components; Incorporating Prior Information into Sentiment–Topic Models section introduces the strategies of incorporating prior
© 2015 John Wiley & Sons, Ltd
247
Focus Article
wires.wiley.com/widm
information into sentiment–topic models; and we conclude the paper in Conclusion section.
JOINT SENTIMENT–TOPIC MODELING Capturing the interactions between topics and sentiments plays an important role in sentiment analysis as sentiment is often expressed through words whose polarity is highly domain and context-dependent.14 When building sentiment–topic models, there are different perspectives in how to model the sentiment and topic components as follows: (1) some researchers model sentiment and topic as a mixture distribution so that the topics being modeled are essentially sentimentbearing topics7–9; (2) some consider the generative process of topic and topic-specific opinions separately, such that each topic-word distribution will have a corresponding sentiment-word distribution (i.e., one-toone mapping)10,11; and (3) others model contrastive opinions by extracting multiple perspectives on opinions with respect to the same topic, where each topic-word distribution has several associated sentiment-word distributions reflecting different opinion perspectives (i.e., one-to-many mapping).12,13
Modeling Sentiment and Topic as a Mixture Distribution One of the early works in modeling sentiment and topic as mixture distribution is the Joint Sentiment–Topic (JST) model,7,8 which can perform document-level sentiment classification and extract sentiment-bearing topics with weakly supervised learning. The original framework of latent Dirichlet allocation (LDA)15,16 has three hierarchical layers, where topics are associated with documents and words are associated with topics. In order to model document sentiments, JST extends LDA by constructing an additional sentiment layer between the document and the topic layers. Hence, JST is effectively a four-layer model, where sentiment labels are associated with documents, under which topics are associated with sentiment labels and words are associated with both sentiment and topic labels. A graphical model of JST and its generative process is illustrated in Figure 2. The procedure for generating a word wi in JST has three stages (see Figure 3 for a complete summary of the parameters used in JST). First, one chooses a sentiment label l from the per-document sentiment proportion π d. Following that, one chooses a topic z from the topic proportion θd,l, where θd,l is conditioned on the sampled sentiment label l. It is worth noting that the topic proportion of JST is different from that of 248
LDA. In LDA, there is only one topic proportion θd for each document d. In contrast, in JST each document is associated with S (the number of sentiment labels) topic proportions, each of which corresponds to a sentiment label l with the same topic number T, i.e., S D Θ = θ d, l . This feature essentially prol=1 d=1 vides the means for the JST model to predict the sentiment associated with the extracted topics. Finally, one draws a word from the per-corpus word distribution φz,l which is conditioned on both topic z and sentiment label l. This is again different from LDA, such that in LDA a word is sampled from the word distribution conditioned only on topics. The first column in Figure 4 shows an example topic extracted by JST from the Amazon electronic reviews. The sentiment-bearing topic contains both topical words such as ‘mouse’ and ‘wheel’, and sentiment words such as ‘comfort’ and ‘smooth’, based on which one can interpret that there is a very positive sentiment topic about Logitech mouse. The Aspect and Sentiment Unification Model (ASUM)9 is very similar to JST, for it detects sentiment and topics simultaneously by modeling each document with a sentiment distribution and a set of sentimentspecific topic proportions. The main difference is that while JST allows the words of a document to be sampled from different word distributions, ASUM constrains the model such that the words from the same sentence must be sampled from the same word distribution. While this assumption is reasonable for long reviews with a decent number of sentences, it might not be suitable for short reviews with few sentences as the constraint will be too strong and causes bias, resulting in degraded performance in both aspect extraction and sentiment classification.17
Modeling Sentiment and Topic Components Separately In contrast to modeling both sentiment and topic as a mixed component, an alternative treatment is to model each of them as a separate component, i.e., topic words and topic-specific opinion words are clustered under different distributions. Under this setting, the learned topic distribution by a sentiment–topic model is similar to that learned by a standard topic model, but an additional opinion word distribution is also extracted, which expresses the sentiment toward the associated topic. For instance, Brody and Elhadad10 proposed Local LDA to extract aspect topics and detect aspectspecific opinion words in an unsupervised manner. They took a two-stage approach by first extracting
© 2015 John Wiley & Sons, Ltd
Volume 5, September/October 2015
WIREs Data Mining and Knowledge Discovery
Sentiment–topic modeling
z
'
w S·T
d
l0
s
l
Nd
t0
D
w0
l1
l2
t1
w1
tK
wt-1
wt
FI G URE 2 | (a) Graphical model of JST and (b) JST generative process.
F I G U R E 3 | Parameter notations of the JST model.
F I G U R E 4 | Topic examples extracted by sentiment–topic models. Volume 5, September/October 2015
© 2015 John Wiley & Sons, Ltd
249
Focus Article
wires.wiley.com/widm
local aspect words using the LDA model by treating each sentence as a document; afterward, aspect-specific opinion words were identified by propagating the polarity scores of adjectives and building the conjunction graph. The MaxEnt–LDA hybrid model11 also models aspects and opinions; but instead of modeling aspect and sentiment in a cascaded procedure as Brody and Elhadad,10 they modeled both simultaneously by incorporating a supervised maximum entropy model into an unsupervised topic model. By assuming that aspect words and opinion words play different syntactic roles in a sentence, they trained a MaxEnt component with lexical and POS features to distinguish between aspect and opinion words, which subsequently allows a word to be generated in different ways, i.e., from a background model, general/specific aspect models, or general/specific opinion models. In order to extract aspect topics, both studies make the same assumption that each sentence is associated with a single aspect. The second column in Figure 4 shows an example topic extracted by MaxEnt–LDA, where the topic component is related to deserts and the sentiment component contains positive sentiment words (i.e., adjective and adverbs) about the topic. Local LDA and MaxEnt–LDA were designed for aspect and aspect-specific opinion word extraction by treating each sentence as a document (Local LDA) or assuming that all the word tokens in a sentence have the same aspect assignments (MaxEnt–LDA). In addition, a much more complex model structure was introduced in MaxEnt–LDA in order to detect opinion words alongside aspect words. In contrast, JST is more suitable for the unsupervised sentiment classification task as it can better capture the dependences between sentiment and topics words in a richer context (i.e., at the document level).
Contrastive Opinion Mining People's behaviors such as values, beliefs, and preferences are influenced by many factors like one's education and cultural background. Such a variation is reflected by the reality where different individuals or groups may have different perspectives on the same event/object. For instance, republicans and democrats hold contrastive views on the U.S. health insurance reform; and parents may have completely different opinions on children playing computer games. However, the aforementioned models (i.e., JST, ASUM, Local LDA, and MaxEnt–LDA) cannot detect contrastive opinion directly, due to the lack of a mechanism to establish a correlation between topics. For instance, there are the same number of (sentiment-bearing) topics under each sentiment label in JST, but there is 250
no guarantee that a topic with index i under the positive sentiment label will express the same thematic information as topic i under the negative sentiment label. Motivated by the aforementioned observations, there are research attempts to mine contrastive opinions from opinionated documents, which represent the views of multiple individuals/groups toward the same topic. Extending the LDA, the Cross-Collection LDA (ccLDA) model18 can extract what is common to all the sources and what is unique to one specific source. However, it does not consider the sentiments of the drawn topics. Fang et al.12 proposed the crossperspective topic (CPT) model for mining contrastive opinions on political texts. In the CPT model, opinion and topic terms are generated separately based on the assumption that topics are expressed through nouns, and opinions through adjectives, verbs, and adverbs. They also assumed that topics are shared among documents of contrasting perspectives; therefore, topic words are drawn from a shared word distribution, whereas opinion words are drawn from a topic– opinion distribution conditioned on the perspectives. Similarly, Elahi and Monachesi13 examined cross-cultural similarities and differences from social media data with respect to language use. In their study, they used LDA to analyze topics from two different cultures on how people express emotions during romantic discussions on social media. The last column in Figure 4 shows two sets of topics extracted by CPT, with each set containing one topical topic and two opinion topics expressing different viewpoints on the topical topic. It can be seen from the example that the New York Times and Xinhua News hold opposite views on the events that Liu Xiaobo was awarded the Nobel peace prize and the Iran Uranium programme. Although contrastive opinion models such as CPT provide frameworks for mining and understanding perspectives across different groups or sources, most of these models require that data containing different opinions needs to be separated in different collections beforehand. However, this requirement might not be practical in real-world applications, especially when facing the task of detecting contrastive opinions/perspectives from streaming data such as social media.
INCORPORATING PRIOR INFORMATION INTO SENTIMENT– TOPIC MODELS While standard topic models can discover topic structures of textual corpora in a fully unsupervised manner, sentiment–topic models are likely to fail in performing tasks such as sentiment classification and sentiment-
© 2015 John Wiley & Sons, Ltd
Volume 5, September/October 2015
WIREs Data Mining and Knowledge Discovery
Sentiment–topic modeling
bearing topic extraction in an unsupervised setting. This is due the fact that sentiment–topic models have to refine the conception of word co-occurrence in order to identify what sentiment is about a topic, which cannot be dealt within the standard topic model. As a consequence, most of the sentiment–topic models incorporate some prior sentiment information (e.g., domain-independent sentiment lexicons) for learning sentiment-bearing topics. Similar to other topic models, directly estimating the complex marginal distribution of sentiment–topic models is intractable due to the coupling of the latent variables, where the mainstream approaches for approximating posteriors are Gibbs sampling and Variational Bayesian (VB) techniques.19,20 Depending on the approximation technique used, sentiment–topic models would require different strategies for incorporating prior sentiment information, which will be discussed in detail in the following sections using the JST model as an example. In addition, for better comprehension of the JST model, the model source code and an example dataset are made available online (https://github.com/linron84/ JST) with detailed instructions of how to run the model.
In contrast to the traditional topic-based classification, a fully unsupervised sentiment model will not be able to identify which features are relevant for polarity classification in the absence of annotated data.1 One way to tackle this problem is to incorporate a small set of domain-independent sentiment-bearing words as prior knowledge (e.g., ‘happy’ and ‘sad’) for the sentiment–topic model learning. Considering the JST model as an example, a dependency link of φl,z was constructed on the transformation matrix V S . The matrix λ modifies the λ = λl , i i=1 l=1 Dirichlet prior β, so that the word prior sentiment polarity can be captured. The complete procedure for incorporating prior knowledge into the JST model is depicted in Figure 5. First, λ is initialized with all the elements taking a value of 1. Then for each term w ϵ {1, …., V} in the corpus vocabulary and for each sentiment label l ϵ {1, …., S}, if w is found in the sentiment lexicon, the element λl,w is updated as follows:
Incorporating Prior Information with Gibbs Sampling
where the function S(w) returns the prior sentiment label of w in a sentiment lexicon, i.e., neutral, positive or negative. For example, the word ‘excellent’ with index i in the vocabulary has a positive sentiment polarity. The corresponding row vector in λ is [0.05, 0.9, 0.05] with its elements representing neutral, positive, and negative prior polarity. For each topic z ϵ {1, …., T}, multiplying λl,i with βl,z,i (i.e., an element-wise multiplication), we can ensure that the word ‘excellent’ has much higher probability of being drawn from the positive topic-word distributions than the negative and neutral topic-word distributions.
One of the general methods to estimate complex distribution is Markov Chain Monte Carlo (MCMC), which can emulate high-dimensional probability distributions by the stationary distribution of a Markov chain. As a special case of MCMC, Gibbs sampling works by sampling the dimensions of a distribution sequentially and one at a time, conditioned on the values of all other variables and data. The posterior of interest can then be obtained from the Markov chain after it reaches the stationary state.8,21–23
λl , w =
i matrix
0.05
0.9
0.05
0.05
0.9
matrix
0.05 0.05 S
S
0.9 0.05 T
0.05
ð1Þ
excellent
excellent bad
neutral positive negative
0:9 if SðwÞ = l 0:05 otherwise
V
V
F I G U R E 5 | Encode sentiment prior by modifying the Dirichlet prior with the transformation matrix. Volume 5, September/October 2015
© 2015 John Wiley & Sons, Ltd
251
Focus Article
wires.wiley.com/widm
Incorporating Prior Information for Variational Bayes VB inference is another useful tool to deal with complex distributions, which optimizes a simplified tractable parametric distribution to be close in Kullback–Leibler (KL) divergence to the intractable posterior distribution.24–26 Therefore, incorporating prior information from sentiment lexicons for a sentiment–topic model using VB as approximation techniques will require different strategies compared to that based on Gibbs sampling. Considering the JST model as an example when using VB inference, the learning goal of JST is to maximize the objective function (Eq. (2)) as follows: OðDjΩÞ = logPðDjΩÞ;
ð2Þ
where D is the text corpora and Ω = {α, β, γ} is the model's hyperparameters. In order to incorporate the prior information, we can modify the original objective function with a criterion term, which expresses their preferences on expectations of sentiment labels on the lexical words. So given some labeled word features (i.e., sentiment lexicon) with their prior sentiment orientation, one can construct a set of real-valued features of the observation to expresses some empirical distribution that the training data should also follow: Nd M X X δ sd , t = j δ w d , t = k fjk ðw, sÞ =
ð3Þ
where δ(x) is an indicator function that takes a value of 1 if x is true and 0 otherwise. This equation essentially calculates how often feature k and sentiment label j cooccur in the corpus. We can then further define a criterion that minimizes the KL divergence of the expected feature distribution and a target expectation f27: ð4Þ
where function f encodes the prior sentiment information, EΩ[f(w, s)] is the expectation of the features and the criterion essentially penalizes the divergence of a specific model expectation from a target value. Finally, we can modify the original objective function by augmenting the generalized expectation criteria term (Eq. (4)): OðDjΩÞ = logPðDjΩÞ− λCðEΩ ½f ðw, sÞÞ
ð5Þ
where λ is a penalized parameter that controls the relative influence of the prior knowledge on the overall objective function. In this way, the prior sentiment
252
Comparison of Gibbs Sampling and Variational Bayes By constructing the Markov chain using Gibbs sampling, the resulting posterior estimator is unbiased for the topic assignment as the assignment is the average of the samples drawn by the MCMC with a probability that is proportional to the posterior probability. One drawback of Gibbs sampling is that it is difficult to find out how many samplings are sufficient for the Markov chain to reach its stationary status. The estimator based on the VB method, however, is biased as it tries to maximize a tractable lower bound for the intractable marginal distribution, and thus may converge to local maximum. However, compared to Gibbs sampling, the convergence of VB can be determined efficiently under the expectation maximization (EM) framework. In practice, when incorporating sentiment prior using Gibbs sampling and VB inference, it was reported that these two methods give comparable performance in sentiment classification, but VB converges faster than Gibbs sampling in general.26
CONCLUSION
d=1t=1
CðEΩ ½f ðw, sÞÞ = KLðf jjEΩ ½f ðw, sÞÞ
information can be incorporated into model learning and the learning goal becomes optimizing the new objective function in Eq. (5).
Recent surveys have revealed that opinion-rich resources such as online reviews and social networks have greater social and economic impacts on both consumers and industries.23 Driven by the demand for gleaning insights into such vast amount of usergenerated data, developing new algorithms for automated sentiment analysis have attracted a dramatic surge of research interest in the past few years. Despite the recent successes, there are still many research challenges in the field of sentiment analysis that remain unsolved. Some noticeable issues are that the supervised classifiers are domain-dependent and that dependencies between sentiments and topics are not taken into consideration, i.e., under different topical contexts the same word can convey completely opposite sentiment meaning. This paper reviews the work in statistical sentiment–topic modeling,7–13 which addresses the above shortcoming of the current approaches to sentiment analysis by modeling sentiment in conjunction with topics from text data. In addition, we discussed how to systematically incorporate sentiment prior information from sentiment lexicons for model learning, which is an important aspect for implementing a sentiment–topic model. Although there are different viewpoints in designing a sentiment–topic
© 2015 John Wiley & Sons, Ltd
Volume 5, September/October 2015
WIREs Data Mining and Knowledge Discovery
Sentiment–topic modeling
model (i.e., some model sentiment and topic in a compound distribution whereas others model both in separate distributions), they all account for the dependencies between sentiment and topics, which may not only help find better feature representations for sentiment classification, but also can provide more informative information based on the extracted sentiment-bearing topics. One common issue of the JST models discussed in the paper is that they all assume a symmetric structure in the model hierarchy, in which the same number of topics is generated under each sentiment label. Such a
model design may be suboptimal for the situation where data exhibit different topic distributions under different sentiment dimensions. A possible way to alleviate this problem is to consider asymmetric topic structure when modeling the hidden thematic structures of data. Another promising direction for future work is to incorporate domain knowledge from external sources (e.g., DBpedia) into the model-learning process, which may help improving the coherence of the extracted topics as well as enabling automatic topic labeling.
REFERENCES 1. Pang B, Lee L. Opinion mining and sentiment analysis. Found Trends Inf Retr 2008, 2:1–135.
Association for Computational Linguistics (NAACL), Los Angeles, USA, 2010, 804–812.
2. Aue A, Gamon M. Customizing sentiment classifiers to new domains: a case study. In: Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria, 2005.
11. Zhao WX, Jiang J, Yan H, Li X. Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Massachusetts, USA, 2010, 56–65.
3. Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, USA, 2002, 79–86. 4. Pang B, Lee L. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the Annual Meeting on Association for Computational Linguistics (ACL), Barcelona, Spain, 2004, 271. 5. Whitelaw C, Garg N, Argamon S. Using appraisal groups for sentiment analysis. In: Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), Bremen, Germany, 2005, 625–631. 6. Kennedy A, Inkpen D. Sentiment classification of movie reviews using contextual valence shifters. Comput Intell 2006, 22:110–125. 7. Lin C, He Y. Joint sentiment/topic model for sentiment analysis. In: Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), Hong Kong, China, 2009. 8. Lin C, He Y, Everson R, Rueger S. Weakly-supervised joint sentiment–topic detection from text. In: IEEE Transactions on Knowledge and Data Engineering (TKDE), 2011. 9. Jo Y, Oh AH. Aspect and sentiment unification model for online review analysis. In: Proceedings of the International Conference on Web Search and Data Mining, Hong Kong, China, 2011, 815–824. 10. Brody S, Elhadad N. An unsupervised aspect-sentiment model for online reviews. In: Proceedings of the Annual Conference of the North American Chapter of the
Volume 5, September/October 2015
12. Fang Y, Si L, Somasundaram N, Zhengtao Y. Mining contrastive opinions on political texts using cross-perspective topic model. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM), Seattle, USA, 2012, 63–72. 13. Elahi MF, Monachesi P. An examination of crosscultural similarities and differences from social media data with respect to language use. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), Istanbul, Turkey, 2012, 4080–4086. 14. Eguchi K, Lavrenko V. Sentiment retrieval using generative models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Sydney, Australia, 2006, 345–354. 15. Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res 2003, 3:993–1022. 16. Blei DM. Probabilistic topic models. Commun ACM 2012, 55:77–84. 17. Li C, Zhang J, Sun JT, Chen Z. Sentiment topic model with decomposed prior. In: SIAM International Conference on Data Mining, Texas, USA, 2013. 18. Michael P, Girju R. Cross-cultural analysis of blogs and forums with mixed-collection topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Edinburgh, UK, 2009. 19. Hoffman MD, Blei DM, Bach F. Online learning for latent Dirichlet allocation. Adv Neural Inf Process Syst 2010, 23:856–864. 20. Bishop CM. Pattern Recognition and Machine Learning, vol. 4. New York: Springer; 2006.
© 2015 John Wiley & Sons, Ltd
253
Focus Article
wires.wiley.com/widm
21. Griffiths TL, Steyvers M. Finding scientific topics. Proc Acad Natl Sci USA 2004, 101:5228–5235. 22. Heinrich G. Parameter estimation for text analysis. 2005. Available at: http://www.arbylon.net/publications/textest. (Access August 05, 2015). 23. Griffiths TL, Steyvers M, Blei DM, Tenenbaum JB. Integrating topics and syntax. Adv Neural Inf Process Syst 2005, 17:537–544. 24. Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK. An introduction to variational methods for graphical models. Mach Learn 1999, 37:183–233.
254
25. Minka T. Estimating a Dirichlet distribution. Technical Report, MIT, 2003. 26. He Y. Incorporating sentiment prior knowledge for weakly supervised sentiment analysis. ACM Trans Asian Lang Inf Process 2012, 11:4:1–4:19. 27. Druck G, Mann G, McCallum A. Learning from labeled features using generalized expectation criteria. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008, Singapore, 595–602.
© 2015 John Wiley & Sons, Ltd
Volume 5, September/October 2015