to provide explanation of a particular classification. In particular, documents from two domains, re- views of beer, and reviews of restaurants are investi- gated.
Multi-attribute Sentiment Classification using Topics Jeremy Fletcher and Jon Patrick Sydney Language Technology Research Group School of Information Technologies The University of Sydney, Australia {jeremy, jonpat}@it.usyd.edu.au
Abstract
1
Rich, human-understandable models of sentiment are requisite of any application of sentiment analysis beyond the purely computational. Recent advances in sentiment analysis have seen numerous corpora being developed which allow for richer modelling of opinion phenomena in text. This paper describes two new corpora designed to provide annotation of “high-level” sentiment attributes for reviews in two domains. Methods for distinguishing the sentiment of these attributes using Latent Dirichlet Allocation (LDA) are investigated, and the results presented not only show an increase in accuracy over a bag-of-words (BOW) baseline, but also provide significant explanatory power, which would aid applications of sentiment analysis such as eLearning.
In particular, documents from two domains, reviews of beer, and reviews of restaurants are investigated. These domains were chosen due to the large amount of data available on the world-wide web rated for different “attributes”. The process involves automatically classifying these reviews on a positive/negative sentiment split, with regards to each of these attributes; for example, the rating a restaurant receives for its service, food or value for money. In section 2, similar work is reviewed in the area of web-based reviews, and related work on explanatory sentiment analysis. Section 3 provides motivation for this work, and introduces the model of Latent Dirichlet Allocation (LDA), which is used to identify “topics” in a document. Section 4 presents the experimental model. In section 5, the results of these experiments are presented, and important features are discussed. Section 6 provides discussion with reference to future work, and overall conclusions.
Introduction
2
Sentiment and subjectivity analysis have been given increasing attention in recent years, with researchers becoming of the opinion that specialised methods for dealing with these phenomena are required. Events such as the Coling/ACL Workshop on Sentiment and Subjectivity in Text are testament to this fact. This paper focuses attention on looking at document-level annotations; that is, annotations of sentiment which apply to an entire review, but relate to only one particular facet of the evaluation. These annotations would be useful for a system which has to provide explanation of a particular classification.
Previous Work
There has been significant work on fine-grained sentiment analysis, including work utilizing the MultiPerspective Question Answering (MPQA) corpus (Wiebe et al, 2005a), used for the task of subjective question answering. Work often focuses on identifying sentiment at a phrase or sentence level (Lin et al, 2006), (Wilson et al, 2005), determining the semantic orientation of words and phrases (Turney, 2002), (Turney and Littman, 2003), identifying opinion holders (Stoyanov and Cardie, 2006) and for information extraction (Wiebe et al, 2005b). Similar work on sentiment analysis using web
corpora includes Pang and Lee (2004), in which reviews of movies were taken from web sources for a classification task, Turney (2002), where web reviews were classified using semantic orientation, and Hu and Liu (2004), which extracted summaries of the opinion expressed on various features of a product in a set of web reviews using semantic orientation and association rule mining. Although the work of Hu and Liu (2004) seems particularly closely related to the work presented here, this task is distinguished as being one of classification, not of summarization. Also, while the results of Hu and Liu are useful for a human, the nebulous nature of the opinion summary does not create a potential on-flow for other computational techniques. Other similar work includes (Yi et al, 2003) and (Popescu and Etzioni, 2005), in which the sentiment of features is sought at a fine-grained level. The differentiation is also drawn here between “evaluative” and “experiential” reviews; where experiential reviews are those which involve a description of emotion regarding something in which the reviewer has participated. For example, reviews of movies, restaurants and beer would be classed as “experiential” reviews, and reviews of electronic devices, video games and cars as “evaluative” reviews. However, there is obviously some degree of overlap between these categories, and most reviews will have some proportion of both evaluative and experiential phenomena.
3
Motivation
3.1
Data Collection
The data used in this research comes from two websites, the first on reviews of beers1 , and the second on reviews of restaurants2 . Each website allows registered users to submit reviews for particular beers or restaurants, and give numeric ratings on a number of attributes. These domains were chosen for this study for two reasons. Firstly, the attributes of both domains are reasonably well demarcated, and reviewers can express the sentiments on each attribute easily. There is also a reasonably well-developed vocabulary for discussing these attributes. Secondly, and more 1 2
http://www.beeradvocate.com http://www.eatability.com.au
pragmatically, the domains were selected because there is a large amount of data available in a reasonably consistent form on the web, which yielded examples labelled for sentiment across multiple attributes. In each instance, reviews were collected from the website in question, and classified as being “positive” or “negative” for a corpus attribute (that is, a feature of the item under review which has been given a numeric rating) if the score given was above or below what was described as being “average”3 . For both beers and restaurants, two corpora are collected; one which is used for the pre-processing step of estimating topics using Latent Dirichlet Allocation (see section 4.1), and one which is used for machine learning experiments. These corpora each comprised 5,000 reviews which were deemed to be “positive” on the basis of the “overall” rating, and 5,000 reviews which were deemed to be “negative” These corpora are labelled respectively the LDA Development Corpus and the Experimental Corpus. 3.1.1
Beer Review Corpus
The reviews in the beer corpus are annotated for five corpus attributes: Appearance, Smell, Taste, Mouthfeel and Drinkability. They are also given an overall score, calculated using the following formula4 , which applies weights to different attributes: overall = appearance + smell + 2×taste + 5 5 5 drinkability mouthf eel + 10 10 Note that in this corpus, the first four attributes are analogous to four human senses - sight, smell, taste, touch - reiterating the experiential nature of this type of review. 3.1.2
Restaurant Review Corpus
The reviews in the restaurant corpus are annotated for four corpus attributes: Ambience, Service, Food and Value. The overall score for this corpus is simply the mean of the scores given for individual attributes, as the website does not present its own weighting scheme. 3
Reviews given a score of exactly “average” on a corpus attribute were classified as “positive”, however this decision was necessarily arbitrary. 4 weighted according to http://beeradvocate.com/help/?topic=reviewing_beers
Topic 1 you can price get prices expect pay top up fine not cheap restaurant ...
βi1 0.0721 0.0671 0.0401 0.0373 0.0367 0.0188 0.0182 0.0172 0.0167 0.0153 0.0146 0.0130 0.0125 ...
Topic 2 best restaurant I one Sydney place restaurants ’ve better most ever worst thai ...
βi2 0.0616 0.0545 0.0461 0.0428 0.0411 0.0366 0.0298 0.0295 0.0274 0.0273 0.0270 0.0228 0.0201 ...
Topic 3 service great food friendly staff always excellent fantastic atmosphere place beautiful lovely attentive ...
βi3 0.1491 0.1427 0.0990 0.0486 0.0427 0.0388 0.0382 0.0354 0.0292 0.0200 0.0136 0.0126 0.0124 ...
Topic 4 we our us table waiter order minutes took asked came arrived one they ...
βi4 0.1247 0.0580 0.0405 0.0280 0.0160 0.0145 0.0135 0.0130 0.0129 0.0116 0.0111 0.0110 0.0108 ...
Topic 5 dishes served way food small tasty portions ordinary cold two her some head ...
βi5 0.0499 0.0376 0.0368 0.0358 0.0338 0.0279 0.0235 0.0216 0.0213 0.0184 0.0163 0.0155 0.0143 ...
Topic 6 I go back would will not again never going definitely recommend think place ...
βi6 0.0896 0.0853 0.0743 0.0679 0.0577 0.0439 0.0419 0.0406 0.0328 0.0324 0.0260 0.0168 0.0162 ...
Table 1: An illustration of selected topics from the restaurant review corpus (only the 13 highest weighted words are included). These topics were generated when predicting a total of 20 topics on LDA estimation. 3.2
Identifying Attributes and Sentiment
The goal of this research is to classify these documents as being positive or negative over each of the corpus attributes for which they are rated. This will then allow illustration not only of how the item under review is perceived overall, but also what elements of it are regarded as being positive or negative. Note that many of the linguistic features expected to have some type of sentiment orientation are not specific to a particular corpus attribute. Take, for example, this sentence from the beer review corpus: “This beer has a wonderful palate, rich and complex.”. The words “rich”, and “complex” imply a positive orientation for taste, and the word “wonderful” has a generic positive orientation, which could apply to any number of corpus attributes. While it is expected a bag-of-words implementation could go some of the way to solving this sentiment evaluation problem, combining the generic descriptors with the corpus attributes they describe will yield a much more robust and intuitive method for modelling the reviews. Hence, there are two phenomena to be identified: the corpus attribute under discussion and the sentiment applicable to it. The subsequent task is to seek a method by which these phenomena can be modelled as features for a machine learner. 3.3
Latent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) is a generative, probabilistic model of a corpus, whereby an author uses a finite set of latent topics to generate words in a document, each word having a particular probability of being used in a particular topic (Blei, 2004).
In practice, this model is used in reverse; that is, to estimate the topics in a particular corpus given the number of topics n and a large enough sample of data. This process yields a n × V matrix B (capital beta) where V is the size of the vocabulary of the corpus, and βij represents the probability that topic tj is manifested by word wi . It also estimates a value for αj , (the prior Dirichlet parameter) the prior distribution of topics. In practice this value is constant over all topics. (See Blei (2004) for details) This set of vectors B can then be used to predict the distribution of these n topics on unseen documents. This process is known as inference, and for a particular document wk , returns a vector γk , where γjk (the posterior Dirichlet parameter) is approximately the prior Dirichlet parameter plus the number of words in the document which are inferred to have been generated by topic tj . This model gives a method for exploiting the differences in vocabulary across documents to illustrate differences in topic distributions. This is exploited for identifying the corpus attributes in our review documents.
4 4.1
Methodology Extracting Attribute-focused LDA Topics
While a standard implementation of LDA extracts topics based on vocabulary differences across documents, this work seeks to exploit the intra-document vocabulary differences. In the corpora presented here, the inter-document vocabulary is rather homogeneous. Each separate document in a corpus, when modelled using LDA, has a similar vocabulary, and so the topic vectors obtained from the process of
Topic 1 glass pint poured light served beer bottle nicely around my body clean head ...
βi1 0.0426 0.0224 0.0220 0.0218 0.0168 0.0140 0.0107 0.0104 0.0094 0.0092 0.0090 0.0087 0.0084 ...
Topic 2 beer not good drinkability drinkable one summer great my enough brewery session me ...
βi2 0.0315 0.0278 0.0258 0.0244 0.0225 0.0214 0.0206 0.0188 0.0158 0.0127 0.0124 0.0063 0.0063 ...
Topic 3 beer mouthfeel creamy carbonated down watery texture thin little water my smooth one ...
βi3 0.0352 0.0287 0.0207 0.0149 0.0108 0.0102 0.0101 0.0089 0.0078 0.0076 0.0074 0.0073 0.0072 ...
Topic 4 hop some up faint flavor little smell hops aroma citrus bitterness maybe aftertaste ...
βi4 0.0285 0.0233 0.0224 0.0186 0.0183 0.0175 0.0150 0.0131 0.0125 0.0121 0.0112 0.0102 0.0097 ...
Topic 5 malt aroma hops sweet taste some flavor finish smell caramel hop dry bitter ...
βi5 0.0481 0.0336 0.0301 0.0269 0.0262 0.0248 0.0191 0.0170 0.0156 0.0154 0.0141 0.0121 0.0121 ...
Topic 6 head color pours dark brown amber golden clear appearance light lacing no white ...
βi6 0.0699 0.0542 0.0499 0.0277 0.0242 0.0211 0.0207 0.0206 0.0191 0.0189 0.0184 0.0168 0.0143 ...
Table 2: An illustration of selected topics from the beer review corpus (only the 13 highest weighted words are included). These topics were generated when predicting a total of 20 topics on LDA estimation. LDA estimation exploit vocabulary differences other than those of the corpus attributes desired. To circumvent this problem, in this work LDA models each sentence in a review as a separate “document”, thereby exploiting the differences in vocabulary expressed within a document when describing different corpus attributes of the review item. While this means that each “document” in the LDA process is substantially shorter, it increases the number of “documents” we have to exploit. There is no way to assure topic coherence; however, empirical tests show a level of coherence apparent in qualitative analysis. Running Blei’s LDA estimation implementation5 over the sentences in the LDA Development Corpora yields topic vectors shown in Tables 1 and 2, each sorted by decreasing β value. A qualitative analysis of these topics shows coherence with the corpus attributes described in section 3.1. In Table 1, a number of topics seem to map directly to the corpus attributes sought. Topic 1 seems to correlate with value, Topic 4 with service and Topic 5 with food. There are also topics which are combinations of attributes. For example, Topic 3 seems to be a combination of service and atmosphere. As well, some topics identify an “overall” evaluative attribute, such as Topic 2, which looks like a general evaluation, and Topic 6, which seems to identify the “recommendation” element which occurs in many reviews. Table 2 shows a similar coherence; Topics 1 and 6 identify appearance, Topic 2 drinkability and Topic 5
available for free http://www.cs.princeton.edu/ blei/lda-c/
download
at
3 mouthfeel. Topics 4 and 5 seem to identify a combination of taste and smell. From a psychological perspective, it makes sense that these topics would be closely linked, as they are the two most closely linked of our senses, and our vocabulary is similar in describing both sensory experiences. What is notable from both tables is that there is some kind of relationship between the topics discovered during LDA estimation and the attributes of a review to be identified. In the next subsection, a method is presented by which these topics are used in a machine learning experiment to predict the sentiment of each corpus attribute. 4.2
Creating Features from LDA Topics
The estimated LDA Topics give some intuition as to the corpus attribute or attributes in question in a particular section of the review. However, a way to effectively link this to the sentiment of the opinion being expressed must be found. Currently, this is addressed simply by using bag-of-words (BOW) features, in combination with the topic features identified through LDA inference. The creation of these features for a document w is described in the following process: 1. For each sentence sk in document w : (a) Use topic matrix B to infer vector γk 6 (b) For each γjk ≥ 1 + αj : i. For each word wi in sentence sk : A. Create a feature tj ×wi 6 Recall, each γjk value is a prediction of the number of words generated by topic tj in document wk plus the prior Dirichlet parameter αj .
Any topic tj is identified as occurring in a particular sentence sk if the value of γjk in the inferred vector is greater than or equal to 1 + αj . This means, any topic is present if it is predicted to have “generated” any word in the sentence. Note that because the “documents” for LDA inference (the sentences) are so short, this extremely low cut-off is designed to boost recall of identified topics. As this process is used as a precursor to a machine learning task, in effect this approach seeks to exploit all the information available. Note also that the values of γjk are not normalised over the document length. This is due not only to the brevity of the “documents”, and the fact that the large proportion of sentences are of a similar length (meaning normalisation would give little advantage), but also because in those sentences which are particularly long, it is likely that more than one attribute is being discussed. Therefore, having topics relating to multiple attributes would be of considerable advantage to a machine learner. This process gives a set of binary existence features which represent, for a document, which LDA topics (and by inference, which corpus attributes) are being used with which words, many of which have a “generic” sentiment orientation (i.e. not a sentiment orientation specific to a particular attribute). Some examples of features created by this method include (in the form topic × word): • Beer Review Corpus Features 1. (malt, aroma, hops, sweet, . . . ) × “nice” 2. (mouthfeel, carbonation, light, body, smooth, . . . ) × “good” 3. (head, white, quickly, yellow, nice, lacing, . . . ) × “cloudy” • Restaurant Review Corpus Features 1. (service, food, friendly, staff, atmosphere, . . . ) × “great” 2. (restaurant, I, one, Sydney, place, . . . ) × “worst” 3. (food, service, average, quality, ok, . . . ) × “overpriced” These features are termed LDA × BOW features. Note that some of these example features (features 1 and 2 in the beer review corpus, and 1 and 2
in the restaurant review corpus), contain words with “generic” semantic orientation. However, their cooccurence with identified topics means that they are more likely to be able to separate situations when they are used to describe different corpus attributes of the item under review. Feature 3 in both corpora gives an example of an LDA topic co-occuring with a word containing sentiment relevant to a particular corpus attribute. 4.3
Machine Learning Using Maximum Entropy
Once these features are extracted from the review documents, a machine learner is trained to classify documents based on the sentiment of each of their corpus attributes. For example, for each document in the beer review corpus, six labels are assigned, a “positive” or “negative” evaluation for each of the corpus attributes, plus another for the “overall” sentiment of the review (similarly for restaurant reviews). Classification is performed using Hal Daumé’s MegaM7 Maximum Entropy Model (MEM) implementation. The baseline is BOW binary existence features. For the experiments on the corpus the features are generated by the process described in section 4.2, plus all the BOW existence features used in the baseline. Note that the features, unlike the LDA topics, were all generated from the Experimental Corpus. Hence, this kept the data used to initially create the B matrix for LDA topics separate from the data used for evaluation. Testing used ten-fold cross validation. 4.4
Results
The results on each corpus, for each attribute, are given in Table 3, both for the BOW baseline and LDA × BOW experiments. These results show a clear increase in accuracy when using these features over the BOW baseline in all cases except the overall attribute on the beer review corpus, where there is a marginal decrease. This result is not unexpected, however, as the features are designed specifically to target the sentiment classification of the relevant attribute classes, not the overall class. 7
available for free http://www.cs.utah.edu/ hal/megam/
download
from
Beer Review Corpus Appearance Smell Taste Mouthfeel Drinkability OVERALL
Baseline (BOW) 77.1% 80.6% 84.5% 79.8% 78.8% 85.3%
LDA × BOW 78.5% (+1.4) 81.3% (+0.7) 85.4% (+0.9) 80.8% (+1.0) 79.3% (+0.5) 85.1% (-0.2)
Restaurant Review Corpus Ambience Service Food Value OVERALL
Baseline (BOW) 77.9% 85.8% 83.5% 85.4%
LDA × BOW 80.1% (+2.2) 86.1% (+0.3) 84.9% (+1.4) 87.6% (+2.2)
87.3%
93.0% (+5.7)
Table 3: Results comparing BOW baseline and LDA × BOW feature sets, evaluated on ten-fold cross validation. What is more surprising is the large increase in accuracy of classification of the overall attribute on the restaurant review corpus. This increase, however, is most likely attributable to the overall evaluative LDA topics extracted from the corpus (e.g. topics 2 and 6 from Table 1). 4.5
Other Experiments
Other experiments were performed on these corpora which are not described here, but which in all cases resulted in a detriment to performance. Changing the cut-off point of the LDA γ values decreased performance if changed significantly higher or lower than 1, indicating that too high a value would not accept topics which could bring potential benefit, and too low a value would potentially accept topics which had no relevant words in the sentence. Omitting the high-coverage BOW features from the experimental feature set also degraded accuracy slightly, indicating that there are features of the text not covered by the LDA × BOW features. Other separate experiments used the real γ values rather than a cutoff point, predicted the class for the “overall” attribute based on the other attributes (and vice-versa), and utilised a manual mapping from the LDA topic vectors to one or more of the corpus attributes, but in all cases, this provided no benefit to classification accuracy. 4.6
Analysis
To gain insight into how the LDA × BOW features are performing, the relative weights given by the MEM to each of the features were examined. A collection of the highest-weight features for each corpus attribute are displayed in Tables 4 and 5. Note that these feature sets not only contain the LDA × BOW features, but simple BOW existence features as well.
Appearance “flat” (head, white, quickly, yellow, nice, lacing, . . . ) × “cloudy” (head, color, pours, dark, brown, amber, . . . ) × “no” “poor” Smell “solid” “weak” (like, than, lager, macro, light, american, . . . ) × “aroma” (malt, aroma, hops, sweet, . . . ) × “nice” Taste “bland” “excellent” (mouthfeel, carbonation, light, body, smooth, . . . ) × “thin” (mouthfeel, carbonation, light, body, smooth, . . . ) × “smooth” Mouthfeel (mouthfeel, carbonation, light, body, smooth, . . . ) × “thin” (mouthfeel, carbonation, light, body, smooth, . . . ) × “smooth” “watery” (mouthfeel, carbonation, light, body, smooth, . . . ) × “good” Drinkability “easy” “solid” (I, beer, like, not, you, taste, . . . ) × “easy” (beer, not, good, drinkable, drinkability, . . . ) × “not” Overall “solid” “excellent” “great” (mouthfeel, carbonation, light, body, smooth, . . . ) × “smooth”
Table 4: Sample of ranked features for classifying the beer review corpus It is possible to see that the LDA × BOW features weighted most highly fall into a number of different categories, which are described in Table 6. This illustrates that the features are, on the whole, acting as expected. The most commonly highlyweighted LDA features are those which either use the LDA topic element to identify the attribute to which a generic sentiment term applies (category 1), or which combine a corpus attribute-related sentiment term with the appropriate attribute (category 2), thereby giving the machine learner more precision. Those features which fall into the third and fourth categories appear to be in the minority, but it is clear that they are reasonable features to expect.
Category 1
LDA Topic × BOW Term corpus attribute topic×generic sentiment term
2
corpus attribute topic×related sentiment term
3
sentiment topic×corpus tribute term
4
corpus attribute topic×other
at-
Description The LDA element identifies the attribute and the BOW element identifies sentiment that does not have a basis in the attribute The LDA element identifies the attribute and the BOW element identifies sentiment which relates specifically to that attribute The LDA element identifies the sentiment (usually domain-specific) and the BOW element identifies the attribute The LDA element identifies the attribute, and the BOW element identifies some other feature. These features are generally orientation-neutral, but when combined with the LDA topic element provide some degree of sentiment
Example (service, food, friendly, staff, atmosphere, . . . ) × “great” (mouthfeel, carbonation, light, body, smooth, . . . ) × “thin” (like, than, lager, macro, light, american, . . . ) × “aroma” (head, color, pours, dark, brown, amber, . . . ) × “no”
Table 6: Four categories of common LDA × BOW combinations. Ambience (service, great, food, friendly, staff, atmosphere, . . . ) × “friendly” (service, great, food, friendly, staff, atmosphere, . . . ) × “views” “excellent” (service, great, food, friendly, staff, atmosphere, . . . ) × “excellent” Service (service, great, food, friendly, staff, atmosphere, . . . ) × “friendly” (well, service, little, quite, though, bit, . . . ) × “rude” (service, great, food, friendly, staff, atmosphere, . . . ) × “great” (service, great, food, friendly, staff, atmosphere, . . . ) × “excellent” Food (best, restaurant, I, one, Sydney, place, . . . ) × “worst” (best, restaurant, I, one, Sydney, place, . . . ) × “best” (well, service, little, quite, though, bit, . . . ) × “bland” (service, great, food, friendly, staff, atmosphere, . . . ) × “great” Value (food, service, average, quality, ok, . . . ) × “overpriced” (nice, wine, expensive, good, all, . . . ) × “expensive” (best, restaurant, I, one, Sydney, place, . . . ) × “worst” (best, restaurant, I, one, Sydney, place, . . . ) × “best” Overall “fantastic” “excellent” (service, great, food, friendly, staff, atmosphere, . . . ) × “fantastic” (best, restaurant, I, one, Sydney, place, . . . ) × “worst”
Table 5: Sample of ranked features for classifying the restaurant review corpus
The third category, for example, is reliant on having sentiment-specific LDA topics extracted. This in turn relies on there being a sufficient number of coherent sentences in the corpus which express sentiment using a similar vocabulary. The example in the table shows the case where a specific style of beer (a light, mass-produced American lager) is often referred to in a negative light, but that the classifier differentiates between when the aroma of such a style is being discussed, and when the style is used purely referentially. The fourth category is smaller still, and mostly composed of an LDA topic element identifying an
attribute, with the BOW element identifying negation. In these cases, it is likely the absence of elements of that attribute are undesirable (e.g. absence of “head” on a beer). In all cases, these features provide yet further insight into the characteristics of a document upon which a sentiment decision is made. For flow-on applications of automatic sentiment analysis, which require explanatory power, these features would be invaluable.
5
Discussion and Future Work
It is clear that one of the major limitations in this study is the modelling of sentiment as a simple bagof-words. While there is benefit to be gained from this type of representation in the short-term, identifying more complex opinion-bearing structures is likely to give a far more accurate representation of the semantic content of the review. Negated expressions, while captured in limited instances in combination with LDA topics, are poorly dealt with by the system here described, as are more complicated or subtle subjective structures. It would also be prudent to compare the results gained in this study to sentential BOW cross-product feature sets. By this, we mean modelling both attribute and sentiment as a bag-of-words in a sentence, and creating similar cross-product features to the LDA × BOW set (for example, we can envisage an “aroma” × “nice” feature for the beer review corpus). This approach would give us insight into how much the LDA features are helping beyond a simple feature-compression capacity.
Similarly, initial experiments modelling whole documents using LDA show a certain level of coherence with sentiment, which suggests benefit in using LDA at different levels to model both sentiment and attribute, possibly combining them in a similar manner to the method described in this paper. In the future, we envisage this work being useful for unsupervised multi-attribute sentiment classification, where the attributes need not be defined apriori, but are rather extracted in a process of discovery, or that the LDA topic extraction can be used as a precursor to identifying evaluated attributes.This approach could also be used interactively and iteratively, as a pre-cursor to, or incorporated into a sentiment analysis eLearning environment.
6
Conclusions
In this paper, we have described the task of multiattribute sentiment classification, whereby documents are labelled with sentiment-orientated features at a high-level, but where the instantiations of these labels are realised at a very fine-grained level, and dispersed throughout the document. We intend to collect further corpora for this task in the near future. We have shown a method for modelling multiple attributes in review documents, which gives a significant increase in accuracy for the task of sentiment classification over a bag-of-words baseline. It is hoped the work presented here forms a prelude to more work in this area, whereby high-level attributes are exploited to give more explanatory power to the task of sentiment classification. The benefits for flow-on applications of having features which describe the decisions made at a semantic level should not be overlooked.
References David Meir Blei. 2004. Probabilistic Models of Text and Images . PhD Dissertation. University of California, Berkeley. Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. Proc. Tenth ACM SIGKDD international conference on Knowledge discovery and data mining. (KDD-04) Wei-Hao Lin, Theresa Wilson, Janyce Wiebe and Alexander Hauptmann. 2006. Which Side are You on? Identifying Perspectives at the Document and Sentence Levels. Proc. 2006 Conference on Natural Language Learning (CoNLL-2006).
Satoshi Morinaga, Kenji Yamanishi, Kenji Tateishi, Toshikazu Fukushima. 2002. Mining Product Reputations on the Web. Proc. KDD 2002. Bo Pang and Lillian Lee. 2004. A Sentimental education: Sentiment analysis using subjectivity summarization base on minimum cuts. Proc. 42nd Meeting of the Association for Computational Linguistics (ACL-2004). Ana-Maria Popescu and Oren Etzioni. 2005. Extracting Product Features and Opinions from Reviews. Proc. HLTEMNLP, 2005. Veselin Stoyanov, Claire Cardie, Diane Litman and Janyce Wiebe. 2004. Evaluating and Opinion Annotation Scheme Using a New Multi-Perspective Question and Answer Corpus. Proc. AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications. Veselin Stoyanov and Claire Cardie. 2006. Partially Supervised Coreference Resolution for Opinion Summarization through Structured Rule Learning. Proc. 2006 Conference on Emprical Methods in Natural Language Processing. Peter D. Turney. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Proc. 40th Meeting of the Association for Computational Linguistics (ACL-2002). Peter D. Turney and Michael L. Littman. 2003. Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems, 21(4):315.346. Janyce Wiebe and Ellen Riloff. 2005. Creating Subjective and Objective Sentence Classifiers from Unannotated Texts. Computational Linguistics and Intelligent Text Processing, pp486-497. Springer Berlin / Heidelberg. Janyce Wiebe, Theresa Wilson and Claire Cardie. 2005a. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2-3), pp. 165-210. Janyce Wiebe, Theresa Wilson and Claire Cardie. 2005b. Exploiting subjectivity classification to improve information extraction. Proc. 20th National Conference on Artificial Intelligence (AAAI-2005). Theresa Wilson, Janyce Wiebe and Paul Hoffman. 2005. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. 2005 Conference on Empirical Methods in Natural Language Processing (EMNLP-2005). Jeonghee Yi, Tetsuya Nasukawa, Razvan C. Bunescu and Wayne Niblack. 2003. Sentiment Analyzer: Extracting Sentiments about a Given Topic using Natural Language Processing Techniques. Proc. 3rd IEEE International Conference on Data Mining.