Exploiting Combined Multi-level Model for Document ... - CNRS

7 downloads 0 Views 518KB Size Report
Exploiting Combined Multi-level Model for Document Sentiment Analysis. Si Li, Hao Zhang, Weiran Xu, Guang Chen and Jun Guo. School of Information and ...
2010 International Conference on Pattern Recognition

Exploiting Combined Multi-level Model for Document Sentiment Analysis

Si Li, Hao Zhang, Weiran Xu, Guang Chen and Jun Guo School of Information and Communication Engineering Beijing University of Posts and Telecommunications Beijing, China E-mail: [email protected] Abstract—This paper focuses on the task of text sentiment analysis in hybrid online articles and web pages. Traditional approaches of text sentiment analysis typically work at a particular level, such as phrase, sentence or document level, which might not be suitable for the documents with too few or too many words. Considering every level analysis has its own advantages, we expect that a combination model may achieve better performance. In this paper, a novel combined model based on phrase and sentence level’s analyses and a discussion on the complementation of different levels’ analyses are presented. For the phrase-level sentiment analysis, a newly defined Left-Middle-Right template and the Conditional Random Fields are used to extract the sentiment words. The Maximum Entropy model is used in the sentence-level sentiment analysis. The experiment results verify that the combination model with specific combination of features is better than single level model. Keywords-combined multi-level model; sentiment analysis; phrase-level; sentence-level; document-level

I.

INTRODUCTION

As more and more blogs, shopping sites and related reviews are available in the Web, people want to know more details about what they are interested in together with the corresponding comments. This requirement motivates the research on techniques of sentiment analysis. Sentiment analysis is a type of subjectivity analysis that focuses on identifying positive or negative opinions, emotions, and evaluations expressed in natural language [1]. The traditional text sentiment analysis is typically based on one single level, such as phrase-level, sentence-level and document-level. For documents in the websites, some may have only one or two sentences, or even several simple words, which are too short to be viewed as articles, while some may contain too many sentences. Therefore single level approaches can’t deal flexibly with these situations, and the combination model of different levels is promising to achieve an improvement. In this paper, we focus on using the combined multi-level model (also called combination model for short throughout other parts of this paper) of the phrase-level and sentence-level sentiment analysis to get the polarity of the documents. 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.1007

We follow the hypothesis that text sentiment expression owns its internal pattern, which can be expressed through the words arranged orderly. Base on this hypothesis, in phraselevel sentiment analysis, we design a Left-Middle-Right (LMR) template to simulate this ordered arrangement which obtains sentiment expressions. The Conditional Random Fields (CRFs) is employed to extract sentiment words and the polarity of each word is given automatically [2]. Moreover, the Maximum Entropy (ME) model is used in the sentence-level sentiment analysis [3]. Morphology and grammar features are selected when using CRFs and ME. After the sentiment words and sentences are recognized, the combination model is proposed to improve the F-measure of document sentiment analysis. In this paper, our definition of the polarities include positive, negative and neutral, which contains objective texts and hybrid texts. The hybrid text includes praise and criticism, the praise and criticism, which have the same weight, and they should be objective in the whole text. The rest of the paper is organized as follows. We review related works in Section II. Section III describes the document sentiment analysis with different levels and the combination model. In Section IV, the evaluation of document sentiment analysis based on different-level systems are presented. Finally, in Section V, conclusions and comments on the future work are given. II.

RELATED WORKS

The text sentiment analysis is usually performed based on words, sentences and documents. Kaji [4] explored how to build lexicon for sentiment analysis. Wilson [5] focused on the phrase-level sentiment analysis work to automatically distinguish between prior and contextual polarity and find which features are important for this task. Pong [6] studied the text sentiment analysis as a sentiment classification problem via classifying documents not by topic, but by overall sentiment, and the work was improved in [7] for sentence-level sentiment analysis. Abbasi [8] proposed a kind of feature extraction algorithm for sentiment classification. In [9], a survey covered techniques and approaches that promised to directly enable opinion-oriented information-seeking systems. 4125 4149 4141

III.

TEXT SENTIMENT ANALYSIS

A. Language Features In this section, we focus on showing the features that are used in this paper, which are listed as the following: Features for sentiment analysis Word N-Grams: The bag-of-words, also the standard word-vector, is the basic and popular used representation of documents. While being put into the n-gram framework, the bag-of-words model is a context free document representation model and can be viewed as a basic unigram form. In this paper, unigram and bigram language features are adopted. Part of Speech (POS) tag: This tag can be considered to be a crude form of word sense disambiguation [9]. Syntax tag: It’s a kind of deeper linguistic analysis. Negation terms: These words can change the sentence polarity obviously, such as “no” and “not”. Degree modifiers: When these words appear, the degree of the sentence polarity may be enhanced or reduced, such as “very” and “much”. Transitional words: These words may change the sentences’ orientation, such as “but”. The dependency relationship: The relationship is an asymmetric binary relationship and obtained according to the dependency parsing tree. Because original features are usually too many to effectively be used, there are two procedures needed to be performed on the original language-grams, one is feature extraction and the other is feature weighting. A simple feature elimination method is used to abandon the features they appear fewer times than a threshold. Boolean value and absolute term frequency (ABS) are introduced for feature weighting. B. Phrase-level 1) The LMR sentiment word template: A word appears in the document may be quite different from the prior polarity in the lexicon as shown in [4]. For example, “What a blue day!” and “This is a blue cup.”. In the first sentence, the word “blue” means melancholy, it has a negative polarity, while the word just means one kind of color without any polarity in the second sentence. Therefore, in order to judge the polarity of a word, we put the word into the context. The LMR template takes such condition into account. We use the word n-grams feature to explain the LMR template. Letters in the “LMR” have different meanings, “M” represents the word which is required to be judged, “L” is the word that on the left of the M word in the ordered

arrangement context, and “R” means the word on the right side of the M word. So the words sequence which contains 2n+1 words can be denoted as LnLn-1…L1MR1…Rn-1Rn. Some information which is got from “L” and “R” can help to judge the polarity of the “M” word. Word n-grams, POS tag, syntax tag, negation terms, the dependency relationship, transitional words and degree modifiers are selected as features to extract sentiment word based on CRFs. Considering the different importance of each feature, we adopted different LMR templates for different features: Word n-grams, POS tag and syntax tag features are under L2L1MR1R2 template. These features should be considered with the congeneric tags from L2, L1, R1 and R2 features. The other features only consider the “M”. If a sentence contains transitional words, the feature for each word in this sentence is 1, else is 0. Degree modifiers feature is set in a similar way. The negation terms feature equals the number of negation terms in a sentence for each word in this sentence. The dependency relationship feature of a word in a sentence just considers the relationship from other word to this word. 2) Document sentiment analysis based on the phraselevel: For matching the LMR template, CRFs is involved for sentiment word extraction. After the phrase-level sentiment analysis, the following information can be got in each document: The number of the positive words is marked as WordPosNum; the number of the negative words is marked as WordNegNum; while the sum of the positive words and the negative words is marked as WordSubNum. The parameter for judging the document polarity is defined as the following: WordScale = WordPosNum WordSubNum (1) With the WordScale parameter, the document polarity is judged by (2): ⎧ −1 WordScale < WL ⎪ Textpol = ⎨ 1 WordScale > WH (2) ⎪0 otherwise ⎩ “-1”, “1” and “0” represent negative, positive and neutral respectively, WL and WH are the thresholds to separate different orientations. In the experiments, WL is set as 0.5 and WH is 0.8. C. Sentence-level Three kinds of language features are involved for sentence-level sentiment analysis. They are word n-grams, negation terms and degree modifiers. We adopted 5 strategies with different combined features and weighting as the following: Strategy A: Unigram and Bigram + Boolean; Strategy B: Unigram and Bigram + ABS; Strategy C: Unigram and Bigram + ABS, negation

4142 4150 4126

words + ABS; Unigram and Bigram + ABS, degree modifiers + Boolean Strategy E: Unigram and Bigram + ABS, negation words + ABS, degree modifiers + Boolean. The ME classification is employed to train a classifier. Then, the following information can be got in each document: The number of the positive sentences is denoted as SenPosNum; the number of the negative sentences is marked as SenNegNum; while the sum of the positive sentences and the negative sentences is denoted as SenSubNum. SenScale is given as (3): SenScale = SenPosNum SenSubNum (3) And the document polarity is judged based on sentencelevel as (4). ⎧ − 1 SenScale < SL ⎪ Textpol = ⎨ 1 SenScale > SH (4) ⎪0 otherwise ⎩

TABLE I.

Strategy D:

In the sentence-level sentiment analysis, different strategies have their own characteristics. Different thresholds are set for different strategies. SL and SH are set as 0.6 and 0.8 in Strategy A, while in Strategy B, C, D and E they are set as 0.4 and 0.6 respectively. D. The Combined Multi-level Model Previous research in information retrieval [11] showed that combining passage retrieval with document retrieval often yields better result than using any single retrieval model alone. This idea also can be considered for sentiment analysis. In this paper, the phrase-level sentiment analysis is the basis of the combination model and a trade-off between phrase-level and sentence-level is involved in the combination model as (5). TextScale = μ×WordScale+ ( 1- μ )×

2 ×WordScale × SenScale WordScale+SenScale

(5)

TextScale is the parameter for judging the document’s polarity based on the combination model. µ is a weighting parameter distributing in the interval [0, 1] and balances the scores of phrase-level and sentiment-level sentiment analysis. In this paper, µ is set as 0.8. Then the document polarity is judged as (6) by TextScale. ⎧ − 1 TextScale < TL ⎪ Textpol = ⎨ 1 TextScale > TH (6) ⎪0 otherwise ⎩ where TL and TH are set as 0.6 and 0.8 respectively. This trade-off method, which employs the sentence-level result to modify the phrase-level result, improves the term windows limitation in phrase-level.

A

B

C

D

E

P% R% F% P% R% F% P% R% F% P% R% F% P% R% F%

DOCUMENT SENTIMENT ANALYSIS BASED ON SENTENCE-LEVEL

Positive 22.9091 38.4146 28.7016 29.2835 57.3171 38.7629 30.6931 56.7073 39.8287 29.4479 58.5366 39.1837 29.1793 58.5366 38.9452

IV.

Negative 15.5689 24.0741 18.9091 31.3305 67.5926 42.8152 32.766 71.2963 44.898 33.1839 68.5185 44.713 35.0467 69.4444 46.5839

Neutral 68.9922 47.9354 56.5678 69.4545 34.2908 45.9135 69.7595 36.4452 47.8774 69.6429 35.009 46.595 70.2797 36.0862 47.6868

Average 35.8234 36.808 34.7262 43.3562 53.0668 42.4972 44.4062 54.8163 44.2013 44.0915 54.0214 43.4972 44.8353 54.6891 44.4053

EXPERIMENTS

In this section, we present empirical evaluation results to assess the effectiveness of our sentiment analysis technique on different levels. In particular, we conducted experiments on the 829 website texts which were downloaded from [10]. This collection, including 164 positive texts, 108 negative texts and 557 neutral texts, contains news, expository essay, movie reviews, product reviews and so on. The training data includes 1741 tagged sentiment words, 10147 sentiment sentences and 10000 sentences without emotion. We employed three performance metrics: Precision, Recall and F-measure. Then the Average performance metrics of three kinds of orientations showed the performance of overall classification. To achieve better results in our experiments, we used the thresholds mentioned in the previous sections for differentlevel sentiment analysis and the combination model. The results of the 5 strategies for sentence-level sentiment analysis are illustrated in Table I. From this table, we draw the following observations for text sentiment analysis based on the sentence-level: (1) In positive texts identification, Strategy C reaches the best precision and F-measure values, while Strategy D and Strategy E get the best recall value. For identifying the negative texts, Strategy C and Strategy E reach the best evaluation results respectively. And Strategy A and Strategy E obtain the best results in neutral analysis. These phenomena reveal that Strategy A is suitable for neutral text analysis, while Strategy C is fit for positive and negative sentence-level sentiment analysis. The Strategy C results illustrate that the negation word feature is an important feature for sentiment analysis, especially for the recall of the negative analysis. (2) The Average value is a kind of comprehensive consideration for the three orientations. From the analysis by synthesis, Strategy E brings in improvements over others in precision and F-measure. This is in line with our expectation. Strategy C gets the best average recall value,

4143 4151 4127

which indicates that the negation feature can distinguish three orientations very well. Strategy C performs better than Strategy D. We conjecture that the negation terms feature is much more effective than the degree modifiers feature. When these two features are combined in Strategy E, the best average F-measure is obtained. Table II shows the evaluation results of each combination model, comparing with the phrase-level method. From Table I and Table II, we can see that document sentiment analysis based on phrase-level significantly outperforms the system based on sentence-level. The improvements over sentence-level from phrase-level are found to be statistically significant for precision, recall and F-measure with large margin. This result implies that the document analysis based on phrase-level is much more effective than the document analysis based on sentence-level. Meanwhile, in the information retrieval system, the smaller the chunk is, the more correct the result is. The result of multi-level sentiment analysis is in accordance with the result of multilevel information retrieval. The combination model with phrase-level and Strategy A shows the best performances on average precision and average F-measure. Moreover, it is also with significantly improvement over other combination models and the single level document sentiment analysis. It is concluded that the unigram and bigram features with Boolean weighting are complementary with the features which are used in the phrase-level sentiment analysis. This result is in conformity with the previous research mentioned in part D of Section III. And the negation terms feature is also helpful in the combination model for the negative analysis. V.

CONCLUSION

In this paper, we propose a novel combination model which is a new perspective for document sentiment analysis based on different levels and discuss the complementation between different levels with different features. Our empirical evaluation results show that the combination model with phrase-level and Strategy A is an effective model to improve the document sentiment analysis results. The features for sentence-level sentiment analysis in Strategy A and the features for phrase-level are complementary. In summary, we can see that our model is promising for document sentiment analysis. But this combination model still has some weak points: (1) As shown in Table II, the combination model isn’t effective for all sentence-levels with phrase-level. (2) We have not dealt with the sentence’s position in a document, from which some useful information might be extracted. (3) It is also helpful to take the strength of opinion words and the prior word’s polarity into consideration, due to their contribution in improving the precision of sentiment analysis.

TABLE II. DOCUMENT SENTIMENT ANALYSIS BASED ON PHRASE-LEVEL AND THE COMBINATION MODEL

Phraselevel Phraselevel +A Phraselevel +B Phraselevel +C Phraselevel +D Phraselevel +E

P% R% F% P% R% F% P% R% F% P% R% F% P% R% F% P% R% F%

Positive 66.8142 92.0732 77.4359 82.6087 69.5122 75.4967 60.0917 79.878 68.5864 60.2804 78.6585 68.254 60.7306 81.0976 69.4517 60.9865 82.9268 70.2842

Negative 88.2353 83.3333 85.7143 83.4783 88.8889 86.0987 61.6352 90.7407 70.34082 63.6943 92.5926 75.4717 63.4615 91.6667 75.0000 65.3595 92.5926 76.6284

Neutral 93.8124 84.3806 88.8469 89.2361 92.2801 90.7326 90.4867 73.4291 81.0704 90.6114 74.5063 81.7734 91.1894 74.3268 81.8991 92.053 74.8654 82.5743

Average 82.9539 86.5957 83.999 85.1077 83.5604 84.1093 70.7379 73.4291 74.355 71.5287 81.9191 75.1664 71.7939 82.3637 75.4503 72.7997 83.4616 76.4956

In our future work, we plan to further improve and refine the combined multi-level model, and to deal with the outstanding problems identified above. ACKNOWLEDGMENT This work was supported by National Natural Science Foundation of China No. 60905017. REFERENCES [1]

J. Wiebe. “Tracking point of view in narrative.” Computational Linguistics, vol. 20, pp. 233-287, 1994. [2] Y. Choi, C. Cardie, E. Riloff and S. Patwardhan, “Identifying sources of opinions with conditional random fields and extraction patterns.” In Proceedings of HLT/EMNLP, pp.355-362, 2005. [3] B. Chen, H. He and J. Guo. “Constructing maximum entropy language models for movie review subjective analysis.” Journal of Computer Science and Technology, 23(2), pp. 231-239, 2008. [4] N. Kaji, M. Kitsuregawa. “Building lexicon for sentiment analysis from massive collection of HTML documents.” In Proc. of the 2007 EMNLP-CoNLL, Prague, pp. 1075–1083, June 2007. [5] T. Wilson, J. Wiebe and P. Hoffmann. “Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis.” Computational Linguistics, vol. 35, pp. 399-433, 2009. [6] B. Pang, L. Lee and S. Vaithyanathan. “Thumbs up? sentiment classification using machine learning techniques.” In Proc. of EMNLP 2002, pp. 79-86. 2002. [7] B. Pang, L. Lee. “A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts.” In Proc. of the 42nd Meeting of ACL, 2004, 271-278. [8] A. Abbasi, H. Chen and A. Salem. “Sentiment analysis in multiple languages: feature selection for opinion classificaiton in web forums.” ACM Transactions on Information System, Vol.26, June 2008. [9] B. Pang, L. Lee. “Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval,” Volume 2, pp.1135, 2008. [10] http://www.searchforum.org.cn/tansongbo/corpus-senti.htm [11] J. P. Callan. “Passage-level evidence in document retrieval.” In Proc. of the 17the ACM SIGIR, pp.302-310, 1994.

4144 4152 4128