A Simple and Efficient Algorithm for Lexicon Generation Inspired by Structural Balance Theory Anis Yazidi(B) , Aleksander Bai, Hugo Hammer, and Paal Engelstad Institute of Information Technology, Oslo and Akershus University College of Applied Sciences, Oslo, Norway
[email protected]
Abstract. Sentiment lexicon generation is a major task in the field of Sentiment Analysis. In contrast to the bulk of research that has focused almost exclusively on Label Propagation as primary tool for lexicon generation, we introduce a simple, yet efficient algorithm for lexicon generation that is inspired by Structural Balance Theory. Our algorithm is shown to outperform the classical Label Propagation algorithm. A major drawback of Label Propagation resides in the fact that words which are situated many hops away from the seed words tend to get low sentiment values since the inaccuracy in the synonym-relationship is not taken properly into account. In fact, a label of a word is simply the average of it is neighbours. To circumvent this problem, we propose a novel algorithm that supports better transitive sentiment polarity transferring from seed word to target words using the theory of Structural Balance theory. The premise of the algorithm is exemplified using the enemy of my enemy is my friend that preserves the transitivity structure captured by antonyms and synonyms. Thus, a low sentiment score is an indication of sentimental neutrality rather than due to the fact that the word in question is located at a far distance from the seeds. The lexicons based on thesauruses were built using different variants of our proposed algorithm. The lexicons were evaluated by classifying product and movie reviews and the results show satisfying classification performances that outperform Label Propagation. We consider Norwegian as a case study, but the algorithm be can easily applied to other languages. Keywords: Sentiment analysis
1
· Structural Balance Theory
Introduction
With the increasing amount of unstructured textual information available on the Internet, sentiment analysis and opinion mining have recently gained a groundswell of interest from the research community as well as among practitioners. In general terms, sentiment analysis attempts to automate the classification c Springer International Publishing Switzerland 2015 ⃝ M. Ali et al. (Eds.): IEA/AIE 2015, LNAI 9101, pp. 336–347, 2015. DOI: 10.1007/978-3-319-19066-2 33
A Simple and Efficient Algorithm for Lexicon Generation
337
of text materials as either expressing positive sentiment or negative sentiment. Such classification is particularity interesting for making sense of huge amount of text information and extracting the ”word of mouth” from product reviews, and political discussions etc. Possessing beforehand a sentiment lexicon is a key element in the task of applying sentimental analysis on a phrase or document level. A sentiment lexicon is merely composed of sentiment words and sentiment phrases (idioms) characterized by sentiment polarity, positive or negative, and by sentimental strength. For example, the word ’excellent’ has positive polarity and high strength whereas the word ’good’ is a positive having lower strength. Once a lexicon is built and in place, a range of different approaches can be deployed to classify the sentiment in a text as positive or negative. These approaches range from simply computing the difference between the sum of the scores of the positive lexicon and sum of the scores of the negative lexicon, and subsequently classifying the sentiment in the text according to the sign of the difference. In order to generate a sentiment lexicon, the most obvious and naive approach involves manual generation. Nevertheless, the manual generation is tedious and time consuming rendering it an impractical task. Due to the difficulty of manual generation, a significant amount of research has been dedicated to presenting approaches for automatically building sentiment lexicon. To alleviate the task of lexicon generation, the research community has suggested a myriad of semi-automatic schemes that falls mainly under two families: dictionary-based family and corpus-based family. Both families are semi-automatic because the underlying idea is to bootstrap the generation from a short list of words with polarity manually chosen (seed words), however they differ in the methodology for iteratively building the lexicon. The majority of dictionary-based approaches form a graph of the words in a dictionary, where the words correspond to nodes and where relations between the words (e.g. in terms of synonyms, antonyms and/or hyponyms) may form the edges. A limited number of seed words are manually assigned a positive or a negative sentiment value (or label), and an algorithm, such the Label Propagation mechanism proposed in [1], is used to automatically assign sentiment scores to the other non-seed words in the graph. The main contribution presented in this paper is the introduction of a novel algorithm that provides enhancements over existing work. Another latent motivation in this article is to investigate the potential of generating lexicon in an automatic manner without any human intervention or refinement. We try to achieve this by increasing the sources of information, namely three different thesauruses, instead of solely relying on a single thesaurus as commonly done in the literature. In fact, we suggest that by increasing the number of thesauruses we can increase the quality of the generated lexicon. Finally, while most of research in the field of sentiment analysis has been centered on the English language, little work has been reported for smaller languages where there is a shortage of good sentiment lists. In this paper, we tackle the problem of building sentiment lexicon for a smaller language, and use the Norwegian language as an example.
338
1.1
A. Yazidi et al.
Background Work
Dictionary-based approaches were introduced by Hu and Liu in their seminal work [2]. They stop the generation when no more words can be added to the list. Mohammad, Dunne and Dorr [3] used a rather subtle and elegant enhancement of Hu and Liu’s work [2,4] by exploiting the antonym-generating prefixes and suffixes in order to include more words in the lexicon. In [5], the authors constructed an undirected graph based on adjectives in WordNet [6] and define distance between two words as the shortest path in WordNet. The polarity of an adjective is then defined as the sign of the difference between its distance from the word ”bad” and its distance from the word ”good”. While the strength of the sentiment depends on the later quantity as well as the distance between words ”bad” and ”good”. Blair and his colleagues [7] employs a novel bootstrapping idea in order to counter the effect of neutral words in lexicon generation and thus improve the quality of the lexicon. The idea is to bootstrap the generation with neutral seed in addition to a positive and a negative seed. The neutral seeds are used to avoid positive and negative sentiment propagation through neutral words. Rao and Ravichandran [8] used semi-supervised techniques based on two variants of the Mincut algorithm [9] in order to separate the positive words and negative words in the graph generated by means of bootstrapping. In simple words, Mincut algorithms [9] are used in graph theory in order to partition a graph into two partitions minimizing the number of nodes possessing strong similarity being placed in different partitions. Rao and Ravichandran [8] employed only the synonym relationship as a similarity metric between two nodes in the graph. The results are encouraging and show some advantages of using Mincut over label propagation. In [10], Hassan and Radev use elements from the theory of random walks of lexicon generation. The distance between two words in the graph is defined based on the notion of hitting time. The hitting time h(i|S)is the average number hops it takes for a node i to reach a node in the set S. A word w is classified as positive if h(w|S+ ) > h(w|S− ) and vice versa, where S+ denotes the set of positive seed words, and S− refers to the set of negative seed words. Kim and Hovy [11] resorts to the Bayesian theory for assigning the most probable label (here polarity) of a bootstrapped word. In [12], a quite sophisticated approach was devised. The main idea is to iteratively bootstrap using different sub-sets of the seed words. Then to count how many times a word was found using a positive sub-set seed and how many times the same word was found using a negative sub-set seed. The sentiment score is then normalized within the interval [0, 1] using fuzzy set theory. It is worth mentioning that another research direction for building lexicon for foreign language is based on exploiting the well-developed English sentiment lexicon. A representative work of such approaches is reported in [13]. In [13], Hassan and his co-authors employ the hitting time based bootstrapping ideas introduced in [10] in order to devise a general approach for generating a lexicon for a foreign language based on the English lexicon.
A Simple and Efficient Algorithm for Lexicon Generation
1.2
339
Transitive Relation and Structural Balance Theory
Structural Balance Theory involves modeling a social network using a graph where edges are annotated edges with positive or negative signs. A positive edge joining two nodes denotes a friendship relationship, while a negative edge denotes antagonism. Structural Balance Theory goes beyond the classical concept of the friend of my friend is my friends, and introduces the notion of the enemy of my enemy is my friend. The theory Structural Balance has recently found direction applications in social networks, resulting in the emergence of signed social networks that are able to describe friendship and antagonism relationships, [14,15], instead of solely relying on positive relationships (friendship). Tidal Trust Algorithm In order to put our work in the right perspective, we shall review a representative work in the field of trust networks. The Tidal Trust algorithm is due to Golbeck [16]. It should be mentioned that it was applied in Film Trust, an online social network for rating movies. The Tidal algorithm belongs to the class of webtrust algorithms and its philosophy is based on the existence of a chain of trust between the user and the movie. In Tidal Trust, each user assesses the confidence that he has in each of his acquaintances using a score between 1 and 10. The participants of the reputation system and their relationships are modeled in terms of a graph, with each node being a participant or an object being rated. An edge describes a relationship between two nodes, in turn signifying the level of trust or rating, depending on the kind of nodes being involved. The algorithm works as follows: – If a node s has rated the movie m directly, it returns its rating for m. – Else, the node asks its neighbors in the graph for their recommendations. Later, Golbeck [16] introduced two improvements to the original Tidal Trust algorithm as follows: – The first enhancement was to consider only short paths from the source to the sink by specifying a maximum length of the trust chain. This improvement was motivated by the observation that trust values inferred through shorter paths tended to be more accurate. – The second enhancement was to discard the paths that involved untrustworthy agents, i.e, whose trust falls below a user-specified threshold. In this manner, the system maintains only the paths with strong trust values.
2
The Novel Approach
A major shortcoming of the Label Propagation algorithm [1] is that while the seed values are propagated throughout the word graph of synonyms, the algorithm does not take into account that for each edge between two synonyms, there
340
A. Yazidi et al.
is an introduction of inaccuracy. The source of inaccuracy is primarily due to the fact that the meaning of two synonyms is typically not overlapping. A synonym of a word might carry a different and skewed meaning and sentiment compared to the meaning and sentiment of the word. Thus, the sentiment value of a word is not a 100% trustworthy indicator of the value that should be propagated to its synonyms. Problems are typically observed in the middle area between the positive and the negative seed words, i.e. typically several hops away from any seed word. While other works have introduced fixes to this problem (e.g. by introducing neutral seed words as described above), we take a different approach to the solution. The implication of the inaccuracy introduced between two neighbouring synonyms in the word graph is that the value propagated from a seed word should have high trustworthiness and thus high weight in its close proximity in the word graph. Similarly, the value should be weighted comparably lower at longer synonym-hops away from the seed word. To account for this effect, we propose a novel approach that combines the ideas from Label Propagation and the Tidal Trust algorithm, and focusing on the shortest path between a word and the seed words. This is described in the following. Assume that a word w is related to a set of seed words S in the graph. Then the sentiment score score(w) for the word w is defined as: ! Ws→w · (−1)N (s→w) score(s) ! , (1) score(w) = s∈S N (s→w) s∈S Ws→w · (−1) – N (s → w) denotes the number of negative edges along the shortest path from seed s to node w. – Ws→w is a weighting function. The weight function should capture the idea that: the closer a seed to the word, the higher its associated weight. In this sense, the weight of a seed should decay as the distance to the word increases. – score(s) is the score of the seed that we manually annotated.
The above rules describe a multiplication operation exemplified by the phrase the enemy of my enemy is my friend. For example, a chain of three positive edges exemplify the principle that the friend of my friend is my friend, whereas those with one positive and two negative edges capture the notions that the friend of my enemy is my enemy, the enemy of my friend is my enemy, and the enemy of my enemy is my friend. An even number of antonyms will preserve the polarity of the seed word when transferred to the target word to be annotated since (−1)N (s→w) will be equal to 1 in this case. While an odd number of antonyms will invert the polarity of the seed word when transferred to the target word to be annotated since (−1)N (s→w) will be equal to −1 in this case. Furthermore, we borrow ideas from Trust theory so that to account for decrease in ”score certainty” as the distance between the seed and the word to be annotated increases.
A Simple and Efficient Algorithm for Lexicon Generation
341
Different Variants of Weighting Functions Let ls→w denotes the path length from s to w in terms of number of hopes. We shall propose different weighting function that captures the idea that the weight decreases as the distance from seed word increases. Multiplicative Decay Ws→w = αls→w where α denotes a factor that controls the decaying such as α < 1. In the experimental results, we tried different values of α, namely, 0.9, 0.7, exp(−0.5) = 0.61, 0.5 and exp(−1) = 0.37. Inversely Proportional to Distance A possible variant: the weight is 1 inversely proportional to the distance from the seed. There Ws→w = ls→w Blocking Contribution based on a Distance Threshold We propose a variant of our algorithm where only shortest paths that are shorter than a certain distance lmin are considered. The motivation behind this variant is to block the contribution from far away seeds. In the experimental results Section, we shall report result for lmin=4 and lmin = 3 . The reader should note that this simple idea is also present in Tidal trust algorithm.
3 3.1
Linguistic Resources and Benchmark Lexicons for Evaluation Building a Word Graph from Thesauruses
We built a large undirected graph of synonym and antonym relations between words from the three thesauruses. The words were nodes in the graph and synonym and antonym relations were edges. The graph where buildt from three norwegian thesauruses [17]. The full graph consisted of a total of 6036 nodes (words) and 16475 edges. For all lexicons generated from the word graph, we started with a set of 109 seed words (51 positive and 57 negative). The words were manually selected where all used frequantly in the norwegian language and spanned different dimensions of positve (’happy’, ’clever’, ’intelligent’, ’love’ etc.) and negative sentiment (’lazy’, ’aggressive’, ’hopeless’, ’chaotic’ etc.). 3.2
Benchmark Lexicons
We generated sentiment lexicons from the word graph using the Label Propagation algorithm [1] which is the most popular method to generate sentiment lexicons. The Label Propagation algorithm initial phase consists of giving each positive and negative seed a word score 1 and −1, respectively. All other nodes in the graph are given score 0. The algorithm propagates through each non-seed words updating the score using a weighted average of the scores of all neighboring nodes (connected with an edge). When computing the weighted average, synonym and antonym edges are given weights 1 and −1, respectively. The algorithm is iterated to changes in score is below some threshold for all nodes. The resulting score for each node is our sentiment lexicon. As a complement to the graph-generated lexicons, we generated a benchmark sentiment lexicon by translating the well-known English sentiment lexicon
342
A. Yazidi et al.
AFINN [18] to Norwegian using machine translation (Google translate). We also generated a second lexicon by manually checking and correcting several different errors from the machine translation. 3.3
Text Resources for Evaluation
We tested the quality of the created sentiment lexicons using 15118 product reviews from the Norwegian online shopping sites www.komplett.no, mpx.no and 4149 movie reviews from www.filmweb.no. Each product review contained a rating from one to five, five being the best and the movie reviews a rating from one to six, six being the best.
4
Evaluation of Novel Approach Against Benchmark Lexicons
We generated a sentiment lexicon by applying our novel approach in Section 2 to the same word graph used for the Label Propagation benchmark lexicon, and by using the same seed words. The quality of this lexicon was evaluated by comparing it against the quality of the different benchmark sentiment lexicons, including the lexicon generated from Label Propagation. The quality of a sentiment lexicon is measured as the classification performance of the www.komplett.no, mpx.no and www.filmweb.no reviews. For each lexicon, we computed the sentiment score of a review by simply adding the score of each sentiment word in a sentiment lexicon together, which is the most common way to do it [19]. If the sentiment shifter ’not’ (’ikke’) was one or two words in front of a sentiment word, sentiment score was switched. E.g. ’happy’ (’glad’) is given sentient score 0.8, while ’not happy’ (’ikke glad’) is given score −0.8. Finally the sum is divided by the number of words in the review, giving us the final sentiment score for the review. We also considered other sentiment shifter, e.g. ’never’ (’aldri’), and other distance between sentiment word and shifter, but our approach seems to be the best for such lexicon approaches in Norwegian [20]. Classification Method We divided the reviews in two parts, one part being training data and the other part for testing. We used the training data to estimate the average sentiment score of all reviews related to the different ratings. The computed scores could look like Table 1. We classified a review from the Table 1. Average computed sentiment score for reviews with different ratings Rating 1 2 3 4 5 Average sentiment score −0.23 −0.06 0.04 0.13 0.24
A Simple and Efficient Algorithm for Lexicon Generation
343
test set using the sentiment lexicon to compute a sentiment score for the test review and classify to the closest average sentiment score from the training set. E.g. if the computed sentiment score for the test review was −0.05 and estimated averages were as given in Table 1, the review was classified to rating 2. In some rare cases the estimated average sentiment score was not monotonically increasing with the rating. Table 2 shows an example where the average for rating 3, is higher than for the rating 4. For such cases, the average of the two Table 2. Example were sentiment score were not monotonically increasing with rating 1 2 3 4 5 Rating Average sentiment score −0.23 −0.06 0.18 0.10 0.24
sentiment scores were computed, (0.10 + 0.18)/2 = 0.14, and classified to 3 or 4 if the computed sentiment score of the test review was below or above 0.14, respectively. Classification Performance We evaluated the classification performance using average difference in absolute value between the true and predicted rating for each review in the test set n
Average abs. error =
1" |pi − ri | n i=1
(2)
where n is the number off reviews in the test set and pi and ri is the predicted and true rating of review i in the test set. Naturally a small average absolute error showing that the sentiment lexicon performed well. Note that the focus in this article is not to do a best possible classification performance based on the training material. If that was our goal, other more advanced and sophisticated techniques would be used, such as machine learning based techniques. Our goal is rather to evaluate and compare the performance of sentiment lexicons, and the framework described above is chosen with respect to that. Classification Performance We evaluated the classification performance using average difference in absolute value between the true and predicted rating based on sentiment lexicons. For details, see [17].
5
Results
This section presents the results of classification performance on product and movie reviews for different sentiment lexicons. The results are shown in Tables 3 and 4. AFINN and AFINN M refer to the translated and manually adjusted
344
A. Yazidi et al.
AFINN sentiment lists. LABEL refers to the Label propagation algorithm. The α values refer to the base in the multiplicative decay approach and finally TRESH refers to the blocking approach with distance threshold. And finally INVERSE refers to the inverse proportional method. Training and test sets were created by Table 3. Classification performance for sentiment lexicons on komplett.no and mpx.no product reviews. The columns from left to right show the sentiment lexicon names, the number of words in the sentiment lexicons, mean absolute error with standard deviaton and 95% confidence intervals for mean absolute error. AFINN AFINN M α = 0.37 α = 0.7 INVERSE α = 0.61 α = 0.5 α = 0.9 TRESH 3 TRESH 4 LABEL
N Mean (Stdev) 95% conf.int. 2161 1.4 (1.39) (1.37, 1.43) 2260 1.45 (1.33) (1.42, 1.48) 6036 1.59 (1.47) (1.55, 1.62) 6036 1.59 (1.44) (1.56, 1.63) 6036 1.6 (1.45) (1.56, 1.63) 6036 1.6 (1.46) (1.57, 1.64) 6036 1.61 (1.47) (1.57, 1.64) 6036 1.61 (1.47) (1.58, 1.64) 6036 1.64 (1.5) (1.6, 1.67) 6036 1.64 (1.49) (1.6, 1.67) 6036 1.67 (1.49) (1.63, 1.7)
randomly adding an equal amount of reviews to both sets. All sentiment lexicons were trained and tested on the same training and test sets, making comparisons easier. This procedure were also repeated several times and every time the results were in practice identical to the results in Tables 3 and 4, documenting that the results are independent of which reviews that were added to the training and test sets. For both review types we observe some variations in classification performance ranging from 1.39 to 1.67 for product reviews and from 1.98 to 2.25 for movie reviews. Comparing Tables 3 and 4, we see that the classification performance is poorer for movie reviews than for product reviews. It is known from the literature that sentiment analysis of movie reviews is normally harder than product reviews [19]. E.g. movie reviews typically contain a summary of the plot of the movie which could contain many negative sentiment words (sad movie), but still the movie can get an excellent rating. For both review types we see that the sentiment lexicon with a rapid multiplicative decay (α = 0.37) performs best among the automatically generated lists which indicates that the quality of the information in the graph decays rapidly. It performs significantly better than the Label propagation algorithm for both review types. Paired T -tests result in p-values = 1.32 · 10−4 and 0.018. Overall the sentiment lexicons generated from Norwegian thesauruses using state-of-the-art graph methods do not outperform the automatically translated AFINN-lists. This shows that machine translation of linguistic resources in English can be used successfully in other smaller languages like Norwegian language.
A Simple and Efficient Algorithm for Lexicon Generation
345
Table 4. Classification performance for sentiment lexicons on filmweb.no movie reviews. The columns from left to right show the sentiment lexicon names, the number of words in the sentiment lexicons, mean absolute error with standard deviaton and 95% confidence intervals for mean absolute error. α = 0.37 AFINN M α = 0.5 AFINN LABEL INVERSE TRESH 4 α = 0.7 α = 0.61 TRESH 3 α = 0.9
6
N Mean (Stdev) 95% conf.int. 6036 1.98 (1.15) (1.93, 2.03) 2260 2.00 (1.12) (1.96, 2.05) 6036 2.03 (1.15) (1.98, 2.08) 2161 2.06 (1.14) (2.01, 2.11) 6036 2.06 (1.1) (2.02, 2.11) 6036 2.1 (1.12) (2.05, 2.15) 6036 2.14 (1.18) (2.09, 2.19) 6036 2.16 (1.16) (2.11, 2.21) 6036 2.2 (1.15) (2.15, 2.25) 6036 2.21 (1.14) (2.16, 2.26) 6036 2.25 (1.13) (2.2, 2.3)
Conclusion
In this paper, we have presented a simple but efficient algorithm for sentiment analysis that is based on elements from the Structural Balance Theory and Trust theory. In the same vein as Structural Balance Theory, the rational of our algorithm is to enable transitive transfer of polarity from seed words through positive and negative relationships, which are synonyms and antonyms in this case. Furthermore, we borrow ideas from Trust theory so that to account for decrease in ”score certainty” as the distance between the seed and the word to be annotated increases. Thus, our algorithm possesses plausible properties that circumvent a major drawback of Label Propagation, namely the fact that nodes far away from the seed words (typically in the close-to-zero-value area in the word graph) get values that are incorrect since the inaccuracy in the synonym-relationship is not taken properly into account. In addition, our algorithm requires only a single run while Label Propagation should run iteratively until convergence of the scores. We conducted extensive experimental results on real life data that demonstrate the feasibility of our approach and its superiority to Label Propagation. A promising research direction that we are currently investigation is to adopt our algorithm as an alternative to Label Propagation in other Semi-Supervised Machine Learning problems.
References 1. Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report, Technical Report CMU-CALD-02-107, Carnegie Mellon University (2002) 2. Hu, M., Liu, B.: Mining opinion features in customer reviews. In: Proceedings of AAAI, pp. 755–760 (2004)
346
A. Yazidi et al.
3. Mohammad, S., Dunne, C., Dorr, B.: Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 599–608. Association for Computational Linguistics (2009) 4. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 168–177 (2004) 5. Kamps, J., Marx, M., Mokken, R.J., De Rijke, M.: Using wordnet to measure semantic orientations of adjectives (2004) 6. Miller, G.A.: Wordnet: a lexical database for english. Communications of the ACM 38(11), 39–41 (1995) 7. Blair-Goldensohn, S., Hannan, K., McDonald, R., Neylon, T., Reis, G.A., Reynar, J.: Building a sentiment summarizer for local service reviews. In: WWW Workshop on NLP in the Information Explosion Era, p. 14 (2008) 8. Rao, D., Ravichandran, D.: Semi-supervised polarity lexicon induction. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 675–682. Association for Computational Linguistics (2009) 9. Blum, A., Lafferty, J., Rwebangira, M.R., Reddy, R.: Semi-supervised learning using randomized mincuts. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 13. ACM (2004) 10. Hassan, A., Radev, D.: Identifying text polarity using random walks. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 395–403. Association for Computational Linguistics (2010) 11. Kim, S.M., Hovy, E.: Automatic identification of pro and con reasons in online reviews. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 483–490. Association for Computational Linguistics (2006) 12. Andreevskaia, A., Bergler, S.: Mining wordnet for a fuzzy sentiment: sentiment tag extraction from wordnet glosses. In: EACL, vol. 6, pp. 209–215 (2006) 13. Hassan, A., Abu-Jbara, A., Jha, R., Radev, D.: Identifying the semantic orientation of foreign words. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 592–597. Association for Computational Linguistics (2011) 14. Leskovec, J., Huttenlocher, D., Kleinberg, J.: Predicting positive and negative links in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, pp. 641–650. ACM (2010) 15. Kunegis, J., Lommatzsch, A., Bauckhage, C.: The slashdot zoo: mining a social network with negative edges. In: Proceedings of the 18th International Conference on World Wide Web, pp. 741–750. ACM (2009) 16. Golbeck, J.A.: Computing and Applying Trust in Web-based Social Networks. PhD thesis, College Park, MD, USA (2005) AAI3178583 17. Hammer, H., Bai, A., Yazidi, A., Engelstad, P.: Building sentiment lexicons applying graph theory on information from three norwegian thesauruses. In: Norweian Informatics Conference (2014) 18. Nielsen, F.˚ A.: A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. CoRR abs/1103.2903 (2011)
A Simple and Efficient Algorithm for Lexicon Generation
347
19. Bing, L.: Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. Springer (2011) 20. Hammer, H.L., Solberg, P.E., Øvrelid, L.: Sentiment classification of online political discussions: a comparison of a word-based and dependency-based method. In: Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 90–96. Association for Computational Linguistics (2014)