(SA): Using a ranking algorithm and informal features to solve ...

15 downloads 29894 Views 563KB Size Report
Jun 14, 2013 - Related to assessment Sentiment Analysis (SA) ... processed by NLP tools and applications. ..... The best results were obtained with the.
UMCC_DLSI-(SA): Using a ranking algorithm and informal features to solve Sentiment Analysis in Twitter Yoan Gutiérrez, Andy González, Roger Pérez, José I. Abreu University of Matanzas, Cuba {yoan.gutierrez, roger.perez ,jose.abreu}@umcc.cu, [email protected]

Antonio Fernández Orquín, Franc Camara Alejandro Mosquera, Andrés Independent Consultant USA Montoyo, Rafael Muñoz University of Alicante, Spain [email protected], {amosquera, montoyo, rafael}@dlsi.ua.es

Adjectives 1 ) NTCIR (Multilingual Opinion Analysis Task (MOAT 2 )) TASS 3 (Workshop on Sentiment Analysis at SEPLN workshop) and Semeval-2013 (Task 2 4 Sentiment Analysis in Twitter) (Kozareva et al., 2013). In this paper, we introduce a system for Task 2 b) of the Semeval-2013 competition.

Abstract In this paper, we describe the development and performance of the supervised system UMCC_DLSI-(SA). This system uses corpora where phrases are annotated as Positive, Negative, Objective, and Neutral, to achieve new sentiment resources involving word dictionaries with their associated polarity. As a result, new sentiment inventories are obtained and applied in conjunction with detected informal patterns, to tackle the challenges posted in Task 2b of the Semeval2013 competition. Assessing the effectiveness of our application in sentiment classification, we obtained a 69% F-Measure for neutral and an average of 43% F-Measure for positive and negative using Tweets and SMS messages.

1

info@franccamara .com

Introduction

Textual information has become one of the most important sources of data to extract useful and heterogeneous knowledge from. Texts can provide factual information, such as: descriptions, lists of characteristics, or even instructions to opinionbased information, which would include reviews, emotions, or feelings. These facts have motivated dealing with the identification and extraction of opinions and sentiments in texts that require special attention. Many researchers, such as (Balahur et al., 2010; Hatzivassiloglou et al., 2000; Kim and Hovy, 2006; Wiebe et al., 2005) and many others have been working on this and related areas. Related to assessment Sentiment Analysis (SA) systems, some international competitions have taken place. Some of those include: Semeval-2010 (Task 18: Disambiguating Sentiment Ambiguous

1.1 Task 2 Description In participating in “Task 2: Sentiment Analysis in Twitter” of Semeval-2013, the goal was to take a given message and its topic and classify whether it had a positive, negative, or neutral sentiment towards the topic. For messages conveying, both a positive and negative sentiment toward the topic, the stronger sentiment of the two would end up as the classification. Task 2 included two sub-tasks. Our team focused on Task 2 b), which provides two training corpora as described in Table 3, and two test corpora: 1) sms-test-input-B.tsv (with 2094 SMS) and 2) twitter-test-input-B.tsv (with 3813 Twit messages). The following section shows some background approaches. Subsequently, in section 3, we describe the UMCC_DLSI-(SA) system that was used in Task 2 b). Section 4 describes the assessment of the obtained resource from the Sentiment Classification task. Finally, the conclusion and future works are presented in section 5.

2

Background

The use of sentiment resources has proven to be a necessary step for training and evaluating systems that implement sentiment analysis, which also 1 http://semeval2.fbk.eu/semeval2.php 2

http://research.nii.ac.jp/ntcir/ntcir-ws8/meeting/ http://www.daedalus.es/TASS/ 4http://www.cs.york.ac.uk/semeval-2013/task2/ 3

443 Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Seventh International Workshop on Semantic c Evaluation (SemEval 2013), pages 443–449, Atlanta, Georgia, June 14-15, 2013. 2013 Association for Computational Linguistics

include fine-grained opinion mining (Balahur, 2011). In order to build sentiment resources, several studies have been conducted. One of the first is the relevant work by (Hu and Liu, 2004) using lexicon expansion techniques by adding synonymy and antonym relations provided by WordNet (Fellbaum, 1998; Miller et al., 1990) Another one is the research described by (Hu and Liu, 2004; Liu et al., 2005) which obtained an Opinion Lexicon compounded by a list of positive and negative opinion words or sentiment words for English (around 6800 words). A similar approach has been used for building WordNet-Affect (Strapparava and Valitutti, 2004) which expands six basic categories of emotion; thus, increasing the lexicon paths in WordNet. Nowadays, many sentiment and opinion messages are provided by Social Media. To deal with the informalities presented in these sources, it is necessary to have intermediary systems that improve the level of understanding of the messages. The following section offers a description of this phenomenon and a tool to track it. 2.1 Text normalization Several informal features are present in opinions extracted from Social Media texts. Some research has been conducted in the field of lexical normalization for this kind of text. TENOR (Mosquera and Moreda, 2012) is a multilingual text normalization tool for Web 2.0 texts with an aim to transform noisy and informal words into their canonical form. That way, they can be easily processed by NLP tools and applications. TENOR works by identifying out-of-vocabulary (OOV) words such as slang, informal lexical variants, expressive lengthening, or contractions using a dictionary lookup and replacing them by matching formal candidates in a word lattice using phonetic and lexical edit distances. 2.2

Construction of our own Sentiment Resource Having analyzed the examples of SA described in section 2, we proposed building our own sentiment resource (Gutiérrez et al., 2013) by adding lexical and informal patterns to obtain classifiers that can deal with Task 2b of Semeval-2013. We proposed the use of a method named RA-SR (using Ranking Algorithms to build Sentiment Resources) 444

(Gutiérrez et al., 2013) to build sentiment word inventories based on senti-semantic evidence obtained after exploring text with annotated sentiment polarity information. Through this process, a graph-based algorithm is used to obtain auto-balanced values that characterize sentiment polarities, a well-known technique in Sentiment Analysis. This method consists of three key stages: (I) Building contextual word graphs; (II) Applying a ranking algorithm; and (III) Adjusting the sentiment polarity values. These stages are shown in the diagram in Figure 1, which the development of sentimental resources starts off by giving four corpora of annotated sentences (the first with neutral sentences, the second with objective sentences, the third with positive sentences, and the last with negative sentences). Phrase 1 W1 W2 W3 W4 Positve Phrases Phrase 2 W1 W5 W3 W2

Phrase 1 W5

W6

W8

W9

Phrase 2 W6

W8

W9

W7

Phrase 3 W3 W4 W5 W6 W7

Phrase 3 W6

W9

W10 W11

(I)

Weight =1 W1

W2

Positive Words

W5

W8

Weight =1

W6

Negative Words

W9

W7

W7

W10

(II) Reinforcing words W1 Weight =1 Weight = 1 Default Weight = 1/N Default Weight = 1/N

(II) W1

W8

W5

W4

W6

W1

(I)

Weight =1

W3

Negative Phrases

W11

(II)

W2

W3

W4

W5

W6

W7

W8

W9

W10

W11

W9

W10

W11

(III) W1

W2

W3

W4

W5

W6

W7

W8

Phrase 1 W1 W2 W3 W4 Neutral Phrases Phrase 2 W1 W5 W3 W2

Phrase 1 W1

W6

W7

W8

Phrase 2 W5

W8

W7

W3

Phrase 3 W3 W1 W2 W4 W5

Phrase 3 W6

W8

W7

W5

(I) W1

W2

W1

W4

W2

W2

W3

W5

W8

W6

W7

Defa ult Weight = 1/N

Defa ult Weight = 1/N

(II) W1

W5

(I) W3

W5

Objective Phrases

(II) W2

W3

W4

W5

W6

W7

W8

Figure 1. Resource walkthrough development process.

2.3 Building contextual word graphs Initially, text preprocessing is performed by applying a Post-Tagging tool (using Freeling (Atserias et al., 2006) tool version 2.2 in this case) to convert all words to lemmas 5 . After that, all obtained lists of lemmas are sent to RA-SR, then divided into four groups: neutral, objective, positive, and negative candidates. As the first set 5

Lemma denotes canonic form of the words.

of results, four contextual graphs are obtained: 𝐺𝑛𝑒𝑢, 𝐺𝑜𝑏𝑗 , 𝐺𝑝𝑜𝑠, and 𝐺𝑛𝑒𝑔 , where each graph includes the words/lemmas from the neutral, objective, positive and negative sentences respectively. These graphs are generated after connecting all words for each sentence into individual sets of annotated sentences in concordance with their annotations ( 𝑃𝑜𝑠 , 𝑁𝑒𝑔 , 𝑂𝑏𝑗, 𝑁𝑒𝑢 ). Once the four graphs representing neutral, objective, positive and negative contexts are created, we proceed to assign weights to apply graph-based ranking techniques in order to autobalance the particular importance of each vertex 𝑣𝑖 into 𝐺𝑛𝑒𝑢, 𝐺𝑜𝑏𝑗, 𝐺𝑝𝑜𝑠 and 𝐺𝑛𝑒𝑔. As the primary output of the graph-based ranking process, the positive, negative, neutral, and objective values are calculated using the PageRank algorithm and normalized with equation (1). For a better understanding of how the contextual graph was built see (Gutiérrez et al., 2013). 2.4 Applying a ranking algorithm To apply a graph-based ranking process, it is necessary to assign weights to the vertices of the graph. Words involved into 𝐺𝑛𝑒𝑢, 𝐺𝑜𝑏𝑗, 𝐺𝑝𝑜𝑠 and 𝐺𝑛𝑒𝑔 take the default of 1/N as their weight to define the weight of 𝑣 vector, which is used in our proposed ranking algorithm. In the case where words are identified on the sentiment repositories (see Table 4) as positive or negative, in relation to their respective graph, a weight value of 1 (in a range [0 … 1] ) is assigned. 𝑁 represents the maximum quantity of words in the current graph. After that, a graph-based ranking algorithm is applied in order to structurally raise the graph vertexes’ voting power. Once the reinforcement values are applied, the proposed ranking algorithm is able to increase the significance of the words related to these empowered vertices. The PageRank (Brin and Page, 1998) adaptation, which was popularized by (Agirre and Soroa, 2009) in Word Sense Disambiguation thematic, and which has obtained relevant results, was an inspiration to us in our work. The main idea behind this algorithm is that, for each edge between 𝑣 i and 𝑣 j in graph 𝐺, a vote is made from 𝑣 i to 𝑣 j. As a result, the relevance of 𝑣 j is increased. On top of that, the vote strength from 𝑖 to 𝑗 depends on 𝑣𝑖 ′𝑠 relevance. The philosophy behind

445

it is that, the more important the vertex is, the more strength the voter would have. Thus, PageRank is generated by applying a random walkthrough from the internal interconnection of 𝐺, where the final relevance of 𝑣𝑖 represents the random walkthrough probability over 𝐺 , and ending on 𝑣𝑖 . In our system, we apply the following configuration: dumping factor 𝑐 = 0.85 and, like in (Agirre and Soroa, 2009) we used 30 iterations. A detailed explanation about the PageRank algorithm can be found in (Agirre and Soroa, 2009) After applying PageRank, in order to obtain standardized values for both graphs, we normalize the rank values by applying the equation (1), where 𝑀𝑎𝑥(𝐏𝐫) obtains the maximum rank value of 𝑷𝒓 vector (rankings’ vector). 𝐏𝐫𝑖 = 𝐏𝐫𝑖 /𝑀𝑎𝑥(𝐏𝐫)

(1)

2.5 Adjusting the sentiment polarity values After applying the PageRank algorithm on𝐺𝑛𝑒𝑢, 𝐺𝑜𝑏𝑗 , 𝐺𝑝𝑜𝑠 and 𝐺𝑛𝑒𝑔 , having normalized their ranks, we proceed to obtain a final list of lemmas (named 𝐿𝑓 ) while avoiding repeated elements. 𝐿𝑓 is represented by 𝐿𝑓𝑖 lemmas, which would have, at that time, four assigned values: Neutral, Objective, Positive, and Negative, all of which correspond to a calculated rank obtained by the PageRank algorithm. At that point, for each lemma from 𝐿𝑓, the following equations are applied in order to select the definitive subjectivity polarity for each one: 𝑃𝑜𝑠 − 𝑁𝑒𝑔 ; 𝑃𝑜𝑠 > 𝑁𝑒𝑔 0 ; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑁𝑒𝑔 − 𝑃𝑜𝑠 ; 𝑁𝑒𝑔 > 𝑃𝑜𝑠 𝑁𝑒𝑔 = { 0 ; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑃𝑜𝑠 = {

(2) (3)

Where 𝑃𝑜𝑠 is the Positive value and 𝑁𝑒𝑔 the Negative value related to each lemma in 𝐿𝑓. In order to standardize again the 𝑃𝑜𝑠 and 𝑁𝑒𝑔 values and making them more representative in a [0…1] scale, we proceed to apply a normalization process over the 𝑃𝑜𝑠 and 𝑁𝑒𝑔 values. From there, based on the objective features commented by (Baccianella et al., 2010), we assume the same premise to establish an alternative objective value of the lemmas. Equation (4) is used for that: 𝑂𝑏𝑗𝐴𝑙𝑡 = 1 − |𝑃𝑜𝑠 − 𝑁𝑒𝑔|

Where 𝑂𝑏𝑗𝐴𝑙𝑡 objective value.

represents

(4)

the

alternative

As a result, each word obtained in the sentiment resource has an associated value of: positivity ( 𝑃𝑜𝑠 , see equation (2)), negativity ( 𝑁𝑒𝑔 , see equation (3)), objectivity (𝑅𝑒𝑎𝑙_𝑜𝑏𝑗, obtained by PageRank over 𝐺𝑜𝑏𝑗 and normalized with equation (1)), calculated-objectivity (𝑂𝑏𝑗𝐴𝑙𝑡, now cited as 𝑜𝑏𝑗_𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 ) and neutrality ( 𝑁𝑒𝑢 , obtained by PageRank over 𝐺𝑛𝑒𝑢 and normalized with equation (1)).

3

System Description

The system takes annotated corpora as input from which two models are created. One model is created by using only the data provided at Semeval-2013 (Restricted Corpora, see Table 3), and the other by using extra data from other annotated corpora (Unrestricted Corpora, see Table 3). In all cases, the phrases are preprocessed using Freeling 2.2 pos-tagger (Atserias et al., 2006) while a dataset copy is normalized using TENOR (described in section 2.1). The system starts by extracting two sets of features. The Core Features (see section 3.1) are the Sentiment Measures and are calculated for a standard and normalized phrase. The Support Features (see section 3.2) are based on regularities, observed in the training dataset, such as emoticons, uppercase words, and so on. The supervised models are created using Weka6 and a Logistic classifier, both of which the system uses to predict the values of the test dataset. The selection of the classifier was made after analyzing several classifiers such as: Support Vector Machine, J48 and REPTree. Finally, the Logistic classifier proved to be the best by increasing the results around three perceptual points. The test data is preprocessed in the same way the previous corpora were. The same process of feature extraction is also applied. With the aforementioned features and the generated models, the system proceeds to classify the final values of Positivity, Negativity, and Neutrality. 3.1 The Core Features The Core Features is a group of measures based on the resource created early (see section 2.2). The system takes a sentence preprocessed by Freeling 2.2 and TENOR. For each lemma of the analyzed sentence, 𝑝𝑜𝑠 , 𝑛𝑒𝑔 , 𝑜𝑏𝑗_𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 , 𝑟𝑒𝑎𝑙_𝑜𝑏𝑗, 6

http://www.cs.waikato.ac.nz/

446

and 𝑛𝑒𝑢 are calculated by using the respective word values assigned in RA-SR. The obtained values correspond to the sum of the corresponding values for each intersecting word between the analyzed sentence (lemmas list) and the obtained resource by RA-SR. Lastly, the aforementioned attributes are normalized by dividing them by the number of words involved in this process. Other calculated attributes are: 𝑝𝑜𝑠_𝑐𝑜𝑢𝑛𝑡 , 𝑛𝑒𝑔_𝑐𝑜𝑢𝑛𝑡 , 𝑜𝑏𝑗_𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑_𝑐𝑜𝑢𝑛𝑡 , 𝑜𝑏𝑗_𝑟𝑒𝑎𝑙_𝑐𝑜𝑢𝑛𝑡 and 𝑛𝑒𝑢_𝑐𝑜𝑢𝑛𝑡. These attributes count each involved iteration for each feature type ( 𝑃𝑜𝑠 , 𝑁𝑒𝑔 , 𝑅𝑒𝑎𝑙_𝑜𝑏𝑗 , 𝑂𝑏𝑗𝐴𝑙𝑡 and 𝑁𝑒𝑢 respectively, where the respective value may be greater than zero. Attributes 𝑐𝑛𝑝 and cnn are calculated by counting the amount of lemmas in the phrases contained in the Sentiment Lexicons (Positive and Negative respectively). All of the 12 attributes described previously are computed for both, the original, and the normalized (using TENOR) phrase, totaling 24 attributes. The Core features are described next. Feature Name

Description

𝒑𝒐𝒔 𝒏𝒆𝒈 𝒐𝒃𝒋_𝒎𝒆𝒂𝒔𝒖𝒓𝒆𝒅 𝒓𝒆𝒂𝒍_𝒐𝒃𝒋 𝒏𝒆𝒖 𝒑𝒐𝒔_𝒄𝒐𝒖𝒏𝒕 𝒏𝒆𝒈_𝒄𝒐𝒖𝒏𝒕 𝒐𝒃𝒋_𝒎𝒆𝒂𝒔𝒖𝒓𝒆𝒅_𝒄𝒐𝒖𝒏𝒕 𝒓𝒆𝒂𝒍_𝒐𝒃𝒋_𝒄𝒐𝒖𝒏𝒕 𝒏𝒆𝒖_𝒄𝒐𝒖𝒏𝒕 𝒄𝒏𝒑 (to positive) 𝒄𝒏𝒏 (to negative)

Sum of respective value of each word.

Counts the words where its respective value is greater than zero Counts the words contained in the Sentiment Lexicons for their respective polarities.

Table 1. Core Features

3.2 The Support Features The Support Features is a group of measures based on characteristics of the phrases, which may help with the definition on extreme cases. The emotPos and emotNeg values are the amount of Positive and Negative Emoticons found in the phrase. The exc and itr are the amount of exclamation and interrogation signs in the phrase. The following table shows the attributes that represent the support features: Feature Name 𝒆𝒎𝒐𝒕𝑷𝒐𝒔 𝒆𝒎𝒐𝒕𝑵𝒆𝒈 𝒆𝒙𝒄 (exclamation marks (“!”)) 𝒊𝒕𝒓 (question marks (“?”)) 𝑾𝑶𝑹𝑫𝑺_𝒄𝒐𝒖𝒏𝒕 𝑾𝑶𝑹𝑫𝑺_𝒑𝒐𝒔 𝑾𝑶𝑹𝑫𝑺_𝒏𝒆𝒈 𝑾𝑶𝑹𝑫𝑺_𝒑𝒐𝒔_𝒄𝒐𝒖𝒏𝒕_𝒓𝒆𝒔 (to

Description Counts the respective Emoticons Counts the respective marks Counts the uppercase words Sums the respective values of the Uppercase words Counts the Uppercase words

positivity) 𝑾𝑶𝑹𝑫𝑺_𝒏𝒆𝒈_𝒄𝒐𝒖𝒏𝒕_𝒓𝒆𝒔(to negativity) 𝑾𝑶𝑹𝑫𝑺_𝒑𝒐𝒔_𝒄𝒐𝒖𝒏𝒕_𝒅𝒊𝒄𝒕 (to positivity) 𝑾𝑶𝑹𝑫𝑺_𝒏𝒆𝒈_𝒄𝒐𝒖𝒏𝒕_𝒅𝒊𝒄𝒕 (to negativity) 𝒘𝒐𝒐𝒐𝒓𝒅𝒔_𝒄𝒐𝒖𝒏𝒕

contained in their respective Graph Counts the Uppercase words contained in the Sentiment Lexicons 7 for their respective polarity Counts the words with repeated chars Sums the respective values of the words with repeated chars Counts the words with repeated chars contained in the respective lexical resource

𝒘𝒐𝒐𝒐𝒓𝒅𝒔_𝒑𝒐𝒔 𝒘𝒐𝒐𝒐𝒓𝒅𝒔_𝒏𝒆𝒈 𝒘𝒐𝒐𝒐𝒓𝒅𝒔_𝒏𝒆𝒈_𝒄𝒐𝒖𝒏𝒕_𝒅𝒊𝒄𝒕 (in negative lexical resource ) 𝒘𝒐𝒐𝒐𝒓𝒅𝒔_𝒑𝒐𝒔_𝒄𝒐𝒖𝒏𝒕_𝒅𝒊𝒄𝒕 (in positive lexical resource ) 𝒘𝒐𝒐𝒐𝒓𝒅𝒔_𝒑𝒐𝒔_𝒄𝒐𝒖𝒏𝒕_𝒓𝒆𝒔 (in Counts the words with repeated positive graph ) chars contained in the respective 𝒘𝒐𝒐𝒐𝒓𝒅𝒔_𝒏𝒆𝒈_𝒄𝒐𝒖𝒏𝒕_𝒓𝒆𝒔 (in graph negative graph )

Table 2. The Support Features

4

Evaluation

In the construction of the sentiment resource, we used the annotated sentences provided by the corpora described in Table 3. The resources listed in Table 3 were selected to test the functionality of the words annotation proposal with subjectivity and objectivity. Note that the shadowed rows correspond to constrained runs corpora: tweeti-bsub.dist_out.tsv 8 (dist), b1_tweeti-objorneub.dist_out.tsv 9 (objorneu), twitter-dev-inputB.tsv10 (dev). The resources from Table 3 that include unconstrained runs corpora are: all the previously mentioned ones, Computational-intelligence11 (CI) and stno12 corpora. The used sentiment lexicons are from the WordNetAffect_Categories 13 and opinion-words 14 files as shown in detail in Table 4. Some issues were taken into account throughout this process. For instance, after obtaining a contextual graph 𝐺, factotum words are present in most of the involved sentences (i.e., verb “to be”). This issue becomes very dangerous after applying the PageRank algorithm because the algorithm 7

Resources described in Table 4. (Task 2. Sentiment Analysis in Twitter, subtask b). 9Semeval-2013 (Task 2. Sentiment Analysis in Twitter, subtask b). 10 http://www.cs.york.ac.uk/semeval-2013/task2/ 11A sentimental corpus obtained applying techniques developed by GPLSI department. See (http://gplsi.dlsi.ua.es/gplsi11/allresourcespanel) 12NTCIR Multilingual Opinion Analysis Task (MOAT) http://research.nii.ac.jp/ntcir/ntcir-ws8/meeting/ 13 http://wndomains.fbk.eu/wnaffect.html 14 http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html 8Semeval-2013

strengthens the nodes possessing many linked elements. For that reason, the subtractions 𝑃𝑜𝑠 − 𝑁𝑒𝑔 and 𝑁𝑒𝑔 − 𝑃𝑜𝑠 are applied, where the most frequent words in all contexts obtain high values. The subtraction becomes a dumping factor. As an example, when we take the verb “to be”, before applying equation (1), the verb achieves the highest values in each subjective context graph ( 𝐺𝑝𝑜𝑠 and 𝐺𝑛𝑒𝑔) namely, 9.94 and 18.67 rank values respectively. These values, once equation (1) is applied, are normalized obtaining both 𝑃𝑜𝑠 = 1 and 𝑁𝑒𝑔 = 1 in a range [0...1]. At the end, when the following steps are executed (Equations (2) and (3)), the verb “to be” achieves 𝑃𝑜𝑠 = 0 , 𝑁𝑒𝑔 = 0 and therefore 𝑂𝑏𝑗𝐴𝑙𝑡 = 1 . Through this example, it seems as though we subjectively discarded words that appear frequently in both contexts (Positive and Negative). Obj or Neu dist 176 368 110 34 objorneu 828 1972 788 1114 1045 dev 340 575 739 CI 6982 6172 stno15 1286 660 384 T 9272 9172 898 1532 1045 Corpus

N

P

O

Neu

C UC

Unk

T

10000 10000

688 X X 5747 X X 1654 X X 13154 X 12330 X 31919

Table 3. Corpora used to apply RA-SR. Positive (P), Negative (N), Objective (Obj/O), Unknow (Unk), Total (T), Constrained (C), Unconstrained (UC). Sources WordNet-Affects_Categories (Strapparava and Valitutti, 2004) opinion-words (Hu and Liu, 2004; Liu et al., 2005) Total

P 629

N T 907 1536

2006

4783

6789

2635

5690

8325

Table 4. Sentiment Lexicons. Positive (P), Negative (N) and Total (T). Precision (%) C Inc P N Neu Run1 8032 1631 80,7 83,8 89,9 Run2 19101 4671 82,2 77,3 89,4

Recall (%) P N Neu 90,9 69,5 86,4 80,7 81,9 82,3

Total (%) Prec Rec F1 84,8 82,3 82,9 83,0 81,6 80,4

Table 5. Training dataset evaluation using crossvalidation (Logistic classifier (using 10 folds)). Constrained (Run1), Unconstrained (Run2), Correct(C), Incorrect (Inc).

4.1 The training evaluation In order to assess the effectiveness of our trained classifiers, we performed some evaluation tests. Table 5 shows relevant results obtained after applying our system to an environment (specific domain). The best results were obtained with the 15

NTCIR Multilingual Opinion Analysis Task (MOAT) http://research.nii.ac.jp/ntcir/ntcir-ws8/meeting/

447

restricted corpus. The information used to increase the knowledge was not balanced or perhaps is of poor quality. 4.2 The test evaluation The test dataset evaluation is shown in Table 6, where system results are compared with the best results in each case. We notice that the constrained run is better in almost every aspect. In the few cases where it was lower, there was a minimal difference. This suggests that the information used to increase our Sentiment Resource was unbalanced (high difference between quantity of tagged types of annotated phrases), or was of poor quality. By comparing these results with the ones obtained by our system on the test dataset, we notice that on the test dataset, the results fell in the middle of the effectiveness scores. After seeing these results (Table 5 and Table 6), we assumed that our system performance is better in a controlled environment (or specific domain). To make it more realistic, the system must be trained with a bigger and more balanced dataset. Table 6 shows the results obtained by our system while comparing them to the best results of Task 2b of Semeval-2013. In Table 5, we can see the difference between the best systems. They are the ones in bold and underlined as target results. These results have a difference of around 20 percentage points. The grayed out ones correspond to our runs. Runs 1_tw 1_tw_cnd 2_tw 2_tw_ter 1_sms 1_sms_cnd 2_sms 2_sms_ava

C 2082 2767 2026 2565 1232 1565 1023 1433

Inc 1731 1046 1787 1248 862 529 1071 661

Precision (%) P N Neu 60,9 46,5 52,8 81,4 69,7 67,7 58,0 42,2 42,2 71,1 54,6 68,6 43,9 46,1 69,5 73,1 55,4 85,2 38,4 31,4 68,3 60,9 49,4 81,4

Recall (%) P N Neu 49,8 41,4 64,1 66,7 60,4 82,6 52,2 43,9 57,4 74,7 59,4 63,1 55,9 31,7 68,9 73,0 75,4 75,3 60,0 38,3 47,8 65,9 63,7 71,0

Total Prec Rec 53,4 51,8 72,9 69,9 47,4 51,2 64,8 65,7 53,2 52,2 71,2 74,5 46,0 48,7 63,9 66,9

F1 49,3 69,0 49,0 64,9 43,4 68,5 40,7 59,5

Table 6. Test dataset evaluation using official scores. Corrects(C), Incorrect (Inc).

    

Table 6 run descriptions are as follows: UMCC_DLSI_(SA)-B-twitter-constrained (1_tw), NRC-Canada-B-twitter-constrained (1_tw_cnd), UMCC_DLSI_(SA)-B-twitter-unconstrained (2_tw), teragram-B-twitter-unconstrained (2_tw_ter), UMCC_DLSI_(SA)-B-SMS-constrained (1_sms),

448

 

NRC-Canada-B-SMS-constrained (1_sms_cnd), UMCC_DLSI_(SA)-B-SMSunconstrained (2_sms), AVAYA-B-sms-unconstrained (2_sms_ava).

As we can see in the training and testing evaluation tables, our training stage offered more relevant scores than the best scores in Task2b (Semaval-2013). This means that we need to identify the missed features between both datasets (training and testing). For that reason, we decided to check how many words our system (more concretely, our Sentiment Resource) missed. Table 7 shows that our system missed around 20% of the words present in the test dataset. twitter sms twitter nonrepeat sms norepeat

hits 23807 12416 2426 1269

miss 1591 2564 863 322

miss (%) 6,26% 17,12% 26,24% 20,24%

Table 7. Quantity of words used by our system over the test dataset.

5

Conclusion and further work

Based on what we have presented, we can say that we could develop a system that would be able to solve the SA challenge with promising results. The presented system has demonstrated election performance on a specific domain (see Table 5) with results over 80%. Also, note that our system, through the SA process, automatically builds sentiment resources from annotated corpora. For future research, we plan to evaluate RA-SR on different corpora. On top of that, we also plan to deal with the number of neutral instances and finding more words to evaluate the obtained sentiment resource.

Acknowledgments This research work has been partially funded by the Spanish Government through the project TEXT-MESS 2.0 (TIN2009-13391-C04), "Análisis de Tendencias Mediante Técnicas de Opinión Semántica" (TIN2012-38536-C03-03) and “Técnicas de Deconstrucción en la Tecnologías del Lenguaje Humano” (TIN201231224); and by the Valencian Government through the project PROMETEO (PROMETEO/2009/199).

References Agirre, E. and A. Soroa. Personalizing PageRank for Word Sense Disambiguation. Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics (EACL2009), Athens, Greece, 2009. Atserias, J.; B. Casas; E. Comelles; M. González; L. Padró and M. Padró. FreeLing 1.3: Syntactic and semantic services in an opensource NLP library. Proceedings of LREC'06, Genoa, Italy, 2006. Baccianella, S.; A. Esuli and F. Sebastiani. SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. 7th Language Resources and Evaluation Conference, Valletta, MALTA., 2010. 2200-2204 p. Balahur, A. Methods and Resources for Sentiment Analysis in Multilingual Documents of Different Text Types. Department of Software and Computing Systems. Alacant, Univeristy of Alacant, 2011. 299. p.

linguistics/the 44th annual meeting of the association for computational linguistics (COLING/ACL 2006), Sydney, Australia, 2006. 1-8 p. Kozareva, Z.; P. Nakov; A. Ritter; S. Rosenthal; V. Stoyonov and T. Wilson. Sentiment Analysis in Twitter. in: Proceedings of the 7th International Workshop on Semantic Evaluation. Association for Computation Linguistics, 2013. Liu, B.; M. Hu and J. Cheng. Opinion Observer: Analyzing and Comparing Opinions on the Web. Proceedings of the 14th International World Wide Web conference (WWW-2005), Japan, 2005. Miller, G. A.; R. Beckwith; C. Fellbaum; D. Gross and K. Miller. Five papers on WordNet. Princenton University, Cognositive Science Laboratory, 1990. Mosquera, A. and P. Moreda. TENOR: A Lexical Normalisation Tool for Spanish Web 2.0 Texts. in: Text, Speech and Dialogue - 15th International Conference (TSD 2012). Springer, 2012.

Balahur, A.; E. Boldrini; A. Montoyo and P. MartinezBarco. The OpAL System at NTCIR 8 MOAT. Proceedings of NTCIR-8 Workshop Meeting, Tokyo, Japan., 2010. 241-245 p.

Strapparava, C. and A. Valitutti. WordNet-Affect: an affective extension of WordNet. Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, 2004. 10831086 p.

Brin, S. and L. Page The anatomy of a large-scale hypertextual Web search engine Computer Networks and ISDN Systems, 1998, 30(1-7): 107-117.

Wiebe, J.; T. Wilson and C. Cardie. Annotating Expressions of Opinions and Emotions in Language. Kluwer Academic Publishers, Netherlands, 2005.

Fellbaum, C. WordNet. An Electronic Lexical Database. University of Cambridge, 1998. p. The MIT Press. Gutiérrez, Y.; A. González; A. F. Orquín; A. Montoyo and R. Muñoz. RA-SR: Using a ranking algorithm to automatically building resources for subjectivity analysis over annotated corpora. 4th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA 2013), Atlanta, Georgia, 2013. Hatzivassiloglou; Vasileios and J. Wiebe. Effects of Adjective Orientation and Gradability on Sentence Subjectivity. International Conference on Computational Linguistics (COLING-2000), 2000. Hu, M. and B. Liu. Mining and Summarizing Customer Reviews. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), USA, 2004. Kim, S.-M. and E. Hovy. Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text. In Proceedings of workshop on sentiment and subjectivity in text at proceedings of the 21st international conference on computational

449

Suggest Documents