Sentence-Level Emotion Detection Framework Using

2 downloads 0 Views 2MB Size Report
vised and unsupervised emotion-based sentiment analysis are based on ...... use the Python Tweepy5 package to collect tweets about dif- ferent topics in two ...
Cogn Comput DOI 10.1007/s12559-017-9503-3

Sentence-Level Emotion Detection Framework Using Rule-Based Classification Muhammad Zubair Asghar 1 Hussain Ahmad 1

&

Aurangzeb Khan 2 & Afsana Bibi 1 & Fazal Masud Kundi 1 &

Received: 23 December 2016 / Accepted: 1 August 2017 # Springer Science+Business Media, LLC 2017

Abstract Emotion detection and analysis aims at developing applications that can detect and analyse emotions expressed by the users in a given text. Such applications have received considerable attention from experts in computer science, psychology, communications and health care. Emotion-based sentiment analysis can be performed using supervised and unsupervised techniques. The existing studies using supervised and unsupervised emotion-based sentiment analysis are based on Ekman’s basic emotion model; have limited coverage of emotion-words, polarity shifters and negations; and lack emoticons and slang. The problems associated with existing approaches can be overcome by the development of an effective, sentence-level emotion-detection sentiment analysis system under a rule-based classification scheme with extended lexicon support and an enhanced model of emotion signals: emotion words, polarity shifters, negations, emoticons and slang. In this work, we propose a rule-based framework for emotion-based sentiment classification at the sentence level obtained from user reviews. The main contribution of this work is to integrate cognitive-based emotion theory (e.g. Ekman’s model) with sentiment analysis-based computational techniques (e.g. detection of emotion words, emoticons and slang) to detect and classify emotions from natural language text. The main focus is to improve the performance of state-of-the-art methods by including additional emotionrelated signals, such as emotion words, emoticons, slang,

* Muhammad Zubair Asghar [email protected]

1

Institute of Computing and Information Technology (ICIT), Gomal University, Dera Ismail Khan, Pakistan

2

Department of Computer Science, University of Science and Technology, Bannu, Pakistan

polarity shifters and negations, to efficiently detect and classify emotions in user reviews. The improved results in terms of accuracy, precision, recall and F-measure demonstrate the superiority of the proposed method’s classification results compared with baseline methods. The framework is generalized and capable of classifying emotions in any domain. Keywords Opinion mining . Sentiment analysis . Emotion detection . Emoticons . Slang

Introduction Cognitive science is an interdisciplinary area of research with a focus on addressing different cognitive processes and intelligent human behaviours such as perception, thinking, remembering, learning, reasoning and emotions. Among these faculties, emotions are very important for identifying human social behaviours. Computational methods for human emotion analysis have been addressed by many researchers for quite some time [1, 2]. Emotion detection and analysis from online text is a new and challenging area in natural language processing (NLP) and sentiment analysis (SA), drawing considerable attention from NLP and SA researchers in recent times. This area is not only theoretical; rather, it has several applications in disciplines such as computer science, psychology, communications, health and education. Because text-based emotion detection is a merger of cognitive science and human neurology, it can successfully bridge the gap between the abstractions of cognitive science and the more emergent area of emotion detection from user’s reviews posted on social media sites [2, 3]. An emotion is a feeling of joy, sorrow, fear or hate expressed by a person in a textual, visual or spoken form. Emotions are closely related to mood, temper, personality

Cogn Comput

and incentive [2]. The basic aim of emotion-based SA systems is to detect and analyse emotions expressed by users in an online text with emphasis on assigning an accurate sentiment score to each emotion-bearing word and sentence [3]. An input text can be classified into a specific emotional category with the help of emotion signals present in the text [1, 4]. In emotion-based SA, we try to extract emotions expressed by users towards entities such as products, policies, services, issues and organizations. Emotion-based sentiment analysis works at different levels, such as word, phrase and sentence [4]. In sentence-level emotion recognition, emotion category for the input text is decided for each individual sentence in a given document. At this level, each sentence is considered an independent unit, expressing a single emotion type. Several studies have been conducted on sentence-level emotion detection with emphasis on improving the performance of classifiers in text [4–7]. In the context of biologically inspired emotion detection techniques, the detection and classification of emotions from text is both challenging and important. An emotion-based sentiment analysis system has two basic building blocks, namely acquisition of emotion-related resources (e.g. emotion-related lexicon) and a module to detect and analyse emotions in usergenerated text using the acquired emotion-related resources. Much work [5–9] has already been performed to build emotion-related resources. However, little work has been performed on how to use the acquired emotion-related lexical resources to analyse user’s emotions. Early studies conducted for emotion detection and analysis are generally based on supervised or unsupervised approaches, in which the supervised approaches largely depend upon the availability of large annotated datasets, and performance degrades in multiple domains due to slow training. The unsupervised approaches are dependent upon the availability of emotive lexicons. Furthermore, such approaches use a basic emotion model having limited coverage of emotion words, polarity shifters, negations, emoticons and slang acting as emotion signals. Therefore, it becomes difficult to analyse emotions using previous techniques to compute sentiment strength of emotions from text. The main challenges in developing sentence-level emotion-based sentiment analysis applications are as follows: (i) lack of sufficient coverage of emotion words—the existing Ekman’s basic emotion model1 has a limited coverage of emotion categories and words; (ii) polarity shifters and negations—polarity shifters and negations are mostly given less attention in emotion-related sentiment classification, which often results in performance degradation of the emotion classifiers; (iii) emoticons— limited lexical resources for emotion-based emojis 1 https://www.paulekman.com/wp-content/uploads/2013/07/Basic-Emotions. pdf

produce low accuracy of the system in detecting polarity of an emotion-based text at the sentence level; and (iv) slang terms—abbreviations and irregular terms used in social media posts often result in incorrect classification of emotion-based text. The aforementioned problems often result in inadequate scalability of the model, low precision and low accuracy in the classification of a user’s emotions expressed in social media text. In addition, it is difficult to address polarity shifters, negation emotions and slang in practice. Therefore, our work attempts to bridge this gap and to enhance the performance of the emotion-based sentiment analysis system using the acquired emotional resources and a set of rule-based classifiers for classifying different emotion signals, such emotion words, emoticons and slang. In this work, an integrated framework is proposed for emotion-based sentiment classification at the sentence level from user reviews using a rule-based classification scheme. The main focus is on detecting and classifying emotion words, emoticons, slang, polarity shifters and negations for emotion detection and classification of user reviews in different domains. The proposed framework is motivated by the prior studies [4, 6, 7] conducted on emotion-based sentiment analysis. Sun et al. [4] proposed a multifaceted cognitive model to analyse emotions with an emphasis on different emotional aspects. Das and Bandyopadhyay [6] introduced a sentence-level emotion tagging system, and Crossley et al. [7] proposed a sentiment and cognition engine-based tool for text analysis. The previous studies used a basic emotion model with limited coverage of emotion words, polarity shifters, emoticons and slang informal emotion signals for classifying the emotions expressed by users in an online text. However, we introduce an emotion classification approach at the sentence level using emotion word classifier (EWC), emoticon classifier (EC), slang classifier (SC) and mixed-mode classifier (MMC) in a pipelined approach. This research aims at improving the performance of emotion-based sentiment classification by introducing a rule-based classification scheme at the sentence level. This present work’s main contribution to the cognitive computation community is to use a cognitive and computational framework to detect and classify emotions from natural language text based on rule-based classification. Emotions represent an individual’s psychological state, such as his/her mental and physical attributes. Therefore, it is necessary to combine emotion theory (e.g. Ekman’s model) with natural language processing techniques, such as the detection and classification of emotion-related words, emoticons and slang. The emphasis is on the development of an emotive lexicon and an integrated emotion-based sentiment classification system based on a set of rule-based classifiers: emotion word classifier, emoticon classifier, slang classifier and

Cogn Comput

mixed-mode classifier. A synopsis of the work’s contributions is as follows: &

&

&

Development of extended emotion lexicon: We develop an extended emotion sentiment lexicon (EESL) to store emotion words and their sentiment scores, which is an enhancement of the existing emotion model of emotions based on Ekman’s basic emotion model to provide maximum coverage of emotion-related words and their synsets. Furthermore, we propose two lexical resources for storing informal emotion signals, namely emoticons and slang, along with their sentiment scores. Emotion-word signal classifier: We propose a classification algorithm to detect and classify formal emotion signals such as emotion-words, polarity shifters and negations. Emoticon-slang signal classifier: We propose a classification algorithm to detect and classify informal emotion signals such as emoticons and slang.

Mixed-Mode Classifier We develop a mixed-mode emotion signal classifier that considers all types of emotion signals, such as emotion words, emoticons and slang. In working towards achieving these aims, the intention is to contribute knowledge to the scientific literature by developing an enhanced emotion detection system. The main emphasis is on detecting emotion words, slang and emoticons and to find the polarity class and score of emotion words and tweets in a given input text. The contribution of this work is significant for the following reasons. First, it provides a sophisticated set of publically available emotive resources: an extended emotion sentiment lexicon, an emoticon lexicon and a slang lexicon. These emotive resources could assist researchers in using cognitivebased SA applications. Second, the algorithms developed for the detection and classification of emotion words, polarity shifters, negations, emoticons and slang, are simple and effective; this can help SA researchers develop more complex techniques in the field of cognitive computation. Finally, the mix-mode emotive classification technique is generic in that it classifies all the emotive signals (emotion words, polarity shifters, negations, emoticons and slang) using a single classifier. This study’s contribution could provide an opportunity for cognitive-based SA researchers to develop a more sophisticated set of classifiers at the sentence level and document level. The rest of paper is structured as follows: the “Related Work” section demonstrates related work about the emotionbased sentiment analysis. In the “Material and Methods” section, we describe the proposed method for creating different lexical resources and a framework for emotion-based sentiment analysis. The “Experiment Setup” section describes the

dataset compilation and pre-processing and the evaluation of the developed system. The last section concludes the work with a discussion on how it can be expanded in the future.

Related Work Emotion detection in sentiment analysis is an important and challenging task due to the ambiguous nature of emotions and the rich collection of emotion words. Several approaches have already been proposed for emotion recognition from text. In this section, overview of the related work performed is presented and discussed. Sun et al. [4] proposed a multifaceted cognitive model for the interpretation of emotions and other complex phenomena. It is comprised of different subsystems: (i) action centred, (ii) non-action centred, (iii) motivational and (iv) metacognitive. They identified a number of important emotion-related aspects and they fit the components into the Connectionist Learning with Adaptive Rule Induction Online (CLARION) framework. However, additional emotion-related aspects, such as polarity shifters, negations, emoticons and slang terms, need to be incorporated to address the social media content effectively. Mohammad and Kiritchenko [5] performed emotion detection from tweets through emotion-word hashtags by using the datasets “Hashtag Emotion Corpus” and “Headlines”. They generated a large lexicon of word-emotion associations from an emotion-labelled tweet corpus. The experimental results obtained on six basic emotions using the SVM classifier show an improvement in emotion association accuracy. However, emotion-words-synonyms and emotions associated with different morphological forms of the emotion words are not addressed. Das and Bandyopadhyay [6] introduced a sentence-level emotion tagging system based on the Conditional Random Field model using different lexical resources, such as WordNet Affect, SentiWordNet (SWN) and SenticNet. The system proposed by [6] uses three emotion scoring methods and a post-processing module. It outperforms the lexiconbased methods. However, inclusion of other emotion signals, such as emoticons and slang, can improve the performance of this model. Crossley et al. [7] introduced a sentiment and cognition engine-based tool to analyse text. It performs text negation, parts-of-speech tagging and sentiment scoring of input text using different dictionaries. Different word vectors are developed to measure the users’ sentiments, emotions and social relationships. However, such word vectors only deal with emotion and sentiment words, and there is no provision for quantifying emotions from emoticons and slang terms, which, if incorporated, can increase the efficiency of their system.

Cogn Comput

Sentic Computing [8] is an emerging approach for developing SA applications; it exploits the principles of computer science and the social sciences. It operates at the concept level and it unveils the hidden content in texts. Sentic Computing integrates common sense reasoning and a novel emotion categorisation model [9] to classify the text at both the paragraph level and the sentence level. Shaila and Vadivel [10] proposed a supervised learning method for detecting and classifying text into emotion tags at the sentence level. The Artificial Neural Network model is employed to isolate patterns of positive and negative emotions for capturing the psychology of a person. They reported that words and phrases play a pivotal role in emotion detection in sentences. Intensifiers are considered for emotion detection; however, other emotion triggers, such as emoticons and slang terms, are not considered that, if incorporated, could produce more efficient results. When working on affective computing, Cambria [11] identified two basic building blocks, namely (i) emotion recognition and (ii) sentiment detection. The emotion recognition module extracts emotion signals, and the sentiment detection aims at classifying text into positive and negative classes. These two tasks are applied in two phases in a pipeline approach; in the first phase, sentiments are detected, and in the next stage, the sentence is tagged to a particular emotion category. In most of the sentiment analysis systems, emotion classification is performed after performing sentiment classification. In their work on emotion detection and analysis, Quan and Ren [12] used a Polynomial kernel method, a supervised machine learning approach for computing similarities between sentences and basic sets of emotions. They used a corpus of Chinese emotions, namely Ren-CECps [13], and achieved 90% accuracy with respect to recognition of basic emotions. However, the system lacks the ability to handle more linguistic expressions such as negations, polarity shifters and punctuation. Socher et al. [14] developed a parsed text corpus (treebank) by annotating sentences with syntactic and semantics structures at a fine-grained level. A Recursive Neural Tensor Network was introduced to estimate sentiments at the sentence level. They achieved higher accuracy of approximately 90% for all phrases and negations by splitting the dataset into training and testing corpuses. At the sentence level, they achieved an improvement over the state-of-the art methods of from 80 to 85.4%. Poria et al. [15] proposed a method for the extraction of features from short multimedia content such as text, audio and video clips by using convolutional multiple kernel learningbased classifiers. Thus, activation values are applied in the inner layer of a Deep Convolutional Neural Network (DCNN) model. Experimental results show that they achieved an improvement of 14% when compared with other methods.

In their work on a convolutional neural network model, Severyn and Moschitti [16] performed sentiment analysis at a microblog level by using a deep learning technique. The proposed model accurately trains the seed words, which are subsequently used as input to the deep learning model. The major advantage of their system is that it does not need support features for training the model on the Twitter dataset. Results obtained show performance improvement over the baseline methods at both the phrase and sentence levels. The supervised machine learning-based approaches generally aim at constructing a classification model on a large annotated corpus. The accuracy of such approaches is largely based on the quality of the annotation, and the training process takes a longer time. Moreover, when such algorithms are applied to other domains, the results are not efficient. To capture contextual polarity at different granularity levels, Muhammad et al. [17] proposed a lexicon-based sentiment classification system. The system works at a finegrained level of sentiment classification and overcomes the limitation of lexicon-based systems in terms of incorrect sentiment scoring. To address this issue, domain-centric vocabulary is incorporated to enhance the performance of sentiment classification. Pensa et al. [18] proposed an integrated framework for modelling user behaviour on social media by using a concept-level graph. The users are identified along with their activities, opinions, behaviour and relationship with their contacts. Temporal analysis is performed by formulating the temporal relationships. The major limitation includes lack of event detection for automatic classification of hot topics in social media posts, which, if incorporated, could improve the results. Chaumartin [19] proposed a knowledge-based approach for detecting six specific emotions by using different lexical resources, such as WordNet [20], WordNetAffect [21] and SentiWordNet [22]. They achieved high accuracy; however, the system can be extended to address a wide range of emotion categories along with polarity shifters and negations. In [23], Agrawal and An proposed an unsupervised, context-based approach for detecting emotion from text at the sentence level. They classified the sentences using semantic and syntactic dependencies. The system did not need an annotated dataset and had no dependency upon an affect lexicon. However, proposing affective semantic relatedness measures, addressing polarity modifiers, negations and slang can enhance the performance of the system. Gievska et al. [24] proposed a hybrid approach to categorize text emotions into Ekman’s six basic emotions. They extended the lexicon with the SenticNet [25] resources for contextual analysis to achieve better performance in emotion detection. However, more promising results can be achieved by incorporating valence shifters, negations and informal text constructs such as emoticons and slang.

Cogn Comput

To model user behaviour in advertising, Boratto et al. [26] proposed a technique to detect segments of users from different sources. For this purpose, user preferences are incorporated, because most of the user’s time is spent on formularizing queries for information retrieval. Thereafter, retrieved content is analysed with respect to the user’s preferences, and the extracted words are compiled in vector format form. Realworld datasets are used to evaluate the performance of the proposed system. The existing approaches for the development of lexiconbased systems for the detection and analysis of emotions in user-generated reviews primarily rely on one or more publicly available sentiment lexicons, such as SWN, WordNet and WordNetAffect. Furthermore, the approaches have used a basic model of emotions that has limited coverage of textual, pictorial and slang-based emotions. To overcome the aforementioned problems associated with machine learning and lexicon-based approaches, we propose an enhanced rulebased classification system for the detection and analysis of emotions expressed in user-generated text. From the overview of the aforementioned studies, the existing supervised approaches for emotion detection are clearly largely dependent upon the availability of large annotated datasets, and performance degrades in multiple domains due to slow training processes. The problem with unsupervised approaches is due to two issues, namely (i) their dependency upon publically available sentiment and emotive lexicons and (ii) their use of a basic emotion model with limited support for emoticons, slang, modifiers or negations. Therefore, there is a need for a more accurate and precise classification method that can classify emotions with improved performance. To bridge this gap, there is a need to create a comprehensive emotion-based sentiment system based on a rule-based classification that can provide a sufficient coverage of emotion words, emoticons, slang, negations and polarity shifters with accurate polarity scores and at the same time achieve more robust results comparable to the performance results of supervised and unsupervised approaches.

Material and Methods To address the problems associated with supervised and unsupervised techniques for emotion detection and to improve the performance of emotion-based sentiment analysis, we propose a rule-based approach for emotion detection at the sentence level. The proposed framework (Fig. 1) operates at the sentence level and uses five rule-based modules, namely (i) lexical resource generation, (ii) emotion word classifier (EWC), (iii) emoticon classifier (EC), (iv) slang classifier (SC) and (v) mixed-mode classifier (MMC). After applying pre-processing steps, the input text is passed through the

aforementioned four classifiers for sentence-level emotionbased sentiment classification. Details of each module are presented in the following subsections. Lexical Resources In this study, we used and generated different lexical resources. Details are provided in the following. Extended Emotion Lexicon In the existing supervised and unsupervised methods for emotion-based sentiment analysis, there is a lack of sufficient coverage of emotion words because most systems are based on Ekman’s basic emotion model, which has a limited coverage of emotion categories and words. To address this problem, we developed an extended emotion lexicon (EEL) derived from Ekman’s model [27] of six primitive emotions, namely (i) fear, (ii) trust, (iii) happiness, (iv) disgust, (v) frustration and (vi) sadness. We propose to extend Ekman’s model by acquiring synsets of each basic emotion word of Ekman’s model from the NRC emotion association lexicon [28]. Furthermore, we propose to extend the basic categories of Ekman’s model by introducing two emotion categories, i.e. embarrassment and reactive [29]. The extended lexicon (available at: https://figshare.com/s/ 2581c9fd16b9b1826a93) contained 1000 affective words that are associated with different emotional categories: 135 words for fear, 125—anger, 146—happiness, 200—disgust, 11 9 — s u r p r i s e , 1 2 0 — s a d n e s s , 7 5 — r e a c t i v e a n d 80—embarrassment. A partial list of the extended emotion lexicon is presented in Table 1. Different sentiment lexicons are used to assign sentiment scores to emotion words. We present a description of two of the most widely used sentiment lexicons: SenticNet and SWN. SenticNet Lexicon SenticNet is a publically available sentiment lexicon for concept-level sentiment analysis [30]. It is based on Sentic computing, a paradigm that combines Semantic Web and Artificial Intelligence approaches using graph mining and multidimensional scaling techniques. This lexicon is comprised of more than 30,000 concepts and heir sentiment scores, on a scale ranging from − 1 to 1. Each concept consists of single word and multiple word concepts. The SenticNet lexicon provides concepts and their interrelated terms at a deeper level. However, our extended emotion lexicon contains words and their associated synsets without establishing any dependency relationship; therefore, to assign a polarity score to each emotion word in our extended lexicon, we use the SWN-based scoring technique. Therefore, to assign a polarity score to each emotion word in our extended lexicon, we use the SentiWordNet (SWN)-based scoring

Cogn Comput

Emoon review datasets

Sentence-Level Emotion detection using Rule-based Classi cation

Preprocessing

Tokenization Emoticon and Slang Filtering Tweet Cleansing Lemmatization

EmoonWord Classi er (EWC)

Emocon Classi er (EC)

Slang Classi er (SC)

MixedMode Classi er (MMC)

Sentence tagged with emotion category

Part-of-speech tagging Spell

correction

Lexicons

Extended Emotion lexicons (EEL)

Emocon Lexicon

Slang Lexicon

Polarity Shifter Lexicon

Fig. 1 The proposed system

technique. The SWN lexicon uses a wide range of words and sentiment scores.

SentiWordNet Lexicon SWN is a general-purpose lexicon with more than 60,000 synsets obtained dynamically from WordNet [20]. An SWN-based sentiment score computation works as follows: sen8scoreswn ðwi Þ   þ if max sen scoreþ ; sen score ; sen score0 ¼ sen scoreþ < sen score    þ  0 ¼ sen score if max sen score ; sen score ; sen score ¼ sen score : 0 sen score else

ð1Þ The “sen_scoreswn(wi) ” is positive if the average positive score (sen_score + ) is greater than both average –ive (sen_score−) and neutral (sen_score0 ) scores. We receive a – ive polarity score by using the same rule. The polarity score is considered neutral if average positive and average negative polarity scores are identical or the average neutral polarity score is greater than positive and negative. For example, the sentiment score set {sen_score+, sen_score−, sen_score0 } for the emotion word “terror” is {0.0625, 0.281, 0}; therefore, sen_scoreswn(“terror”) = −0.281, which is negative. Using the aforementioned Eq. 1, a sentiment score for each

emotion word in our extended emotion lexicon is computed and used for subsequent processing (Table 2). Slang Lexicon We develop a slang lexicon (SL) from two slang resources, namely “noslang.com” and “onlineslangdictionary.com”, by acquiring 500 slang terms and their meanings (available at: https://figshare. com/s/2581c9fd16b9b1826a93). We asked five human annotators to assign positive, negative and neutral polarity classes and scores (+ 0.5, + 1, 0, − 0.5 and − 1.0) to each slang term. The manual annotation and scoring of the entire slang list yields five votes for each slang term. Based on a majority-voting scheme, the annotation with a maximum number of votes is deemed the winner. Thus, we acquired 55% positive, 40% negative and 5% neutral slang terms. A partial list of such slang terms is presented in Table 3. Emoticon Lexicon The existing emotion-based sentiment analysis systems face the problem of low accuracy in the detection and classification emoticons. This problem arises due to limited lexical resources for emotionrelated emoticons [24]. To address this problem, we developed an Emoticon Lexicon (EL) from the existing

Cogn Comput Table 1 No.

A sample list of emotion words taken from EEL Emotion category

Emotion type

Extended emotion synsets/words (proposed)

1

Fear

Ekman’s basic emotion

Terror, dread, shock, phobia, apprehension, fright, horror, trepidation

2

Anger

Ekman’s basic emotion

Anger/annoyance, vexation, exasperation, crossness, irritation, irritability, indignation, pique, displeasure, resentment, rage, fury, wrath, outrage, temper, road rage, air rage, irascibility, ill temper, dyspepsia, waspishness, informal aggravation, literary ire, choler, bile

3

Happiness

Ekman’s basic emotion

Pleasure, joy, enjoyment, euphoria, amusement, ecstasy, orgasm, serenity, gladness, love, delight, elation, happiness, excitement, courage, pride, satisfaction, serene, calm, relaxed, relieved, passionate, interest, attraction, allure, appeal, charm, beauty

4

Disgust

Ekman’s basic emotion

Boredom, loathing, hate, hatred, revulsion, repugnance, abhorrence, antipathy, aversion, rejection, dislike, odium, discomfort, disapproval, dirty, wrong, badness, anger, contempt, irritation, despair, disappointed

5

Surprise

Ekman’s basic emotion

6

Sadness

Ekman’s basic emotion

Astonishment, amazement, incredulity, bewilderment, stupefaction, wonder, confusion, disbelief, consternation, shock, rude awakening, eye-opener, turn up for the books, shocker, whammy, unexpected, unforeseen Grief, sorrow, pensiveness, mourning, lament, lamentation, crying, weeping, dirge, elegy, hurt, helplessness, powerlessness, worry

7

Embarrassment

(Proposed)

Shame, awkward situation, puzzle, mess, bashfulness, shyness, timidity, tangle

8

Reactive

(Proposed)

Sharp, soft-hearted, sensitive, receptive, sympathetic, active, conscious, kind-hearted, sensible, impressionable

emoticon lists2, 3 ignoring duplicate terms. We acquired a list of 450 emoticons (available at: https://figshare.com/s/ 2581c9fd16b9b1826a93). To assign polarity class and score to the emoticons in our lexicon, we asked the five human annotators to assign scores of + 0.5 (positive), + 1.0 (positive), − 0.5 (negative), − 1.0 (negative) and 0 (objective). The score closest to the average of the five annotators’ scores is assigned to each emoticon. In 87.32% of the cases, the five annotators assigned similar scores to the emoticons. The annotators were told to categorize the emoticons into eight categories as proposed by the Human Machine Interaction Network on Emotion [28]. The acquired emoticon lexicon uses the rule-based classifier proposed for detecting and classifying emotion-related emojis (see the “Emoticon Classifier” section). A sample set of entries from the emoticon lexicon is presented in Table 4. Rule-Based Classification of Emotion Signals Rule-based classification is used to classify the emotions in user reviews using a set of “if-then” rules [31], the if clause is called “rule antecedent”, and the then clause is called “rule consequent”. We constructed 35 rules for emotion signal detection. Romanyshyn [32] proposed 20 rules for sentiment 2

Emoticons, available at: http://netforbeginners.about.com/cs/netiquette101/ a/bl_emoticons101.htm, last accessed on Nov 20, 2016. 3 Emoticons, available at: http://www.sharpened.net/emoticons/, last accessed on Nov 20, 2016.

classification of user reviews in the Ukrainian language. However, we proposed 38 rules for the detection and classification of emotion signals, such as emotion words, emoticons and slang with the support of different lexical resources as discussed in the “Lexical Resources” section. We categorize the emotion signals into three categories, namely (i) emotion words, (ii) emoticons and (iii) slang and propose rule-based classifiers for the detection and classification of each emotion signal. The proposed system works at the sentence level. At this level, each sentence is treated as an independent entity, expressing different emotions with specific thresholds based on the sentiment score computed for each emotion category. For this purpose, we tokenize each review into a set of individual sentences (see the “Pre-processing” section) that are stored in

Table 2 EEL

Partial list of emotion words with polarity scores taken from

Emotion word

Sentiment score using Eq. 1

Best Awkward Weeping Pleasure Horror Shock

+ − − + − −

0.305 0.521 0.208 0.2 0.458 0.213

Cogn Comput Table 3

A partial list of slang terms with their sentiment class and scores

Emotion tags

Slang

Description

Polarity class

Polarity score

Fear

FOGC

Fear of getting caught

Negative

−1

FUD

Fear, uncertainty and doubt

Negative

−1

Sadness

CID CLAB

Crying in disgrace Crying like a baby

Negative Negative

−1 −1

Happiness

Happs Hehehe

Happy Laughing

Positive Positive

+1 +1

Anger

BIH

Burn in hell

Negative

−1

Disgust

BS Smh

Bullshit Shaking my head

Negative Negative

−1 −1

Surprise

Damn Blower

Condemn/disbelief a surprise; a big shock

Negative Negative

−1 −1

Random

unexpectedly great

Positive

+1

Embarrassment

Dac Moded

Negative Negative

−1 −1

Reactive

Notta

“Embarrassed”. Pronounced das-ed or das Embarrassed. Usually used after someone does something stupid Not

Negative

−1

Air

Alright

Positive

+1

the Excel file. In this work, we use the terms tweet and sentence interchangeably. Mathematically, we write as follows: Let T be a set of tweets, represented as follows: T ¼ ft1; t2; t3; …:tng Let Em be a list of basic emotion categories, “fear”, “anger”, “happiness”, “disgust”, “surprise” and “sadness”, represented as follows: Em(ei) = {list of emotion categories} Em(Si) = {emotion category assigned to sentence S}

Emotion Word Classifier The EWC aims at classifying opinion words as emotion signals by utilizing a variety of lexical resources, namely SentiWordNet and an Extended Emotion Lexicon (EEL). The EWC (Fig. 2) works as follows: (i) computation and aggregation of the sentiment score of emotion words, (ii) selecting the emotion category of emotion words with the highest sentiment score, (iii) polarity shifter classifier (PSC) and (iv) negation handling. Computation and Aggregation of Sentiment Score of Emotion Words The EEL is used to retrieve the sentiment scores of emotion words belonging to different emotion

categories, such as “fear”, “anger”, “happiness”, “disgust”, “surprise”, “sadness”, “embarrassment” and “reactive”. Furthermore, sentiment scores of one or more emotion words pertaining to a specific category are computed and aggregated as follows: emotion words sent scoreðemotion catgÞi ¼ ∑ 1 ≤i ≤m pol scoreew ði; jÞ; 0< j< n       if ew j ∈ S ∧ ew j ∈ EEL ∧ ewi ∈ ðemotion catg Þi

ð2Þ where “emotion_word_sent_score(emotion_catg)i ” is an aggregated sentiment score of all of the emotion words (j) in an input tweet pertaining to the ith emotion category (“fear”, “anger”, “happiness”, “disgust”, “surprise”, “sadness”, “embarrassment” and “reactive”), and “pol_scoree (i , j)” shows the polarity score of the jth emotion word ew in the ith emotion category. The If clause shows that emotion word “j” belongs to sentence S, the extended emotion lexicon EEL and the ith emotion category. For example, the sentence “smh! I don’t have anything to do, I feel lost ☹, I’m bored” contains one emotion word— “lost”. In the above example, “lost” is an emotion word (ewj). Using Eq. 1, its sentiment score is computed as “− 0.45”; using EEL, it belongs to the emotion category (emotion_catg) of “sadness”. Using Eq. 2, sentiment scores of all of the emotion words belonging to an emotion category (sadness) are aggregated. In this case, there is a single word, “lost”, belonging to the “sadness” category; therefore, computation of Eq. 2 occurs as follows:

Cogn Comput Table 4

Partial list of emoticons with their sentiment class and scores Emotion

Emotion synsets (from EEL)

category

Emoticon

Senti

Sentiment

Symbol

ment

class

score Happiness

Pleasure, joy, enjoyment, euphoria, amusement,

+1.0

Positive

-1.0

Negative

-1.0

Negative

-1.0

Negative

-1.0

Negative

-1.0

Negative

-1.0

Negative

Negative/pos

ecstasy, orgasm, serenity, gladness, love, Delight, Elation, Happiness, Excitement, Courage, pride, satisfaction, Serene, Calm, Relaxed, Relieved, passionate interest, attraction, allure, appeal, charm, beauty :->, :-D, :), :-) =D, :3 :> :ˆ) :-3 =>:-V =v

:-

1 Sadness

:(

hurt, Grief, sorrow, pensiveness, mourning,

:-
:O, :o O.o, o.O

:$, :-$ receptive,

(y)

-

sympathetic, active, conscious, kind-hearted,

:p

1.0/+1. itive

sensible, impressionable

=-O

0

Cogn Comput Sentence “S” having “N” emoon words as emoon signals

Emoon-Word1 (EW1)

Using Eq. 2 Aggregate senment scores of words (EWn) from “Fear” category

Using Eq. 2 Aggregate senment scores of words (EWn) from “Anger” category

Using Eq. 2 Aggregate senment scores of words (EWn) from “Reacve” category

…………

Emoon-Word2 (EW2)

Using Eq. 2 Aggregate senment scores of words (EWn) from “Happiness” category

Using Eq. 2 Aggregate senment scores of words (EWn) from “Embarrassment” catgeory

Emoon wordN (EWn)

Using Eq. 2 Aggregate senment scores of words (EWn) from “Disgust” category

Using Eq. 2 Aggregate senment scores of words (EWn) from “Surprise” category

Using Eq. 2 Aggregate senment scores of words (EWn) from “Sadness” category

Using Eq. 3, select the emoon category E with the highest senment score from the above aggregated scores

Using Eq. 4 Tag sentence S with emoon category E

Fig. 2 Flow diagram of emotion-word classifier (EWC)

  emotion words sent scoreðemotion catgÞ“Sadness” ¼ ∑ 1 ≤ i ≤ m “ lost” ¼ ∑ 1 ≤ i ≤ m ð−0:45Þ 0< j< n 0< j< n emotion words sent scoreðemotion catgÞ“Sadness” ¼ ð−0:45Þ

Similarly, “bored” is an emotion word (ewj); using Eq. 1, its sentiment score is computed as “− 0.187”, and using EEL, it belongs to the emotion category (emotion_catg) of “Disgust”. Using Eq. 3, sentiment scores of all of the emotion words

belonging to an emotion category (disgust) are aggregated. In this case, there is a single word, “bored”, belonging to the “disgust” category; therefore, computation of Eq. 3 occurs as follows:

  emotion words sent scoreðemotion catgÞ“Disgust” ¼ ∑ 1 ≤ i ≤ m “ bored ” ¼ ∑ 1 ≤ i ≤ m ð−0:187Þ 0< j< n 0< j< n emotion words sent scoreðemotion catgÞ“Disgust” ¼ ð−0:187Þ

Cogn Comput

From the aforementioned computations, the sentiment scores of all of the emotion words (lost, bored) belonging to their respective emotion categories (sadness, disgust) are computed and aggregated. In the next step, we select the emotion category with the highest sentiment score.

highest aggregate sentiment score (considering magnitude only and ignoring the positive/negative signs) as follows: selected sent scoreðewi Þ   ¼ max emotion word sent scoreðemotion catgÞi 1≤i≤m

Selecting the Emotion Category of Emotion Words with the Highest Sentiment Score After computing the aggregate sentiment scores of each emotion category based on emotion words, we next find the emotion category with the

ð3Þ

To find the emotion category with the highest sentiment score, we insert in the R.H.S. of Eq. 3 the sentiment scores of the emotion words of particular emotion categories (already computed in Eq. 2), as follows:

  selected emotion words sent score “bored ” “Disgust” ¼ max ðabsð−0:45; −0:187ÞÞ 1≤i≤m

From the above computation, the score of “lostsadness” is clearly − 0.45, which is greater (comparison without signs) than that of “boreddisgust”, or − 0.187; therefore, sentence S is tagged as follows:

Tagging a Sentence/Tweet with a Particular EmotionCategory Based on Emotion-Words Finally, the input sentence/tweet is tagged with a specific emotion category based on the emotion class with the highest sentiment score of emotion-words, as follows:

selected emotion words sent scoreð}lost}}Sadness} Þ ¼ −0:45   Emotion tagðSiÞ ¼ ðemotion catg Þi ; if ðselected emotion words sent scoreðewiÞÞ ¼ emotion words sent scoreðemotion catgÞi

ð4Þ

   ∧ i Є “fear” ; “ anger” ; “ happiness” ; “ disgust” ; “ sadness” ; “ surprise” ; “ embarrassment ” ; “ reactive”

where “(emotion_catg)i” is the ith emotion category belonging to any of the main emotion categories “fear”, “anger”, “happiness”, “disgust”, “surprise”, “sadness”, “embarrassment” or “reactive”; “selected__emotion_words_sent_ score(ewi)” is the highest sentiment score of a particular emotion category w.r.t. emotion words in Eq. 3; “emotion_words_ sent_score (emotion_catg)i” is the aggregated sentiment score of a particular emotion category computed using Eq. 4; and “emotion_tag(Ti)” is the emotion tag assigned to a particular tweet/sentence. Emotion tagðSiÞ ¼ ðSadnessÞ

purpose, a PSC module is proposed to play a pivotal role in the detection and classification of emotions. Polarity shifter words, such as “very”, “too”, “few” and “slightly”, enhance or reduce the sentiment strength of opinion words in a tweet/sentence. We adopted a list of 45 polarity shifters along with their sentiment scores from the work performed by [31]. A partial list of such words is presented in Table 5, in which enhancers are positive and reducers are negative. Let PSpos_neg be a list of positive and negative polarity shifters represented as follows: PS pos

Polarity Shifter Classifier The polarity shifters and negations are generally given less attention in emotion-related sentiment classification, which often results in performance degradation of the classifiers [31]. For this

sent scorepol‐shifter ðewÞ ¼

n

neg¼

flist of positive and negative polarity shiftersg

If a word is found in a list of positive or negative polarity shifters, then the sentiment score of the nearest emotion word is calculated as follows:

   sent scoreðewÞ þ sent scoreðewÞ* sent scorepol‐shifter ðwx Þ ; if wx ∈PS pos

 neg

ð5Þ

Cogn Comput Table 5

Partial list of positive and negative polarity shifters

Polarity shifter

Weight

Polarity shifter

Weight

Really

+ 0.15

Minor

− 0.3

Too

+ 0.4

Less

− 0.5

completely Very

+ 1.0 + 0.5

Hardly Low

− 0.7 − 0.2

Extremely Soo

+ 0.8 + 1.0

Some

− 0.25

sent_scorepol − shifter , and (wx) is the numeric score of the polarity shifter obtained from the polarity shifter dictionary. The polarity score of the nearest emotion word is obtained by multiplying the numeric score of the polarity shifter by the sentiment score of the sentiment word in Eq. 5 and then adding it to the polarity score of the emotion word. For example, in the sentence “I extremely love Apple Mobile”, the polarity shifter “extremely” is enhancing the strength of the adjacent emotion word—“love”. Therefore, using Eq. 5, the enhanced polarity score of emotion word “love” is calculated as follows:

where wx denotes that the word belongs to a list of positive and negative polarity shifters, w is the sentiment word          sentscorepol‐shifter “ extremely love” ¼ sent score “ love” þ sent score “ love” *sent scorepol‐shifter “ extremely” ¼ 0:388 þ ð0:388  0:8Þ ¼ 0:388 þ 0:3104 ¼ 0:698;

where 0.388 is the polarity score of the sentiment word, namely “love”, retrieved and calculated using Eq. 1, 0.8 is the strength of the positive polarity shifter “extremely”, retrieved from Table 5, and 0.698 is the updated score of sentiment word “love” after the manipulation of the polarity shifter.

where sw denotes the nearest opinion word and sw−1 denotes the prior word of the sentiment word belonging to a list of negation words—Neg_list. For example, using Eq. 6, the sentiment score of “not interesting” is calculated as follows:

      sent scorepol−neg “ not interesting ” ¼ sent score “ interesting ” * ð−1Þ ¼ 0:375  −1 ¼ −0:375:

Negation Handling Negation words, such as “never”, “not”, “do not” and “did not”, often switch the polarity of sentiment words in a tweet/sentence. For example, the sentences “this book is interesting” and “this book is not interesting” have different sentiments. The first sentence carries positive polarity; however, in the second sentence, the negation term “not” flips the polarity of the sentiment word “interesting” from positive to negative. Therefore, negation words must be carefully considered for accurate polarity classification. In this work, negations are handled by creating a list of negation words, and each word in a sentence is searched in the manually created negation list. Let Neg_list be a list of negation terms defined as follows: Neg list ¼ flist of negation termsg If a word is present in the negation list, then the sentiment score of the nearest sentiment word is switched by multiplying the score of the sentiment word by − 1 as follows: sent scorepol−neg ðswÞ ¼

n

   sent scoreðswÞ * ð−1Þ ; if ðsw −1Þ ∈Neg list

ð6Þ

Emoticon Classifier If not properly handled, identification and classification of emoticons as emotion signals in social media posts might result in incorrect classification of emotion-based text. For this purpose, we propose a rule-based emoticon classifier. The proposed classifier is inspired by early work performed by [31] on emoticon-related sentiment analysis. However, we introduce a rule-based classifier for the efficient detection and classification of emoticons as emotion signals. The proposed EC works as follows: (i) computation and aggregation of the sentiment scores of emoticons and (ii) selecting the emotion category of emoticons with the highest sentiment score (Fig. 3).

Computation and Aggregation of Sentiment Scores of Emoticons In this module, sentiment scores of one or more emoticons pertaining to a specific category are aggregated. emoticons sent scoreðemotion catgÞi ¼ ∑ 1 ≤i ≤m pol scoree ði; jÞ; 0 true positive, represents the number of true positive classifications, fp-> false positive is the number false positive classifications, tn-> true negative is the number of true negative classifications and fn->false negative is the number of false negative classifications (Table 8).

Table 8 Confusion matrix

Evaluation on News Dataset

relatedness measure. Table 11 reports the accuracy results of two versions of the proposed method along with the unsupervised method [23] and one other lexical baseline [24]. Results in Table 11 depict that our method performs better than the lexical baseline and NAVA unsupervised method.

In this experiment, two versions of the proposed system, namely (i) emotion words including polarity shifters and negations and (ii) emotion words with emoticons and slang, are compared with the two baseline methods (Table 9) in terms of precision, recall and F-measure. Entries in italic show the best performance of particular emotion categories listed in the first column. The two emotion categories, i.e., “Embarrassment” and “Reactive”, are additions to the existing state of art methods; therefore, corresponding entries in the respective columns of the two base line methods are left blank, whereas their entries are listed in the respective columns of the proposed method. When calculating the average of the three evaluation matrices, entries in bold show the best performance of a particular system/method against all of the emotion categories. Evaluation on Mobile/Smart Phones Dataset In this experiment, we analyse the performance of the proposed method in terms of precision (P), recall (R) and F-measure (F) w.r.t. the following parameters (emotion signals): (i) emotion words (words, polarity shifters and negations); (ii) emoticons; (iii) slang; (iv) emotion words (words, polarity shifters and negations) and emoticons; (v) emotion words (words, polarity shifters and negations) and slang; and (iv) emotion words (words, polarity shifters and negations), emoticons and slang. It is clear from the results presented in Table 10 that the proposed system yields the best performance for emotion detection when all of the emotion signals are considered. The system yields a precision of 83%, 65% recall and 77% F-measure.

Class

Actual

Predicted

Positive

Tp

fn

Negative

Fp

tn

Emoticon Analysis Experiment In another experiment, we analyse the effect of emoticons in emotion classification by classifying the sentences using the emoticon classifier (EC). Figure 6 shows that inclusion of emoticons as emotion signals in the proposed method improves classification accuracy from 78.86 to 83.45%. Slang Analysis Experiment The slang analysis experiment investigates the effect of slang on the emotion-based SA of text. For efficient emotion classification, classifying slang terms as carriers of emotion signals is required before performing the emotion-based sentiment classification of a user-generated text. Table 12 shows the effect of slang on emotion detection and classification of text. Sentiment Computation of Complete Sentences This experiment addresses the computation of sentence-level emotion-based sentiment by considering negations and modifiers on the Pang and Lee dataset [34]. This setup is comparable to previous state-of-the-art work [14], which used only phrases and complete sentences for binary sentiment classification. Table 13 shows the results of emotion-based sentiment computation for different emotion categories at the sentence level.

Evaluation on ISEAR Dataset On the ISEAR emotion benchmark dataset,9 we compare two versions of the proposed method with the NAVA unsupervised method [23] and the lexical baseline [24]. The lexical baseline works as follows. For each emotion word, if it is found the WordNet-Affect lexicon, it is labelled with the respective emotion category. The emotion tag of a sentence is assigned by selecting the emotion with highest score in the sentence. In the case of a tie, the selection is made randomly. Finally, the sentence is tagged as neutral if none of the emotion words is present in the lexicon. The NAVA unsupervised method uses a special list of words and applies a straightforward semantic 9

https://github.com/bogdanneacsa/tts-master/tree/master/ISEAR

Qualitative Evaluation In this section, we present a qualitative evaluation of the results obtained from this study. Four result categories are evaluated. Emoticons The detection and classification of emotions as emotion signals is of paramount importance for automated emotion-based sentiment analysis systems. The results presented in Fig. 6 show that considering emoticons when classifying emotions in text provides a significant increase in precision. The results

Cogn Comput Table 9

Performance evaluation on different emotion signals using News dataset

Emotion categories Hybrid approach [6]

Fear Anger

Lexicon-based [5]

Proposed (emotion words only)

Proposed (emotion words + emoticons + slang)

P

R

F

P

R

F

P

R

F

P

R

F

57.1 62.5

80.0 100.0

66.6 76.9

63.5 49.3

43.7 26.5

51.8 34.5

66.6 60.0

76.9 69.9

71.0 70.0

80.0 70.0

66.6 68.4

72.0 76.0

Happiness

61.5

88.8

72.7

54.0

45.1

49.1

56.3

50.2

55.0

60.8

66.6

69.0

Disgust Surprise

60.0 71.4

50.0 100.0

54.5 83.3

42.1 44.3

18.6 29.2

25.8 35.2

50.6 52.6

24.5 46.4

30.3 48.0

70.5 80.0

56.4 76.6

75.0 76.5

Sadness Embarrassment

83.3 _

21.7 _

34.4 _

52.5 _

36.7 _

43.2 _

75.2 86.6

46.2 76.8

53.0 78.0

78.0 88.8

68.1 78.6

75.3 80.1

Reactive

_

_

_

_

_

_

84.3

74.6

79.3

85.7

78.4

82.3

Average

65.9

73.4

64.7

50.9

33.3

39.9

66.3

58.2

60.5

76.7

69.9

75.7

obtained further depict that the presence of emoticons in text plays a pivotal role in expressing a user’s emotions. Therefore, the poor performance of emotion-based sentiment analysis systems has successfully been addressed by incorporating emoticons as emotion signals. However, the major limitation of our system is that it is unable to classify correctly emotion text in which a user conveys contradictory emotions. For example, the text “She waited for the train ☺, the train was late ☹” is declared as neutral when passed through the emoticon classifier because the polarities of positive and negative emoticons nullify each other in this particular text. Slang The detection and classification of slang as emotion signals in text has a major role in emotion-based sentiment analysis. Experiments performed on three datasets (Tables 10, 11 and 12) show that detection and classification of slang terms as emotion signals has a significant effect on emotion analysis. The evaluation results presented in Table 12 show that the SC module provides a significant improvement in results. Therefore, we have satisfactorily addressed the issue of poor

Table 10

performance of emotion-based sentiment analysis systems due to not incorporating slang terms as emotion signals. However, the slang classifier has certain limitations. The Urban Dictionary includes several definitions for a single term; therefore, we select the first definition of a slang term. Devising an intelligent mechanism for disambiguating multiple meanings of a single slang term is necessary.

Polarity Shifters and Negations Proper handling of polarity shifters and negations plays a significant role in efficient classification of emotion-based sentiment analysis. Experimental results presented in Table 10 show that consideration of polarity shifters and negations as emotion signals improves the performance of the system. However, classification of polarity shifters and negations in conjunction with emoticons and slang requires careful preprocessing and classification; otherwise, such classification might lead to incorrect results. Therefore, a set of linguistic rules must be devised to manage classifications in a more sophisticated fashion.

Performance evaluation on different emotion signals using the mobile/smart phones dataset

Emotion signals

Emotion words (words, polarity shifters, negations) Emoticons Slang Emotion words (words, polarity shifters, negations) + emoticons Emotion words (words, polarity shifters, negations) + slang Emotion words (words, polarity shifters, negations) + emoticons + slang

Evaluation matrices P

R

F

0.77 0.73 0.72 0.79 0.78 0.83

0.41 0.57 0.58 0.59 0.60 0.65

0.53 0.59 0.67 0.68 0.69 0.77

Entries in italic show the best performance of particular emotion categories listed in the first column

Cogn Comput Table 11

Evaluation results on ISEAR dataset

Table 12

Method

Overall accuracy on six emotions

NAVA approach [23]

60–65%

Lexical method [24]

55%

Proposed

73.73%

Comparative results of SC module

With/without slang

Without slang With slang

Lack of Domain-Specific Words The presence of domain-specific words in text creates problems in the accurate classification of emotions. Most of the opinion words expressed in the health domain contain terms having incorrect sentiment scores in SWN [35, 36]. For example, the term “relax” is declared as objective by SWN, but it conveys a positive sentiment; therefore, the word “relax” should be placed in the positive class. For more accurate classification of emotion words, domain-specific words should be exploited in emotion-based sentiment analysis systems.

Performance P

R

F

A

0.76 0.86

0.68 0.80

0.80 0.92

0.76 0.78

produce good results for a particular dataset, but their maintainability is weak across multiple domains due to a dependency on a significantly annotated dataset (supervised techniques) and a publically emotive lexicon (unsupervised/lexicon-based). However, the proposed rule-based system can produce satisfactory results in multiple domains when tested on multiple datasets. For this purpose, a consultancy is needed with the leading industry experts in sentiment analysis, such as http://www.socialmediaexplorer.com/social-mediamonitoring/sentiment-analysis/

Complex and Compound Emotion Sentences

Conclusion and Future Work

Complex and compound emotion sentences express multiple emotions, and the system often produces incorrect emotion classification results in these cases. For example, the sentence “Bilal couldn’t set up his mobile because the setting was complicated” is tagged as neutral by our system. A mechanism is needed to address complex and compound emotion sentences.

This work presents the results of applying an emotion classification approach at the sentence level using emotion word classifier (EWC), emoticon classifier (EC), slang classifier (SC) and mixed-mode classifier (MMC) in a pipelined approach to detect and classify emotions expressed by users in user-generated text. The proposed framework consists of the following tasks: (1) data acquisition in different domains from Twitter and benchmark datasets; (2) pre-processing of the acquired text; (3) use of an emotion-word classifier to detect and classify the emotion words, polarity shifters and negations expressed in text using different lexical resources; (4) application of an emoticon classifier to detect and classify the emoticon-related emoticons expressed in text; (5) emotionbased sentiment classification of slang using a slang classifier based on a slang lexicon; (6) detection and classification of the aforementioned emotion signals: emotion words, emoticons and slang using a mixed-mode classifier; and (7) sentiment classification of reviews with respect to specific emotion categories. The proposed technique assists in classifying emotionwords (words, polarity shifters and negations) expressed in text/tweets, detects emoticon-related emoticons and classifies

Scalability and Maintainability The proposed rule-based framework is scalable to process large amounts of user reviews and is not limited to the existing datasets being applied. Furthermore, it has good scalability in terms of new rules integration, i.e. one can add new rules to the existing rule base proposed for emotion words, slang and emoticons. Concerning maintainability, the field of affective computing remains emergent, and the existing approaches

WITH EMOTICONS Method

83.45

WITHOUT EMOTICONS

Accurcay (%) 76 Series1

78.86

77

78

Table 13 79

without emocons 78.86

Fig. 6 Accuracy results of EC classifier

80

81

82

with emocons 83.45

83

84

Sentiment classification at the sentence level

System

Accuracy

Socher et al. [14] (fine-grained sentiment classification at the sentence level) Proposed (emotion-based sentiment classification at the sentence level)

80.46% 86.31%

Cogn Comput

them using an emoticon dictionary, incorporates an SC module for classifying slang-related emotion sentences/tweets using slang as an emotion signal and scores retrieved from the slang lexicon and performs the emotion-based sentiment classification by focussing on three emotion signals: emotion-words, emoticons and slang in different domains. The improved results in terms of accuracy, precision, recall and F-measure show that the proposed method’s classification results are better than are those of the baseline methods. The framework is generalized and is capable of classifying emotions in any domain. The possible limitations of the proposed approach are an inability to incorporate domain-specific words and an automatic scoring of such words without performing a lookup operation in SWN, which could increase classification accuracy. To minimize scoring inaccuracy in different domains due to the general-purpose nature of SentiWordNet, domainspecific techniques should be investigated. Another possible direction for improvement is the addition of personality identification features for effective classification of emotion-based text/tweets. Finally, analysing the complex and compound sentences in emotion-based sentiment classification would be another interesting research direction. Acknowledgements We are grateful to Prof. Dr. Shakeel Ahmad, Institute of Computing, Gomal University, for facilitating us by providing a licenced software and manuals during the execution of this project. We also are thankful to Furqan Khan for the execution and maintenance of software needed for conducting the experiments in the Opinion Mining and Sentiment Analysis Lab.

3.

4. 5. 6. 7.

8. 9.

10.

11. 12.

13.

14.

15. Funding The authors received no specific funding for this work. Compliance with Ethical Standards

16.

Conflict of Interest The authors declare that they have no conflict of interest. 17. Informed Consent All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.

18.

19. Human and Animal Rights This study did not involve any experimental research on humans or animals; hence, an approval from an ethics committee was not applicable in this regard. The data collected from the online forums are publicly available data and no personally identifiable information of the forum users were collected or used for this study.

20. 21.

References

22. 23.

1.

2.

Wang QF, Cambria E, Liu CL, Hussain A. Common sense knowledge for handwritten Chinese recognition. Cogn Comput. 2013;5(2):234–42. Yang L, Lin H, Lin Y, Liu S. Detection and extraction of hot topics on Chinese microblogs. Cogn Comput. 2016;8(4):577–86.

24.

Agarwal B, Poria S, Mittal N, Gelbukh A, Hussain A. Conceptlevel sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput. 2015;7(4):487–99. Sun R, Wilson N, Lynch M. Emotion: a unified mechanistic interpretation from a cognitive architecture. Cogn Comput. 2016;8(1):1–14. Mohammad SM, Kiritchenko S. Using hashtags to capture fine emotion categories from tweets. Comput Intell. 2015;31(2):301–26. Das D, Bandyopadhyay S. Sentence-level emotion and valence tagging. Cogn Comput. 2012;4(4):420–35. Crossley SA, Kyle K, McNamara DS. Sentiment Analysis and Social Cognition Engine (SEANCE): an automatic tool for sentiment, social cognition, and social-order analysis. Behav Res Methods. 2016;49:1–19. Cambria E, Grassi M, Hussain A, Havasi C. Sentic computing for social media marketing. Multimed Tools Appl. 2012;59(2):557–77. Cambria E, Hussain A, Havasi C, Eckl C. Common sense computing: from the society of mind to digital intuition and beyond. In: LNCS, 5707. Berlin:Springer; 2009. p. 252–259. Shaila SG, Vadivel A. Cognitive based sentence level emotion estimation through emotional expressions. In: Progress in systems engineering. Springer International Publishing; 2015. p. 707–713. Cambria E. Affective computing and sentiment analysis. IEEE Intell Syst. 2016;31(2):102–7. Quan C, Ren F. Sentence emotion analysis and recognition based on emotion words using Ren-CECps. Int J Adv Intel. 2010;2(1): 105–17. Li J, Ren F. Creating a Chinese emotion lexicon based on corpus Ren-CECps. In Cloud Computing and Intelligence Systems (CCIS), 2011 I.E. International Conference on 2011 Sep 15 IEEE; p. 80-84. Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, Potts C. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP) 2013 Oct 18 Vol. 1631. p. 1642. Poria S, Chaturvedi I, Cambria E, Hussain A. Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis," 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona pp. 439-448. doi:10.1109/ICDM.2016.0055. Severyn A, Moschitti A. Twitter sentiment analysis with deep convolutional neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval 2015. ACM; p. 959-962. Aminu M, Nirmalie W, Robert L. Contextual sentiment analysis for social media genres. Knowl-Based Syst. 2016; doi:10.1016/j. knosys.2016.05.032. Pensa RG, Sapino ML, Schifanella C, Vignaroli L. Leveraging cross-domain social media analytics to understand TV topics popularity. IEEE Comput Intell Mag. 2016;11(3):10–21. Chaumartin FR. UPAR7: a knowledge-based system for headline sentiment tagging, Proceedings of the 4th International Workshop on Semantic Evaluations. Association for Computational Linguistics, 2007. WordNet Domains available at: http://wndomains.fbk.eu/index. html, last accessed on April 12, 2016. Wordnet-Affect available at : http://wndomains.fbk.eu/index.html, last accessed on May 20, 2016. SentiWordNet (SentiWordNet available at: http://sentwordnet.isti. cnr.it/, last accessed on August 5, 2016. A. Agrawal, A. An. Unsupervised emotion detection from text using semantic and syntactic relations, In: Web Intelligence and Intelligent Agent Technology (WI-IAT), IEEE/WIC/ACM International Conferences on Vol. 1. 2012. 346–353. Gievska S, Koroveshovski K, Chavdarova T A hybrid approach for emotion detection in support of affective interaction. In: 2014 I.E.

Cogn Comput International Conference on Data Mining Workshop. IEEE; 2014. p. 352-359. 25. Cambria E, Olsher D, Rajagopal D. SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. In: Twenty-eighth AAAI conference on artificial intelligence, 2014. 26. Boratto L, Carta S, Fenu G, Saia R. Using neural word embeddings to model user behavior and detect user segments. Knowl-Based Syst. 2016; doi:10.1016/j.knosys.2016.05.002. 27. Ekman’s basic emotions, available at: https://www.paulekman.com/ wp-content/uploads/2013/07/Basic-Emotions.pdf, last accessed on September 24, 2016. 28. NRC emotion lexicon, available at: http://saifmohammad.com/ WebDocs/NRC-Emotion-Lexicon-v0.92-InManyLanguages-web. xlsx, last accessed on November 10, 2016. 29. Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, Mcrorie M, Martin JC, Devillers L, Abrilian S, Batliner A, Amir N. The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: International Conference on Affective Computing and Intelligent Interaction 2007 Sep 12 Springer Berlin Heidelberg. p. 488-500).

30.

Cambria E, Poria S, Bajpai R, Schuller B. SenticNet 4: a semantic resource for sentiment analysis based on conceptual primitives. In: COLING, 2016. 2666–2677. 31. Asghar MZ, Khan A, Ahmad S, Qasim M, Khan IA. Lexiconenhanced sentiment analysis framework using rule-based classification scheme. PLoS One. 2017;12(2):e0171649. doi:10.1371/ journal.pone.0171649. 32. Romanyshyn M. Rule-based sentiment analysis of Ukrainian reviews. Int J Artif Intell Appl. 2013;4(4):103. 33. Kundi F, Ahmad S, Khan A, Asghar. Detection and scoring of internet slangs for sentiment analysis using SentiWordNet. Life Sci J. 2014;11(9):66–72. 34. Pang, Lee L. Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL; 2005. p. 115–124, . 35. Asghar MZ, et al. A unified framework for creating domain dependent polarity lexicons from user generated reviews. PLoS One. 2015;10(10):e0140204. 36. Asghar MZ, Ahmad S, Qasim M, Zahra R, Kundi FM. SentiHealth: creating health-related sentiment lexicon using hybrid approach. SpringerPlus. 2016;5(1):1–23. doi:10.1186/s40064-016-2809-x.

Suggest Documents