algorithms were proposed to analyze the sarcasm in tweets of Twitter. ... Tendulkar fan who likes to post the tweet about Sachin and his Twitter account consist a.
Sarcastic Sentiment Detection Based on Types of Sarcasm Occurring in Twitter Data *Santosh Kumar Bharti National Institute of Technology Rourkela Ramkrushna Pradhan National Institute of Technology, Rourkela Korra Sathya Babu National Institute of Technology, Rourkela Sanjay Kumar Jena National Institute of Technology, Rourkela
ABSTRACT In Natural Language Processing (NLP), sarcasm analysis in the text is considered as the most challenging task. It has been broadly researched in recent years. The property of sarcasm that makes it harder to detect is the gap between the literal and its intended meaning. It is a particular kind of sentiment which is capable of flipping the entire sense of a text. Sarcasm is often expressed verbally through the use of high pitch with heavy tonal stress. The other clues of sarcasm are usage of various gestures such as gently sloping of eyes, hands movements, shaking heads, etc. However, the appearances of these clues for sarcasm are absent in textual data which makes the detection of sarcasm dependent upon several other factors. In this article, six algorithms were proposed to analyze the sarcasm in tweets of Twitter. These algorithms are based on the possible occurrences of sarcasm in tweets. Finally, the experimental results of the proposed algorithms were compared with some of the existing state-of-the-art. Keywords: Opinion mining, Parsing, POS tagging, Sarcasm, Sentiment analysis, Twitter data
INTRODUCTION Sentiment analysis is a procedure to extract the attitude of a speaker or a writer concerning some target (Pang, 2002, PP. 79-86). Social media and social networking have fueled the online space, including Facebook, Amazon, Twitter, etc. as ratings, reviews, comments, etc. are everywhere. Social media content is growing rapidly with every passing day. With the large volumes of information generating daily, identifying clear and most consumer reliable information about their preferences has become very tough task nowadays. To get the correct and trusted information, individuals are showing interest towards analysis of social networking content. For every business, online reviews have become the deciding factor which can break or make a product in the marketplace. Sentiment analyzer is used as a tool to understand how consumers are reacting to an event or a new product through auditing social networking posts and comments.
Sarcasm is derived from the French word “Sarcasmor” that means “tear flesh” or “grind the teeth”. In simple words, sarcasm is a way to speak bitterly. Macmillan Dictionary defines, "sarcasm is an activity of saying or writing the opposite of what you mean, or of speaking in a way intended to make someone else feel stupid or show them that you are angry” (Macmillan, 2007). For example: “it feels great being bored”. In this example, the literal meaning of the sentence is different than what the speaker intends to say using sarcasm. While writing sarcasm in text, people often use only positive words to convey a negative opinion instead of real negative. However, the average human reader face problem in sarcasm detection for social media content such as tweets, reviews, blogs, online forums, etc. In this paper, the tweets of Twitter are used as the dataset for sarcasm detection. Twitter is a microblogging social networking site where a user can read and post the messages. The Twitter allow posting a message of a limited length of 140 characters. Due to the limitations, the users often use symbolic and figurative texts to express their feelings such as @username, smilies, emoji, exclamation mark and interjection words. Recognition of these symbolic and figurative texts in tweets are the most arduous task in NLP. In the realm of Twitter, the author has observed several types of sarcastic tweets as shown in Table 1 that occurred frequently. Table 1. Various types of sarcastic tweets. T1 T2 T3 T4 T5 T6 T7
Sarcasm as a contradiction between positive sentiment and negative situation. Sarcasm as a contradiction between negative sentiment and positive situation. Tweets that starts with interjection word. Sarcasm as a contradiction between likes and dislikes. Sarcasm as a contradiction between tweet and the universal facts. Sarcasm as a contradiction between tweet and its temporal facts. Positive tweet that contains a word and its antonym pair.
The author has already discussed sarcasm type T1, T2 and T3 in their previous article (Bharti, 2015). The remaining types of sarcastic tweets are as follows:
Type T4 depicts the users’ likes and dislikes behavior while posting the tweets on Twitter. To learn the behavioral habit of a particular Twitter user, one can analyze and observe his likes and dislikes habit. Using these likes and dislikes lists of a particular user, one can analyze the sarcastic tweets from the particular user's account. For example, if any Sachin Tendulkar fan who likes to post the tweet about Sachin and his Twitter account consist a tweet like “I love to see Sachin’s failure in batting” which contradict his like’s habit so, one can be easily identify that given tweet is sarcastic. Type T5 says about tweets contradicting universal facts. For example, if any user has posted a tweet “sun is revolving around the earth” and the fact is earth revolves around the sun. In such cases, user is intentionally negating the universal facts in their tweet, and there is a high probability that the tweet is sarcastic. Type T6 says about tweets that contradict temporal facts, which may change over a period. For example, "India had an amazing win in cricket world cup 2015 final", while the fact is Australia won the cricket world cup in 2015 and India won the cricket world cup in 2011. If this type of contradiction occurs, then the tweet will be sarcastic. Type T7 discuss about the positive tweets that contains an antonym pair of either adjective, verb, noun or adverb. For example, “when I am online, she is offline, and when
78 tweets that based on universal facts
31
1
23
23
135 tweets that contains antonym pair
105
3
5
22
91 tweets that based on time dependent facts
41
3
25
22
Table 5: A comparison of proposed approaches with state-of-the-art. Approaches Precision Recall (%) (%) Barbieri system 87 86 Tungthamthiti system 75 75 Riloff with positive verb 27 44 with negative situation 28 37 Contrast (+VPs, -situation)unordered 10 55 Contrast (+VPs, -situation)ordered 08 69 Contrast (+preds, -situation) 12 62 Liebrecht system with 50/50 74 with 25/75 negative, positive ratio 55 PBALN with sarcastic tweets 82 90 PBALN without sarcastic tweets 65 76 TSIW sarcastic tweets 86 97 TSIW without sarcastic tweets 78 74 TCULD from first user's account 72 92 TCULD from second user's account 91 77 TCULD from third user's account 92 73 CTUFs based on universal facts 57 96 PTCAP tweets that contains antonym pair 97 95 CTTFs based on time dependent facts 62 93
F- measure (%) 87 75 34 32 17 14 21 85 70 91 75 81 84 82 72 96 74
The author made a comparison of the proposed algorithms (PBALN, TSIW, TCULD, CTUFs, PTCAP, and CTTFs) with some of the state-of-the-arts as given in Table 5 and observed that the obtained result attains considerable improvements. Results of algorithms such as TSIW, TUCF, PTCAP and CTTFs are proposed for the first time.
CONCLUSION There is no formal structure of sarcasm. Due to this, analysis of sarcasm is difficult for researchers in any form of data such as text or voice. However, continuous research is going on in this era for betterment. In this paper, the author proposed six algorithms and tried to provide complete solutions for sarcasm detection in Twitter data i.e. tweets. They mainly focused on tweet structure and observed various types of sarcasm occurrences in tweets as shown in Table 3. Based on these sarcasm types, the author has proposed set of algorithms. Experimental results of the proposed algorithm attained significant accuracy as PBALN attains 0.90 precision and 0.85
F1-measure. Testing results of TSIW prove Lunando’s statement “if tweet contains any interjection word such as “wow, aha, yay, etc.” then that tweet has a higher chance to be classified into sarcastic as it attains 0.97 recalls and 0.91 F1-measure. TCULD attains 0.92 precision and 0.84 F1-measure. CTUFs and TCTDC reach the precision value 0.96 and 0.93 respectively, while PTCAP achieves 0.97, 0.95, 0.96 precision, recall, and F1-measure respectively. Sarcasm detection in the images, audio clip and videos are still the uncharted areas to continue the research. In the future, the author will target in this field with some regional languages.
REFERENCES Barbieri, F., Saggion, H., & Ronzano, F. (2014). Modelling Sarcasm in Twitter, a Novel Approach. ACL 2014, 50. Bharti, S. K., Babu, K. S., & Jena, S. K. (2015, August). Parsing-based Sarcasm Sentiment Recognition in Twitter Data. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (pp. 1373-1380). ACM. Bharti, S. K., Vachha, B., Pradhan, R. K., Babu, K. S., & Jena, S. K. (2016). Sarcastic sentiment detection in tweets streamed in real time: a big data approach. Digital Communications and Networks 2016, 2(3), 108-121. Bharti, S. K., Pradhan, R. K., Babu, K. S., & Jena, S. K. (2017). Sarcasm Analysis on Twitter Data Using Machine Learning Approaches. Trends in Social Network Analysis: Information Propagation, User Behavior Modeling, Forecasting, and Vulnerability Assessment 2017, 51-76. Carvalho, P., Sarmento, L., Silva, M. J., & De Oliveira, E. (2009, November). Clues for detecting irony in user-generated contents: oh...!! It’s so easy ;-). In Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion (pp. 53-56). ACM. Chaumartin, F. R. (2007, June). UPAR7: A knowledge-based system for headline sentiment tagging. In Proceedings of the 4th International Workshop on Semantic Evaluations (pp. 422425). Association for Computational Linguistics. Davidov, D., Tsur, O., & Rappoport, A. (2010, July). Semi-supervised recognition of sarcastic sentences in twitter and amazon. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning (pp. 107-116). Association for Computational Linguistics. Filatova, E. (2012, May). Irony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing. In LREC (pp. 392-398). González-Ibánez, R., Muresan, S., & Wacholder, N. (2011, June). Identifying sarcasm in Twitter: a closer look. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2 (pp. 581586). Association for Computational Linguistics. Justo, R., Corcoran, T., Lukin, S. M., Walker, M., & Torres, M. I. (2014). Extracting relevant knowledge for the detection of sarcasm and nastiness in the social web. Knowledge-Based Systems, 69, 124-133. Kreuz, R. J., & Roberts, R. M. (1995). Two cues for verbal irony: Hyperbole and the ironic tone of voice. Metaphor and symbol, 10(1), 21-31.
Kreuz, R. J., & Caucci, G. M. (2007, April). Lexical influences on the perception of sarcasm. In Proceedings of the Workshop on computational approaches to Figurative Language (pp. 1-4). Association for Computational Linguistics. Kunneman, F., Liebrecht, C., van Mulken, M., & van den Bosch, A. (2014). Signaling sarcasm: From hyperbole to hashtag. Information Processing & Management. Liebrecht, C., Kunneman, F., & Bosch, A. (2013). The perfect solution for detecting sarcasm in tweets\# not. In proceedings of the Association for Computational Linguistics, pp. 29-37. Liu, P., Chen, W., Ou, G., Wang, T., Yang, D., & Lei, K. (2014). Sarcasm Detection in Social Media Based on Imbalanced Classification. In Web-Age Information Management (pp. 459-471). Springer International Publishing. Lukin, S., & Walker, M. (2013, June). Really? Well apparently bootstrapping improves the performance of sarcasm and nastiness classifiers for online dialogue. In Proceedings of the Workshop on Language Analysis in Social Media (pp. 30-40). Lunando, E., & Purwarianti, A. (2013, September). Indonesian social media sentiment analysis with sarcasm detection. In Advanced Computer Science and Information Systems (ICACSIS), 2013 International Conference on (pp. 195-198). IEEE. Maynard, D., & Greenwood, M. A. (2014, May). Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis. In Proceedings of LREC. Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational linguistics, 19(2), 313-330. Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up? : Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86). Association for Computational Linguistics. Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates, 71, 2001. Rajadesingan, A., Zafarani, R., & Liu, H. (2015, February). Sarcasm detection on Twitter: A behavioral modeling approach. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (pp. 97-106). ACM. Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N., & Huang, R. (2013). Sarcasm as Contrast between a Positive Sentiment and Negative Situation. In EMNLP (pp. 704-714). Rusu, D., Dali, L., Fortuna, B., Grobelnik, M., & Mladenic, D. (2007, July). Triplet extraction from sentences. In Proceedings of the 10th International Multiconference" Information SocietyIS (pp. 8-12). Strapparava, C., & Valitutti, A. (2004, May). WordNet Affect: an Affective Extension of WordNet. In LREC (Vol. 4, pp. 1083-1086). Tsur, O., Davidov, D., & Rappoport, A. (2010, May). ICWSM-A Great Catchy Name: SemiSupervised Recognition of Sarcastic Sentences in Online Product Reviews. In ICWSM. Tayal, D. K., Yadav, S., Gupta, K., Rajput, B., & Kumari, K. (2014, March). Polarity detection of sarcastic political tweets. In Computing for Sustainable Global Development (INDIACom), 2014 International Conference on (pp. 625-628). IEEE.
Tungthamthiti, P., Shirai, K., & Mohd, M. (2014). Recognition of Sarcasm in Tweets Based on Concept Level Sentiment Analysis and Supervised Learning Approaches. In Proceedings of Pacific Asia Conference on Language, Information and Computing, Phuket, Thailand. Utsumi, A. (2000). Verbal irony as implicit display of ironic environment: Distinguishing ironic utterances from nonirony. Journal of Pragmatics, 32 (12), 1777-1806. Verma, N., & Bhattacharyya, P. (2004). Automatic lexicon generation through WordNet. GWC 2004, 226.