Classification of Amazon Book Reviews Based on Sentiment Analysis K. S. Srujan(B) , S. S. Nikhil, H. Raghav Rao, K. Karthik, B. S. Harish, and H. M. Keerthi Kumar Department of Information Science and Engineering, Sri Jayachamarajendra College of Engineering, Mysuru 570006, Karnataka, India {snkr.is.2017, ssnikhilkumar, raghav.rao9, jckarthikk, hmkeerthikumar}@gmail.com,
[email protected]
Abstract. Since the dawn of internet, e-shopping vendors like Amazon have grown in popularity. Customers express their opinion or sentiment by giving feedbacks in the form of text. Sentiment analysis is the process of determining the opinion or feeling expressed as either positive, negative or neutral. Capturing the exact sentiment of a review is a challenging task. In this paper, the various preprocessing techniques like HTML tags and URLs removal, punctuation, whitespace, special character removal and stemming are used to eliminate noise. The preprocessed data is represented using feature selection techniques like term frequency-inverse document frequency (TF–IDF). The classifiers like K-Nearest Neighbour (KNN), Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF) and Naive Bayes (NB) are used to classify sentiment of Amazon book reviews. Finally, we present a comparison of (i) Accuracy of various classifiers, (ii) Time elapsed by each classifier and (iii) Sentiment score of various books.
1
Introduction
Amazon, one of the most popular e-commerce companies, has its presence across the globe. It allows customers to express their opinions by rating and reviewing the products they purchase. By the virtue of its popularity, it gathers huge amount of data every day. These reviews can be in the form of text or photos. Analysis of these reviews attracts researches from all over the world. Sentiment analysis is the process of determining the opinion or feeling of a piece of text [1,2]. The opinion can be either positive, negative or neutral [3]. Sentiment analysis uses techniques like natural language processing (NLP), computational linguistic to identify subjective information. Opinion mining or sentiment analysis is employed by either lexicon based, machine learning based or hybrid/combined analysis. Lexicon-based approaches focus on dictionary for predicting sentiment, whereas machine learning-based approaches focus on testing data against trained data for prediction of opinion. Hybrid approaches combines c Springer Nature Singapore Pte Ltd. 2018 V. Bhateja et al. (eds.), Information Systems Design and Intelligent Applications, Advances in Intelligent Systems and Computing 672, https://doi.org/10.1007/978-981-10-7512-4_40
402
K. S. Srujan et al.
features of both. Human beings are pretty good at determining the sentiment. We can look at the review and immediately know if it is negative or positive. Companies across the world have implemented machine learning to do sentiment analysis automatically. It is useful for gaining insight into customer opinions. After analyzing the reviews, we can identify customer’s opinion about the product [4–6]. By sentiment analysis, companies can build recommendation systems or better targeted marketing campaigns. The complexity in sentiment analysis includes removing noisy data from raw dataset, selecting suitable features for representation and choosing appropriate classifier. In this paper, we apply various preprocessing methods and use different classifiers to classify book reviews as either positive or negative class. The main objective of this work is to present a comparative study on different classifiers based on accuracy and processing time taken by each classifier. Finally, we focus on comparing sentiment score for different books. There are two basic sentiments positive and negative, and eight basic emotions namely joy, anger, sadness, trust, surprise, disgust, fear and anticipation. The rest of the paper is organized as follows: in Sect. 2, we give a brief overview of the literature that has been done related to sentiment analysis. Preprocessing steps and classification models are presented in Sect. 3. The detailed experimental analysis is presented in Sect. 4. Finally, Sect. 5 concludes with future work.
2
Literature Survey
Plenty of research has been undertaken in the domain of sentiment analysis and text classification. Categorization of sentiment polarity is one of the fundamental problem in sentiment analysis [7–10]. Given a part of text, the task is to categorize whether the text is positive or negative. There are different levels of sentiment polarity classification namely the entity or aspect level, the sentence level and document level [11]. The entity type focuses on what exactly people like or dislike from their opinions. The document level considers the polarity, i.e. positive or negative sentiment for the entire document, while the sentence level deals with each sentence’s sentiment categorization. Many researchers [3,12–16] across the globe have conducted research based on supervised, semi-supervised and unsupervised machine learning approaches. Bhatt et al. in [14] proposed a system for sentiment analysis on iphone 5 reviews. The methodology integrates various preprocessing techniques to reduce noisy data like HTML tags, punctuations and numbers. The features are extracted using part-of-speech (POS) tagger and rule-based method are applied to classify the reviews into different polarity. Emma et al. in [15] explore the role of text preprocessing on online movie reviews. The various preprocessing techniques such as data cleaning, stemming and removal html tags are used to remove noisy data. The irrelevant features are eliminated by using chi-square feature selection technique. The Support Vector Machine (SVM) is used to classify the reviews into positive or negative classes. In [3], Tripathy et al. presented a comparison
Classification of Amazon Book Reviews Based on Sentiment Analysis
403
of different classifiers based on accuracy for movie review dataset. The methodology incorporated various preprocessing techniques to reduce noisy data like whitespaces, numbers, stop word removal and vague information removal. The features are extracted and represented by count vectorizer and TF–IDF. Naive Bayes (NB) and Support Vector Machine (SVM) are used to classify the data as positive or negative. By comparing accuracy of NB with SVM, SVM achieved accuracy of 94%. Turney et al. [13,17] presented an unsupervised learning algorithm for rating a review as thumbs up or thumbs down. The algorithm uses POS tagger to extract the adjectives or adverbs from epinion customer reviews. The pointwise mutual information and information retrieval (PMI-IR) algorithm is used to calculate the semantic orientation of each phases and classify the review based on the average semantic orientation phrases. Mohammad et al. in [12] developed a system to generate emotion association lexicon of words. The system incorporates Amazon’s crowd service platform called mechanical turk. The emotion lexicon consists of list of words and their associations with two sentiments (negative and positive) and eight emotions (sadness, joy, trust, surprise, disgust anger, fear and anticipation). The syushet package [18] in R Language implements emotion lexicon developed by National Research Council (NRC), Canada. In this work, we present a comparative study on (i) Accuracy of various classifiers, (ii) Time elapsed by each classifier, (iii) Sentiment score of various books. In next section, we discuss methodology and classifiers used in our experiment.
3
Methodology
The Amazon book reviews dataset [19] is considered for our analysis. The methods include preprocessing, representation and classification. The unstructured Amazon book reviews are preprocessed using various preprocessing techniques like data cleaning, URL/HTML tag removal, punctuation and number removal, stop word removal and stemming. The preprocessed text are represented using TF–IDF representation model. Classifiers like K-Nearest Neighbours (KNN), Random Forest (RF), Naive Bayes (NB), Decision Trees and Support Vector Machine (SVM) are used to classify the dataset into positive and negative class. 3.1
Preprocessing
Data preprocessing involves transforming raw data into a coherent format. Data preprocessing is a proven method for refining data. The raw dataset looks like “4.0/gp/customer-reviews/R2BBCQKO693KA4?ASIN=1491590173 FiveStars
Great read beginning to end. Mr. Weir knows his science ” which contains noise and vague information and hence needs to be cleaned. Following are the preprocessing techniques incorporated for Amazon book review dataset. Data cleaning: it is a process of finding and eliminating inaccurate, useless and corrupt records from the Amazon book review dataset. Abstract contents like
404
K. S. Srujan et al.
“gp/customer-reviews//R2BBCQKO693KA4?ASIN” were eliminated from the dataset as they do not depict any sentiment. Removal of HTML tags and URL’s: Amazon book review dataset has many html tags like “
”, “
” and URLs having prefix like “http”, “ftp”, “https”. They do not convey any sentiment and hence it is removed. Punctuation and special character removal: punctuation such as full stop (.), comma (,) and brackets () are used in writing to separate sentences are removed as they do not denote any sentiment. Along with that, special characters like “%”, “#”, “$” used to denote cost, percentage, comment are also eliminated. Removal of numbers and white-spaces: Amazon book reviews dataset contains page numbers, date, etc. They are eliminated as they do not convey any sentiment. Whitespaces including “/t” (tab) are eliminated as they do not indelicate any sentiment. Stemming: stemming is used to extract the root of a word. For instance, the root word or stem of words such as “satisfaction”, “satisfied” and “satisfying” is “satisfy”. Stemming reduces indexing size around 30–50% [20]. Stop word removal: stop words are mostly prepositions, articles, conjunctions like the, is, at, which, and on which appears repeatedly in a sentence but they do not denote any sentiment. Hence these stop words, which do not convey any meaning independently, are removed. We convert all the letters to lower case like “STAR” to “star”, which eliminates redundant words in dataset. By enforcing various preprocessing methods, we are removing noisy data. Further, preprocessed text are given suitable representation for the classification. 3.2
Representation
Representation is the important step in sentiment classification [21]. Generally, raw data contains noise and is refined by applying various preprocessing methods. The preprocessed data is converted into term document matrix (TDM) which computes frequency of each word. Feature extraction methods like bag of words and TF–IDF are implemented on the TDM. TF–IDF contains the two factors TF and IDF, by multiplying these two factors we will get the TF–IDF score of a word. TF score assigns weight to most frequently occurring words in the book review dataset. IDF is the scaling factor which assigns weight to the least frequent words in the dataset. For rare and frequent words, this score is less compare to the other words. We can eliminate them by ignoring the words with less TF–IDF scores. 3.3
Classification
Classification is the method of grouping target function into different classes. The focus of the proposed work is on sentiment analysis. We use five classifiers like KNearest Neighbours (KNN), Random Forest (RF), Naive Bayes (NB), Decision Tree and Support Vector Machine (SVM). The various classifiers are evaluated to give a comparative study of accuracies for grouping dataset into positive or
Classification of Amazon Book Reviews Based on Sentiment Analysis
405
negative. Each classifier has its significance and utility, but few perform better for a given dataset. For example, NB is a probabilistic classifier which uses the properties of Bayes theorem assuming strong independence between the features [22]. One of the advantage of this classifier is that it requires small amount of training data to calculate the parameters for prediction. SVM classifier represents each review in vectorized form as a data point in the space. SVM is used to analyse the complete vectorized data, and the intent behind training of model is to find a hyper plane [13]. KNN is a non-parametric method used for classification. It takes k closest training set as input and outputs as class membership [10]. Random Forest is an ensemble method which constructs multiple decision trees during training time and outputs label of the class [23].
4
Experimental Evaluation
In this section, we present detailed description of dataset, experimental procedures carried out and discussion on the results obtained. 4.1
Dataset Description
Our work focuses on Amazon book review dataset [19] available at UCI repository. The dataset consists of 213,335 book reviews of eight most popular books namely, “Gone Girl”, “The Girl on the Train”, “The Fault in our Stars”, “Fifty Shades of Grey”, “Unbroken”, “The Hunger Games”, “The Goldfinch” and “The Martian”. Table 1. Number of positive class and negative class reviews for each book The Martian
Negative 523
The Fifty Goldfinch Shades of Grey
Gone Girl
The Fault in Our Stars
Unbroken The The Girl Hunger on Games The Train
3708
9504
4410
322
1206
3110
11939
7400
628
4163
3185
Neutral
3010
5388
2494
5011
Positive
19038
13765
18544
27459 30228 22444
25576 20214
Total
22571
22861
32977
41974 35844 25876
37139 24027
Table 1 depicts the number of positive class and negative class reviews for each book. Total number of reviews is given in the fourth row for each column. Each entry contains four attributes, review score, tail of review URL, review title and HTML of review text, and each entry is separated by a newline character. An example of a review is shown in Sect. 3.
406
K. S. Srujan et al.
4.2
Experimentation
In the experiment, we consider two class problem, i.e. positive and negative class. Two thousand sample reviews for each book is selected randomly such that it has about 50% positive opinions and 50% negative opinions. Each review consists of rating and text reviews by the user. These reviews are grouped into two classespositive and negative based on their ratings. If the review has a rating of 1 or 2, it is grouped into negative class and the reviews consisting of ratings 4 and 5 are grouped as positive. The corpus is divided in the ratio of 60:40 for training and testing, respectively. The various preprocessing techniques like data cleaning, URL and HTML tag removal, whitespace removal, stop word removal, stemming are applied on the dataset . The training and testing corpuses are converted to individual term document matrix (TDM), and frequency of each word is computed. We use bag of words model to build a corpus, from the preprocessed data. TF–IDF is used for representation. The TDM is then fed as an input to various classifiers such as KNN, SVM, Naive Bayes, Decision Tree and Random Forest. The accuracy of each classifier is computed by dividing the number of correct classification with the total number of tuples in the input data set. Accuracy =
N umber of correct classif ication N umber of tuples in the test set
(1)
In Table 2, we observe that Random Forest gives better result in two class classification problem for the dataset. The TDM consists of feature vector values called points. Significant improvements in classification accuracy have resulted from building an ensemble of trees and letting them to vote for the most popular class. In order to grow these ensembles, often random vectors are generated. On an average, Random Forest was able to classify for about 88.86% accurately for eight books and reached maximum accuracy of 94.72% for the book “Unbroken”. “Goldfinch” and “Gone girl” got highest accuracy for KNN. This is because KNN is non-parametric, i.e. it makes no assumption about the data distribution. Those book reviews has points near to the neighbour. Hence, achieved highest accuracy. Table 2. Accuracy achieved using various classifiers for different books Book
KNN
Decision tree Naive Bayes SVM Random forest
1. The Hunger Game
85.50
88.44
88.44
89.24 89.64
2. The Girl on Train
84.51 82.60
82.60
82.60 86.20
3. Goldfinch
90.00 75.40
81.20
84.20 84.00
4. Gone Girl
84.91
79.68
79.68 82.45
66.88
5. The Martian
86.58
53.01
85.94
85.94 91.16
6. Fifty Shades of Grey
86.22
50.80
52.20
65.80 86.60
7. Unbroken
71.90
84.64
94.62
94.60 94.72
8. The Fault in Our Stars 87.10
57.00
93.80
93.80 94.40
Classification of Amazon Book Reviews Based on Sentiment Analysis
407
We also compare the time elapsed for two class classification problem. Time taken by classifier to group into positive and negative class for different books is shown in Fig. 1. Customer can write a review in a single sentence or in a paragraph. If reviews have multiple sentences, then the time taken to classify is significantly high. Hence, some books take more time than other books. Figure 1 shows time taken in seconds for classifying either as positive or negative class for different books. Amongst all the classifiers implemented, Random Forest takes more execution time, because it needs to grow multiple trees or ensembles.
Fig. 1. Time taken by classifier
Sentiment analysis involves the study of various sentiments expressed by the customer. These sentiments can be related to anger, sadness, joy, trust, surprise, disgust, fear and anticipation [10]. Syuzhet package [18] in R language is used for our analysis. The input data is preprocessed using various data cleaning techniques. Term document matrix (TDM) is created which indicates frequency of each word involved in the text. After constructing the TDM, we fetch sentiment word from text using National Research Council (NRC), Canada, emotion lexicon dictionary. Later we count sentiment words by category. Finally, we generate sentiment score for each book. Figure 2 shows the sentiment score of the book “The Hunger Games”, which is an adventure novel, thus evoking emotions such as positive, anticipation and emotions related to trust. There is also a sense of trust amongst the readers. Single word can be mapped to multiple emotion. For example, abandoned is mapped to anger, fear and sadness. But abandonment is mapped to anger, fear, sadness and surprise. This mapping is present in NRC emotion dictionary. Accuracy =
N umber of W ord count f or particular emotiom T otal N umber of W ord count f or that book
(2)
Table 3 illustrates the percentage of each emotions evoked amongst its readers by a book. Percentage is obtained by dividing word count of different emotions
408
K. S. Srujan et al.
Fig. 2. Illustrates sentiment score for the book “The Hunger Games”
such as joy, anger, surprise, anticipation, sadness and trust expressed by the readers to total word count for a particular book. It shows, clearly, that thrillers such as “Gone Girl”, “The Girl on Train” are high on words which are used to express surprise, fear and anticipation and books such as “Fifty shades of grey” and “Gone girl” have generally evoked negative, anger and disgust amongst its readers. Novels such as “Fault in our stars” and “Unbroken” made the readers feel sad, positive and trust. Adventure books such as “The Hunger Games”, “The Martian” and “The Goldfinch” garnered more words on positive and anticipation. 4.3
Discussion
In this paper, we have evaluated various preprocessing techniques, feature extraction for classifying the text as either positive or negative class. We present a comparative study of (i) Accuracy of classifiers, (ii) Time elapsed by each classifier and (iii) Sentiment score of various books. On the basis of results obtained, we can observe from two Random Forests compared to Naive Bayes, which offers consistent and marked improvements in accuracy but requires more processing time. If we have multiple points in a low-dimensional space, then KNN is considered as a good choice. Thus, we infer that SVM with a kernel and Random Forest (RF) are the best choices for most of the problems. Random Forest is known to be a reliable and efficient algorithm. It is versatile robust and performs well when it comes to classification task. In general, they require very little feature engineering and parameter tuning. By computing sentiment score, we can get to know what kind of that emotion customers evoked after reading a book. Comparison helps for judging the books based on emotion. Thus, proposed model
11.42
5.91
11.57
Surprise
Trust
8.53
6.80
4.46
Sadness
Disgust
6.60
Anger
Fear
14.52
13.77
Negative
5.36
7.84
6.91
5.86
11.26
Anticipation 11.04
6.23
8.65
9.35
Joy
21.94
21.96
Positive
3.39
5.11
5.21
4.15
11.93
12.15
13.60
5.99
11.47
27.00
5.69
8.95
6.69
5.39
14.17
8.89
11.56
5.04
11.59
22.04
5.62
7.42
7.26
6.54
14.19
11.06
11.50
5.75
8.94
21.72
3.38
5.78
6.06
4.62
12.40
11.32
13.45
5.81
10.38
26.81
6.87
7.07
6.45
6.79
16.92
10.30
11.81
4.82
9.20
19.77
4.16
7.04
9.38
6.17
13.17
8.69
13.66
4.67
9.07
24.00
The Hunger Games The Girl on Train Goldfinch Fault in Our Star Gone Girl Martian Fifty Shades Unbroken
Table 3. Percentage of distribution of each emotions evoked amongst readers by a book
Classification of Amazon Book Reviews Based on Sentiment Analysis 409
410
K. S. Srujan et al.
can be incorporated along with recommendation algorithms for giving better insights of a product for online customers.
5
Conclusion and Future Scope
Sentiment analysis is important for any online retail company to understand customer’s response. Reviews need to be analysed for building recommendation algorithms to target customers based on their needs. Amazon, one of the ecommerce giant, generates huge amount of feedback data. In our work, we present the analyses of customer book reviews from amazon.com. This work presents a comparative study of accuracy of various classifiers. Random Forest was able to give average accuracy of 90.15% for six books. By analyzing processing time of various classifiers, Random Forest ranked the highest. Sentiment analysis by making use of NRC Emotion Lexicon has helped us to gauge the predominant sentiment amongst the readers of various bestselling books on Amazon. The sentiments were classified on the basis of different emotions. After classification, we also analysed the results graphically and arrived at the conclusion as to which book evokes what kind of emotion predominate amongst its readers. The scope of this work can be extend by incorporating various feature selection techniques like mutual information (MI), chi-squared test and information gain for better representation. We can use hybrid classifiers like SVM with other combination to enhance accuracy. Better recommendation algorithm can be built by considering emotion evoked by customer reviews.
References 1. Pang, B., and Lee, L.: A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on Association for Computational Linguistics, pp. 271 (2004). 2. Pang, B., and Lee, L.: A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, in Proceedings of the 42nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp. 271 (2004). 3. Tripathy, Abinash, Agarwal,A. and Santanu Kumar Rath.: Classification of Sentimental Reviews Using Machine Learning Techniques. Procedia Computer Science, Vol. 57, pp. 821–829 (2015). 4. Smithikrai, C.: Effectiveness of teaching with movies to promote positive characteristics and behaviors. Procedia-Social and Behavioral Sciences, Vol. 217, pp. 522–530 (2016). 5. Shruti, T., and Choudhary, M.: Feature Based Opinion Mining on Movie Review. International Journal of Advanced Engineering Research and Science, Vol. 3, pp. 77–81. (2016). 6. Hu, M., and Liu, B.: Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining ACM, pp. 168–177 (2004). 7. Pang, B., and Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, Vol. 2, No. 1–2, pp. 1–135 (2008).
Classification of Amazon Book Reviews Based on Sentiment Analysis
411
8. Chesley, P., Vincent, B., Xu, L., and Srihari, R. K.: Using verbs and adjectives to automatically classify blog sentiment Training, Vol. 580, No. 263, pp. 233 (2006). 9. Choi, Y., and Cardie, C.: Adapting a polarity lexicon using integer linear programming for domain-specific sentiment classification. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Vol. 2, pp. 590–598 (2009). 10. Tan, L. K. W., Na, J. C., Theng, Y. L., and Chang, K.: Sentence-level sentiment polarity classification using a linguistic approach. In International Conference on Asian Digital Libraries Springer Berlin Heidelberg, pp. 77–87 (2011). 11. Liu, B.: Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, Vol. 5, No. 1, pp. 1–167 (2012). 12. Mohammad, S. M., and Turney, P. D.: Crowdsourcing a wordemotion association lexicon. Computational Intelligence, Vol. 29, No. 3, pp. 436–465 (2013). 13. Turney P. D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp. 417–424 (2002). 14. Bhatt, Aashutosh, Chheda,H., Gawande,K.: Amazon Review Classification and Sentiment Analysis. International Journal of Computer Science and Information Technologies, Vol. 6, No. 6, pp. 5107–5110 (2015). 15. Haddi, Emma, Liu,X. and Yong Shi.: The role of text pre-processing in sentiment analysis. Procedia Computer Science, Vol. 17, pp. 26–32 (2013). 16. Anand, D., and Naorem, D.: Semi-supervised Aspect Based Sentiment Analysis for Movies Using Review Filtering. Procedia Computer Science, Vol. 84, pp. 86–93 (2016). 17. Mohammad, S. M., and Turney, P. D.: Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. In Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text Association for Computational Linguistics, pp. 26– 34 (2010). 18. https://cran.r-project.org/web/packages/syuzhet/syuzhet.pdf 19. https://archive.ics.uci.edu/ml/datasets/Amazon+book+reviews 20. Vijayarani, S., Ilamathi, M. J., and Nithya, M.: Preprocessing Techniques for Text Mining-An Overview. International Journal of Computer Science and Communication Networks, Vol. 5, No. 1, pp. 7–16 (2015). 21. Trstenjak, B., Mikac, S., and Donko, D.: KNN with TF-IDF based Framework for Text Categorization. Procedia Engineering, Vol. 69, pp. 1356–1364 (2014). 22. McCallum A., Nigam.K .:A comparison of event models for naive bayes text classification, in AAAI-98 workshop on learning for text categorization, Citeseer, Vol. 752, pp. 41–48 (1998). 23. Ho T.K.: Random decision forests. In Document Analysis and Recognition, Proceedings of the Third International Conference on IEEE, Vol. 1, pp. 278–282 (1995).