Ensemble of Classification Algorithms for Subjectivity and Sentiment ...

3 downloads 46755 Views 739KB Size Report
The best results obtained for the subjectivity analysis and the sentiment classification in terms of macro-F1 are 97.13% and. 90.95% respectively. Keywords: ...
Ensemble of Classification Algorithms for Subjectivity and Sentiment Analysis of Arabic Customers' Reviews Nazlia Omar,Mohammed Albared, Adel Qasem Al-Shabi,Tareq Al-Moslmi

Ensemble of Classification Algorithms for Subjectivity and Sentiment Analysis of Arabic Customers' Reviews 1*

2

Nazlia Omar, Mohammed Albared 3

4

Adel Qasem Al-Shabi, Tareq Al-Moslmi 1,2,3,4 Center for Artificial Intelligence, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi,Selangor, Malaysia. 1* [email protected], [email protected], 3 [email protected], [email protected]

Abstract Sentiment Analysis is a very challenging and important task that contains natural language processing, web mining and machine learning. Up to date, few researches have been conducted on sentiment classification for Arabic languages due to the lack of resources for managing sentiments or opinions such as senti-lexicons and opinion corpora. The main obstacle in Arabic sentiment analysis is that phrases and words that are used by Arabic web users to express sentiments are highly subjected to usage trends. In addition, the use of dialectal phrases and words contributes to ambiguity in the analysis of Arabic sentiments and opinions. To antidote this shortage, this study proposes an ensemble of machine learning classifiers framework for handling the problem of subjectivity and sentiment analysis for Arabic customer reviews. First of all, three renowned text classification algorithms, called Naive Bayes, Rocchio classifier and support vector machines, are adopted as base-classifiers. Second, we make a comparative study of two kinds of ensemble methods, namely the fixed combination and meta-classifier combination. The experimental results show that the ensemble of the classifiers improves the classification effectiveness in terms of macro-F1 for both levels. The best results obtained for the subjectivity analysis and the sentiment classification in terms of macro-F1 are 97.13% and 90.95% respectively.

Keywords: Sentiment Analysis, Supervised Approaches, Ensemble Technique, Arabic Customers' Reviews

1. Introduction The rapid noticeable growth of the user-generated content on the Web has given a rich data source for mining opinions. Now, people can know others’ opinions about different topics from their reviews on the web even without knowing them. People can also write about their opinions and sentiments regarding a specific product or service. Such opinions could be used by both individuals and organizations. Mostly before buying a product, people nowadays search online to see the reviews of customers about the product and their decisions are affected by those reviews. Using the same reviews, companies and organizations can get a feedback about their services or products which help them improve their products or services. From this point, the interest in using systems to pull out sentiment from texts that exist on the web has already extended into a major research work. That work includes a large variety of methodological approaches and potential applications within natural language processing and data mining. Researchers proposed many different approaches and variations of solutions to tackle the task of extracting sentiments from user-generated text and analyze them. Sentiment analysis solutions can be divided according to the scope of input such as document-level, sentence-level, and word level sentiment analysis. Additionally, it can be divided according to the approaches used to solve the problem of sentiment classification into two main categories. The first category proposes treating the problem as a text classification task and thus solving it using machine learning techniques. The second category performs mainly unsupervised techniques and it is more concerned about linguistically investigating the text to identify words, phrases or sentences that have semantic orientation [1].

International Journal of Advancements in Computing Technology(IJACT) Volume5, Number14, October 2013

77

Ensemble of Classification Algorithms for Subjectivity and Sentiment Analysis of Arabic Customers' Reviews Nazlia Omar,Mohammed Albared, Adel Qasem Al-Shabi,Tareq Al-Moslmi

This paper proposes an ensemble of machine learning classifiers framework for handling the problem of subjectivity and sentiment analysis of Arabic customer reviews. Firstly, three famous text classification algorithms, namely Naïve Bayes, Rocchio classifier and support vector machines are utilized as base-classifiers. Secondly, we make a comparative study of two forms of ensemble methods, namely the fixed combination and meta-classifier combination. Three meta-classifier combination ensemble methods with Naïve bayes, logistic regression and SVM are investigated. The rest of the paper is arranged as the following: section two consists of related literature, section three contains the methodology employed in this paper, section four is an experimental setup, section five gives the results of the experiments and section six concludes the paper.

2. Related Work Subjectivity and Sentiment Analysis is a task of an interdisciplinary challenge. It includes three components namely natural language processing, web mining and machine learning. It is a complex task in which several separate tasks are encompassed. These tasks can be executed at several levels from different perspectives such as document level, sentence level, and word level. Generally, there are two prime machine learning approaches used for Sentiment Analysis. These approaches are called supervised and unsupervised[2][3]. Most of the works of Sentiment Analysis concentrated on English language and other Indio-European languages. Even though Arabic is considered to be among the top ten languages that most used on the Internet according to the Internet World Stats[4]. Only few researches have been performed on Arabic. The work of Rushdi et al. [5] uses machine learning classifiers by using both Arabic and English corpora. They employ two machine learning classifiers namely, Support Vector Machines (SVMs) and Naive Bayes (NB) classifiers. The results obtained show that SVMs outperform the NB classifier and also there is no a big difference between using the term frequency (TF) and the term frequency-inverse document frequency (TF-IDF) for weighting methods. Elhawary and Elfeky [6] present sentiment analysis method on Arabic business reviews. A multilabel classifier is used in this part to assign a tag from the set {review, forum, blog, news, shopping store} to a document. Abdul-Mageed et al [7] designed SVM-based system for subjectivity and sentiment analysis for Arabic social media genres. The corpus encompasses data in both Modern Standard Arabic as well as dialectal Arabic. The results suggest that they need develop solutions for each domain and task. Up to date, there are only few levels of sentiment have been studied. There is still need to study subjectivity and sentiment analysis in many areas such as customers’ reviews regarding products, services, movies and political reviews. In addition, all levels of sentiment analysis still need to be studied especially document-level, word/phrase -level and feature-level. In this work, we will study subjectivity and sentiment analysis customers’ reviews on document-level.

3. Methodology Machine learning approaches are well-known and broadly used for sentiment analysis. The main benefit of the machine learning method consists in the high accuracy that gives a high-quality training corpus [8]. The subjectivity analysis task is a text classification problem having two labels: relevant or irrelevant. However, the sentiment polarity classification is a three-class text classification problem with three labels: positive, negative or neutral. In this study, we used a two-level classification to Arabic subjectivity analysis and Sentiment analysis in the first stage, subjectivity analyzer based on supervised approaches and ensemble techniques filters the reviews to two categories: relevant and irrelevant. The second stage, a sentiment analyzer based also on supervised approaches and ensemble technique to classify relevant reviews into positive, negative and neutral. In each of the above described levels of classification, a supervised approach and the ensemble technique consists of Naïve Bayes (NB), Support Vector Machines (SVM) and Rocchio classifiers. The main points set behind combining individual classifiers are that each individual classifier generates different types of errors. Also these classifiers are combined to make advantage of their

78

Ensemble of Classification Algorithms for Subjectivity and Sentiment Analysis of Arabic Customers' Reviews Nazlia Omar,Mohammed Albared, Adel Qasem Al-Shabi,Tareq Al-Moslmi

strengths. Ensemble methods become more and more popular since they allow one to overcome the flaws of single supervised approaches. The combined classification always exceeds the best of its individuals in sentiment analysis task [9-14]. The following is a brief description of these individual classifiers used in this work.

3.1. Corpus One of the problems in the area of subjectivity and sentiment analysis of the Arabic text is the unavailability of free corpora specified for subjectivity and sentiment analysis. Thus we decide to create our own subjectivity and sentiment analysis annotated Arabic data. This data is collected from Arabic reviews web sites mainly from jeeran.com. This website contains Arabic reviews for different Arabic services and products from different regions in Arab world. Two raters of native speakers of Arabic were asked to annotate the sentiment of the reviews. They reflected a great deal of agreement in their classification of the reviews. The raters were confused by some reviews; therefore, the researcher requested a college-educated native speaker of Arabic to judge its final sentiment. The collected data is distributed following [15, 16] on four categories as a) objectivity review, b) subjectivity-positive review, c) subjectivity-negative review, and d) subjectivityneutral review. Table 1 shows the distribution of reviews among these categories. Table 1. The reviews categories and their training and the test set Type of Data

Objectivity Reviews

Sub-Pos Reviews

Sub-Neg Reviews

Sub-neu Reviews

Training set

1500

673

525

302

Testing set

225

101

79

45

3.2. Preprocessing Both subjectivity analysis and sentiment analysis using supervised machine learning approach demand preprocessing of raw text data to extract features and hence it can be easily used by machine learning algorithms. For each review in the corpus, the system executes the following tasks: tokenization, function word removal and stemming. Arabic language has many dialects [17]. After that, dialects and slang words are processed. Most of the time, reviewers write their reviews in their dialects. In each dialect, they use different words to express the same opinion. To handle this problem, a lexicon containing dialectical words and their standard Arabic equivalents have been created and used. Table 2 shows sample of the dialects words and the equivalent words in Modern Standard Arabic. In addition to the use of the lexicon, the machine learning classifiers also trained using data from both modern standard Arabic and Arabic dialects. Table 2. Sample of the dialectical words and their standard Arabic equivalents Dialect/slang word Equivalent word In MSA Meaning in English ‫واﯾﺪ ﺣﻠﻮ‬،‫ ﻣﻨﯿﺢ‬، ‫ ﻛﻮﯾﺲ‬، ‫ﺧﻮش‬ ‫ﺟﯿﺪ‬ Good ‫ ﻣﺮة ﺣﻠﻮ‬، ‫ طﻌﯿﻢ‬، ‫ ﺑﺎھﻲ‬، ‫زاﻛﻲ‬ ‫ﻟﺬﯾﺬ‬ Delicious ‫ وﺻﺦ‬، ‫وﺻﺎﺧﺔ‬، ‫ أرف‬، ‫زﻓﺮة‬ ‫ﻣﺘﺴﺦ‬ Dirty Finally, we obtained the standard called Term Frequency-Inverse Document Frequency (TF-IDF) weight which is a common weight presentation of terms frequently used in text classification to represent the collected reviews. However, the term can be a single word or two words or even a complete phrase. The TF-IDF weight can be represented by the following formula:

79

Ensemble of Classification Algorithms for Subjectivity and Sentiment Analysis of Arabic Customers' Reviews Nazlia Omar,Mohammed Albared, Adel Qasem Al-Shabi,Tareq Al-Moslmi

A =

f  ∑ f

∗ log 

N n

(1)

Where fij is the frequency of the term i in review j, ni denotes the number of reviews in which the term i occurs at least once, and N stands for the number of reviews in the data set.

3.3. Subjectivity Analysis and Sentiment Analysis The subjectivity analysis and sentiment analysis are made up of two stages of classification. The first one is about the subjectivity filtering in which we determine whether a comment is relevant review or not. To do so, a binary classifier is used to classify comments as relevant or irrelevant. This classifier learns from the content of an extensive collection of comments which have already been classified as being relevant or irrelevant. For those reviews which have been classified as relevant reviews in the first level, there is a further classification called the second-level of classification. The second stage (sentiment analysis) determines the sentiment polarity of the review: positive, negative or natural. The general process of sentiment analysis is to induce a classifier by using a set of training data with manually assigned category labels (positive, negative or natural) and then apply it to predict labels for uncategorized reviews. The following describe in brief the classifiers used in both stages. The key ideas behind the selection of these classifiers are that they are efficient in sentiment analysis and they produce different types of errors[9-14].

3.4. The Classifiers Used 3.4.1. Naïve Bayes Naïve Bayes is an effective classification algorithm which is widely used for sentiment analysis and document classification. As a probabilistic model, the Naive Bayes classifier makes the use of the joint probabilities of terms and their categories for the sake of figuring out the probabilities of categories given as a test review. The main advantages of Naïve Bays are that they are simple, easy to implement and have better performance algorithm. There are two NB models used for text classification. These models are Multinomial Model and Multivariate Bernoulli Model [18, 19]. Based on the following Bayes’ formula, NB model is used in sentiment classification: P(C |d) =

P(C )P(d|C ) P(d)

(2)

Where P(C |d) stands for the posterior probability of class C given a new document d, whereas P(C ) presents the probability of class Ci. 3.4.2. Rocchio Algorithm It is an instance based learning algorithm[20]. In the training phase, it builds a single centroid for a single category by all the documents in this category. And in the online test phase, the distance similarity between the centroids of different categories and the new unknown review is calculated and the categories with smallest distance are assigned to the review. The centroids are calculated as follows:

Q new = a

1 R

åw

d ÎR

kj

-b

1 R

åw

kj

(3)

d ÏR

Where R is the number of documents included in the category, R is the number of documents that are not included in the category, a and b are two control parameters and w kj is the weight of term

80

Ensemble of Classification Algorithms for Subjectivity and Sentiment Analysis of Arabic Customers' Reviews Nazlia Omar,Mohammed Albared, Adel Qasem Al-Shabi,Tareq Al-Moslmi

t k in document d j . The centroid for each category is calculated with summing the positive documents and subtracting the negative documents. After calculating the centroids of each class, the similarity between the centroids of different categories and the new unknown review is calculated and categories with the smallest distance are assigned to the review. This work investigates two versions of Rocchio algorithm, one with Cosine similarity, and another with Jaccard similarity. Cosine similarity: the similarity between two documents, such as training document Di and test documentD_j, is measured by cosine similarity: 



(D , D ) =

∑ (   ∑  

× ×

 )

(4)

 ∑  

here k= (1, 2, 3.., m) refers to the number of the terms available in training as well as testing documents. Jaccard similarity: the similarity between two documents such as training document Di and test documentD_j is measured by jaccard similarity:

Sim  (D , D ) =

 ∑ W

W × W ) ∑ (   + ∑ W − ∑  × W  ) (W

(5)

where k= (1, 2, 3.., m) number term in training and testing documents. 3.4.3. Support Vector Machine known as (SVM), the Support Vector Machine is a discriminative classifier, and this classifier can recognized by a separating hyper plane. To put it simply, the output of the algorithm is the optimal hyper plane that put new examples in categories after receiving labeled predefined training data [21]. The operation of the (SVM) algorithm is based on finding the hyper plane that gives the largest minimum distance to the training examples. This distance receives the important name of margin within SVM’s theory. Therefore, the optimal separating hyper plane maximizes the margin of the training data. Suppose that X is a set of labeled training points (feature vector) (x1, y1),..., ( xn, yn), where each training point xi ∈ RN is given a label yi ∈ {−1, +1},where i = 1,. . .,n. The(6) vector w and the bias b determine the separating hyper plane, which gives the points that determine

w.x + b = 0 3.4.4. Ensemble Technique The categorizations of the ensemble methods gives out fixed rules and trained methods. The individual outputs related to fixed rules methods are gathered in a fixed process such as voting rule; in the contrary, trained methods meta-classifier gather outputs through training on a testing dataset. It was reported in that the ensemble techniques outperform other individual technique. Voting method: The voting rule numerate the outputs of individual classifiers. After that it assigns a testing sample x to the class i depending on the most predictions by the participating classifiers (7) [11]: 

O =  I( argmax(O ) = j) 



where I(..) means the indicator function and O is the output of the classifier J. Meta-classifier: When using a meta-classifier for combination, the outputs of all the labels of the class of the participating classifiers used as features for meta-learning. In our work, some classification algorithms are adopted namely naïve Bayes, linear SVM and logistic regression.

81

Ensemble of Classification Algorithms for Subjectivity and Sentiment Analysis of Arabic Customers' Reviews Nazlia Omar,Mohammed Albared, Adel Qasem Al-Shabi,Tareq Al-Moslmi

4. Experiments Setting and Evaluation Metrics In order to evalute the performance of the subjectivity and sentiment analysis of Arabic customer reviews, two levels of experiments are conducted. In the first level, experiments are performed on Arabic corpus annotated for subjectivity analysis. Every dataset is equally divided into five folds, and all the results gathered by experiment are processed with a 5-fold cross validation. The training data in every loop of the cross validation is the five fold document, and the test data is the remaining fold. The performance reported in all of the following tables is in terms of the average classification accuracy. we ,first, evlaute four machine learning classification techniques for Arabic subjectivity analysis: 1) Naive Bayes; 2) Support Vector machine; 3) Rocchio classifier; and 4) ensemble techniques. First, different models form the three machine learning classifiers (Naive Bayes; Support Vector machine, Rocchio) are evaluated. The best three models are then selected to be combined. The performance of the Arabic sentiment anlaysis is evaluated using the same experiment setting as in the previous experiments. Instead, in this level we use the sentinment analysis corpus. The performance subjectivity classification and sentiment classification system is often gauged depending on the followin: precision P, recall R and Macro-average F1. F-measure is the most common measure, and it combines the precision and recall for evaluating the classification methods.. For ease of comparison, the Macro-averaged (Macro-F1) is used. Macro-averaged F-measure can be described as the traditional arithmetic mean of the F-measure which is computed for every problem. It is macro-average which distributes weight equally to every category and every document, and it controlled by the performance of the system on the popular categories.

Precision = Recall =

TP ( TP + FP )

TP ( TP + FN )

2* Recall* Precision ( Recall + Precision )

F1 =

F1macro =

1 m å F1 (i) m i =1

5. Results Discussion In the first-level classification experiments, the corpus which consists of relevant and irrelevant reviews is used to train the classifier. In the second-level classification experiments, the corpus which consists of positive and negative reviews is used to train the classifier. All algorithms are evaluated by using the 5-foldcross validation. The 5-folds documents are considered the training data and the other fold is considered the remaining data in each loop in the cross validation. The results in terms of per class precision, per class recall, F-measure and macro-averaged F-measure are considered the averaged values that are calculated across all the 5-folds cross the validation experiments.

5.1. Results of Individual Classifiers We examine individual classification algorithms on both levels. All classification algorithms i.e. NB, Rocchio Algorithm, and SVM are evaluated on both subjectivity Analysis and sentiment analysis. The results in terms of per class precision, per class recall , F-measure and macro-averaged F-measure of individual classification algorithms on subjectivity analysis and sentainment anlysis are reported inTable 3 and Table 4, respectively.

82

Ensemble of Classification Algorithms for Subjectivity and Sentiment Analysis of Arabic Customers' Reviews Nazlia Omar,Mohammed Albared, Adel Qasem Al-Shabi,Tareq Al-Moslmi

Firstly, we concentrate on the performance of different classifiers in terms of subjectivity analysis. For simplicity, we concentrate on the five data-sets in terms of the average results. When comparing the performances of the classifiers, the Bernoulli Naive Bayes algorithm shows a performance higher than other algorithms. The performance of other methods (90.11–94.61 %) in comparison with Bernoulli Naive Bayes algorithm still effective. Secondly , we focus on the performance of different classifiers on sentiment analysis. When the classifier performances are compared, the SVM Classifier algorithm shows a higher performance (89.81%)than other algorithms. However, the results also show that the performance of the Bernoulli Naive Bayes algorithm on sentiment analysis is slightly less than what has been achieved by SVM. Table 3. The performance of individual classification algorithm on Subjectivity Analysis Method Class Precision Recall F-Measure Macro-F Relevant 97.03% 91.59% 94.23 Multinomial Naive Bayes 94.59 non-relevant 92.62% 97.41% 94.95 Relevant 95.47% 97.66% 96.55 Bernoulli Naive Bayes 96.51 non-relevant 96.81% 96.12% 96.46 Relevant 90.48% 88.79% 89.63 Rocchio with Jaccard 90.11 non-relevant 89.83% 91.38% 90.60 Relevant 92.49% 92.06% 92.27 Rocchio with Cosine 92.59 non-relevant 92.70% 93.10% 92.90 Relevant 93.98% 94.86% 94.42 SVM Classifier 94.61 non-relevant 95.22% 94.40% 94.81 Table 4. The performance of individual classification algorithm on Sentiment Analysis Class Precision Recall F-Measure Macro-F Method Positive 91.87% 85.61% 88.63 Multinomial Naive Bayes 89.04 Negative 86.62% 92.48% 89.45 Positive 94.12% 84.85% 89.24 Bernoulli Naive Bayes 89.78 Negative 86.30% 94.74% 90.32 Positive 79.14% 82.58% 80.82 Rocchio with Jaccard 80.83 Negative 82.02% 79.70% 80.84 Positive 84.50% 82.58% 83.53 Rocchio with Cosine 83.77 Negative 83.09% 84.96% 84.01 Positive 88.32% 91.67% 89.96 SVM Classifier

Negative

91.41%

87.97%

89.66

89.81

5.2. Results of Ensemble of Classification Algorithms In this part, the researchers examine classification algorithms ensemble. Experiments will be performed on both levels. There are three participant classification algorithms, they are NB, Rocchio Algorithm and SVM. The results in terms of per class precision, per class recall , F-measure and macro-averaged F-measure of ensemble of classification algorithms on subjectivity analysis and sentiment analysis are reported in Table 5 and Table 6, respectively. Firstly, we focus on the performance of different ensemble of classification algorithms on subjectivity analysis. When performances are compared in the Table 3 and Table 5, most of the ensemble methods receive improvements over the other classifiers which are individual. The Meta learner ensemble technique with logistic regression yields the best performance. Secondly, we focus on the performance of different ensemble of classification algorithms on sentiment analysis. When

83

Ensemble of Classification Algorithms for Subjectivity and Sentiment Analysis of Arabic Customers' Reviews Nazlia Omar,Mohammed Albared, Adel Qasem Al-Shabi,Tareq Al-Moslmi

performances are compared in the Table 4 and Table 6, some of the ensemble methods receive improvements over the other individual classifiers. The meta learner ensemble technique with logistic regression yields the best performance. In terms of the effectiveness regarding the three ensemble strategies, we can conclude that ensemble of classification algorithms perform robustly better than all the other individual classifiers. Table 5. Classification Ensemble Performances of Algorithms on Subjectivity Analysis Method Class Precision Recall F-Measure Relevant 95.85% 97.20% 96.52 Ensemble (voting) Non-relevant 97.38% 96.12% 96.75 Relevant 95.87% 97.66% 96.76 Ensemble (Stacking, NB) Non-relevant 97.81% 96.12% 96.96 Relevant 98.60% 96.35% 97.46 Ensemble (Stacking, logistic regression) Non-relevant 96.55% 98.68% 97.60 Relevant 97.85% 97.20% 97.52 Ensemble (Stacking, SVM) Non-relevant 97.38% 96.12% 96.75

Macro-F 96.63 96.86 97.53 97.13

Table 6. Performances of Ensemble of Classification Algorithms on Sentiment Analysis Method Class Precision Recall F-Measure Macro-F Positive 91.41% 88.64% 90.00 Ensemble (voting) 90.19 Negative 89.05% 91.73% 90.37 Positive 93.10% 81.82% 87.10 Ensemble (Stacking, NB) 87.87 Negative 83.89% 93.98% 88.65 Positive 90.91 90.91% 90.91% Ensemble (Stacking, logistic regression) 90.95 Negative 90.98% 90.98 90.98% Positive 92.13% 88.64% 90.35 Ensemble (Stacking, SVM) 90.56 Negative 89.13% 92.48% 90.77

6. Conclusion and Feature Work In this study, the researchers made a comparative study carried out on the effectiveness of individual supervised classifiers and ensemble methods for Subjectivity and Sentiment Analysis of Arabic Customers' Reviews. The ensemble method was applied to sentiment classification tasks, aiming at integrating classification algorithms efficiently in order to formulate a classification procedure more accurate than the others. The three common text classification algorithms employed as base-classifiers are naive Bayes, Rocchio classifier and support vector machines. Second we made a comparative study of two types of ensemble methods. They are the voting and meta-classifier combinations. However, the results of indvidual classifiers on both Subjectivity and Sentiment Analysis subtasks showed that Bernoulli Naive Bayes and SVM algorithms performed better than other methods. The results also showed that ensemble of classification algorithms with meta learner ensemble technique performed robustly better than all the other individual classifier. Our future efforts will be targeted at developing Arabic senti-WorldNet. Additionally, investigating the implementation of Semantic orientation approaches. Finally, the researchers are also interested in constructing hybrid models which are suitable for Arabic sentiment classification.

7. References [1] Afnan A. Al-Subaihin, Hend S. Al-Khalifa and AbdulMalik S. Al-Salman, "A Proposed Sentiment Analysis Tool For Modern Arabic Using Human-Based Computing.",ACM-iiWAS2011, pp. 543546.

84

Ensemble of Classification Algorithms for Subjectivity and Sentiment Analysis of Arabic Customers' Reviews Nazlia Omar,Mohammed Albared, Adel Qasem Al-Shabi,Tareq Al-Moslmi

[2] Wen Li, Yuefeng Chen, Weili Wang, "Fine-Grained Sentiment Classification based on HowNet", JCIT: Journal of Convergence Information Technology, Vol. 7, No. 19, pp. 86 ~ 92, 2012. [3] F. S. Mohammed, L. Zakaria, N. Omar, M.Y. Albared, "Automatic Kurdish Sorani Text Categorization Using N-Gram Based Model.",International Conference on Computer & Information Science (ICCIS), pp. 392-395, 2012. [4] Mohammed Korayem, David Crandall, and Muhammad Abdul-Mageed, “Subjectivity and Sentiment Analysis of Arabic: A Survey”, Springer, 2012. [5] Mohammed Rushdi-Saleh, M. Teresa Martín-Valdivia, L. Alfonso Ureña-López, and José M. Perea-Ortega, “OCA: Opinion corpus for Arabic,” Journal of the American Society for Information Science and Technology, vol. 62, no. 10, pp. 2045-2054, 2011. [6] Mohamed Elhawary and Mohamed Elfeky, "Mining Arabic Business Reviews.", IEEE International Conference on Data Mining Workshops, pp. 1108-1113,2010. [7] Muhammad Abdul-Mageed, Sandra Kubler, and Mona Diab, “SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media,” WASSA 2012, pp. 19, 2012. [8] Bin Lu, Benjamin K. Tsou and Oi Yee Kwong, "Supervised Approaches and Ensemble Techniques for Chinese Opinion Analysis at NTCIR-7.", Proceedings of NTCIR-7 Workshop Meeting, pp. 218-225, 2008. [9] Songbo Tan and Jin Zhang, “An Empirical Study Of Sentiment Analysis For Chinese Documents”, Expert Systems with Applications, vol. 34, no. 4, pp. 2622-2629, 2008. [10] Qiang Ye, Ziqiong Zhang, and Rob Law, “Sentiment Classification Of Online Reviews To Travel Destinations By Supervised Machine Learning Approaches”, Expert Systems with Applications, vol. 36, no. 3, pp. 6527-6535, 2009. [11] Rudy Prabowo and Mike Thelwall, “Sentiment Analysis: A Combined Approach”, Journal of Informetrics, vol. 3, no. 2, pp. 143-157, 2009. [12] Rui Xia, Chengqing Zong and Shoushan Li, “Ensemble Of Feature Sets And Classification Algorithms For Sentiment Classification”, Information Sciences, vol. 181, no. 6, pp. 1138-1152, 2011. [13] Rui Xia and Chengqing Zong, "A Pos-Based Ensemble Model For Cross-Domain Sentiment Classification", Proceedings of the 5th international Joint conference on natural Language Processing (iJcnLP-2010). 2011. [14] James A. McCart, Dezon K. Finch, Jay Jarman, Edward Hickling, Jason D. Lind, Matthew R. Richardson, Donald J. Berndt, and Stephen L. Luther, “Using Ensemble Models to Classify the Sentiment Expressed in Suicide Notes”, Biomedical Informatics Insights, vol. 5, no. Suppl. 1, pp. 77, 2012. [15] Janyce M. Wiebet, Rebecca F. Bruce, and Thomas P. O'Harat, "Development And Use Of A GoldStandard Data Set For Subjectivity Classifications", In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 246-253, 1999. [16] Muhammad Abdul-Mageed, Mona T. Diab, and Mohammed Korayem, "Subjectivity And Sentiment Analysis Of Modern Standard Arabic." In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 587-591. 2011. [17] Mohammed Albared, Nazlia Omar, and Mohd. Juzaiddin Ab Aziz, "Developing A Competitive HMM Arabic POS Tagger Using Small Training Corpora", Intelligent Information and Database Systems, pp. 288-296: Springer Berlin/Heidelberg, 2011. [18] Andrew McCallum and Kamal Nigam, "A Comparison Of Event Models For Naive Bayes Text Classification.", In AAAI-98 workshop on learning for text categorization, pp. 41-48, 1998. [19] Lv Pin, and Zhong Luo, “Naive Bayesian Text Classifier Based on Different Probability Model”, JDCTA: International Journal of Digital Content Technology and its Applications, AICIT, vol. 6, no. 12, pp. 464 - 471, 2012. [20] Kang Hyuk Lee, Judy Kay, Byeong Ho Kang, and Uwe Rosebrock, “A Comparative Study On Statistical Machine Learning Algorithms And Thresholding Strategies For Automatic Text Categorization”, PRICAI 2002: Trends in Artificial Intelligence, pp. 55-67, 2002. [21] Zhao Da-peng, “Research on the Vector Space Model Based Text Automatic Classification System”, JDCTA: International Journal of Digital Content Technology and its Applications, AICIT, vol. 7, pp. 381 ~ 388, 2013.

85

Suggest Documents