Performance Evaluation of Machine Learning Methods ... - Springer Link

Li XC, Mao WJ, Zeng D et al. Performance evaluation of machine learning methods in cultural modeling. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 24(6): 1010–1017 Nov. 2009

Performance Evaluation of Machine Learning Methods in Cultural Modeling

ê ì

¿

Xiao-Chen Li1 ( ), Wen-Ji Mao1,∗ ( ), Senior Member, CCF, Member, ACM 1,2 Daniel Zeng ( ), Senior Member, IEEE, Peng Su1,3 ( ) and Fei-Yue Wang1 ( ), Senior Member CCF, Fellow, IEEE 1

2 3

Key Laboratory of Complex Systems and Intelligence Science, Institute of Automation, Chinese Academy of Sciences Beijing 100190, China Department of Management Information Systems, University of Arizona, Tucson AZ 85721, U.S.A. School of Management Engineering, Shandong Jianzhu University, Jinan 250013, China

E-mail: {xiaochen.li, wenji.mao, dajun.zeng, peng.su}@ia.ac.cn; [email protected] Received March 8, 2009; revised July 20, 2009. Abstract Cultural modeling (CM) is an emergent and promising research area in social computing. It aims to develop behavioral models of human groups and analyze the impact of culture factors on human group behavior using computational methods. Machine learning methods, in particular classification, play a critical role in such applications. Since various cultural-related data sets possess different characteristics, it is important to gain a computational understanding of performance characteristics of various machine learning methods. In this paper, we investigate the performance of seven representative classification algorithms using a benchmark cultural modeling data set and analyze the experimental results as to group behavior forecasting. Keywords

1

cultural modeling (CM), machine learning, classification, group behavior forecasting

Introduction

Cultural and ideological conflicts have always been topics of great interest. How to investigate the impact of culture on human behavior or social changes has become a popular research issue[1−5] . In the meanwhile, significant opportunities exist for cultural research given easily accessible contents such as those from the Web. However, traditional methods highly rely on human expert analysis, which is impractical for massive data. New methodologies to efficiently analyze cultural dynamics from enormous data are in great demand. Cultural modeling (CM) increasingly becomes an emergent and promising research area in social computing[6−8] . It aims to apply computational techniques to build behavior models of organizations and analyze the impact of culture factors on organizational behavior. Cultural modeling has raised interesting questions such as the discovery of the correlation between cultural factors and organizations’ behavior, efficient and effective identification of behavioral patterns

of organizations from tons of cultural-related data, and the prediction of organizational behavior based on the cultural context. The idea of cultural modeling was first proposed in [1]. Subrahmanian[1] suggested that real-time computational models can use various cultural factors to help policy-makers predict the behavior of political, economic, and social groups. Subrahmanian et al.[2] then developed a concept model called CARA (Cognitive Architecture for Reasoning about Adversaries) as a first cultural model. Moreover, they[3−4] further proposed some effective algorithms to improve their model. Although their work is mainly applied in terrorism domain, the scope of cultural modeling can be wider and extended to other domains. The core issue in CM is the automatic construction of group behavior model based on relevant cultural data. Machine learning methods play a critical role in such applications. The quality of the learned behavioral models as well as the accuracy of predictions relies heavily on the machine learning methods selected. So, the selection of proper machine learning method is a

Regular Paper This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 60621001, 60875028, 60875049, and 70890084, the Ministry of Science and Technology of China under Grant No. 2006AA010106, and the Chinese Academy of Sciences under Grant Nos. 2F05N01, 2F08N03 and 2F07C01. ∗ Corresponding Author

Xiao-Chen Li et al.: Performance Evaluation of Machine Learning Methods in Cultural Modeling

1011

crucial problem for cultural modeling. The characteristics of cultural modeling data sets pose some major challenges from a machine learning perspective. Generally, CM data sets are the historical behavioral record of organizations. They are often stored as a relational table for each organization. When a new organizational behavior or event happens, the data items capture current time, related cultural factors, and the state of the organization and behavior. Typically, the frequency of these events is low. In most cases, for a given organization, there are only dozens of records each year. Hence, the size of the data sets is always very small. At the same time, when a certain behavior happens, only a few attributes are identified. Furthermore, the number of attributes should be very large in order to analyze organizational behavior effectively. In this paper, we focus on the comparison of major classification methods in cultural modeling, including Naive Bayesian (NB), support vector machines (SVMs), artificial neural networks, k-nearest neighbor (kNN), decision trees, random forest (RF) and associative classification (AC). We evaluate the algorithm performance of these classification methods in modeling cultural related data. Specifically, we use MAROB (The minorities at risk organizational behavior) benchmark cultural data sets[9] . The rest of the paper is organized as follows. Section 2 reviews the related work. Section 3 briefly introduces the classification methods we use for comparison. Section 4 presents the experimental studies and analyzes the results we obtained from the experiment. We further discuss the related learning and modeling issues in Section 5, and conclude the paper in Section 6.

rithms can find the most probable actions (or sets of actions) of a certain organization in a possible world model. SemiHOP can significantly reduce the number of variables in the linear programs and thus is faster than HOP. Nevertheless, there are still some weaknesses in their algorithms. For example, the improvement in computational efficiency is somehow achieved at the cost of the forecasting accuracy. Furthermore, their algorithms are not evaluated with any quantifiable measure, including accuracy. To better predict group behavior, Martinez et al.[4] presented a new algorithm called CONVEX based on the MAROB data sets, which is more computational efficient and accurate than HOP and SemiHOP. CONVEX views each instance as a pair of two vectors, namely context vector and action vector. The context vector contains the values of the context variables associated with the group, while the action vector contains the values of action variables of the group. To predict the action vector, CONVEX uses distance functions in metric spaces to calculate the similarity between a given context vector and the other context vectors. Therefore, CONVEX is essentially a variant of kNN method. The authors found from the experiments that when k = 2, using the Hamming or Manhattan distance, CONVEX yields the best performance with accuracy 0.969. Although the result seems very good, it is not clear how well the same algorithm will perform under other meaningful measures. Furthermore, given the specific characteristics of cultural data sets, does the kNN-based method always yield best performance under different measures? In this paper, we shall investigate the answers to these questions through experimental studies.

2

3

Related Work

A typical method in cultural modeling is CARA[1−2] . CARA provides a conceptual model for reasoning about how diverse cultural, political and industrial organizations make decisions. CARA is comprised of four components, namely, a Semantic Web extraction engine to elicit organization data, an opinion-mining engine that captures the group’s opinions, an algorithm to correlate cultural relevant variables with the actions the organization takes, and a simulation environment within which analysts and users can experiment on hypothetical situations. When a new event occurs, CARA can forecast the k most probable sets of actions the group might take in a few minutes. To further develop CARA, Khuller and his colleagues proposed two algorithms, HOP and SemiHOP, mainly based on probabilistic logic programming[3]. Through solving a set of linear programs, the algo-

3.1

Machine Learning Methods Naive Bayesian

Naive Bayesian (NB)[10] is a classical probabilistic classifier based on Bayes’ theorem. NB classifier can be trained very efficiently in a supervised learning setting relying on the precise nature of the probability model. It is viewed as optimal classifier when there is no dependency between a particular feature and other features given the class. Although simple, it has proven to be a successful classifier in a wide variety of domains. 3.2

Support Vector Machines

The SVMs[11] are based on the statistical learning theory and the Vapnik-Chervonenkis (VC) dimension. An SVM views the examples to be classified as two sets of vectors in an n-dimensional space and then builds a separating hyperplane in that space maximizing the

1012

J. Comput. Sci. & Technol., Nov. 2009, Vol.24, No.6

margin between the two data sets. In this study we use MSO, the first effective algorithm for SVMs, with the polynomial kernel. 3.3

Artificial Neural Networks

An artificial neural network[12−13] is composed of many simple processing elements (called artificial neurons) whose functionality is loosely based on the animal neuron. It learns via a process of adjustments to the connections between the processing elements and element parameters. As one of the most frequently used neural network architectures, multilayer perception (MLP)[13] is selected as a representative artificial neural network method for our experiment. We take the sum of the attribute count and the class count and divide it by 2 as the number of the hidden layers. 3.4

k-Nearest Neighbor

kNN is a simple but effective classification method. For a new instance to be classified, k nearest neighbors of the instances are selected, and then the major class of the k neighbors is assigned to the class of the new instance. While kNN method is employed, the choice of k has a crucial effect on its classification performance. Martinez et al.[4] proposed a kNN-based algorithm (i.e., CONVEX) for classification task. They found that CONVEX performed best on MAROB data sets when k value is set to 2 and Manhattan distance is chosen. In our experiment, we adopted the same parameter values to implement CONVEX as a representative of kNN method. 3.5

Decision Trees

The most widely used classifier in machine learning is probably decision trees. Decision trees are tree structures that classify instances by sorting them based on feature values[14] . Within a decision tree, a node denotes the selected feature that is used to split input data and branches denote values of the node. Over the last a few years, C4.5 has become a most popular decision tree method. In our experiment, we chose C4.5 as the representation of decision trees. 3.6

tures to grow the tree at each node. Each tree is grown without pruning. Essentially, random forest enables a large number of weak or weakly-correlated classifiers to form a strong classifier.

Random Forest

Random forest[15] is a classifier that evolves from decision trees. It actually consists of many decision trees. To classify a new instance, each decision tree provides a classification for input data; random forest collects the classifications and chooses the most voted prediction as the result. The input of each tree is a sampled data from the original data set. In addition, a subset of features is randomly selected from the optional fea-

3.7

Associative Classification

Associative classification (AC)[16] is a branch of data mining research which combines association rule mining with classification. Associative classification is a special case of association rule discovery in which only the class attribute is considered in the rule’s right-hand side (consequent)[16] . After a large set of rules are generated, AC selects a subset of high quality rules via rule pruning and ranking. Based on the reduced rule set, AC can then build an effective classifier. Associative classification is usually more accurate than decision trees method in practice. The L3 associative classification algorithm[17] is chosen in the experiment. 4 4.1

Experiment and Analysis Data Sets

In our experiment, we mainly use the MAROB cultural data sets for anti-terror research. The MAROB data set is a subsidiary of the Minorities at Risk (MAR) Project[18] , and covers 118 terrorism organizations in the Middle East ranging from 1980 to 2004. In the MAROB data set, the historical data of each terrorism organization is represented as a relational database table, with the rows denoting years and the columns denoting behavioral attributes and culture attributes of the organization. Behavioral attributes consist of violent behavior and nonviolent behavior. Violent behavior involves attack targets (e.g., attack civilians or government infrastructure) and different types of attack (e.g., kidnap, arm attack, etc.). Violent behavior comprises the class for prediction in our experiment. Nonviolent behavior involves different kinds of normal behavior, for instances, negotiation with government, soliciting external support and so on. Culture attributes include religious, economy, or political factors associated with an organization, such as religious belief and political grievance of the organization. As the typical cultural modeling data sets, MAROB has some distinct characteristics. First of all, each organization data is rather sparse and the distribution of attribute values is imbalanced. Second, instances of each data set are often small (with less than 100 instances per organization) and the number of the attributes is extremely high (with more than 140 attributes per record). Our test is based on the 22 largest organization data in the MAROB data sets.


4.2

Evaluation Measures

1013

predict the organizational behavior with the three evaluation measures. For each algorithm, we report the average values of each measure and their standard deviations.

According to the characteristics of experimental data and domain features in CM, we choose three measures to evaluate the performance of the classifiers, namely accuracy, Recall and AUC with 10-fold crossvalidation. Accuracy is commonly used in evaluating the quality of classification. However, there are some limitations of the accuracy measure in cultural modeling domain. For example, accuracy cannot well reflect a classifier’s performance when data distribution is imbalanced. In group behavior forecasting, we mainly focus on forecasting the occurrence of organizational behavior, hence the classification of positive examples is much more important than that of negative ones. In this case, overall high accuracy of an algorithm will not ensure its good performance in classifying positive examples. Therefore, in addition to accuracy, we consider two other evaluation measures, Recall and AUC. Recall is generally used for the data in which positive examples’ misclassification costs are much higher than those of the negative ones. In our experiment, we focus on evaluating Recalls of the terrorism behavior that leads to destructive consequences. AUC is the area under ROC curves. It is a statistically consistent and more discriminating measure than accuracy for balanced or unbalanced data sets[19] . We

4.3

Experimental Results

Table 1 shows the averages values and standard deviations of the accuracy, Recall and AUC measures for the seven experimental algorithms. The boldfaced AVG (average value) corresponds to the algorithm which has the highest average value, and the boldfaced SD (standard deviation) corresponds to the algorithm which has the lowest standard deviation. Tables 2 and 3 demonstrate the accuracy and AUC values of the prediction on different attribute sets. For each algorithm, the lowest accuracy and AUC values are marked in bold. 4.4

Observations and Analysis

It can be seen from Table 1 that all the algorithms achieve high accuracies of more than 90%, whereas the Recalls are rather low and AUCs vary considerably from classifier to classifier. This is partly due to the skewed distribution of the experimental data. Most of the data sets have large ratio of negative examples in contrast to the small percentage of positive examples, which results in high accuracies of performance yet a lower level of

Table 1. Overall Experimental Results of the Seven Algorithms Measure Accuracy AUC Recall

Statistic AVG SD AVG SD AVG SD

NB

SMO

MLP

Algorithm CONVEX

C4.5

RF

L3

0.969 0.044 0.978 0.054 0.778 0.303

0.941 0.060 0.697 0.216 0.420 0.434

0.952 0.053 0.915 0.168 0.525 0.409

0.944 0.052 0.854 0.249 0.344 0.405

0.929 0.069 0.530 0.401 0.362 0.404

0.940 0.056 0.788 0.227 0.428 0.404

0.929 0.063 0.866 0.154 0.430 0.431

Table 2. Accuracies of Prediction on Different Attribute Sets Attribute sets

NB (%)

SMO (%)

MLP (%)

CONVEX (%)

C4.5 (%)

RF (%)

L3 (%)

Attack types Transnational violence outside the country Transnational violence within the country Domestic violence

96.465 98.118 99.275 96.878

93.228 94.665 96.823 95.564

94.102 95.293 97.938 95.237

92.932 94.964 96.870 95.785

92.151 92.645 94.658 93.553

92.744 92.192 96.493 94.646

92.127 91.097 95.620 94.191

Table 3. AUCs of Prediction on Different Attribute Sets Attribute sets Attack types Transnational violence outside the country Transnational violence within the country Domestic violence

NB

SMO

MLP

CONVEX

C4.5

RF

L3

0.964 0.995 0.995 0.977

0.662 0.616 0.578 0.681

0.890 0.945 0.915 0.925

0.839 0.806 0.859 0.881

0.351 0.354 0.340 0.464

0.757 0.696 0.651 0.746

0.840 0.887 0.908 0.872

1014

Recalls. The higher the degree of class imbalance problem and the smaller the overall size of the training set, the more classifiers will be sensitive to the class imbalance problem[20] . Naive Bayesian significantly outperforms the other algorithms in terms of accuracy, AUC and Recall, with the lowest standard deviations. Although the attributes in our data sets do not satisfy the independence assumption, there are two main reasons that lead to the best performance of Naive Bayesian. As pointed out above, the MAROB data sets are very small (with only 30 to 60 instances for an organization). Naive Bayesian may often perform exceedingly well on smaller data sets[21] . The other reason is that zero-one loss is used as loss function in our experiment, and Naive Bayesian can be optimal under zero-one loss (misclassification rate) even when independence assumption is violated by a wide margin[21] . From Table 1, C4.5 algorithm is almost always dominated by other algorithms. This is not surprising because decision trees are prone to errors in classification problems with many classes and relatively small number of training examples[22] . As a variant of decision trees algorithm, random forest algorithm gets better Recall and much better AUC than C4.5, indicating that the performance of ensemble classifiers may be superior to a single classifier. Associative classifier outperforms C4.5 with a Recall of 0.068 and an AUC of 0.336. As AUC measure represents the overall performance of a classifier, this is expected because associative classifier has been proved much better than C4.5[17,23−25] . Note that kNN (the CONVEX algorithm used in the related work is actually a variant of kNN) performs the worst in terms of average Recall. One explanation for this is that kNN is dominant on relatively dense data sets[26] , but for the highly spare data sets, its performance suffers as it is unable to form reliable neighborhoods. Our experimental data are indeed sparse and the proportion of positive examples is much less than that of the negative ones. The positive examples are so sparse that it is very difficult for kNN to find positive neighbors for positive examples. As a result, kNN’s Recall could be very low. Furthermore, kNN seems more sensitive to class imbalance problem. Research also shows that SVMs and MLP are not as sensitive to imbalance as kNN[20,27−29] . As shown in Tables 2 and 3, the accuracy and AUC performances of the seven classifiers on different attribute sets reveal interesting results. Five out of seven classifiers give the lowest accuracy on Attack types attribute set. In addition, three out of seven classifiers give the lowest AUC performance on the same attribute set. In our experiment, the attributes used for prediction are grouped into four types. Attack types


means that the attack modes of the terrorists, for instance, bomb, hijack, and kidnap. Transnational violence outside the country means that the organization using violence to target transnational entities outside the country of the organization as a strategy. Transnational violence within the country means that the organization using violence to target transnational entities within the country of the organization as a strategy. Domestic violence means the organization using violence domestically as a strategy. The bad prediction performance on Attack types indicates that the attack types of the terrorists are hard to predict depending on the cultural attributes defined in the experimental data. Alternatively, we can use the occurrence of attack, the severity of terrorism attack and some other attributes to replace Attack types. 5

Discussions

Our experimental studies show that the prediction results can be further improved in cultural modeling. Below we discuss the related research issues on constructing data sets, selecting features and learning algorithms, model interpretability and incorporating domain knowledge. • Cultural Data Sets Construction The first step and one key difficulty in cultural modeling is the construction of structural data sets from open sources. The difficulty lies in two facts. First, collecting relevant raw data from massive Web information manually is rather painstaking. This needs the participation of a lot of people and great human efforts. Second, the selection of appropriate attributes related to an organization is quite difficult. Furthermore, some attributes in cultural data sets are hard to quantify, for example, the friendship between two countries. These tasks require substantial domain knowledge, which usually rely on the efforts of domain experts. Due to these difficulties, agent-based approach can be a promising alternative to represent and model social cultural situations. • Attribute Selection We notice that some cultural factors remain constant in an organization data set, for example, religious belief always remains the same for one organization. In this case, these attributes do not contribute to the classification. These cultural factors are inherent properties of the organizations. They should comprise cultural context of the organizations rather than being used for classification. Along this line of thinking, each organization should have its cultural context which is composed of various cultural attributes. Cultural context belongs to the cultural level of one organization that is beyond its historical record. We can further learn one


organization’s behavioral pattern under certain cultural factors through comparing different organizations’ behavioral patterns and cultural context. • Best Performance of Classifiers Given the fundamental data characteristics in cultural modeling, the performance of classifiers varies according to other specific features of the data sets. Take the optimality condition of naive Bayesian as an example. It has been proved that NB is optimal for any two class concept with nominal features under zero-one loss[31−33] . As all the environmental attributes in the MAROB data sets are nominal and all the forecasting behavioral attributes are two class concepts, it is not surprising that NB performs best in the experiment. However, for the more complex cultural data sets where optimality condition of NB is no longer satisfied, NB may not be a best classifier under all the measures. • Handling Class Imbalance Problem Experimental results show that in cultural modeling domain, class imbalance problem has significant impact on the performances of the learning algorithms in cultural modeling domain. Research in [20, 28, 30] has proposed several methods to tackle this problem, such as oversampling, undersampling and cost-modifying. Oversampling and undersampling are mainly based on the sampling of original data sets. These two methods are more effective on large data sets. The costmodifying method consists of modifying the relative cost associated to misclassifying the positive and the negative classes so that it compensates for the imbalance ratio of the two classes[20]. Since our data sets are rather small, we plan to adopt the cost-modifying method to handle class imbalance problem in our future work. • Model Interpretability An important issue in cultural modeling research is how to interpret the resulting models. We need to know the association between cultural factors and organizational behavior in an explainable manner. Classifiers should not only derive statistical models for behavior prediction, but also provide reasonable explanation of the behavior occurrence. Therefore, rule-based classifiers (e.g., C4.5 and associative classifier in our study) are preferable to cultural modeling. C4.5 can induct decision rules for expert evaluation. Associative classifier can generate more association rules, and its comprehensibility is better than C4.5. Despite their mediocre performances, these two classifiers are superior to others in the interpretability of models. • Incorporation of Domain Knowledge As the results show, performances of the evaluated learning methods are still far from ideal. Recalls of all the algorithms are pretty low. This shows that pure data-driven approaches may not be sufficiently in

1015

predicting organizational behavior. We can explicitly utilize domain knowledge to improve the prediction results in cultural modeling. Related work in data mining also suggests similar improvement[34−35] . The incorporation of domain knowledge will also improve the interpretability of the prediction models. • Cultural and Social Dynamics of Behavioral Patterns Cultural-related data we adopt is sorted by year. It is possible that during the time span, cultural and social backgrounds underlying the data have changed greatly. As a result, behavioral patterns extracted from the historical data may not correctly reflect current and future trends of group behavior. How to effectively track the evolution of behavioral patterns becomes an important research issue[36] . 6

Conclusion

Cultural modeling has increasingly gained attention in recent social computing research. In this paper, we compared the performance of seven machine learning methods for organizational behavior prediction in cultural modeling. We tested these classifiers on culturalrelated data sets, and used three measures to evaluate and analyze the experimental results. The results show that Naive Bayesian performs best across all configurations. However, all the algorithms we investigated (including the variant of kNN used in the related work) have relatively low Recalls and AUCs varying from classifier to classifier. This suggests that pure data-driven classification methods may not fully meet the performance needs in cultural modeling. The findings motivate us to work on the incorporation of domain knowledge and logic reasoning into group behavior prediction. In addition, we plan to explore cultural modeling in other domains in our future work, such as e-commerce, public health and other social computing applications. References [1] Subrahmanian V S. Computer science: Cultural modeling in real time. Science, 2007, 317(5844): 1509–1510. [2] Subrahmanian V S, Albanese M, Martinez M V, Nau D, Reforgiato D, Simari G I, Sliva A, Wilkenfeld J, Udrea O. CARA: A cultural-reasoning architecture. IEEE Intelligent Systems, 2007, 22(2): 12–16. [3] Khuller S, Martinez V, Nau D, Simari G, Sliva A, Subrahmanian V S. Finding most probable worlds of logic programs. In Proc. the First International Conference on Scalable Uncertainty Management, Washington DC, USA, October 10–12, 2007, pp.45–59. [4] Martinez V, Simari G I, Sliva A, Subrahmanian V S. CONVEX: Context vectors as a paradigm for learning group behaviors based on similarity. IEEE Intelligent Systems, 2007, 23(4): 51–57. [5] Wang F Y. Is culture computable? IEEE Intelligent Systems,

1016 2009, 24(2): 2–3. [6] Wang F Y, Carley K M, Zeng D, Mao W. Social computing: From social informatics to social intelligence. IEEE Intelligent Systems, 2007, 22(2): 79–83. [7] Wang F Y. Toward a paradigm shift in social computing: The ACP approach. IEEE Intelligent Systems, 2007, 22(5): 65– 67. [8] Zeng D, Wang F Y, Carley K M. Social computing. IEEE Intelligent Systems, 2007, 22(5): 20–22. [9] Minorities at risk organizational behavior dataset. Minorities at Risk Project, University of Maryland, College Park: Center for International Development and Conflict Management, 2008, http://www.cidcm.umd.edu/mar. [10] Hand D J, Yu K. Idiot’s Bayes — Not so stupid after all? International Statistical Review, 2001, 69(3): 385–398. [11] Vladimir N Vapnik. Support Vector Estimation of Functions. Statistical Learning Theory, Haykin S (ed.), Springer-Verlag, 1998, pp.375–570. [12] Jain A K, Mao J, Mohiuddin K M. Artificial neural networks: A tutorial. Computer, 1996, 29(3): 31–44. [13] Christopher M Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995. [14] Kotsiantis S, Zaharakis I, Pintelas P. Machine learning: A review of classification and combining techniques. Artificial Intelligence Review, 2006, 26(3): 159–190. [15] Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32. [16] Thabtah F. A review of associative classification mining. The Knowledge Engineering Review, 2007, 22(1): 37–65. [17] Baralis E, Garza P. A lazy approach to pruning classification rules. In Proc. the Second IEEE International Conference on Data Mining, Maebashi, Japan, December 9–12, 2002, pp.35– 42. [18] Minorities at Risk Project. College Park, MD: Center for International Development and Conflict Management, 2005, http://www.cidcm.umd.edu/mar/. [19] Ling C X, Huang J, Zhang H. AUC: A statistically consistent and more discriminating measure than accuracy. In Proc. the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, August 9–15, 2003, pp. 329–341. [20] Japkowicz N, Stephen S. The class imbalance problem: A systematic study. Intelligent Data Analysis, 2002, 6(5): 429– 450. [21] Kohavi R, Wolpert D H. Bias plus variance decomposition for zero-one loss functions. In Proc. the Thirteenth International Conference on Machine Learning, Bari, Italy, July 3–6, 1996, pp.275–283. [22] Govindarajan M. Text mining technique for data mining application. Proceedings of World Academy of Science, Engineering and Technology, 2007, 26(104): 544–549. [23] Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. In Proc. the Third International Conference on Knowledge Discovery and Data Mining, New York, USA, August 27–31, 1998, pp.80–86. [24] Li W, Han J, Pei J. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proc. the First IEEE International Conference on Data Mining, San Jose, USA, November 29–December 2, 2001, pp.369–376. [25] Yin X, Han J. CPAR: Classification based on predictive association rules. In Proc. the Third SIAM International Conference on Data Mining, San Francisco, USA, May 1–3, 2003, pp.369–376. [26] Grˇ car M, Mladeniˇc D, Fortuna B, Grobelnik M. Data sparsity issues in the collaborative filtering framework. In Proc. the Seventh International Workshop on Knowledge Discovery on the Web, Chicago, USA, August 21, 2005, pp.58–76. [27] Nathalie J. Class imbalances: Are we focusing on the right issue? In Proc. the ICML 2003 Workshop on Learning


[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35] [36]

from Imbalanced Data Sets, Washington DC, USA, August 21, 2003. Japkowicz N. The class imbalance problem: Significance and strategies. In Proc. the Second International Conference on Artificial Intelligence, Las Vegas, USA, June 26–29, 2000, pp.111–117. Zhang J. kNN approach to unbalanced data distributions: A case study involving information extraction. In Proc. the ICML2003 Workshop on Learning from Imbalanced Data Sets, Washington DC, USA, August 21, 2003. Barandela R, Sanchez J S, Garcia V, Rangel E. Strategies for learning in class imbalance problems. Pattern Recognition, 2003, 36(3): 849–851. Rish I, Hellerstein J, Thathachar J. An analysis of data characteristics that affect naive Bayes performance. Technical Report, IBM T.J. Watson Research Center, 2001. Domingos P, Pazzani M. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 1997, 29(2/3): 103–130. Martinez-Arroyo M, Sucar L E. Learning an optimal naive Bayes classifier. In Proc. the Eighteenth International Conference on Pattern Recognition, Hong Kong, China, August 20–24, 2006, p.958. Zhang C, Yu P S, Bell D. Domain-driven data mining. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(2): 301. Cao L, Zhang C. Domain-driven, actionable knowledge discovery. IEEE Intelligent Systems, 2007, 22(4): 78–88. Nau D, Wilkenfeld J. Computational cultural dynamics. IEEE Intelligent Systems, 2008. 23(4): 18–19.

Xiao-Chen Li is a Ph.D. candidate at the Institute of Automation, Chinese Academy of Sciences. His research interests mainly focus on social computing and agent-based modeling.

Wen-Ji Mao received her Ph.D. degree in computer science from the University of Southern California in 2006. She is an associate professor at the Institute of Automation, Chinese Academy of Sciences. Prof. Mao is a member of ACM and AAAI, and a senior member of the China Computer Federation. Her research interests include artificial intelligence, multi-agent systems and social modeling. Daniel Zeng received his Ph.D. degree in industrial administration from Carnegie Mellon University in 1998. He is a research professor at the Institute of Automation, Chinese Academy of Sciences. He is also affiliated with the University of Arizona as an associate professor and the director of the Intelligent Systems and Decisions Laboratory. Prof. Zeng is

Xiao-Chen Li et al.: Performance Evaluation of Machine Learning Methods in Cultural Modeling a senior member of IEEE. His research interests include software agents and multi-agent systems, intelligence and security informatics, social computing and recommender systems. Peng Su is a Ph.D. candidate at the Institute of Automation, Chinese Academy of Sciences. His research interests mainly focus on machine learning and social computing.

1017

Fei-Yue Wang received his Ph.D. degree in computer and systems engineering from Rensselaer Polytechnic Institute in 1990. He is the director of the Key Laboratory of Complex Systems and Intelligence Science at the Chinese Academy of Sciences. He is also a professor in the University of Arizona’s Systems & Industrial Engineering Department and the director of the university’s Program in Advanced Research of Complex Systems. Prof. Wang is a fellow of IEEE, INCOSE, IFAC, ASME, and AAAS. His current research interests include social computing, Web and services science, modeling, analysis and control of complex systems, especially social and cyber-physical systems.

Performance Evaluation of Machine Learning Methods ... - Springer Link

Performance Evaluation of Machine Learning Methods ... - Springer Link

Suggest Documents

A survey of methods for distributed machine learning - Springer Link

Performance Evaluation of Machine Learning Techniques using ...

Performance Evaluation of Machine Learning ... - Semantic Scholar

Machine Learning Methods

An Evaluation of Machine Learning-based Methods for Detection of ...

An Evaluation of Machine Learning-based Methods for Detection of ...

Improving Machine Transliteration Performance by ... - Springer Link

(PRD) and machine learning - Springer Link

Interactive Machine Learning (iML) - Springer Link

Selecting the Appropriate Machine Learning ... - Springer Link

Machine learning and games - Springer Link

RDRCE: Combining Machine Learning and ... - Springer Link

Automatically explaining machine learning prediction ... - Springer Link

Machine learning methods for the automatic evaluation of exercises

Geometric Performance Evaluation of the ... - Springer Link

An Evaluation of Machine Learning Methods for Prominence Detection

An evaluation of randomized machine learning methods for redundant

Evaluation of Machine Learning Methods to Predict Coronary ... - arXiv

An Evaluation of Machine Learning-based Methods for ... - CiteSeerX

Performance comparison of machine learning

Performance Evaluation of Machine Learning Algorithms for ... - MDPI

Performance Evaluation of Machine Learning Algorithms in ... - arXiv

Ranking Methods in Machine Learning

SPARSE MACHINE LEARNING METHODS FOR ...