Employing Neural Network and Naive Bayesian ...

ICGST AIML-11 Conference, Dubai, UAE, 12-14 April 2011

Employing Neural Network and Naive Bayesian Classifier in Mining Data for Car Evaluation S. Makki, A. Mustapha, J. M. Kassim, E. H. Gharayebeh, M. Alhazmi Dept. of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, 43400 UPM Serdang, Malaysia [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract

In data mining, classification is a form of data analysis that can be used to extract models describing important data classes. Two of the well known algorithms used in data mining classification are Backpropagation Neural Network (BNN) and Naïve Bayesian (NB). This paper investigates the performance of these two classification methods using the Car Evaluation dataset. Two models were built for both algorithms and the results were compared. Our experimental results indicated that the BNN classifier yield higher accuracy as compared to the NB classifier but it is less efficient because it is time-consuming and difficult to analyze due to its black-box implementation. Keywords: Data mining, Backpropagation Neural Network, Naïve Bayesian Classifier, Classification.

1. Introduction

Data mining techniques provide people with new power to research and to manipulate the existing large volume of data. Data mining process discovers interesting information from the hidden data, which can either be used for future prediction and/or intelligent summarization of the data details. There are many achievements of application from data mining techniques to various areas such as engineering, marketing, medical, financial, and car manufacturing. The design and

manufacturing domain is a natural candidate for data-mining applications because it contains extensive data. Besides enhancing innovation, data-mining methods can reduce the risks associated with conducting business and improve decision-making [1]. Especially in profiling practices such as surveillance and fraud detection, before data mining algorithms can be used, a target dataset must be assembled. As data mining can only uncover patterns already present in the data, the target dataset must be large enough to contain huge number of patterns while at the same time, remain to be concise enough to be mined in an acceptable timeframe. A common source for data is a data mart or data warehouse. Because data mart and data warehouse are significant repository, preprocessing is essential to perform analysis on the multivariate datasets before any clustering or data mining task is performed [2]. Data mining tasks like clustering, association rule mining, sequence pattern mining, and classification are used in many applications. Some of the widelyused data mining algorithms in classification include Bayesian algorithms and neural networks. Backpropagation neural network (BNN): A neural network is a computer program that recognizes patterns. It is designed to take a pattern of data and to generalize from the patterns. An essential feature of this technology is that it improves the performance on a particular task by gradually learning a mapping between inputs and outputs of patterns. There are no set rules or

113


sequence of steps to follow in generalizing patterns of data. The network is designed to learn a nonlinear mapping between the input and output data. Generalization is used to predict the possible outcome for a particular task. This process involves two phases, known as the training phase (learning) and the testing phase (testing). Standard supervised backpropagation neural network learning methodology will be used for the experiments in this paper. A subset of available data from the data set will be used as training samples for the network. Training a backpropagation network focuses on obtaining optimal values for the learning rate, estimating the number of hidden layers and the number of nodes in each layer. The overall error is tracked until a minimum is obtained by altering the mentioned parameters. A trained network can be then used in classifying the test data. Naïve Bayesian (NB): The naïve Bayesian classifier works as follows: Let D be a training set of tuples and their associated class labels. As usual, each tuple is represented by an ndimensional attribute vector, X = (x1, x2, …, xn), depicting n measurements made on the tuple from n attributes, respectively, A1, A2, … , An. Suppose that there are m classes, C1, C2, …, Cm. Given a tuple, X, the classifier will predict that X belongs to the class having the highest posterior probability, conditioned on X. That is, the naïve Bayesian classifier predicts that tuple X belongs to the class Ci if and only if P(Ci|X) > P(Cj|X) for 1 ≤ j ≤ m; j ≠ i Thus, we maximize P(Ci|X). The class Ci for which P(Ci|X) is maximized is called the maximum posteriori hypothesis. The aim of this research is to create a model in data mining for the domain of Car Evaluation, which is useful in car marketing prediction system. This study also determines which techniques are more accurate in this domain either the Backpropagation Neural Network (BNN) classifier or the naïve Bayesian (NB) classifier. The objectives are of this paper as follows: •

To create a model for BNN classifier in mining data for Car Evaluation.

• •

To create another model for NB classifier in mining data for Car Evaluation. To analyze both models.

The remainder of the paper is organized as follows: Section (2) details out the related works, Section (3) presents the methodology, Section (4) describes the experiments, Section (5) presents the results and discussion, and finally Section (6) concludes the paper.

2. Related Work

An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If the selection is not performed thoroughly, the comparative studies across data mining tasks and algorithms may easily result in statistically invalid conclusions. This is especially true when one is using data mining techniques to analyze very large databases, which inevitably contain some statistically unlikely data. Bayesian classifiers have been well evaluated either among its various types of algorithms that compare among Naïve-Bayes, tree augmented Naïve-Bayes, BN augmented Naïve-Bayes and general BNs; or its performance have been evaluated in a specified filed of data mining such as that in [3]. Also a compression with other data mining algorithms had been inducted on a specified application like spam filtering, for example in [4]. On the other hand, Artificial Neural Networks (ANN) as a classifier algorithm are also widely-used in data mining for performing classification in a number of applications. [5] uses ANN and compares its performance against decision trees mining algorithm to develop a prediction models for breast cancer. [6] performs a comparison between ANN and Support Vector Machine (SVM) for Drug/Nondrug Classification. Several analysis studies have been conducted between Bayesian classifier and ANN classifier. In image processing, comparison has been made between Bayesian classifier with the multilayer feed-forward neural network using the example of plant/weed/soil discrimination. Another study compared between ANN and statistical methods based on Bayesian classifier to classify of multisource remote sensing and geographical data [7]. Other comparative studies on variation of Bayesian and ANN classifiers range across vast

114


domain such as in motion picture [8], breast tissue [9], webpage [10], Internet [11] and Skype network traffic [12].

3. Methodology

Figure 1. Buying

This paper performs an analytical study to compare different phenomena between the Backpropagation Neural Network (BNN) and the naïve Bayes (NB) classifier in the domain of car evaluation. The dataset is obtained from UCI Machine Learning Repository [13], which is supplied by the University of California. The dataset has been used as the source for many Bayesian classifications studies such as in [14] and [15]. The car evaluation database was originally derived from a simple hierarchical decision model. The model evaluates cars according to the following concept structure: CAR PRICE Buying Maint TECH COMFORT Doors Persons passengers Lug_boot Safety

Car acceptability Overall price Buying price Price of maintenance Technical characteristics Level of comfort Number of doors Capacity in terms

• • • • •

Figure 3. Doors

Figure 4. Persons

of

Figure 5. Lug_boot

The size of luggage boot Estimated safety of the car

PRICE, TECH, and COMFORT are three immediate concepts. Every concept is related to its lower level descendants by a set of examples. The car evaluation database contains examples with the structural information removed, i.e., directly relates CAR to six input attributes: buying, maint, doors, persons, lug_boot, and safety. There are 1,728 instances that completely cover the attribute space with 6 attributes (no missing attribute values) as follows: •

Figure 2. Maint

Figure 6. Safety

The class distribution, which is the number of instances per class is shown in Table 1 and Figure 7 accordingly. Table 1. Class distribution Number of Percentage instance per (%) class Unaac 1210 70.023 Acc 384 22.222 Good 69 3.993 Vgood 65 3.762 Class Name

buying: v-high, high, med, low (Figure 1) maint: v-high, high, med, low (Figure 2) doors: 2, 3, 4, 5-more (Figure 3) persons: 2, 4, more (Figure 4) lug_boot: small, med, big (Figure 5) safety low, med, high (Figure 6)

115


Figure 7. Class distribution

4. Experiments

Two classifier models are constructed for the purpose of classification, namely: Backpropagation Neural Network (BNN) classifier and Naïve Bayesian (NB) classifier using the Waikato Environment for Knowledge Analysis (WEKA) version 3.6.3 based on withinsubject experimental design [16]. All subjects received two different treatments, and the potential effects of each treatment observed. Figure 8 shows the NB algorithm as well as the Multilayer Perceptron algorithm in WEKA, which are used to construct the NB and BNN classifiers, respectively.

Nonetheless, we did not treat the missing data in data set. Interested reader can refer back, for instance, to [17]. We did not improve NBC since the literature indicated that NBC does not improve the accuracy of the classification task as expected in a set of natural domains [18]. The dataset is separated into two parts, one part is used as the training data set to produce the prediction model, and the other part is used as the testing dataset to test the model accuracy. We used 10-fold cross-validation method, which means all data from the dataset will be selected into training data and testing set consecutively. We also used the percentage split validation methods with different parameters. Backpropagation neural network (BNN): Standard supervised BNN learning methodology is used for these experiments. A subset of available data from the dataset used as a training sample for the network. The overall error is tracked until a minimum is obtained by altering the mentioned parameters. A trained network is then used in classifying the test data. For input layer, the number of nodes is equal to the number of values for all the six attributes. For the output layer, the number of nodes is equal to the four values of the class label, as shown in Figure 9.

Figure 8. BNN and NB algorithms in WEKA

Figure 9. Layers for the proposed the BNN

116


maint

safety doors

perso lug_boot

Figure 10. Example of NB structure

Time Taken in Testing

Number of Instances

Correctly Classified Instances

Incorrectly Classified Instances

10-fold crossvalidation Split 10% train, remainder test Split 66% train, remainder test Split 90% train, remainder test

Time Taken in Training (Build Model)

Test Mode

Table 2. Results for Multilayer Perceptron

Around 10 seconds

Very high

1728

1720 (99.53 7%)

8 (0.463 %)

Around 10 seconds

High

1555

1382 (88.87 5%)

173 (11.12 5%)

Around 10 seconds

High

588

583 (99.15 0%)

5 (0.850 %)

Around 10 seconds

High

173

172 (99.42 2%)

1 (0.578 %)

Time Taken in Testing

Time Taken in Training (Build Model)

Incorrectly Classified Instances

buying

Correctly Classified Instances

Table 2 and Table 3 show the results for both Multilayer Perceptron and Naïve Bayesian, respectively using different models and under different experimental setting. Also, a number of comparative terms are presented.

10-fold crossvalidatio n Split 10% train, remaind er test Split 66% train, remaind er test Split 90% train, remaind er test

Number of Instances

5. Results and Discussions

Table 3. Results for Naïve Bayesian

Test Mode

Naïve Bayesian (NB): For NB, a subset of available data from the data set will be used as a training sample to compute the prior and the posteriori probabilitities. Then the rest of the data is use as a testing set with different percentage parameters using the same 10-fold cross-validation method. Figrue 10 shows one possible structure of the NB classifier during training.

Milli-seconds (almost instant)

1728

1481 (85.706% )

247 (14.294% )


1555

1270 (81.672% )

285 (18.328% )


588

486 (82.653% )

102 (17.347% )


173

144 (83.237% )

29 (16.763% )

Recall that we tested BNN and NB classifiers using 10-Fold cross-validation and percentage split with different percentages (10%, 66%, and 90%). These have led us to a number of observations. One, in term of accuracy, BNN always gives more accurate results as compared to NB with highest accuracy using the 10-fold cross-validation technique. Two, in term of time, NB always perform at a faster rate. Building the model (training) and testing are executed almost instantly. On the other hand, BNN is much slower where building the model in every mode takes almost the same time (which was 11 seconds in our case), while time for testing varies from one test mode to another. We found out that in 10-fold cross-validation, the testing mode was time consuming, but in the percentage split, the smaller the training set is, the longer it takes to complete the test. Three, we tried to investigate the consequence of changing the order of the data in the dataset, while preserving the attribute values to both algorithms. We observed that percentage split testing mode in both algorithms yielded different results and the number of correctly classified instances changed. On the other hand, the results

117


did not change when 10-fold cross-validation testing mode was used. Four, concerning the incompleteness of the knowledge by means of omission of some data, we observed that WEKA ignored the whole tuple that contains the missing data before applying the algorithm as a kind of preprocessing to the data, rather than replacing some of the missing date through some estimation techniques. Five, by viewing the raw data and the corresponding classification results, NB is selfexplanatory but BNN implementation can be considered as a black box. This means that we are not able to explain how the net reached to the correct solution. Six, in term of incrementability, we can easily add some new tuples or rules to the NB by performing simple calculations, but in the BNN, it is hard to add or modify because we need to retrain the whole net even due to a very minor modification. Seven, in term of modifiability, we can easily change rules independently from each others in NB but it is harder to modify in BNN.

6. Conclusions and Future Works

The comparative analysis discussed in the present work support our assumptions in which the BNN is slower, ambiguous, and is more difficult to manipulate than NB. Nonetheless, BNN shows an amazing percentage of accuracy. The backpropagation algorithm also sometimes causes feedforward neural networks to converge to a local minimum on the error surface. The research also tried to investigate other factors like changing the order of patterns applied to the net and the incompleteness of data for both models. Repeating the experiments on different datasets with varying settings is suggested as a future work.

7. References

[1] A. Kusiak and M. Smith. Data Mining in Design of Products and Production Systems. Annual Reviews in Control, 31(1):147-156, 2007. [2] J. Han and M. Kamber. Data Mining Concepts and Techniques, 2nd Edition. Morgan Kaufmann, 2008.

[3] Y. Yang and G. I. Webb. A Comparative Study of Discretization Methods for Naive-Bayes Classifiers, In proceedings of the 2002 Pacific Rim Knowledge Acquisition Workshop, Tokyo, Japan, pp. 159-173, 2002. [4] I. Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C. D. Spyropoulos and P. Stamatopoulos. Learning to Filter Spam E-Mail: A Comparison of a Naïve Bayesian and a Memory-based Approach. In proceedings of the Machine Learning and Textual Information Access Workshop, pp. 1-13, 2000. [5] D. Delen, G. Walker, and A. Kadam. Predicting Breast Cancer Survivability: A Comparison of Three Data Mining Methods. Artificial Intelligence in Medicine, Elsevier, pp. 121-130, 2004. [6] J. A. Marchant and C. M. Onyango. Comparison of a Bayesian Classifier with a Multilayer Feed-Forward Neural Network using the Example of Plant/Weed/Soil Discrimination. Computers and Electronics in Agriculture, Elsevier, 39: 3-22, 2003. [7] J. A. Benediktsson and P. H. Swain. Neural Network Approaches versus Statistical Methods in Classifications of Multisource, IEEE on Geosciences and Remote Sensing, 28(8), 1990. [8] R. Russo. Bayesian and Neural Networks for Motion Picture Recommendation. Technical Report, Boston COllege, 2006. [9] Y. Wu and S.C. Ng. Combining Neural Learners with the Naive Bayes Fusion Rule for Breast Tissue Classification. In proceedings of the 2nd IEEE Conference on Industrial Electronics and Applications, pp. 709-713, 2007. [10] D. Xhemali, C.J. Hinde, and R.G. Stone. Naive Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages. International Journal of Computer Science, 4(1):16-23, 2009. [11] T. Auld, A.W. Moore, and S.F. Gull. Bayesian and Neural Networks for Internet Traffic Classification. IEEE Transactions on Neural Networks, 18(1):223-239, 2007. [12] M. Jalali. Skype Traffic Classification: Naive Bayes or Neural Networks, Technical Report, University of Toronto, 2010. [13] UCI. http://archive.ics.uci.edu/ml/. Accessed on 11 November 2010.

118


[14] F. Pernkopf. Bayesian Network Classifiers versus Selective k-NN Classifier. Pattern Recognition, 38(1):1-10, 2005. [15] D. M. Kline and C. S. Galbraith. Performance Analysis of the Bayesian Data Reduction Algorithm. International Journal of Data Mining, Modeling and Management, 1(3): 223-236, 2009. [16] M. Hall, E. Frank, G. Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1, 2009. [17] P. Liu, L. Lei, and N. Wu. A Quantitative Study of the Effect of Missing Data in Classifiers, Computer and Information Technology, pp. 28–33, 2005. [18] K. M. Ting and Z. Zheng. A Study of AdaBoost with Naive Bayesian Classifiers: Weakness and Improvement. Computational Intelligence, 19(2): 186-200, 2003.

Biographies

Sarmad Makki received the B.Sc. and MSc. degrees in Computer Science from University of Baghdad in 1995 and 2000, respectively. Currently he is one of the staff members in University of Baghdad - College of Science and he is preparing his PhD at Universiti Putra Malaysia (UPM) with the Intelligent Computing Research Group. His research interests include Data Mining, Clustering and Evolutionary Computation. Aida Mustapha received the B.Sc. degree in Computer Science from Michigan Technological University and the M.IT degree in Computer Science from UKM, Malaysia in 1998 and 2004, respectively. She received her Ph.D. in Artificial Intelligence focusing on dialogue systems. She is currently an active researcher in the area of Computational Linguistics, Text Mining, and Agent Systems in Social Networks.

Junaidah Mohamed Kassim received the B. IT. degree in Information Science from Universiti Kebangsaan Malaysia (UKM) and the M.Sc degree in Computer Science from Universiti Teknologi Malaysia (UTM), Malaysia in 1999 and 2002, respectively. Now she is doing her Ph.D. in Knowledge Management focusing on decision making at Universiti Putra Malaysia (UPM).

Ealaf H. Gharaybeh received his B.Sc. degree in Software Engineering from Philadelphia University, Jordan in 2009. He is currently pursuing M.Sc in Software Engineering at Universiti Putra Malaysia. His main research interests include Data Mining, Software Architecture, Software Quality and Software Evaluation. Mohammed alhazmi received his Bachelor Degree in Information Technology (Hons) in Security Technology from Multimedia University in 2009 and his Masters Degree in Computer Science (Software Engineering) from Universiti Putra Malaysia in the following year 2010. Currently he is working at Yanabea Technology Trading Est. as a Team Leader in the Development Department at Riyadh, Saudi Arabia. The company is developing and moderating the project of the E-government for the Ministry of Higher Education in Saudi Arabia.

119