Mobile Package Recommendation using Classification with ... - SAKI

3 downloads 444 Views 5MB Size Report
As one interesting application, it is possible for telecommunication .... The testing version of. Weka is 3.6.10 and running on Macbook Pro, model early. 2011. ..... “Impact of 3G and beyond technology development and pricing on mobile data ...
Mobile Package Recommendation using Classification with Feature Discretization and Threshold-based Ensemble Technique Boonyarit Soonsiripanichkul, Nattapong Tongtep, Thanaruk Theeramunkong School of Information, Computer, and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University 131 Moo 5, Bangkadi, Muang, Pathum Thani, Thailand 12000 [email protected] {nattapong,thanaruk}@siit.tu.ac.th http://www.siit.tu.ac.th Abstract— Advising customers on which package they should select is a challenging task since various needs and available packages may generate complicated conditions. For instance, there are two packages that are compatible for talk only but different in sale channel such as for business and for individual personal. This paper presents a comparative study on feature discretization and ensemble technique. The applied classification models are naïve Bayes and their ensembles on different types of features with the consideration of feature discretization and performance threshold-based ensemble. The model has been tested on a collection of approximately 82,000 SIM card activation records from a Thai Telecommunication Company. Classification accuracy improves from 86.00% to 90.00% after discretization, and increases to 91.97% after threshold-based ensemble. Keywords— Naïve Bayes Classification, Comparative Study, Feature Discretization, Mobile Package Recommendation, Threshold-based Ensemble.

I. INTRODUCTION Nowadays, there have been electronic collections of tremendous data that a company can use for understanding what customers need and recommending the appropriate products to them. As one interesting application, it is possible for telecommunication organizations to analyse past transactions to acquire rules for advising suitable package for their customers. However, it is difficult for a staff to advice appropriate promotion to users because there are various factors for selecting the mobile device package such as cellular data limitation, call duration, and others. Therefore, it is a challenging topic to automate recommendation process by applying data mining technique. Harno [1] mentioned that after 3G technology has been promoted, mobile data service has increased and voice service looks declining everyday. He studied about the flat rate pricing of mobile package of data service. He also discussed that the current trend that customer user use smart device is increasing everyday. Therefore a problem is that we should consider how to serve right promotion to right customer. Hui [2] reported that customer service support is one of important

functions of business to improve customer satisfaction. They use data mining technique to solve the problem that often occurred by automatic advisory system. Nuddamongkol [3] mentioned some data mining techniques face with numerical value. However, we can solve that problem by discretization, means reduce details of data by dividing the data into sub range in order to decrease processing time and improve accuracy. Güvenir [4] concluded that voting feature-based classifier has ability to improve the accuracy of predicting bank financial distress. Tripoliti [5] mentioned that voting technique with random forest algorithm gives significant improvement of the result in the performance of projection. In this paper, we present a comparative study on feature discretization and ensemble technique. The applied classification models are naïve Bayes and their ensembles on different types of features with the consideration of feature discretization and performance threshold-based ensemble. In Section 2, we propose a multi-stage framework composed of four states, preprocessing stage, training stage, testing stage, and ensemble stage. The experimental settings are illustrated in Section 3. Section 4 displays the experimental results and provides discussion on feature discretization and thresholdbased ensemble technique. Conclusions and future works are given in Section 5. II. THE FRAMEWORK A. Framework The explanation has been done in four stages as in Figure 1. In the preprocessing stage, we have cleaned data by removing noisy data (Cleaning Data) i.e., the data of SIM activation where is not defined that label or SIM activation for internal company usage. However, we still maintained the data that have missing value because naïve Bayes classifier allows missing value for training and testing. Then, we get the verified data. The next process is feature selection (Selecting Feature), it is the process to filter only valuable features for our scope of work, and we filter the good relation attributes for predicting the class by predict the class using only one feature each and see the correct prediction rate. After that, we

select only the channel of sale that we focus, which is shop (Filtering Data). In this process, we reduce data from around 120,000 entities to more than 80,000 entities. Then, we transformed data i.e. rename classes, and extract data from one column to two columns (Transforming Data, Extracting Data). This process has done for privacy issue. In this process we got formatted data to discretise. Next, the number of distinct of numerical features, which are “price range”, “3G flat rate”, and “voice”, are discretised by applying equalfrequency method except two values, which 0 (no usage) and 1 (unlimited usage). They are separated to its individual group. Therefore, we cluster it in this process (Discretising Data) to reduce complexity and observe the best result. Then, we do list all possible sets of discretization. Training stage, after we discretised feature and create set of combination, we perform eight training set. After that we train the data with naïve Bayes classifier (NB Learning) model and retrieve the eight set for test data, which is Model Set 1 to 8. In testing stage, eight test sets are performed after preprocessing stage. Then, each test set is trained by model of that set and get classified result. Candidate relations are tested by NB model. The last stage is threshold-based ensemble stage. The predictive results of each set are predicted. We vote the result score for accept or reject to suggest the package of mobile phone and get the evaluation result. B. Discretization description The learning algorithm usually yields poor result in accuracy or in efficient rate of correct prediction if it is faced with many irrelevant features [6]. Therefore, feature discretization is the procedure to improve the performance of prediction by grouping the relevant numerical data range together. Discretization is the method in preprocessing step to reduce the learning complexity to improve dependence between data. It is the technique to transform data from numerical value to nominal value to decrease the variety of possible values or of distinct. Discretization is good for noise

reduction or better representation [7]. The method of discretization that is applied for the experiment is equalfrequency discretization (EFD) method. The properties of EFD as follows [8]: 1. Discretization model relies on data that sort from smallest to highest value 2. EFD allows separating the range value into sub range. 3. In each sub range, the value is divided by the frequency of the data in this interval. To select the boundary of interval, EFD use halfway ((a+b)/2) between two sides of the boundary [9]. C. Classification In general, there are various classification techniques for classification but each of them may use in different aspect. The reason for selects naïve Bayes (NB) because it is the statistical classification type that directly matches with feature discretization issue. It is the most intuitive and simple classifier [10]. Besides, naïve Bayes is also use when there is some missing data in the dataset. Moreover, it allows the dependencies among subsets of attributes. In order words, the values of attribute are conditionally independent of one another, given the class label of the object [7]. The features F1-F6 are input attributes while the last feature F7 is the output with only multiple alternative values; class 1 to class 16 according to Table II. We can formulate the NB model as follows. The probability that a record 𝑟 = (𝑟! , 𝑟! , … , 𝑟! ) (characterized by a set of features, 𝐹! = (𝐹! = 𝑟! ), (𝐹! = 𝑟! ), … , (𝐹! = 𝑟! ) ) will be revisited again in the region N2 (𝐹! = +1), can be defined as follows: 𝑝 𝐹! = +1|𝐹! =

𝑝∗

𝑝∗ 𝐹! = +1|𝐹! 𝐹! = +1|𝐹! + 𝑝∗ 𝐹! = −1|𝐹!

(1)

!

𝑝∗ 𝐹! = +1|𝐹! = 𝑝(𝐹! = +1)

Figure 1: The proposed framework

𝑝(𝐹! = 𝑟! |𝐹! = +1) !!!

(2)

!

𝑝∗ 𝐹! = −1|𝐹! = 𝑝(𝐹! = −1)

𝑝(𝐹! = 𝑟! |𝐹! = −1)

(3)

!!!

Here, 𝑝 𝐹! = +1|𝐹! is the probability that the record 𝑟 will be revisited again in the region N2, 𝑝 ∗ 𝐹! = +1|𝐹! is the unnormalized probability that the record 𝑟 will be revisited again, (i.e., positive class), and 𝑝 ∗ 𝐹! = −1|𝐹! is the unnormalized probability that the record 𝑟 is not revisited again in the region N2 (i.e., negative class), 𝑝(𝐹! = +1) (or 𝑝(𝐹! = −1)) is the prior probability that the record is revisited (or not revisited). 𝑝(𝐹! = 𝑟! |𝐹! = +1) (or 𝑝(𝐹! = 𝑟! |𝐹! = −1) ) is the probability that the feature (𝐹! = 𝑟! ) when the class is positive (or negative). For continuous attributes (𝐹! ), Guassian distribution with smoothing can be applied as follows: 𝑝 𝐹! = 𝑟! 𝐹! = +1 =

𝑝 𝐹! = 𝑟! 𝐹! = −1 =

1 σ!

2π 1

σ!



𝑒

𝑒

!

!

!! !! ! ! !!

!! !! ! ! !!

! !

(4)

! !

(5)

Here, µμ ! (or µμ ! ) is the mean of the positive (or negative) class, σ ! (or σ ! ) is the standard deviation of the positive (or negative) class, and 𝜖 is a small positive constant used for smoothing to resolve sparseness problem. It is set to 0.000001 in the experiments [7]. D. Threshold-based ensemble Threshold-based ensemble is the upper layer after evaluates testing set. We construct by comparing the F-measure or performance of predicted class of each training model. Then, majority vote is applied. If the F-measure more than 0.5, predicted class is shown. Otherwise, class 0 that means cannot vote is shown. Finally, the system counts the most occurrences class for accept or reject. The system rejects when class 0 is the most occurrences. The process flowchart is shown in Figure 2 and algorithm is shown in Figure 3. If the result is rejecting, it means that the predicted class is not

ALGORITHM threshold-based ensemble n # number of sets INPUT classSet{n} # a set of predicted class of one instance INPUT FMeasureSet{n} # a set of the F-measure value of predicted class of one instance arrayCount {n+1} = 0 #n+1 for cannot predicted # a set of count value of each set i = 0, count = 0, j = 0, indexAt = 0 WHILE i < n IF fMeasureSet[i] > 0.5 arrayCount[i] ++ count++ ELSE arrayCount[n+1] ++ ENDIF i ++ ENDWHILE WHILE j < n+1 IF highestCount < MAX(arrayCount[j]) highestCount = MAX(arrayCount[j]) indexAt = j ENDIF j++ ENDWHILE IF indexAt == j RETURN reject ELSE RETURN arraySet[indexAt] ENDIF END threshold-based ensemble Figure 3: Our proposed threshold-based algorithm

confident. Therefore, we avoid answering the predicted class of that input set. III. EXPERIMENTAL SETTING A. Software Weka is the data mining software that consists of multiple tools for mining data include data preprocessing, classification, cluster, associate rules, and visualization [11]. The Weka application has been used to discover the tremendous data. In this case, the data of SIM card activation to discover the most appropriate model for classification. The testing version of Weka is 3.6.10 and running on Macbook Pro, model early 2011. Weka is open-source Java application developed by the University of Waikato, New Zealand. This application provides a number of features and user-friendly interface

Figure 2: Threshold-based ensemble flowchart

through which many of the standard algorithms are present [12]-[14]. Additionally, it supports Comma-separated values (.csv) file format that utilizes preformatted data sets. B. Evaluation After naïve Bayes technique is applied to the dataset. The evaluations that are focused are accuracy for overall prediction and the F-measures also considered for reliability of each output class. The evaluation of accuracy rate (Acc) used to evaluate the effectiveness of the correct (true) value. The equation below shows how Acc is computed. 𝐴𝑐𝑐 =

𝑇𝑁 +  𝑇𝑃 𝑇𝑃   +  𝐹𝑃   +  𝐹𝑁   +  𝑇𝑁

Note that TP = true positive, TN = true negative, FP = false positive, and FN = false negative. Precision is used to measure the probability of correct prediction. The precision formula is shown as follows [15]: 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =

𝑡ℎ𝑒  𝑛𝑢𝑚𝑏𝑒𝑟  𝑜𝑓  𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒  𝑐𝑜𝑟𝑟𝑒𝑐𝑡  𝑎𝑛𝑠𝑤𝑒𝑠 𝑇𝑃𝑡ℎ𝑒  𝑛𝑢𝑚𝑏𝑒𝑟  𝑜𝑓  𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒  𝑎𝑛𝑠𝑤𝑒𝑟𝑠

The recall (R) is used to evaluate the effectiveness of classifier for each class. It is also known as true positive rate usually increase when it gain higher probability to obtain more correct classification. 𝑟𝑒𝑐𝑎𝑙𝑙 =

𝑡ℎ𝑒  𝑛𝑢𝑚𝑏𝑒𝑟  𝑜𝑓  𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑  𝑐𝑜𝑟𝑟𝑒𝑐𝑡  𝑎𝑛𝑠𝑤𝑒𝑟𝑠 𝑡ℎ𝑒  𝑛𝑢𝑚𝑏𝑒𝑟  𝑜𝑓  𝑐𝑜𝑟𝑟𝑒𝑐𝑡  𝑎𝑛𝑠𝑤𝑒𝑟𝑠

The F-measure is a combined measure between recall and precision. A constant α controls the weight of focus on recall or precision. Therefore, when α  is  set  to  0.5,  it  means  recall   is  considered  as  important  as  precision. The equation of the F-measure is as follows [7]: 𝑓 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 =

𝑟𝑒𝑐𝑎𝑙𝑙  ×  𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝛼  ×  𝑟𝑒𝑐𝑎𝑙𝑙 + ( 1 − 𝛼  ×  𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)

C. Feature Description As a preparation step, we consider the data of SIM activation post-pay history. There are six features and one concept (label). There are three nominal attributes that are day in week, usage, and region. The first feature name is day in week; it is the attribute that consists of seven distinct i.e., Monday, Tuesday, …, and Sunday. The next attribute is type of customer usage that can be separated into six types, which are BB, smart, talk, internet, business, and TV. Another nominal feature is region. Region is also considered because there was professional expert person said, difference in places may produce in different results [16]. We separate the area of Thailand into 14 regions. The others existing three numeric attributes are price in Baht, voice in minute, and Internet in mbps. All of these features are clustered for feature discretization (see in Table I).

TABLE I FEATURE DISCRETIZATION DETAILS

Entity Price Voice Internet

Description The price of the package in Thai Baht The maximum calls in minute of that package The flat rate of 3G service (data usage in mbps)

# of Original Distinct 50

# of Discretize Distinct 5

32

7

15

7

instances of each class. The total of records are 82,168 instances. E. Setting This section shows how to construct the experiment to classify the mobile package with feature discretization and threshold-based ensemble evaluation. By preliminary experiments, we clean the data. Then, we select only attribute that necessary for this experiment. Next, we filter only instances that sold by shop channel, which is in our scope of work. Then, modifying some details such as renaming the attribute for privacy and security issue. The final step for preprocessing stage is discretising numerical feature we perform discretising process according to Table III. The TABLE II DATASET STATISTICS

Label

Description

Class 8

The package for calls and surfing the Internet type 1 Package for Internet only The cross selling promotion Package for calls only type 4 Package for calls only type 2 High flat rate of mobile Internet usage The promotion for iPhone device The promotion for iPad device Package for calls only type 3 The package for calls and surfing the Internet type 2 Free SIM card Package for calls only type 1 The promotion for Blackberry device The mobile package acceptable only for business to business Package for SME business The event SIM card Total records of all device packages

Class 5 Class 16 Class 14 Class 9 Class 4 Class 7 Class 6 Class 11 Class 12 Class 10 Class 2 Class 1 Class 3 Class 13 Class 15

# of Records 36,752 12,166 12,098 7,428 5,018 2,802 2,760 1,165 912 428 151 139 138 97 73 41 82,168

equal-frequency separation is applied. Moreover, we manually set additional special two bins that are 0 (no usage) and -1 for unlimited usage thus there are seven bins for usage and 5 bins D. Class Description for price rate because price rate have neither zero nor The package of SIM card is a nominal attribute. In this case, unlimited. For feature discretization, we have set the subset of three 16 possible packages are classified. The details of classes are shown in Table II. Table II also shows the number of numeric features as shown in Table I. Then, we evaluate using

naïve Bayes classifier by 8 experiments for testing each combination of feature and record the result. The combination TABLE III TABLE OF FEATURE COMBINATION

Set 1 2 3 4 5 6 7 8

f1 f1 f1 f1 f1 f1 f1 f1

f2 [A] f2 f2 [A] [A] f2 [A]

Feature f3 f4 f3 f4 [B] f4 f3 [C] [B] f4 f3 [C] [B] [C] [B] [C]

f5 f5 f5 f5 f5 f5 f5 f5

f6 f6 f6 f6 f6 f6 f6 f6

of the set of NB model is shown in Table III. The Table III shows combination of feature discretization. The features of f1 to f6 are day in week, voice, Internet, price, usage, and region respectively. A, B, and C are discretised voice, Internet, and price respectively. Then, we construct model set 1 to 8 by using naïve Bayes classifier, and we perform NB classification with percentage split. The data set is divided in a training set and a test set between 67% and 33% respectively [17]. We select percentage split because we have quite large instances of data. The percentage split gives a little lower correct prediction but consumes much less time if we compare with 10-fold cross validation. After, we execute the model with testing set, we record the result of the F-measure (F), and accuracy (Acc). Then, we evaluate confident of predicted class using threshold-based technique that accept only the set that have the F-measure more than 0.5 otherwise, those set cannot vote. The process is shown in Figure 2. Next, we accept the result only if the most

occurrences class is more than no vote (class 0) class. Finally, we compare the accuracy before and after apply ensemble. IV. EXPERIMENTAL RESULT AND DISCUSSION The result of the F-measure between each package and each set are shown on Table IV. “None” means no feature discretization or original data were executed. “A”, “B”, and “C” means that feature is discretised then executes using naïve Bayes. The results shown as underline are the maximum Fmeasure of each class. According to Table IV, it can be seen that the most accuracy model is the combination of voice, and Internet, which is 90.00%. It increased by 3.86% from none discretised. It means that if we reduce the details of voice and Internet attribute, the better accuracy is increased. Although it gave the attractive rate of prediction, it is not give the best prediction rate for every class. The maximum average of the F-measure is set 8. The last classified set also has the highest performance to predict all class up to 8 classes. The second rank of the highest predicted class is Set 7 that gave the correctly predict up to six classes. Then, Set 5 is the third for highest number of the F-measure, which are five classes. Another aspect is focus on relationship of classes. It can be seen that class 1 and class 16 is not affect by feature discretization technique. It means that the pattern of these two classes may affect by other three nominal, or it is possible that all of available features of this experiment may not affect to these two classes. Surprisingly, the best F-measure result of class 7 is 0.61 that come from none discretised feature. It means that feature discretization is not compatible with class 7. However, the most outstanding result that feature discretization make the prediction much more effective is class 12. The F-measure result was rocket from 0 to 0.98 that is more than 90.00% predictive improvement. The overall of other classes also improve when we apply feature discretization technique refer to the graph.

TABLE IV THE F- MEASURE RESULT OF EACH CLASS USING NAÏVE BAYES CLASSIFIER Feature Discretization Accuracy (%) F-measure Class1 Class2 Class3 Class4 Class5 Class6 Class7 Class8 Class9 Class10 Class11 Class12 Class13 Class14 Class15 Class16 Average

Set 1 None 86.14

Set 2 A 88.48

Set 3 B 87.67

Set 4 C 87.10

Set 5 A, B 90.00

Set 6 A, C 88.52

Set 7 B, C 88.42

Set 8 A, B, C 89.87

1.00 0.71 0.49 0.16 0.86 0.33 0.61 0.95 0.72 0.57 0.64 0.00 0.69 0.82 0.10 0.99 0.86

1.00 0.67 0.49 0.16 0.86 0.33 0.61 0.95 0.91 0.00 0.87 0.05 0.67 0.93 0.11 1.00 0.88

1.00 0.80 0.81 0.42 0.88 0.45 0.61 0.96 0.72 0.55 0.63 0.98 0.62 0.82 0.12 0.99 0.88

1.00 0.80 0.44 0.18 0.86 0.35 0.60 0.95 0.85 0.00 0.66 0.00 0.60 0.84 0.33 1.00 0.87

1.00 0.67 0.76 0.42 0.88 0.45 0.61 0.96 0.91 0.00 0.87 0.98 0.60 0.93 0.00 1.00 0.90

1.00 0.40 0.40 0.18 0.86 0.35 0.60 0.95 0.91 0.00 0.90 0.11 0.72 0.93 0.00 1.00 0.88

0.99 0.90 0.58 0.53 0.87 0.45 0.61 0.96 0.85 0.00 0.66 0.98 0.67 0.84 0.36 1.00 0.89

0.99 0.40 0.75 0.53 0.87 0.45 0.61 0.96 0.91 0.00 0.90 0.98 0.78 0.93 0.00 1.00 0.91

TABLE V TABLE OF VOTING PROCEDURES Instance 1 2 3 . . ,

Set 1 4, 0.157, 0 9, 0.715, 9 5, 0.857, 5 . . ,

Set 2 4, 0.158, 0 16, 0.995,16 5, 0.857, 5 . . ,

Set 3 4, 0.415, 0 9, 0.721, 9 4, 0.415, 0 . . .

Set 4 4, 0.181, 0 13, 0.603,13 5, 0.859, 5 . . .

Set 5 4, 0.417, 0 16, 0.995, 16 4, 0.417, 0 . . .

According to the result of the F-measure, the classes that surely cannot pass ensemble process are class 4, 6, 10, and 15 because there is the F-measure less than 0.5, more than half of number of set. Therefore, these four classes are rejected for this evaluation to improve the accuracy of prediction by threshold-based ensemble occurrence model. This ensemble model give satisfy improvement on accuracy from 90.00% when numerical features are discretised to 91.97%. The examples of ensemble procedure of instances are shown in Table V. According to Table V, x means predictive class, y means the F-measure of that predictive class, and z means the most occurrences class after pass ensemble process if 0 means the F-measure below 0.5. V. CONCLUSION AND FUTURE WORK This paper conducting the approach of improves the accuracy of suggestion mobile package promotion to customers. We improve accuracy by 2 steps. First, we focus on preprocessing stage. The discretization technique is applied in this preprocessing stage. We apply equal-frequency discretized to the data. It means that we transform data from numerical value to nominal value. Secondly, we focus on decision for prediction process. We apply ensemble the most occurrences technique to decision to answer the prediction or not. If not, we reject to predict. Although the result of combination of all discretised feature gives the best F-measure, it do not guarantee that combination of these two give all classes to the best accuracy. Moreover, we concluded that discretisation is not only improves representation of data but also smooths noises. Next, Ensemble procedure improves the accuracy of prediction by ensemble the most occurrences answer includes vote no answer. The weak point of this ensemble technique is that if there are a lot of performance lower than 0.5 for a class, these class is rejected by voter because the most occurrences is cannot vote. However, threshold-based process can improve confident of prediction because if the set of combination is answer the same predicted class with high the F-measure, it usually increases answer weight. We think that predicted class that are reject from ensemble procedure because the features that we consider are not enough or is not compatible for prediction. Therefore, it show low rate of the F-measure in most of classified set. The future work that we plan to research is how to minimize the reject case but improve the accuracy also. On the other hand, we plan to apply our ensemble process with other models. Moreover, we plan to improve the accuracy of predicts the mobile package selection by applying other discretization method and balance class technique in preprocessing stage.

Set 6 4, 0.182, 0 13, 0.721, 13 4, 0.182, 0 . . .

Set 7 4,0.525, 4 13, 0.667, 13 4, 0.525, 4 . . .

Set 8 4,0.525, 4 16, 1.000, 16 4, 0.525, 4 . . .

Occurrences 6 3 3 . . .

Answer 0 (Reject) 16 (Accept) 5 (Accept) . . .

REFERENCES [1] [2] [3] [4]

[5]

[6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

Harno, J. (2010). “Impact of 3G and beyond technology development and pricing on mobile data service provisioning, usage and diffusion”. Telematics and informatics, 27(3), 269-282. Hui, S. C., & Jha, G. (2000). “Data mining for customer service support”. Information & Management, 38(1), 1-13. Nuddamongkol S. 1999. “Knowledge Discovery in Database Based on Kohonen Self-organizing Map Algorithm”. Altay Güvenir, Murat Çakır, H. Voting features based classifier with feature construction and its application to predicting financial distress, Expert Systems with Applications, Volume 37, Issue 2, March 2010, Pages 1713-1718 Evanthia E. Tripoliti, Dimitrios I. Fotiadis, George Manis, “Modifications of The Construction and Voting Mechanisms of The Random Forests Algorithm”, Data & Knowledge Engineering, Volume 87, September 2013, Pages 41-65 John, G. Kohavi, R. Pfleger, K. “Irrelevant features and the subset selection problem”, in Proc. 11th Znt. Machine Learning Conf. (Morgan Kaufmann, NJ, 1994), pp. 121-129. Theeramunkong, T., “Introduction to Concepts and Techniques in Data Mining and Application to Text Mining”, Edition 1: 2011. Boulle, M. 2005. “Optimal bin number for equal frequency discretizations in supervized learning”. Intelligent Data Analysis, 9(2), pp. 175-188. David G. Sullivan, 2013, “Preparing The Data”, Lecture document of Boston University. Balamurugan, A., 2010, “NB+: An improved Naïve Bayesian algorithm”, Journal of Knowledge-Based Systems, 24, pp.563-569. Ozer P. “Data Mining Algorithms for Classification”, BSc Thesis Artificial Intelligent. January 2008. Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2nd edition, 2005. Drazin, S., & Montag, M. 2012. Decision Tree Analysis using Weka. Machine Learning-Project II, University of Miami, pp. 1-3. Weka Data Mining Java Software. Available at: http://www.cs.waikato.ac.nz/~ml/weka/ Pant, B., Pant, K., & Pardasani, K. R. 2009. “Decision tree classifier for classification of plant and animal micro RNA’s”. In Computational Intelligence and Intelligent Systems, pp. 443-451. Lambert, C, & Mizerski, D 2010, “Purchase of A Fast Food Cartoon Character Toy Premium Targeted to Young Children”, American Academy Of Advertising Conference Proceedings, pp. 90-92. Barbosa S., Chen P., Cuzzocrea A., Du X, Filipe J, Kara O, Kotenko I, Sivalingam K.M., Slezak D, Washio T, Yang X. “Communications in Computer and Information Science”, Volume 51, 2009, pp. 433-451.