This research paper has presented a heart disease prediction system using different data mining techniques namely Naïve Bayes (NB), Decision Tree (DT) and.
International Conference on Information Society (i-Society 2014)
An Ensemble based Decision Support Framework for Intelligent Heart Disease Diagnosis 1
Saba Bashir, 2Usman Qamar, 3M.Younus Javed
College of Electrical and Mechanical Engineering National University of Sciences & Technology (NUST), Islamabad, Pakistan {1saba.bashir, 2usmanq, 3myjaved}@ceme.nust.edu.pk Abstract — Large amount of medical data leads to the need of intelligent data mining tools in order to extract useful knowledge. Researchers have been using several statistical analysis and data mining techniques to improve the disease diagnosis accuracy in medical healthcare. Heart disease is considered as the leading cause of deaths worldwide over the past 10 years. Several researchers have introduced different data mining techniques for heart disease diagnosis. Using a single data mining technique shows an acceptable level of accuracy for disease diagnosis. Recently, more research is carried out towards hybrid models which show tremendous improvement in heart disease diagnosis accuracy. The objective of the proposed research is to predict the heart disease in a patient more accurately. The proposed framework uses majority vote based novel classifier ensemble to combine different data mining classifiers. UCI heart disease dataset is used for results and evaluation. Analysis of the results shows that the sensitivity, specificity and accuracy of the ensemble framework are higher as compared to the individual techniques. We obtained 82% accuracy, 74% sensitivity and 93% specificity for heart disease dataset. Keywords—Data mining; Naïve Bayes; Decision tree; Support vector machine; Majority Voting; Ensemble technique
I.
INTRODUCTION
Computational intelligence plays a vital role in medical diagnosis and intelligent decision making. There are large numbers of medical applications and diagnosis procedures that can be categorized using intelligent computational classification tasks. Heart disease, also named as canary artery disease (CAD), is a generalized term that can be used to relate any symptom related to heart [1]. CAD is a most common form of cardiovascular disease where coronary arteries narrow and sometimes widen which leads to heart attack. The death rate due to cardiovascular disease is rapidly increasing worldwide and almost 500,000 women killed every year due to heart attacks [1]. There are many symptoms related to heart disease. Some people feel chest pain and fatigue, whereas almost 50% of the people feeling nothing until a heart attack occur. According to statistical data from world health organization (WHO), the CAD may result in abnormality or permanent disability in many men and women; and almost one third of the worldwide population died in developing countries by 2010 [1]. The prediction of a disease made by a healthcare physician is not 100% accurate [2]. Computational biology is often used to translate clinical biology into clinical practice and it also helps
978-1-908320-38/4/$25.00©2014 IEEE
in identification of disease from clinical data. The most effective role of computational biology is to help in discovery of biomarkers (attributes) in heart disease. The diagnostic phenomenon may involve combination of many different data fields and development of predictive models. There are many statistical analysis and data mining techniques that can be used to accomplish the disease diagnosis task [1, 3]. Data mining plays an important role in field of heart disease prediction. The study of classification involves the discovery of hidden patterns from existing clinical data to identify the boundary between healthy and heart disease individuals [4]. Heart disease classification is very critical for therapy of patients. This research paper has presented a heart disease prediction system using different data mining techniques namely Naïve Bayes (NB), Decision Tree (DT) and Support Vector Machine (SVM). The results show that each technique has its own advantages in achieving the objectives of heart disease diagnosis with high accuracy. The proposed system can solve complex what-if queries that were not possible for traditional decision support systems. The likelihood of patients getting heart disease is determined using different attributes such as age, sex, trestbps, chol, thalach, oldpeak etc. The significant knowledge related to heart disease can be established from these medical factors such as identification of patterns and relationships between different attributes for a certain type of disease. A. Research Objective The main objective of proposed research is to develop an intelligent heart disease diagnosis and prediction system using a novel ensemble framework based on heterogeneous classifiers namely NB, DT based on Gini Index (DT-Gini) and SVM. These classifiers are combined using majority vote based scheme. The proposed framework performs better than individual classifiers in heart disease dataset and proves high prediction accuracy and reliable performance. The proposed system uses UCI benchmark heart disease dataset to identify, discover and extract the hidden knowledge associated with heart disease in a patient. It can differentiate between healthy and sick individuals with high accuracy. The proposed clinical decision support framework can help medical practitioners to make intelligent medical decisions more accurately that were not possible with traditional decision support systems. Moreover, the treatment cost can also be reduced by providing effective treatment. The proposed framework presents a novel combination of classifiers that has not been explored before.
259
International Conference on Information Society (i-Society 2014)
The rest of paper is organized as follows: section 2 is related with dataset information, section 3 describes the related work, section 4 covers the proposed framework and section 5 describes proposed technique. Experiments & results are given in section 6 and finally conclusion and future work is shown in section 7. II.
DATASET INFORMATION
Standard heart disease dataset from UCI repository [5] has been used for training and testing purpose. The database contains 76 attributes, but we have used only 14 of them in order to obtain the accurate results using less number of feature space. Table 1 shows the selected heart disease dataset attributes. The dataset contains a total of 303 instances, of which 164 are healthy and 139 have heart disease. 297 rows are complete whereas six rows have missing values that are represented by -9 and replaced by the attribute mean. No outliers have been detected since the data has already been processed by the respective publishers. Some of the important computations are given in table 2. TABLE I. HEART D ISEASE DATASET INFORMATION
Chen, A.H et al. [6] proposed a heart disease prediction system (HDPS) for identifying heart disease in patients more efficiently and accurately. A predictive model is used to diagnose the disease in combination with data and knowledge. Statistics and machine learning are two main approaches that are used in proposed framework. The algorithm has three steps; data selection, artificial neural network model for classification and user friendly HDPS. Learning Vector Quantization (LVQ) is then applied on data set for training. ROC curve is used to check the accuracy of results. Finally, the accuracy of predicted results is 80%. There is no description of the proposed framework. The paper applies the Linear Vector Quantization (LVQ) technique but it does not mention any implementation details. Also, there is no comparison of this research with the similar past research to clearly identify it’s superiority over other techniques. Jabbar et al. [7] used an associative classification algorithm for heart disease prediction. The algorithm uses genetic approach for prediction which results in higher accuracy and interestingness. First, an associative classification is used to classify the dataset with labeled classes and rules are collected from the training dataset. These rules are then organized in a form to construct a classifier. The genetic algorithm solves the optimization problems efficiently. Selection, crossover and mutation are three steps that are used as genetic operators for generating new strings. The proposed algorithm is presented in a list of steps which require further details. Shadab et al. [8] present a heart disease prediction system in the form of a web based application. The proposed algorithm is based on Naïve Bayes to predict hidden patterns in given dataset. The results of complex queries are displayed in tabular as well as PDF form. The advantage of using Naïve Bayes algorithm is that it considers only small amount of dataset for training and uses independent attributes for prediction and occurrence of a particular disease. Das, R., Turkoglu b.I., Sengur, A. [9] proposed a heart disease prediction system using Naïve Bayes algorithm. This research paper has proposed an ensemble based method (neural networks) using SAS software for heart disease diagnosis. Ensemble method combines the individual neural networks used to train on same tasks and results in increase of generalization. The model is based on posterior probabilities (predicted values) from multiple predecessors. The neural network components are multilayer feed forward independent networks used to classify the feature space. The paper does not present results in a detailed manner. Also, there is no comparison of this research with the similar past research to clearly identify it’s superiority over other techniques.
A brief overview of some related work is presented in Table 3.
978-1-908320-38/4/$25.00©2014 IEEE
Sotelsek-Margalef,A., Villena-Román, J., [10] proposed an expert system named MIDAS for medical diagnosis from patient’s records. The system is based on information extraction and machine learning approaches which uses previously diagnosed patient’s record histories for future diagnosis. The Natural Language Processing (NLP) techniques have been used for text classification. The MIDAS is considered as first expert system in history used for infected blood disease diagnosis. The data is obtained from the Cincinnati Children’s Hospital and contains well represented
260
International Conference on Information Society (i-Society 2014)
classes. The Weka tool is used in combination with C4.5 decision tree algorithm and k-Nearest-Neighbor classifier. The results show a great accuracy towards medical disease diagnosis. The evaluation of proposed framework is done on the basis of ICD-code prediction accuracy. Aronsky,D., MD, Haug, P.J., MD [11] presented a real time diagnostic system based on Bayesian network which detects the patients with guidelines of pneumonia. The proposed system has two main components; diagnostic and management component. The diagnostic component is based on Bayesian network and identifies the probability of pneumonia in patients whereas management component evaluates the severity index. The system proved to be very helpful for physicians with the behavioral bottleneck; that is necessity of patient’s hospitalization. The paper describes that the proposed system computes and updates results after every five minutes; however, there is no detail as to how these results were computed. TABLE II.
DATA MINING APPROACHES IN HEALTHCARE
classification of CAD. Multilayer feed-forward neural network approach and back propagation algorithm is used to analyze the Bull’s eye images. The neural network is trained with images data and desired output data. The results show that automated diagnosis is better than radiology resident but worse than experienced radiologist. There are no details of the learning, recognition and diagnosis steps as presented in the proposed architecture. Lisboa, P.J., Taktak, A.F.G, [13] proposed a systematic review on significance of artificial neural networks (ANN) in the field of cancer. This review also draws a conclusion about study design to improve the studying being followed-up in future research. It is also clear from the survey that there is a need of more extensive applications for rigorous methodologies. There is a need of decision support to move away from simple database queries to more complex systems involving neural networks. The clinical systems will be shifted from advising to informing the clinicians by inferring the risk of particular disease in the form of colored bar showing the certain population would expect to have that disease. Covit, A.B., Familant, M.E., Covit, S., [14] proposed a software module or computer engine that automatically assign the medical code such as ICD codes (ICD 9 or ICD 10 and other codes as well) to unformatted medical reports, discharge sheets and medical notes etc. The input to system is a document, the system reads and accesses it with the diagnosis associated with the codes. It identifies the diagnosis as well as the language associated with the medical diagnosis. The system then decides whether to assign the ICD code or not to that particular document on the basis of syntactic and semantic usage. This method is used to apply the codes to different mediums such as database entries, attachments to the documents, emails sent to different owners, electronic and paper forms etc. The paper is more focused towards the flow of the application than the proposed research. In the light of success for different data mining techniques, and specifically ensemble techniques, it is very beneficial to consider ensemble techniques for the disease diagnosis and prediction. Therefore, we have proposed an ensemble framework based on majority voting scheme that combines individual classifiers and achieves higher accuracy for heart disease diagnosis. III.
PROPOSED FRAMEWORK
The proposed framework is based on a novel combination of three heterogeneous classifiers: NB, DT-GI and SVM. These classifiers are combined using majority voting scheme [23].
Fujita,H., Katafuchi,T., Uehara,T., Nishimura,T., [12] proposed a system that aids radiologists in detection and
978-1-908320-38/4/$25.00©2014 IEEE
A. Explanation of base classifiers The key feature of classifier is to perform mapping from feature space X (discrete or continuous) to set of labels Y. There are wide applications of classifiers ranging from medicine, finance, mobile phone, computer vision, voice recognition, biomedical and data mining. 1) Decision tree using Gini index Decision tree is like a flowchart where every non leaf node is test on an attribute, every branch of node represents the outcome of test and every leaf node is class label. Root node
261
International Conference on Information Society (i-Society 2014)
represents all data at start [24]. Decision tree classifier does not require any domain knowledge and uses tree like graph. It calculates the conditional probabilities for research analysis and chooses the best alternative traversing from root to leaf and indicates unique class separation [25]. It can be used in medical field for disease classification and prediction. Attribute with lowest Gini index is used for rule generation [26].
[p(i, t)]
Gini(t)= 1 −
where p(i,t) represents the probability of class i in node t and c-1 shows total number of classes. The splitting criterion is based on crisp set of rules generated from decision tree using Gini index. 2) Naïve Bayes Classifier The Naïve Bayes classifier focuses on the rule that presence or absence of a disease depends on a feature itself. It assumes that features are independent of each other [27]. Supervised learning algorithm can be used to train the probability model of Naïve Bayes classifier [28]. The classification decisions of Naïve Bayes are quite good despite of the fact that its probability estimates are of low quality [29]. Following Bayesian rule is used to calculate the class probability for a given dataset:
P(C |X) = P(C ) ×
( |
)
( )
where X is an instance that needs to be classified and Ck is respective class. P(Ck|X) represents the probability of vector X belongs to class Ck. The direct estimation of P(Ck|X) is not possible due to sparseness of data. Therefore, P(X|Ck) can be decomposed and calculated as follows:
P(X) =
P(X |C)
where Xj denotes the jth element for vextor X. Combining equation (1) and (2), we obtain:
P(C |X) = P(C ) ×
(
|
)
( )
A small amount of dataset is required for training and data estimation such as central tendency or spread of input parameters. Due to attribute independence, only the attributes of given class are required instead of entire convenience matrix. 3) Support Vector Machine The basic concept of SVM is based on statistical learning theory. Initially SVM was designed for binary data classification but they can also be extended for multiclass problems [30]. SVM classifier creates a hyper plane or set of hyper planes that can be used in high dimensional space for classification and regression analysis. Kernel functions are used for nonlinear mapping of training samples in high dimensional space.
978-1-908320-38/4/$25.00©2014 IEEE
Different kernel functions, such as polynomial, Gaussian and sigmoid etc, are used for mapping and maximizing the separation between data points. Following classification rule is used for the SVM classifier [30]:
Sgn(f(x, w, b)) f(x, w, b) =< . > + where (w,b) represents a complex problem and x shows an example to be classifier. The ultimate purpose is to minimize ||w|| having set of constraints. y (< . x > + ) ≥ 1 We have used heart disease dataset attributes that classifies the data in two classes; healthy individuals and heart disease patients. IV.
PROPOSED TECHNIQUE
The proposed technique is based on majority vote based classifier ensemble. It involves data acquisition, preprocessing and then classifier training. Trained classifiers are then used to classify heart disease data and majority scheme is then used to obtain high accuracy results to obtain the consensus between different alternatives. The proposed framework is shown in Fig 2. Data acquisition involves obtaining heart disease data from UCI repository and then performs data partition and variable selection. Data partition step divides the data into training set and test set. Pre-processing is then applied on training data. It involves missing value replacement, outlier detection and removal, feature selection and class label identification. A. Proposed Majority vote based Ensemble The proposed ensemble approach is divided into two main steps. First, generate individual classifier’s decision for training set and second, combine decisions to obtain a new model based on majority voting scheme. Selected parameters from heart disease dataset are used to train the individual classifiers. Let N denotes number of individual classifiers represented by C1………CN and M represents number of output classes. In this research, there are two classes; healthy and sick representing N=3 and M=2. The ensemble classifier can be defined as: Find the vector V, denoted by boolean array, that represents binary vote based ensemble. The size of V is N*M. V(i,j) for the boolean array shows the decision that ith classifier has voted for jth class or not. V(i,j)=1 if ith classifier has voted for jth class and 0 otherwise. As a result, if 2 out of 3 classifiers voted for same class then the final decision of ensemble will be that class. The main focus of proposed ensemble framework is to combine several heterogeneous classifiers that differ in their properties and results. The training process is then changed and makes a classifier model that has accurate classifier decisions. Following steps are used for disease prediction using the proposed technique. STEP 1: The first step in the proposed technique is data cleaning. The outliers are detected and removed, missing attribute values are replaced with attribute mean, and feature selection is performed to obtain reduced set of attributes.
262
International Conference on Information Society (i-Society 2014)
STEP 2: The next step is to divide the data into training set and test set. STEP 3: NB classifier is then trained using training set and constructs Naïve Bayes classifier (NBC). The heart disease data is given to classifier for training. We have used 14 attributes for training purpose. STEP 4: DT training is then performed to construct Decision Tree classifier (DTC) model based on Gini index. Crisp set of rules are generated as a result of tree construction. These rules are used to classify heart disease data into healthy and sick classes. STEP 5: SVM training is then performed to construct SVM classifier. The trained classifier can classify heart disease data into binary classes named healthy and sick. STEP 6: The testing data is then used to check the performance of trained classifiers. The ensemble architecture using NBC, DTC and SVM classifiers is used to classify test data. The data is classified into two classes; patients’ having/not having heart disease (1=Yes, 0=No). STEP 7: The testing data is fed to each classifier for classification. Each of the three classifiers of the ensemble architecture results in class 0 (healthy patient) or class 1 (patient with heart disease). The final decision is made on majority voting of each classifier’s decision with respect to their output class. STEP 8: The testing tuple is assigned to the class which has highest voting result (two classifiers having same result).
STEP 9: Find accuracy, sensitivity and specificity of proposed ensemble framework. V.
We have used three classifiers and trained them using heart disease dataset to classify them as healthy (0) or sick (1). The accuracy, sensitivity and specificity of the classifiers are measured to evaluate the performance of proposed ensemble framework with individual classifiers. Sensitivity indicates the number of persons that are correctly classified healthy in the dataset whereas specificity indicates the proportion of patients that are correctly classified as sick. Mathematically: Sensitivity =
True Positives True Positives + False Negatives
Speci icity =
True Negatives True Negatives + False Positives
Accuracy measures the proportion of correct predictions made by proposed framework against actual class label for test data. Mathematically: Accuracy =
Data partition
Variable selection
TABLE III. Decision tree classifier
True Positives + True Negatives True Pos + False Pos + True Neg + False Neg
Decision tree generates crisp rules that are used to classify data into healthy or sick individuals whereas Naïve Bayes and SVM are first used to train the classifier and then trained classifiers classify test data into two classes. Naïve Bayes and Decision tree classifiers were fed with dataset of 13 attributes whereas SVM classifier was fed with only 2 attribute values (with highest Information Gain). Table 3 indicates confusion matrices whereas results of sensitivity, specificity and accuracy for proposed ensemble classifier and three individual classifiers are given in Table 4. The graphical comparison of proposed ensemble framework with individual classifiers is shown in Fig 3. It is clear from the comparison that proposed ensemble framework has high accuracy, sensitivity and specificity values for heart disease dataset
Heart database
Naïve Bayes classifier
EXPERIMENTS AND RESULTS
Support vector machine classifier
CONFUSION MATRIX OF PROPOSED ENSEMBLE TECHNIQUE WITH OTHER TECHNIQUES
Classifier Naïve Bayes
Majority voting
Decision Tree SVM
Result
Fig 2. Proposed framework for heart disease prediction
978-1-908320-38/4/$25.00©2014 IEEE
Proposed Framework
Class
Healthy
Sick
Healthy
119
9 120
Sick
55
Healthy
110
18
Sick
65
110
Healthy
101
27
Sick
46
129
Healthy
120
8
Sick
46
129
263
International Conference on Information Society (i-Society 2014) TABLE IV.
COMPARISON OF PROPOSED ENSEMBLE TECHNIQUE Accuracy
Sensitivity
Specificity
Naïve Bayes
78.79%
68.42%
92.86%
Decision Tree
72.73%
63.16%
85.71%
SVM
75.76%
73.68%
78.57%
Proposed
81.82%
73.68%
92.86%
Results Percentage (%)
Classifiers
100.00% 80.00% 60.00% 40.00% 20.00% 0.00% Accuracy Naïve Bayes SVM
Sensitivity Specificity
Evaluation Measures Decision Tree Proposed Ensemble
Fig 3. Graphical comparison of proposed ensemble technique VI.
CONCLUSION AND FUTURE WORK
The objective of the proposed research is to make more accurate prediction of heart disease for a patient. Subsequently, three classifiers like Naïve Bayes, DT-GI and SVM are used to predict the heart disease of a patient given dataset of 13 attributes. Inconsistencies and missing values were also resolved before the model construction. Moreover, the prediction of heart disease is also computed for the proposed Ensemble technique using majority vote based technique. Observations exhibit that the accuracy of the proposed ensemble technique is much higher than the rest of techniques. The technique can be extended to identify the intensity of heart disease. Fuzzy learning models can be applied to predict the intensity of cardiac disease. Moreover, same framework can be used for multidisease prediction such as diabetes, breast cancer and liver disease diagnosis. REFERENCES [1]. Rajkumar, A. and Reena, G.S.: Diagnosis of Heart Disease Using Datamining Algorithm. In: Global Journal of Computer Science and Technology, Vol. 10 (2010). [2]. Mrs. Subbalakshmi, G.: Decision Support in Heart Disease Prediction System using Naive Bayes. Indian Journal of Computer Science and Engineering. [3]. Thuraisingham, BM.: A Primer for Understanding and applying data mining. IT Professional. Pp 28-31. (2000) [4]. Palaniappan, S., Awang, R.: Intelligent Heart Disease Prediction System Using Data Mining Techniques. 978-1-4244-1968-5/08/ ©IEEE (2008) [5]. http://archive.ics.uci.edu/ml/datasets/Heart+Disease. (last accessed: 11th June 2014) [6]. Chen, A.H., Huang, S.Y., Hong, P.S., Cheng, C.H., Lin, E.J., HDPS: Heart Disease Prediction System, Computing in Cardiology, (2011) [7]. Jabbar, M.A., Chandra, P., Deekshatulu B.L., Heart Disease Prediction System using Associative Classification and Genetic Algorithm, International Conference on Emerging Trends in Electrical, Electronics and Communication Technologies-ICECIT, (2012) [8]. Pattekari S.A., Parveen, A., Prediction System for Heart Disease Using Naive Bayes, International Journal of Advanced Computer and
978-1-908320-38/4/$25.00©2014 IEEE
Mathematical Sciences, ISSN 2230-9624. Vol 3, Issue 3, 2012, pp 290294 [9]. Das, R., Turkoglu b.I., Sengur, A., Effective diagnosis of heart disease through neural networks ensembles, Expert Systems with Applications 36 (2009) 7675–7680 [10]. Sotelsek-Margalef,A., Villena-Román, J., MIDAS: An InformationExtraction Approach to Medical Text Classification, Procesamiento del lenguaje Natural, (2008), pp. 97-104 [11]. Aronsky,D., MD, Haug, P.J., MD, Automatic Identification of Patients Eligible for a Pneumonia Guideline, Dept. of Medical Informatics, LDS Hospital, University of Utah, Salt Lake City, Utah, 1067-5027, 2000, AMIA. [12]. Fujita,H., Katafuchi,T., Uehara,T., Nishimura,T., Application of Artificial Neural Network to Computer-Aided Diagnosis of Coronary Artery Disease in Myocardial SPECT Bull's-eye Images, The Journal of Nuclear Medicine, Vol. 33 No. 2, February 1992 [13]. Lisboa, P.J., Taktak, A.F.G, The use of artificial neural networks in decision support in cancer: A systematic review, Science Direct, Neural Networks 19 (2006) 408–415 [14]. Covit, A.B., Familant, M.E., Covit, S., System and method for automatic assignment of medical codes to unformatted data, United states patent application publication, 11/106, 817, Apr.15, 2005 [15]. H. Yan, “Development of a decision support system for heart disease diagnosis using multilayer perceptron”, Proceedings of the 2003 International Symposium, vol. 5, (2003), pp. V-709- V-712. [16]. Andreeva, “Data Modelling and Specific Rule Generation via Data Mining Techniques”, International Conference on Computer Systems and Technologies - CompSysTech, (2006). [17]. A. Hara and T. Ichimura, “Data Mining by Soft Computing Methods for the Coronary Heart Disease Database”, Fourth International Workshop on Computational Intelligence & Application, IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, (2008) December 10-11. [18]. V. A. Sitar-Taut, “Using machine learning algorithms in cardiovascular disease risk evaluation”, Journal of Applied Computer Science & Mathematics, (2009). [19]. C. L. Chang and C. H. Chen, “Applying decision tree and neural network to increase quality of dermatologic diagnosis”, Expert Systems with Applications, Elsevier, vol. 36, (2009), pp. 4035-4041. [20]. K. Srinivas, B. K. Rani and A. Govrdhan, “Applications of Data Mining Techniques in Healthcare and Prediction of Heart Attacks”, International Journal on Computer Science and Engineering (IJCSE), vol. 02, no. 02, (2010), pp. 250-255. [21]. Y. Kangwanariyakul, C. Nantasenamat, T. Tantimongcolwat and T. Naenna, “Data Mining of Magneto cardiograms For Prediction of Ischemic Heart Disease”, EXCLI Journal, (2010). [22]. M. J. Abdi and D. Giveki, “Automatic detection of erythematosquamous diseases using PSO–SVM based on association rules”, Engineering Applications of Artificial Intelligence, vol. 26, (2013), pp. 603-608. [23]. Shouman , M., Turner, T., Stocker, R.: Applying k-Nearest Neighbour in Diagnosing Heart Disease Patients. In: International Journal of Information and Education Technology, Vol. 2, No. 3 (2012). [24]. Goharian & Grossman, Data Mining Classification, Illinois Institute of Technology, http://ir.iit.edu/~nazli/cs422/CS422-Slides/DMClassification.pdf, (2003). [25]. Apte & S.M. Weiss, Data Mining with Decision Trees and Decision Rules, T.J. Watson Research Center, http://www.research.ibm.com/dar/papers/pdf/fgcsaptewe issue_with_cover.pdf, (1997). [26]. Shouman, M., Turner, T., Stocker, R.: Using Decision Tree for Diagnosing Heart Disease Patients, In: Proceedings of the 9th Australasian Data Mining Conference, Ballarat, Australia (2011). [27]. Sotelsek-Margalef, A., Villena-Román, J.,: MIDAS: An InformationExtraction Approach to Medical Text Classification. In: Procesamientodellenguaje Natural, pp. 97-104 (2008). [28]. Zhang, H., The optimality of Naïve Bayes. American association of artificial intelligence, (2004) [29]. Manning, D., Raghavan, P., Schutze,H.: Introduction to Information retrieval, Cambridge university, (2008) [30]. Wang, S., Mathew, A., Chen, Y., Xi, L., Ma, L., Lee, J.: Empirical Analysis of Support Vector Machine Ensemble Classifiers.In: Expert Systems with applications, (2009)
264