Master of Science in Epidemiology, staff of Namazi Hospital, Shiraz, Iran. 5. ... high rates of colon cancer and the benefits of data mining to predict survival, in this ...
A Rule Based Classification Model to Predict Colon Cancer Survival Reza Abbasi1, Mitra Montazeri2, 3, Mohammad Zare4, Majid Amiri gharghani5 1.
Master of Science in Health Information Technology, Researcher of Medical Informatics Research Center, Department of Health Information Technology, School of Management and Information, Scientific Research Center, Kerman University of Medical Sciences, Kerman, Iran
2.
Medical Informatics Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran
3.
Computer Engineering Department, Shahid Bahonar University, Kerman, Iran
4.
Master of Science in Epidemiology, staff of Namazi Hospital, Shiraz, Iran
5.
Master of Science in environmental health, Department of environmental of health, Faculty of Health, Kerman University of Medical Sciences, Kerman, Iran.
Abstract Introduction: Colon cancer is the second most common cancer in the world and fourth most common cancer in both sexes in IRAN, whose % 8.12 of all cancers in the covers. Predict the outcome of cancer and basic clinical data about it is very important. Data mining techniques can be used to predict cancer outcome. In our country, data mining studies on colon cancer, not covered as lung or breast cancers. It seems can be with identify factors influencing on survival and modify them, increased survival of colon cancer patients. Then according to high rates of colon cancer and the benefits of data mining to predict survival, in this study examined factors influencing on the survival of these patients. Methods: We use a dataset with four attributes that include the records of 570 patients in which 327 Patients (57.4%) and 243 (42.6%) patients were males and females respectively. Trees Random Forest (TRF), AdaBoost (AD), RBF Network (RBFN), and Multilayer Perceptron (MLP) machine learning techniques with 10-cross fold technique were used with the proposed model for the prediction of colon cancer survival. The performance of machine learning techniques were evaluated with accuracy, precision, sensitivity, specificity, and area under ROC curve. Results: Out of 570 patients, 338 patients and 232 patients were alive and dead respectively. In this Study, at first sight it seems that among this techniques, Trees Random Forest (TRF) technique showed better results in comparison to other techniques (AD, RBFN and MLP). The accuracy, sensitivity, specificity and the area under ROC curve of TRF are 0.76, 0.808, 0.70 and 0.83 respectively. Conclusions: in this study seems that Trees Random Forest model (TRF) which is a rule based classification model was the best model with the highest level of accuracy. Therefore, this model is recommended as a useful tool for colon cancer survival prediction as well as medical decision making.
Keywords: Colon Cancer, Survival Prediction, Data Mining