Application of Machine Learning Algorithms to a Well ...

3 downloads 0 Views 136KB Size Report
Clinical Problem: Liver Disease. Sakshi Takkar, Lovely Professional University, Phagwara, India. Aman Singh, Lovely Professional University, Phagwara, India.
International Journal of E-Health and Medical Communications Volume 8 • Issue 4 • October-December 2017

Application of Machine Learning Algorithms to a Well Defined Clinical Problem: Liver Disease Sakshi Takkar, Lovely Professional University, Phagwara, India Aman Singh, Lovely Professional University, Phagwara, India Babita Pandey, Lovely Professional University, Phagwara, India

ABSTRACT Liver diseases represent a major health burden worldwide. Machine learning (ML) algorithms have been extensively used to diagnose liver disease. This study accordingly aims to employ various individual and integrated ML algorithms on distinct liver disease datasets for evaluating the diagnostic performances, to integrate dimensionality reduction method with the ML algorithms for analyzing variation in results, to find the best classification model and to analyze the merits and demerits of these algorithms. KNN and PCA-KNN emerged to be the top individual and integrated models. The study also concluded that one specific algorithm can’t show best results for all types of datasets and integrated models not always perform better than the individuals. It is observed that no algorithm is perfect and performance of an algorithm totally depends on the dataset type and structure, its number of observations, its dimensions and the decision boundary. Keywords Classification, Discriminant Analysis, Feature Extraction, Integrated Models, KNN, Liver Diagnosis, PCA, SVM

INTRODUCTION Liver is largest internal organ of the body. It plays a significant role in transfer of blood throughout our body. The levels of most chemicals in our blood are regulated by the liver. It helps in metabolism of the alcohol, drugs and destroys toxic substances. Liver can be infected by parasites, viruses which cause inflammation and diminish its function (Pandey & Singh, 2014). It has the potential to maintain the customary function, even when a part of it is damaged. However, it is important to diagnose liver disease early which can increase the patient’s survival rate. Expert physicians are required for various examination tests to diagnose the liver disease, but it cannot assure the correct diagnosis. Computer-aided diagnosis is needed for correct prediction of liver disease and it also helps to deal with tremendous and cumbersome data. Research interest is growing in the field of ML and knowledge discovery in order to traverse knowledge in detailed volume. Data stored in databases contains valuable hidden knowledge which helps to enhance decision making. Supervised classification is one of the main methods to extract knowledge from databases where set of training examples are known previously (Dankerl et al., 2013; Kumar, Moni, & Rajeesh, 2013) . Actually, Classification is a dual process which consists two phases. One is Training phase where with the help of classifier algorithm, training dataset trains the classifier. The other is Testing phase where testing of classifier is done to analyze its performance using different samples of the test set. Prediction accuracy is a DOI: 10.4018/IJEHMC.2017100103  Copyright © 2017, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 

38

International Journal of E-Health and Medical Communications Volume 8 • Issue 4 • October-December 2017

criterion to evaluate the performance of classifier. Classification accuracy describes the percentage of instances which are correctly classified. Various classification algorithms are there which include SVM, discriminant analysis and nearest neighbor algorithms etc. These classification algorithms are applied on different small or large medical datasets. The task of learning from scanty datasets is an arduous task. Some datasets contain too many attributes but to select an adequate subset of attributes or features is a significant question. To select an effective subset of attributes, two dimension reduction techniques are there – one technique is to reduce the dimensions by selecting relevant features from the existing features and is known as feature selection. The other one is feature extraction where a set of new reduced features is designed based on some transformation function (Guyon & Elisseeff, 2006; Jenke, Peer, & Buss, 2014). These techniques may be supervised or unsupervised and it depends on whether they use the output information or not. One of the optimum and extensively used feature extraction methods is Principal Component Analysis (PCA). PCA is a learning that is unsupervised as it does not utilize the output information. In this number of features are decreased for effective data representation by abandoning the linear combinations that have small variances and contain only those that have large variances. This method transforms the existing or original n coordinates orthogonally into new n coordinates’ set called as principal components (Bro & Smilde, 2014). As an outcome of transformation, the first principal component has the greatest possible variance. Also, there is orthogonality between each subsequent component and the pioneering component. In most previous researches, different algorithms are applied on liver datasets to find the best algorithm for accurate diagnoses of disease. The aim of this study is to employ different classification algorithms on various liver datasets to determine the applications of algorithms, to integrate PCA approach in order to analyze the variation in results and to find the best proposed method. The remainder of this paper is structured as follows. In Section 2 previous studies on the liver disease diagnosis using classification algorithms are reviewed. Section 3 describes the detailed procedure of all algorithms. Section 4 represents the results and discussion which have been done on three liver datasets and Section 5 recapitulates the paper with brief conclusions. LITERATURE REVIEW A lot of researchers have implemented various ML approaches in order to predict the presence of liver disease. Branch et al. (Branch, Azad, Branch, & Azad, 2015) performed different classification algorithms on ILPD-Indian Liver Patient Dataset and BUPA liver disorder dataset from UCI machine learning repository. The operation of preprocessing and creation of predictive model is done by RAPIDMINER software. In ILPD dataset, back up arrow machine model and in BUPA dataset, back up arrow machine model and regression model have the highest prediction accuracy. On the other hand, Gulia et al. (Gulia, Vohra, & Rani, 2014) performed data classification on same ILPD dataset. The classification algorithms considered here are J-48, multilayer perceptron, SVM, random forest and bayesian network. First, simple classification algorithms are applied on dataset, after then relevant subset of features is selected on which classification algorithms are applied and in last comparative analysis of both datasets is done. The classification algorithms C4.5 decision tree, naïve bayes classification (NBC), back propagation, SVM and KNN have been considered by Ramana et al. (Ramana, Surendra, Babu, & Venkateswarlu, 2011) where performances of two datasets: Andhra Pradesh (AP) liver dataset and BUPA liver disorder dataset have been compared on the basis of accuracy, precision, sensitivity and specificity using Weka data mining open source software. Accuracy for AP dataset is high as

39

21 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the product's webpage: www.igi-global.com/article/application-of-machine-learningalgorithms-to-a-well-defined-clinical-problem-liverdisease/187055?camid=4v1

This title is available in InfoSci-Journals, InfoSci-Select, InfoSci-Journal Disciplines Medicine, Healthcare, and Life Science, InfoSci-Healthcare Administration, Clinical Practice, and Bioinformatics eJournal Collection. Recommend this product to your librarian: www.igi-global.com/e-resources/libraryrecommendation/?id=2

Related Content RFID Applications in E-Healthcare Mohamed K. Watfa, Manprabhjot Kaur and Rashida Firoz Daruwala (2013). UserDriven Healthcare: Concepts, Methodologies, Tools, and Applications (pp. 259-287).

www.igi-global.com/chapter/rfid-applications-healthcare/73840?camid=4v1a An Autonomous Intelligent System for the Private Outdoors Monitoring of People with Mild Cognitive Impairments Antoni Martínez-Ballesté, Frederic Borràs Budesca and Agustí Solanas (2016). EHealth and Telemedicine: Concepts, Methodologies, Tools, and Applications (pp. 693-708).

www.igi-global.com/chapter/an-autonomous-intelligent-system-for-theprivate-outdoors-monitoring-of-people-with-mild-cognitiveimpairments/138426?camid=4v1a

Proposal for Interactive Anonymization of Electronic Medical Records Carlos Andrés Moque Millán, Alexandra Pomares Quimbaya and Rafael A. Gonzalez (2013). Information Systems and Technologies for Enhancing Health and Social Care (pp. 166-177).

www.igi-global.com/chapter/proposal-interactive-anonymization-electronicmedical/75627?camid=4v1a A Novel Graphical-Oriented Framework for Capturing Data within Clinical Information Systems Farzad Jahedi, Mehran Maghsoudloo and Medhi Amirchakhmaghi (2013). International Journal of Healthcare Information Systems and Informatics (pp. 28-40).

www.igi-global.com/article/a-novel-graphical-oriented-framework-forcapturing-data-within-clinical-information-systems/78929?camid=4v1a

Suggest Documents