Comparative Study of Classification Techniques Used in Skin Lesion Detection Systems Uzma Jamil * , Shehzad Khalid + Department of Computer Engineering, Bahria University, Islamabad, Pakistan *,+ * Government College University, Faisalabad, Pakistan * + Email: [email protected], [email protected]

Abstract--The major branch of data mining used to assign raw data to a particular group is classification. It is a method used to forecast the group association for data objects. Medical Imaging is dealing with the designing of automated systems to help physician diagnosis. In this paper, we present the comprehensive study of some classification techniques used in Medical Imaging. Several types of classification methods including Support Vector Machine, Bayesian networks, Neural networks, k-nearest neighbor classifier, and fuzzy logic techniques are used for this purpose. This study is providing a wide-ranging review of classification techniques used in medical diagnostic systems like diabetic retinopathy, foot ulcer and other related to medical field.

I. INTRODUCTION Machine learning is one of the leading branch of artificial intelligence, which is about analyzing and examining with the help of some available data. It Predicts the unknown value, its properties, performs feature extraction and relating it with some known available data. Data mining is also the term used to extract and discover a piece of information from the raw data. These two major fields get intersected at the point of data extraction and analyses. As data mining utilizes the methodologies of machine learning for the information exploration and elicitation. Data mining is used for digging the data deep down and build a valuable piece of imminent knowledge. Data Mining uses three major components of machine learning for its task performing that is 1) Classification, 2) Clustering and 3) Association. Our skin is the largest human organ that serves as a shield that protects us from pressure and stroke, heat or cold, the sun´s UV-radiation and bacteria. Among many other reasons, one of the main reasons for the increase in skin cancer among the general population is the changes in behavior and attitude in relation to sun exposure. Lesions (Cancers) is primarily caused by UV-damaged melanocytes in our skin. These cells then grow out of control and can spread to other parts of the body. To identify lesions, knowledgeable dermatologists rely initially on standard approaches of


pattern recognition, second on history of the disease, and later laboratory parameters. Physicians use clinical [1, 2] ABCD, Menzies’s method, 7-point checklist and various classification methods to diagnose and classify the data objects. So that it is a big desire to establish a system that start working with a digital image to lesion detection after passing through a series of steps. One of the most recent techniques for non invasive diagnosis of skin cancer is digital dermoscopy based on epiluminescent microscopy principle. It allows the evaluation of high resolution dermoscopic images of skin lesions, using a color video camera equipped with lenses providing high magnification factors and better visualization of the subsurface structures. Advantages of the digital dermoscopy are the mole mapping, a good identification of different and subtle morphological structures (pigmented network, dots, globules, bluewhite veils, blotches and streaks), a long-term observation and documentation of melanocytic lesions, independence of the investigator, reducing errors and allowing screening differentiate between difficult to diagnose lesions and benign lesions. Due to higher quality and diagnostic accuracy, compared to simply using the clinic view, dermoscopy has become the primary tool for skin lesion diagnosis [3]. The idea of using a computer assisted tool in order to assist in skin lesion diagnosis was proposed for the first time around 1985. This approach involves, as a first step, the lesion area detection, followed by an automatic features extraction step; based on these features, a classification and a diagnosis step follow. Over time, automatic acquisition, reference and digital storage of images (e.g.: PhotoFinder and MoleMax) have been developed, but embedded software and applications performance are still low. Nowadays, numerous methods have been developed for the automatic lesion detection in dermatoscopy images. Recent approaches consist in applying fuzzy C means clustering algorithms [4-5], gradient flow snakes methods [6], thresholding followed by region growing approaches [7], statistical region merging techniques [8], contrast enhancement combined with k-means _____________________________________________

Fig. 1. Topology of Expository literature on Automated Lesion System Classifiers.

Classification is a task to categorize the test sample using some already available trained data and with the help of using supervised (well known & well defined) algorithms to categorizing and finding the class of the unknown data value. Different classifying techniques are used for this purpose. In the next sections, this paper will focus and provide critical view about the following techniques: K-Nearest Neighbor, Naïve Bayesian Algorithm, Support Vector Machine, Neural networks classifier, M-mediod Modeling. Fig. 1 is presenting the topology description of steps of an automated lesion detection system and classifiers used in the phase of classification by different researchers. II. K-NEAREST NEIGHBOR K-Nearest Neighbor is also known as “LAZY Learning Algorithm”. The algorithm has the simplest

Fig. 2. Chart presenting the %age results & comparison of Heart-1 Dataset - The Utility of Feature Weighting in Nearest-Neighbor Algorithms.

KNN Classifier give good results only if the proper features are used. As shown in Table I and in Fig.2, if KNN algorithm is soup up with some additional properties, it gives remarkable good results. KNN fails when is a subset of irrelevant features.



This algorithm allows the repossession and understanding of those cases that are in hand. This point is very helpful for dermatologists to directly make comparison of unknown lesions with other lesion classes. III. BAYESIAN ALGORITHM Naive and simple Bayesian Algorithm is a feature based probability model. The algorithm working depend upon the probability of the instance and all the features are independent. Using the Bayesian theorem, To check the training data, compute the previous probabilities P(Ci ) of each class, multiply this with likelihood i.e. P(X/Ci ), then dividing it with the evidence sum P(X) of all classes. The maximum valued class later is assigned to the unknown sample data. The algorithm is fast and accurate as compare to KNN. It is not sensitive towards the garbage and irrelevant features. This is applicable for real as well as discrete data. Strong to inaccessible points and handle the missing values. It uses Gaussian standard deviation values like mean and standard deviation. The issues with Bayesian algorithm are the liberty assumption of classes, the zero probability problems and the laplacian estimator [14, 15, 17]. Naive Bays Algorithm is good for any sort of features. This ability is helpful in making comparison of unknown lesion and known lesion classes. This algorithm is more conservative in estimating the performance. It avoids over-fitting and handles sampling error from different lesion datasets. IV. SUPPORT VECTOR MACHINE Support vector machines (SVM) are algorithms that evaluate data and identify patterns hidden in that data by applying some rules. The major concept of SVM is that it takes input and estimates, for whole


data set individually, which of two possible given input groups builds the final output. For this reason it is famous as non-linear classifier. This efficient algorithm constructs a model that assigns every new data element into its group. Algorithm provide a space to mention data as points and these points are grouped so that the given classes are divided by a clear gap that is as wide as possible. Using Kernel concept, SVMs are very good for non linear classification . New data points are then mapped into that same space to predict a category based on which side of the gap they fall on[1]. Grouping data into its particular class is a common practice that is used in medical imaging and many other fields and areas. This is the basic goal of this classifier to assign that data to its relevant group to which it belongs to. There are many different hyper planes that are used to classify the data. One logical choice to select best hyper plane is that it gives maximum border or space in between classes. We select that hyper plane which is far from the nearest data point on each side [3, 14]. A. Linear SVM Linear Support Vector machine algorithm is highly efficient in performance for large data sets. can handle even several million data. It is flexible enough to deal to multiclass classification problems and can easily manage with thousands of features. An SVM trained with samples from two classes. This is highly efficient in performance for tremendous number of transactions. B. Nonlinear classification After Linear classifier, it is suggested a way to create nonlinear classifiers by applying the kernel trick [13] to maximum-margin hyper planes [14]. This algorithm is actually similar to linear except the concept that every data point is considered as an

individual kernel function. This is the rreason that this non-linear classifier can easily handlee the abnormal feature space. The resulting space is also high dimensional due to kernel conversionn. Even in the presence of high dimensions, this is non-linear classifier in its original input spacce. These are maximum margin classifiers that provide the flexibility to deal high dimensional featuure space. SVMs have a number of rewards over the more classical classifiers like neural networkks, Baysian and decision trees. mization of cost The main idea of SVM is the optim function. This is the reason that SVMss are less prone to over fitting. An advantage of SV VM is that it provides united framework in whhich different machine learning architectures can be generated [24, 25, 26]. When no probability of class membership is provided, the results of Support Vectorr Machines are biased. This is the big disadvantage of S SVM. V. M-MEDIOD CLASSIFIEER In this technique, pattern is modelled by a set of cluster centers of evenly disjunctive mediods. This technique presents the class containinng n members with m-mediods. The strength of this ttechnique is its ability to model complex patterns witthout imposing any restriction on the shape of class. IIt is supporting arbitrary shape cluster and very good inn working even in the presence of outliers. When thhe m-mediods model for all the classes have beeen learnt, the classification and anomaly detection cann be performed by checking the closeness of differentt classes using hierarchical classifier that works in a specific hierarchy[11]. This classifier works best for differeent shaped data groups and give excellent accuracy in the noise presence. High inconsistency resides inn density within one group of data elements. It aalso identifies normality range and it also works onn multi-variate settings. It can model any shape clusterr and works on any feature space. This classifier is uused by many medical fields such diabetic retinopathyy etc and giving very good results. VI. NEURAL NETWORK CLA ASSIFIER The classifiers generated by neural networks are described as complex mathematical fuunctions. They are rather inconceivable and dense to humans. They follow selective rule to classify data, clooudiness limits them in many real-life applicationss where both accuracy and comprehensibility are reqquired, such as medical diagnosis and credit risk evvaluation [12]. Power full data fitting or function approximation makes it vulnerable to over fittting problem.

Combining several neural netwo orks can be used to improve the performance [11, 22, 23]. In many fields the ANN give good results only if the proper features are used. In microscopic images there are many attempts to extracct the different types of features. Since in microscopicc field the objects of interest present a high variabiliity of content, size, intensity distribution, position, orrganization, shape, it is hard to extract a suitable set of features which cover all possible situations. Fig. 3 is presenting the percentage illustration of classificcation methods used by existing Lesion detection systeems.

Graph Represen tation

Artificial Neural Network s


Bayesian K Nearest Neighbo urs

Support Vector Machine s

Fig. 3. Illustration of classification methods as used by existing diagnostic systems.

VII. FEATURES FOR CO OMPARISON The set of features used for fo making working comparison of these classifiers is described in Table 1 d and scalability, that include accuracy, Speed Robustness and interpretability.




Accuracy = No. of correct classification / Total no of In nstances in the data set

Speed & Scalability

Time to co onstruct the model Time to o use the model Efficiency in disk-resident d databases


handling noisse, missing values and irrelevant feaatures, streaming data


Understanding an nd insight provided by the model



In the following section, In Table III, Table IV, Table V and in Table VI, detail description of classifier comparison is given. TABLE III WORKING ACCURACY OF CLASSIFIERS Accuracy Algorithm

Working Accuracy

K-Nearest neighbor

Decreased accuracy in condensed data values and increased accuracy as data values are decreased.


exhibit high accuracy when applied to large databases.


Provides good accuracy in both linear and nonlinear classifications


Provide good rate of Accuracy in both circular and arbitrary shaped clusters Works for general and other frame works


Working Accuracy

K-Nearest neighbor

Easy to understand Complexity is exponential to the no of data values.


Simple and Naive


Linear complexity


Works on multi-variant settings Understand and model any shape of cluster



Working Accuracy

K-Nearest neighbor

The time to classify the instance is increased as the number of total instances increased.

Classification techniques are well designed algorithms that generally provide the good results of some the features but not fully accommodate the whole set of features. So it cannot be said that an algorithm lacks in some way. It is actually the problem and its environment that specifies the selection of correct classifier even use of more than one classifier is involved for accurate results in the specific problem domain. In medical imaging the ensemble classifiers are also giving very good results. These are used in dermatology, diabetic retinopathy and in many other medical imaging automated systems. More sophisticated classifiers are experienced by the researchers that specifically deal with outliers and not very much sensitive to noise and other artifacts.


Bayesian classifiers have also exhibited high speed when applied to large databases.



Scales well to high dimensional data


Efficient because works on medoids





Working Accuracy

K-Nearest neighbor

Insensitive to noise and irrelevant transformations


Robust to isolated noise. Handles missing values ignoring the instance during probability estimate calculation.


Generally insensitive


Abnormality detection is done

