Comparative Study of Classification Techniques Used ...

1 downloads 0 Views 1MB Size Report
storage of images (e.g.: PhotoFinder and MoleMax) have been developed, but embedded software and applications performance are still low. Nowadays,.
Comparative Study of Classification Techniques Used in Skin Lesion Detection Systems Uzma Jamil * , Shehzad Khalid + Department of Computer Engineering, Bahria University, Islamabad, Pakistan *,+ * Government College University, Faisalabad, Pakistan * + Email: [email protected], [email protected]

Abstract--The major branch of data mining used to assign raw data to a particular group is classification. It is a method used to forecast the group association for data objects. Medical Imaging is dealing with the designing of automated systems to help physician diagnosis. In this paper, we present the comprehensive study of some classification techniques used in Medical Imaging. Several types of classification methods including Support Vector Machine, Bayesian networks, Neural networks, k-nearest neighbor classifier, and fuzzy logic techniques are used for this purpose. This study is providing a wide-ranging review of classification techniques used in medical diagnostic systems like diabetic retinopathy, foot ulcer and other related to medical field.

I. INTRODUCTION Machine learning is one of the leading branch of artificial intelligence, which is about analyzing and examining with the help of some available data. It Predicts the unknown value, its properties, performs feature extraction and relating it with some known available data. Data mining is also the term used to extract and discover a piece of information from the raw data. These two major fields get intersected at the point of data extraction and analyses. As data mining utilizes the methodologies of machine learning for the information exploration and elicitation. Data mining is used for digging the data deep down and build a valuable piece of imminent knowledge. Data Mining uses three major components of machine learning for its task performing that is 1) Classification, 2) Clustering and 3) Association. Our skin is the largest human organ that serves as a shield that protects us from pressure and stroke, heat or cold, the sun´s UV-radiation and bacteria. Among many other reasons, one of the main reasons for the increase in skin cancer among the general population is the changes in behavior and attitude in relation to sun exposure. Lesions (Cancers) is primarily caused by UV-damaged melanocytes in our skin. These cells then grow out of control and can spread to other parts of the body. To identify lesions, knowledgeable dermatologists rely initially on standard approaches of

266

pattern recognition, second on history of the disease, and later laboratory parameters. Physicians use clinical [1, 2] ABCD, Menzies’s method, 7-point checklist and various classification methods to diagnose and classify the data objects. So that it is a big desire to establish a system that start working with a digital image to lesion detection after passing through a series of steps. One of the most recent techniques for non invasive diagnosis of skin cancer is digital dermoscopy based on epiluminescent microscopy principle. It allows the evaluation of high resolution dermoscopic images of skin lesions, using a color video camera equipped with lenses providing high magnification factors and better visualization of the subsurface structures. Advantages of the digital dermoscopy are the mole mapping, a good identification of different and subtle morphological structures (pigmented network, dots, globules, bluewhite veils, blotches and streaks), a long-term observation and documentation of melanocytic lesions, independence of the investigator, reducing errors and allowing screening differentiate between difficult to diagnose lesions and benign lesions. Due to higher quality and diagnostic accuracy, compared to simply using the clinic view, dermoscopy has become the primary tool for skin lesion diagnosis [3]. The idea of using a computer assisted tool in order to assist in skin lesion diagnosis was proposed for the first time around 1985. This approach involves, as a first step, the lesion area detection, followed by an automatic features extraction step; based on these features, a classification and a diagnosis step follow. Over time, automatic acquisition, reference and digital storage of images (e.g.: PhotoFinder and MoleMax) have been developed, but embedded software and applications performance are still low. Nowadays, numerous methods have been developed for the automatic lesion detection in dermatoscopy images. Recent approaches consist in applying fuzzy C means clustering algorithms [4-5], gradient flow snakes methods [6], thresholding followed by region growing approaches [7], statistical region merging techniques [8], contrast enhancement combined with k-means _____________________________________________

ISBN: 978-1-4799-5754-5/14/$26.00 ©2014 IEEE

clustering [9], classifiers like Support Vector Machine, Neural Networks[10], or genetic algorithm . Automated Skin Lesion Detection Systems

PreProcessing

Fuzzy Logic Engine

Begelrnan et al., 2004

Lesion Segmentat ion

Naive Bayesian Classifier

Support Vector Machine

Feature Extraction

Neural Network Classifier

Q. Abbas et al.,2012, Naik et al., 2007, Lebrun et al., 2007, Wang et al., 2008.

Kononenko, 2001, Naik et al., 2007, Chen & Lee, 1997, Naik et al., 2008, Jeong et al., 2009

Lesion Classificat ion

Graph representat ion for classificatio

Ta et al., 2009, Gunduz-Demir et al., 2010

logic. All the data set elements are the instances in some dimension space, taking the unknown sample value and getting its points in the set of space. Calculate the distance of sample value with each point in space, with any distance formula such as Euclidean distance, Minkowski, Mahalanobis city block, and chessboard. Now the next step is to sort the distances in ascending order, finding the k number of nearest points. The k should be odd for decision making. the maximum nearest neighbors will decide the class of unknown given sample. The KNN algorithm is sensitive to the garbage and irrelevant features, with slow processing time. It is quite simple and easy to implement. It also manages to handle mutually related features of any arbitrary shape [27, 28, 29, 30]. Computationally insensitive as the size of training data set increases. To improve the KNN accuracy further techniques are mixed with the algorithm such as Weighted KNN (associating weight to attributes), Backward Elimination (deletes & restores attributes) and Instance weighted KNN (Gradient Decent) [3]. Table I and Fig.2 are describing the results of these techniques in tabular form and with the help of bar chart. It is clearly depicting that when new rules and techniques are mixed with KNN, results are highly improved and gives very good results as compared to traditional KNN classifiers.

Kononenko, 2001, Colantonio et al., 2007, Boland & Murphy, 2001, Velliste & Murphy, 2002; Meftah et al., 2010.

Fig. 1. Topology of Expository literature on Automated Lesion System Classifiers.

Classification is a task to categorize the test sample using some already available trained data and with the help of using supervised (well known & well defined) algorithms to categorizing and finding the class of the unknown data value. Different classifying techniques are used for this purpose. In the next sections, this paper will focus and provide critical view about the following techniques: K-Nearest Neighbor, Naïve Bayesian Algorithm, Support Vector Machine, Neural networks classifier, M-mediod Modeling. Fig. 1 is presenting the topology description of steps of an automated lesion detection system and classifiers used in the phase of classification by different researchers. II. K-NEAREST NEIGHBOR K-Nearest Neighbor is also known as “LAZY Learning Algorithm”. The algorithm has the simplest

Fig. 2. Chart presenting the %age results & comparison of Heart-1 Dataset - The Utility of Feature Weighting in Nearest-Neighbor Algorithms.

KNN Classifier give good results only if the proper features are used. As shown in Table I and in Fig.2, if KNN algorithm is soup up with some additional properties, it gives remarkable good results. KNN fails when is a subset of irrelevant features.

267

TABLE I RESULTS & COMPARISON OF HEART-1 DATASET - THE UTILITY OF FEATURE WEIGHTING IN NEAREST-NEIGHBOR ALGORITHMS.

Wine Data Set

K

Learning Rate

# of Examples

# of Training Examples

# of Testing Examples

# of Attributes

# of Classes

Accuracy

KNN

3

NA

178

146

32

13

3

87.5

Back Elimination

3

NA

178

146

32

10

3

84.38

Attribute WKNN

3

0.005

178

146

32

13

3

87.5

Instance WKNN

3

0.005

178

146

32

13

3

62.5

This algorithm allows the repossession and understanding of those cases that are in hand. This point is very helpful for dermatologists to directly make comparison of unknown lesions with other lesion classes. III. BAYESIAN ALGORITHM Naive and simple Bayesian Algorithm is a feature based probability model. The algorithm working depend upon the probability of the instance and all the features are independent. Using the Bayesian theorem, To check the training data, compute the previous probabilities P(Ci ) of each class, multiply this with likelihood i.e. P(X/Ci ), then dividing it with the evidence sum P(X) of all classes. The maximum valued class later is assigned to the unknown sample data. The algorithm is fast and accurate as compare to KNN. It is not sensitive towards the garbage and irrelevant features. This is applicable for real as well as discrete data. Strong to inaccessible points and handle the missing values. It uses Gaussian standard deviation values like mean and standard deviation. The issues with Bayesian algorithm are the liberty assumption of classes, the zero probability problems and the laplacian estimator [14, 15, 17]. Naive Bays Algorithm is good for any sort of features. This ability is helpful in making comparison of unknown lesion and known lesion classes. This algorithm is more conservative in estimating the performance. It avoids over-fitting and handles sampling error from different lesion datasets. IV. SUPPORT VECTOR MACHINE Support vector machines (SVM) are algorithms that evaluate data and identify patterns hidden in that data by applying some rules. The major concept of SVM is that it takes input and estimates, for whole

268

data set individually, which of two possible given input groups builds the final output. For this reason it is famous as non-linear classifier. This efficient algorithm constructs a model that assigns every new data element into its group. Algorithm provide a space to mention data as points and these points are grouped so that the given classes are divided by a clear gap that is as wide as possible. Using Kernel concept, SVMs are very good for non linear classification . New data points are then mapped into that same space to predict a category based on which side of the gap they fall on[1]. Grouping data into its particular class is a common practice that is used in medical imaging and many other fields and areas. This is the basic goal of this classifier to assign that data to its relevant group to which it belongs to. There are many different hyper planes that are used to classify the data. One logical choice to select best hyper plane is that it gives maximum border or space in between classes. We select that hyper plane which is far from the nearest data point on each side [3, 14]. A. Linear SVM Linear Support Vector machine algorithm is highly efficient in performance for large data sets. can handle even several million data. It is flexible enough to deal to multiclass classification problems and can easily manage with thousands of features. An SVM trained with samples from two classes. This is highly efficient in performance for tremendous number of transactions. B. Nonlinear classification After Linear classifier, it is suggested a way to create nonlinear classifiers by applying the kernel trick [13] to maximum-margin hyper planes [14]. This algorithm is actually similar to linear except the concept that every data point is considered as an

individual kernel function. This is the rreason that this non-linear classifier can easily handlee the abnormal feature space. The resulting space is also high dimensional due to kernel conversionn. Even in the presence of high dimensions, this is non-linear classifier in its original input spacce. These are maximum margin classifiers that provide the flexibility to deal high dimensional featuure space. SVMs have a number of rewards over the more classical classifiers like neural networkks, Baysian and decision trees. mization of cost The main idea of SVM is the optim function. This is the reason that SVMss are less prone to over fitting. An advantage of SV VM is that it provides united framework in whhich different machine learning architectures can be generated [24, 25, 26]. When no probability of class membership is provided, the results of Support Vectorr Machines are biased. This is the big disadvantage of S SVM. V. M-MEDIOD CLASSIFIEER In this technique, pattern is modelled by a set of cluster centers of evenly disjunctive mediods. This technique presents the class containinng n members with m-mediods. The strength of this ttechnique is its ability to model complex patterns witthout imposing any restriction on the shape of class. IIt is supporting arbitrary shape cluster and very good inn working even in the presence of outliers. When thhe m-mediods model for all the classes have beeen learnt, the classification and anomaly detection cann be performed by checking the closeness of differentt classes using hierarchical classifier that works in a specific hierarchy[11]. This classifier works best for differeent shaped data groups and give excellent accuracy in the noise presence. High inconsistency resides inn density within one group of data elements. It aalso identifies normality range and it also works onn multi-variate settings. It can model any shape clusterr and works on any feature space. This classifier is uused by many medical fields such diabetic retinopathyy etc and giving very good results. VI. NEURAL NETWORK CLA ASSIFIER The classifiers generated by neural networks are described as complex mathematical fuunctions. They are rather inconceivable and dense to humans. They follow selective rule to classify data, clooudiness limits them in many real-life applicationss where both accuracy and comprehensibility are reqquired, such as medical diagnosis and credit risk evvaluation [12]. Power full data fitting or function approximation makes it vulnerable to over fittting problem.

Combining several neural netwo orks can be used to improve the performance [11, 22, 23]. In many fields the ANN give good results only if the proper features are used. In microscopic images there are many attempts to extracct the different types of features. Since in microscopicc field the objects of interest present a high variabiliity of content, size, intensity distribution, position, orrganization, shape, it is hard to extract a suitable set of features which cover all possible situations. Fig. 3 is presenting the percentage illustration of classificcation methods used by existing Lesion detection systeems.

Graph Represen tation

Artificial Neural Network s

Others

Bayesian K Nearest Neighbo urs

Support Vector Machine s

Fig. 3. Illustration of classification meth hods as used by existing diagnostic systems.

VII. FEATURES FOR CO OMPARISON The set of features used for fo making working comparison of these classifiers is described in Table 1 d and scalability, that include accuracy, Speed Robustness and interpretability.

TABLE II FEAUTRES FOR COMP PARISON Feature

Detail

Accuracy

Accuracy = No. of correct classification / Total no of In nstances in the data set

Speed & Scalability

Time to co onstruct the model Time to o use the model Efficiency in disk-resident d databases

Robustness

handling noisse, missing values and irrelevant feaatures, streaming data

Interpretability

Understanding an nd insight provided by the model

269

TABLE VI INTERPRETABILITY

In the following section, In Table III, Table IV, Table V and in Table VI, detail description of classifier comparison is given. TABLE III WORKING ACCURACY OF CLASSIFIERS Accuracy Algorithm

Working Accuracy

K-Nearest neighbor

Decreased accuracy in condensed data values and increased accuracy as data values are decreased.

Bayesian

exhibit high accuracy when applied to large databases.

SVM

Provides good accuracy in both linear and nonlinear classifications

M-Mediod

Provide good rate of Accuracy in both circular and arbitrary shaped clusters Works for general and other frame works

Algorithm

Working Accuracy

K-Nearest neighbor

Easy to understand Complexity is exponential to the no of data values.

Bayesian

Simple and Naive

SVM

Linear complexity

M-Mediod

Works on multi-variant settings Understand and model any shape of cluster

VIII. CONCLUSION

Algorithm

Working Accuracy

K-Nearest neighbor

The time to classify the instance is increased as the number of total instances increased.

Classification techniques are well designed algorithms that generally provide the good results of some the features but not fully accommodate the whole set of features. So it cannot be said that an algorithm lacks in some way. It is actually the problem and its environment that specifies the selection of correct classifier even use of more than one classifier is involved for accurate results in the specific problem domain. In medical imaging the ensemble classifiers are also giving very good results. These are used in dermatology, diabetic retinopathy and in many other medical imaging automated systems. More sophisticated classifiers are experienced by the researchers that specifically deal with outliers and not very much sensitive to noise and other artifacts.

Bayesian

Bayesian classifiers have also exhibited high speed when applied to large databases.

ACKNOWLEDGMENT

SVM

Scales well to high dimensional data

M-Mediod

Efficient because works on medoids

TABLE IV MEASUREMENT OF SPEED AND SCALABILITY Speed & Scalability

270

Interpretability

TABLE V ROBUSTNESS COMPARISON

Authors wish to express their gratitude to the person who have given her heart whelming moral support in completion of research article that is Ms. Uzma Sattar. We really appreciate her cooperation and I am very grateful for her kind help in every aspect.

Robustness

REFERENCES

Algorithm

Working Accuracy

K-Nearest neighbor

Insensitive to noise and irrelevant transformations

Bayesian

Robust to isolated noise. Handles missing values ignoring the instance during probability estimate calculation.

SVM

Generally insensitive

M-Mediod

Abnormality detection is done

[1] Johr RH. Dermoscopy, (2002). alternative melanocytic algorithms—the ABCD rule of dermatoscopy, menzies scoring method, and 7-point checklist. Clin Dermatol; 20: 240–247. [2] Blum A, Luedtke H, Ellwanger U, Schwabe R, Rassner G, Garbe C., (2004). "Digital image analysis for diagnosis of Cutaneous melanoma. Development of a highly effective computer algorithm based on analysis of 837 melanocytic Lesions", Br J Dermatol; 151: 1029–1038. [3] A. Sultana, M. Ciuc, T. Radulescu, L. Wanyu and D. Petrache, (2012). "Preliminary work on dermatoscopic Lesion Segmentation", 20th European

Signal Processing Conference (EUSIPCO), ISSN 2076-1465, pp. 2273-2277. [4] P. Schmid, (1999). “Segmentation of digitized dermatoscopic images by two-dimensional color clustering”, IEEE Trans Medical Imaging. [5] H. Zhou, G. Schaefer, A. Sadka and M.E. Celebi, (2009). “Anisotropic mean shift based fuzzy C-means segmentation of dermoscopy images”, IEEE J Selected Topics Signal Process, 3: 26-34. [6] B. Erkol, R.H. Moss, R.J. Stanley, et al. (2005), “Automatic lesion boundary detection in dermoscopy images using gradient vector flow snakes”, Skin Res Technol; 11:17-26. [7] H. Iyatomi, H. Oka, M.E. Celebi, et al. (2008), “An improved internetbased melanoma screening system with dermatologist-like tumor area extraction algorithm”, Comput Med Imaging Graph., 32:566579. [8] M. E. Celebi, H. A. Kingravi, H. Iyatomi, Y. A. Aslandogan, W. V. Stoecker, R. H. Moss, J. M. Malters, J. M. Grichnik, A. A. Marghoob, H. S. Rabinovitz, and S. W. Menzies (2008), “Border detection in dermoscopy images using statistical region merging”, Skin Res Technol. 14(3): 347–353. doi: 10.1111/j.1600-0846.2008.00301.x. [9] D.D. Gómez, C. Butakoff, B.K. Ersbøll, W. Stoecker, (2008) “Independent histogram pursuit for segmentation of skin lesions”, IEEE Trans Biomed Eng.; 55(1):157-61. [10] S. Kia, S. Setayeshi, M. Shamsaei, M. Kia (2012), Computer-aided diagnosis (CAD) of the skin disease based on an intelligent classification of sonogram using neural network, Neural Comput & Applic (2013) 22:1049–1062, DOI 10.1007/s00521-0120864-y, _ Springer-Verlag London Limited. [11] Geetika M , Sunint K. (2006) “Comparative study of ANN for pattern classification” WSEAS Int. Conf. on Mathematical Methods and Computational Techniques in Electrical Engineering, Bucharest, October. [12] S Khalid, (2010) , “Activity classification and anomaly detection using m-mediods based modelling of motion patterns”, Pattern Recognition, 43, 3636–3647. [13] Aizerman, Mark A.; Braverman, Emmanuel M.; and Rozonoer, Lev I. (1964). "Theoretical foundations of the potential function method in pattern recognition learning". Automation and Remote Control 25: 821– 837 [14] Boser, Bernhard E.; Guyon, Isabelle M.; and Vapnik, Vladimir N. (1992); A training algorithm for optimal margin classifiers. In Haussler, David (editor); 5th Annual ACM Workshop on COLT, pages 144–152, Pittsburgh, PA. ACM Press [15] Theodoridis, Sergios; and Koutroumbas, Konstantinos; (2009), "Pattern Recognition", 4th Edition, Academic Press, ISBN 978-1-59749-272-0 [16] Jeong M.R., Ko B.C., Nam J.Y. , (2009), Overlapping Nuclei Segmentation Based on Bayesian Networks

and Stepwise Merging Strategy. Journal of Microscopy, Vol. 235, 2, 188–198. [17] Kononenko I. (2001), Machine Learning for Medical Diagnosis: History, State of the Art and Perspective. Artificial Intelligence in Medicine, Vol. 23, 1, 89−109. [18] Lebrun G., Charrier C., Lezoray O., Meurie C., Cardot H. (2007), A Fast and Efficient Segmentation Scheme for Cell Microscopic Image. Cellular and Molecular Biology, Vol. 53, 2, 51−61. [19] Naik S., Doyle S., Agner S., Madabhushi A., Feldman M.D., Tomaszewski J. (2008), Automated Gland and Nuclei Segmentation for Grading of Prostate and Breast Cancer Histopathology. In Proceedings of 5th IEEE International Symposium on Biomedical Imaging, 284−287. [20] Begelrnan G., Gur E., Rivlin E., Rudzsky M., Zalevsky Z., (2004) Cell Nuclei Segmentation Using Fuzzy Logic Engine. International Conference on Image Processing, Vol. 5, 2937−2940. [21] Ta V.T., Lezoray O., El Moataz A., Schupp S. (2009), Graph-Based Tools for Microscopic Cellular Image Segmentation. Pattern Recognition, Vol. 42, 6, 1113−1125. [22] C. M. Bishop (1995), Neural Networks for Pattern Recognition, Oxford University Press, London, UK. [23] L. Fausett (1994), Fundamentals of Neural Network: Architectures, Algorithms and Applications, Prentice Hall, Englewood Cliffs, NJ, USA. [24] M. E. Celebi, H. A. Kingravi, B. Uddin et al. (2007), “A methodological approach to the classification of dermoscopy images,” Computerized Medical Imaging and Graphics, vol. 31, no. 6, pp. 362–373.View at Publisher · View at Google Scholar · View at Scopus [25] V. N. Vapnik (1998), Statistical Learning Theory, Wiley, New York, NY, USA. [26] N. S.-T. J. Cristianini (2000), An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, Cambridge, UK. [27] B. V. Dasarathy (1991), Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. [28] M. Burroni, R. Corona, G. Dell'Eva et al. (2004), “Melanoma computer-aided diagnosis: reliability and feasibility study,” Clinical Cancer Research, vol. 10, no. 6, pp. 1881–1886. View at Publisher · View at Google Scholar · View at Scopus. [29] E. A. El-Kwae, J. E. Fishman, M. J. Bianchi, P. M. Pattany, and M. R. Kabuka , (1998), “Detection of suspected malignant patterns in three-dimensional magnetic resonance breast images,” Journal of Digital Imaging, vol. 11, no. 2, pp. 83–93. [30] B. D. Ripley (1996), Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge, UK.

271