Classification of Ophthalmologic Images Using an Ensemble of Classifiers? Giampaolo L. Libralao, Osvaldo C. P. Almeida, and Andre C. P. L. F. Carvalho Institute of Mathematic and Computer Science University of Sao Paulo - USP Av. Trabalhador Sao-carlense, 400 - CEP 13560-970, Sao Carlos, Sao Paulo, Brazil
[email protected],
[email protected],
[email protected]
Abstract. The human eye may present refractive errors as myopia, hypermetropia and astigmatism. This article presents the development of an Ensemble of Classifiers as part of a Refractive Errors Measurement System. The system analyses Hartmann-Shack images from human eyes in order to identify refractive errors, wich are associated to myopia, hypermetropia and astigmatism. The ensemble is composed by three different Machine Learning techniques: Artificial Neural Networks, Support Vector Machines and C4.5 algorithm and has been shown to be able to improve the performance achieved). The most relevant data of these images are extracted using Gabor wavelets transform. Machine learning techniques are then employed to carry out the image analysis. Key words: Classifiers Combination, Ocular Refractive Errors, Machine Learning, Expert Systems, Hartmann-Shack Technique, Optometry.
1
Introduction
Frequently, an human eye presents refractive errors, like myopia, hypermetropia and astigmatism. Although there are several procedures able to diagnosis errors, previous studies have shown they are not efficient enough[15]. The available devices for refractive error detection require frequent calibrations. Besides, the maintenance of the current devices is usually expensive and may require technical support from experts[17]. In order to overcome the previous limitation, this paper presents a new approach, based on Machine Learning (ML). The authors believe that this new approach allows the development an efficient diagnosis solution. ML is concerned with the development and investigation of techniques able to extract concepts (knowledge) from samples[7]. In this work, ML techniques are applied for the classification of eye images. The proposed system employs images generated by the Hartmann-Shack (HS) technique. Before, their use by the ML techniques, the images are pre-processed. ?
The authors acknowledge the support received from FAPESP (State of Sao Paulo Research Funding Agency)
The pre-processing is performed in order to eliminates image imperfections introduced during the acquisition process. Next, features are extracted from the image through the Gabor Wavelet Transform[5][3]. The use of Gabor transform reduces the number of input data (image pixels) to be employed by the ML algorithms, assuring that relevant information is not lost. Thus, a new data set is obtained where each sample is represented by a set of feature values. Finally, ML algorithms are trained to diagnosis eyes images using this new data set. In order to improve the performance achieved in the classification of eyes images, the authors combined different ML techniques in a committee. This article describes the ensemble proposed and a set of experiments performed to evaluate the performance gain due to the combination in the Refractive Errors Measurement System (REMS). The article is organised as follows: Section 2 presents a brief review of Machine Learning (ML) techniques used in the classifiers ensemble; Section 3 discusses the main features of the ensemble investigated; Section 4 explains the proposed Refractive Errors Measurement System; Section 5 describes the tests performed and shows the experimental results obtained; finally, Section 6 presents the main conclusions.
2
Machine Learning Techniques
One of the main goals of ML is the development of computational methods able to extract concepts (knowledge) from samples[7]. In general, ML techniques are able to learn how to classify previously unseen data after undergoing a training process. The classification of samples that were not seen in the training phase is named generalization. ML algorithms are in general inspired on other research areas[13]: biological systems (as ANNs), cognitive processes (Case Based Reasoning), symbolic learning (Decision Trees), and statistical learning theory (Support Vector Machines). 2.1
Artificial Neural Networks
One of the ANNs used in this work is the MLP networks[11] which are one of the most popular ANN models. MLP networks present at least one hidden layer, one input layer and one output layer. The hidden layers work as feature extractors; their weights codify features from input patterns, creating a more complex representation of the training data set. There is no rule to specify the number of neurons in the hidden layers. MLP networks are usually trained by the Backpropagation learning algorithm[12]. The other ANN model investigated in this paper, RBF networks, were proposed by Broomhead and Lowe[2]. A typical RBF network has a single hidden layer whose neurons use radial base activation functions, which are in general Gaussian functions. RBF networks are usually trained by hybrid methods, composed of an unsupervised and a supervised stage. The former determines the
number of radial functions and their parameters. The later calculates the neuron weights. In general the K-Mean algorithm is used for the first stage. For the second stage, a linear algorithm is usually employed to calculate the values of the weights. RBF networks have been successfully employed for several pattern recognition problems[1]. 2.2
Support Vector Machines
SVMs are learning algorithms based on the theory of statistical learning, through the principle of Structural Risk Minimization (SRM). They deal with pattern recognition problems in two different ways. In the first way, classification mistakes are not considered. Patterns that do not fit the typical values of their class will change the separation hyper-plane, in order to classify this pattern in the correct class. In the second, extra variables are established, so that patterns that do not fit the typical values of their group can be excluded, depending on the amount of extra variables considered, reducing, thus, the probability of classification errors. The high generalization capacity obtained by SVMs results from the use of the statistical learning theory, principle presented in the decade of 60 and 70 by Vapnik and Chernovenkis[18]. 2.3
C4.5 Algorithm
The C4.5 algorithm is a symbolic learning algorithm that generates decision trees from a training data set. It is one of the successors of the ID3 algorithm[10]. The ID3 algorithm is a member of a more general group of techniques, known as Top-down Induction of Decision Trees (TDIDTs). To build the decision tree, one of the attributes from the training set is selected. The training set patterns are then divided according to their value for this particular attribute. For each subset, another attribute is chosen to perform another division. This process goes on until each subset contains only samples from the same class, where one leaf node is created and receives the same name of the respective class.
3
Ensembles
Ensembles of classifiers aims to improve the overall performance obtained in a pattern recognition task by combining several classifiers individually trained [9]. Usually, such combination leads to more stable classifiers. However, it presents advantages and disadvantages as any other classification strategy. The main disadvantage of ensembles is the increase of the problem complexity, which can be reduced by employing techniques to partition the problem among the classifiers. The choice of the number of classifiers to be combined depends on the main features of the problem investigated and the number of classes used.
The main emphasis of classifiers combination is the exploration of similarities and differences associated to each classifier. It is also very important to take into consideration the generalization capacity and the dependency among classifiers belonging to the combined set. Classifiers that produce similar errors are not recommended for a combination. Ensembles of classifiers can present lower classification error rates than those obtained by each classifier employed individually.
4
Refractive Errors Measurement System
This section presents the main features of the REMS (Refractive Errors Measurement System) proposed by Netto[17]. The REMS system has four modules: 1. Image Acquisition Module. The acquisition of the HS images was carried out by Prof. Dr. Larry Thibos from Optometry group of the Indiana University (USA), using an equipment built by his group, known as aberrometer ; 2. Image Pre-processing Module. The ophthalmic images are generated in a format that does not allow their direct use by ML techniques. First the image data is normalized, then, the image is filtered by a pre-processing method to eliminate noise that may affect the feature extraction process; 3. Feature Extraction Module. This module aims the extraction of the main feature of an image in order to reduce the amount of input data for the analysis module. The extraction process uses a technique named Gabor Wavelet Transform; 4. Analysis Module. This module analyses patterns provided by the feature extraction module. The RBF and MLP networks, SVMs and the C4.5 algorithm were used to implement the analysis module. All these techniques are explained in Section 2. Classifiers combination developed is also part of this module. This proposed computational system processes an image obtained by the HS technique and then analyses it extracting relevant information for an automatic diagnosis of the possible refractive errors that may exist in the eye using a ML technique. Once the images are obtained, these are filtered by a pre-processing method, which eliminates image imperfections introduced during the acquisition process. This method is based on histogram analysis and spacial-geometrical information of the application domain[14]. The eyes image dataset has 100 patients, six images for each patient, three images of the right eye and three of the left eye, which result in 600 images. Each image is associated to three measurements (spherical (S), cylindrical (C) and cylindrical axis (A)), which are used to determine refractive errors. The used data set possesses the following measurement spectrum: spherical, from -1.75D (Dioptres) to +0.25D; cylindrical, from 0.0D to 1.25D, and cylindrical axis, from 0◦ to 180◦ . Negative values of spherical correspond to myopia, positive values of spherical indicate hypermetropia.
The resolution of a commercial auto-refractor is 0.25D for spherical (myopia and hypermetropia) and cylindrical (astigmatism), and 5◦ in cylindrical axis (astigmatism). The resolution adopted for the REMS is the same as commercial auto-refractors and the experimental data used in the experiments has also this resolution The aloud error for this kind of application is ±0.25D for S and C, and ±5◦ for A, the same resolution existent in commercial auto-refractors. The auto-refractor is fast and precise equipment in the analysis of refractive errors. The measurements of original data set were divided into classes, according to a fix interval, based in a commercial auto-refractor’s resolution. For spherical (S), 9 classes were created (the classes vary between -1.75D and +0.25D with interval of 0.25D), for cylindrical (C) were created 6 classes (the classes vary between 0.0D and +1.25D with interval of 0.25D), and for cylindrical axis (A), 25 classes were created (the classes vary between 0◦ and 180◦ with interval of 5◦ ). Table 1 shows the distribution among classes for the C measurement, it is possible to note the adopted criterion do not aloud superposition between classes created, because it is based in a commercial auto-refractor’s resolution. Table 1. Quantity of exemplars for measurement C. C Measurement Quantity of exemplars Distribution among classes (%) 0.00 30 7.04% 0.25 229 53.76% 0.50 113 26.52% 0.75 31 7.28% 1.00 15 3.52% 1.25 8 1.88%
Before the image analysis, each image features are extracted using the Gabor wavelet transform[5], which allows an image to be represented by its most relevant features, storing the majority of the image information in a reduced data set. The use of Gabor has shown good results for the extraction of the most relevant features from images, as it is capable of minimize data noise in the space and frequency domains[4]. Then, the analysis module uses these extracted features as inputs for the proposed techniques, largely reducing the amount of information processed. Thus, input data to the classifiers combination modules developed are vectors created by Gabor transform, resulting in a final vector with 200 characteristics, this vector is first normalized before been presented to ML techniques analyzed. Details of the Gabor transform and the implemented algorithm can be found in Netto[17] and Daugman[4].
5
Tests and Results
The authors investigated random combinations of the ML techniques that presented the best individual performance. For the experiments, the Weka simula-
tor1 , from University of Waikato, New Zealand, and the SVMTorch simulator[16], were used. It is important to highlight that three different sub-modules were developed foreach studied technique, in order to independently analyse each type of measurement (S, C and A). One set of experiments was devoted to interpret the data of S, another set of experiments for C and the last set for A. The configurations of best arrangements of the ML techniques (MLPs, RBFs, SVMs and C4.5 algorithm) were combined into four different manners and their results presented to a final classifier, in order to obtain new final results better than those previously obtained by the system. For training the random resampling method was applied. The data set (426 examples after leaving the patterns that presented measurement problems apart) was divided into 10 different random partitions. These 10 partitions were random generated, but keeping a uniform distribution for each measurement analyzed, S, C or A. For the ANNs (MLPs and RBFs) and C4.5 algorithm, the partitions were subdivided into three subsets, one for training with 60% of the samples, another for validation, with 20% of the samples and the last for tests, with also 20% of thesamples. For SVMs, the partitions were subdivided into two subsets, one for training and validation with 80% of the samples, and another for tests, with 20% of samples. The results obtained by the combined techniques were presented to a final classifier responsible to generate the final result of each module. The four modules developed are composed by the following ML techniques: – Module 1 combines two SVMs and one C4.5 classifier with C4.5 as final classifier; – Module 2 has one SVM, one C4.5 classifier and one RBF with a SVM as final classifier; – Module 3 has two C4.5 classifier and one SVM combined by a new SVM as final classifier; – Module 4 has a MLP as final classifier of two SVMs and one C4.5 classifier. The best results were obtained by the modules two and three, in which the final classifier was a SVM algorithm. These can be seen in tables 2 and 3. The C4.5 algorithm and the MLP networks did not achieve good performance as final classifiers and so these results will be omitted. Table 2 shows the performance of SVM as a final classifier in data combination in the second module. It can be seen observed the efficiency of the SVM in the combination of the individual classifiers, better than any of the individual classifiers. Table 3 presents the results generated by module 3, which reinforces the results obtained in Table 2, since SVM again obtain high performance when acting as a final classifier. To determine the superiority of a particular technique, a statistical test was carried out[6]. The results obtained were used to decide which of the techniques presented better performance, with, for example, 95% of certainty. For such, the main task is to determine if the difference between the techniques As and Ap is 1
http://www.cs.waikato.ac.nz/ml/weka/index.html (accessed in January of 2004)
Table 2. Results for the second combination module (SVM). Type of Measurement S C A
Total of Exemplars 82 83 70
% Error 32.35% 19.20% 36.50%
Tests Standad Deviation ±1.76% ±1.43% ±2.20%
Table 3. Results of third combination module (SVM). Type of Measurement S C A
Total of Exemplars 82 83 70
% Error 29.40% 19.40% 36.05%
Tests Standad Deviation ±2.01% ±1.70% ±2.14%
relevant or not, assuming the normal distribution of error taxes[19]. For this, the average and the standard deviation of the error rates are calculated according to Equations 1 and 2, respectively. The absolute difference of standards deviations was obtained by Equation 3[8]. mean(As − Ap) = mean(As) − mean(Ap) r sd(As − Ap) =
sd(As)2 + sd(Ap)2 2
tcalc = ad(As − Ap) =
mean(As − Ap) sd(As − Ap)
(1)
(2)
(3)
Choosing the initial null hypothesis H◦ : As = Ap and the alternative hypothesis H1 : As 6= Ap. If ad(As − Ap) 0 then Ap is better than As; however, if ad(As − Ap) ≥ 2.00 (boundary of acceptation region) then Ap is better than As with 95% of certainty. On the other hand, if ad(As − Ap) ≤ 0 then As is better than Ap and if ad(As − Ap) ≤ −2.00 then As is better than Ap with 95% of certainty. The boundary of acceptation region AR: (-2.00, 2.00) for these experiments are based in the distribution table Student t[6]. In order to compare efficiency of classifiers combination, two statistical tests were made comparing the performance of the modules 2 and 3, which presented better results, with the SVMs, which present best results in the experiments observed in Netto[17]. Table 4 presents the statistical tests comparing the second module of classifiers combination (Table 2) and the best results obtained by the SVM technique encountered in Netto[17]. This results show the SVM of the combination module achieved better results than any other SVM employed later, with more than 95% of certainty for the three measurements (S, C and A) analyzed.
Table 4. Results of the statistical comparison of SVM and the second module of combination. SVM (As) - Average error
SVM (combination of classifiers) (Ap) - Average error S C A S C A 0.622 ±0.011 0.421 ±0.010 0.814 ±0.016 0.323 ±0.017 0.193 ±0.014 0.365 ±0.022 SVM and SVM from ad(As − Ap) Certainty Acceptation Hypothesis classifiers combination region H1 S 20.88 95% (-2.00, 2.00) Accept C 18.74 95% (-2.00, 2.00) Accept A 23.34 95% (-2.00, 2.00) Accept
Table 5 presents the statistical tests comparing the third module of classifiers combination (Table 3) and the best results obtained by the SVM technique encountered in Netto[17]. This results show the SVM of the combination module achieve better results than any other SVM employed later, with more than 95% of certainty for the three measurements (S, C and A) analyzed. Table 5. Results of the statistical comparison of SVM and the third module of combination. SVM (As) - Average error
SVM (combination of classifiers) (Ap) - Average error S C A S C A 0.622 ±0.011 0.421 ±0.010 0.814 ±0.016 0.294 ±0.020 0.194 ±0.017 0.360 ±0.021 SVM and SVM from ad(As − Ap) Certainty Acceptation Hypothesis classifiers combination region H1 S 20.32 95% (-2.00, 2.00) Accept C 16.27 95% (-2.00, 2.00) Accept A 24.31 95% (-2.00, 2.00) Accept
The results of the two applied statistical tests show that the modules of classifiers combination developed, based in SVM technique as a final classifier, improved the performance measured by REMS in the analysis of myopia, hypermetropia and astigmatism when compared to each classifier applied individually.
6
Conclusions
This article reports the application of classifiers combination to improve the performance of REMS described in Netto[17]. The classifiers combination uses ML techniques in order to carry out the analysis and improve the final performance achieved by the Analysis Module. The Analysis Module approach affects directly
the system, so performance of this module is critical. Classifiers combination allowed this module and the hole system to become more refined. The data set used for these experiments, HS images from the Optometry group of the Indiana University (USA), presents limitations, images has reduced measures spectra: for spherical (S), spectra varies between -1.75D and +0.25D and for cylindrical (C) between 0.0D and 1.25D (both with resolution 0.25D), with axis (A) varying between 5◦ and 180◦ (with resolution of 5◦ ). In these spectra there are few exemplars of each class. Another important limitation of data set was that images of an eye from the same patient had differences in the measurements S, C and A. This is possibly caused by errors in the acquisition process. The authors believe that a new data set without measurement errors and with a larger amount of representative exemplars uniformly distributed by the possible spectra of measurements (for example, S varying between -17.00D and 17.00D and C between 0.0D and 17.00D) would improve the performance obtained by the ML techniques individually and consequently the classifiers combination. Moreover, the set of images should have similar numbers of exemplars for each class. The absence of preliminary studies in this kind of work does not alow the comparison between the REMS proposed in this article with those employed by similar systems. Nevertheless, these results show that the quality of the data set is crucial for the analysis performance. In spite of the limitations of data set used, it is relevant to notice the classifiers combination achieved its objective, increasing the general performance of the system proposed. The results obtained were relevant and may encourage future researches investigating new approaches to improve even more the performance of the Analysis Module.
References 1. Bishop, C. M.. Neural Networks for Pattern Recognition, Oxford University Press.(1996). 2. Broomhead, D. S. and Lowe, D.. Multivariable functional interpolation and adaptative networks, Complex Systems. 2(1988) 321-355. 3. Chang, T. and Kuo, C. J.. Texture Analysis and Classification with Tree-Structured - Wavelet Transform. IEEE Transaction on Image Processing. 2(4) (1993) 429-441. 4. Daugman, D.. Complete Discrete 2-D Gabor Transforms by Neural Networks for Image Analysis and Compression, IEEE Trans. on Acoustic, Speech, and Signal Processing, 36(7) (1988) 1169-1179. 5. Gabor, D.. Theory of Communication. Journal of the Institute of Electrical Engineers. 93 (1946) 429-457. 6. Mason, R., Gunst, R., and Hess, J.. Statistical design and analysis of experiments, John Wiley and Sons. (1989) 330. 7. Mitchell, T.. Machine Learning, McGraw Hill. (1997). 8. Monard, M. and Baranauskas, J.. Concepts of machines learning in intelligent systems: bases and applications. chapter 4, (2003) 512. Ed. Manole (in Portuguese).
9. Prampero, P. S. and Carvalho, A. C. P. L. F.. Reconhecimento de imagens de navios utilizando Combinacao de Classificadores, Anais do IV Workshop de Teses e Dissertacoes - ICMC/USP. (1999) 170-177. 10. Quinlan, J. R.. C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, CA. (1993). 11. Rumelhart, D. and Mcchelland, J. L.. Learning internal representations by error propagation. In: D.E. (1986). 12. Haykin, S.. Neural Networks - A Comprehensive Foundation, Prentice Hall, 2nd. edition.(1999). 13. Smola, A. J., Barlett, P., Scholkopf, B., and Schuurmans, D.. Introduction to Large Margin Classifiers, chapter 1, (1999) 1-28. 14. Sonka, M.; Hlavac, V., and Boyle, R.. Image processing, analysis, and machine vision. 2nd. edition, PWS Publishing. (1999). 15. Thibos, L. N.. Principles of Hartmann-Shack aberrometry. Wavefront Sensing Congress, Santa Fe. (2000). (http://www.opt.indiana.edu/people/faculty/thibos/VSIA/VSIA2000 SH tutorial v2/index.htm). 16. Collorbert, R. and Bengio, S.. SVMTorch: Support Vector Machines for Large Scale Regression Problems, Journal of Machine Learning Research, 1 (2001) 143160. (http://www.idiap.ch/learning/SVMTorch.html). 17. Netto, A. V.. Image processing and analysis for measuring ocular refraction errors, Thesis (doctorate), University of Sao Paulo, Sao Carlos, SP, August. (in Portuguese). (2003). 18. Vapnik, V. N. and Chervonenkis, A.. On the uniform convergence of relative frequencies of events to their probabilities. Theory of probability and applications, 16 (1968) 262-280. 19. Weiss, S. M. and Indurkhya, N.. Predictive Data Mining: A Practical Guide, Morgan Kaufmann Publishers, Inc., San Francisco, CA. (1998).