2012 IEEE EMBS International Conference on Biomedical Engineering and Sciences | Langkawi | 17th - 19th December 2012
Leukaemia Screening Based on Fuzzy ARTMAP and Simplified Fuzzy ARTMAP Neural Networks A.S.Abdul Nasir, M.Y.Mashor
R.Hassan
Electronic & Biomedical Intelligent Systems (EBItS) Research Group, School of Mechatronic Engineering, Universiti Malaysia Perlis, Campus Pauh Putra, 02600 Pauh, Perlis, Malaysia. Email:
[email protected]
Department of Haematology, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, 16150 Kubang Kerian, Kelantan, Malaysia. Email:
[email protected]
Abstract—Leukaemia is a life threatening disease that has caused many deaths for the patients below 20 years of age compared to the other types of cancer. The accurate and early detection of leukaemia are the main keys to cure this disease effectively. Therefore, the need for analyzing the blood cells promptly is essential for leukaemia screening. Currently, the procedure for leukaemia screening is performed by haematologists by analyzing the blood cells under the microscope. Since the recognition of the blood cells has been performed manually, it is a time consuming and effortful procedure. As a step to provide the solution to this problem, this paper presents the classification of three different types of white blood cells (WBCs) which are lymphoblast, myeloblast and normal cell inside the Acute Lymphoblastic Leukaemia (ALL), Acute Myelogenous Leukaemia (AML) and normal blood samples by using the Fuzzy ARTMAP (FAM) and Simplified Fuzzy ARTMAP (SFAM) neural networks. Here, an overall of 24 extracted input features that cover the size, shape and colour features have been obtained from each WBC nucleus and fed to both FAM and SFAM networks. Comparison of performance has been made for finding the best classifier that is capable of classifying the WBCs with an optimum result. Overall, the results indicate that SFAM network has produced the highest testing accuracy with classification result of 92.00% by using the overall extracted features compared to the FAM network with classification result of 90.63%.
The procedure for early detection of leukaemia is called screening test. Screening test is conducted by haematologists to identify the presence of a cancer before an individual has any symptoms relating to the cancer itself [3]. In general, screening of leukaemia is usually started by performing the repeated complete blood count (CBC) process [4]. If the counting process produce an abnormal result, observation of the WBCs under the light microscope is required for the confirmation of leukaemia cells. Specific morphological features of the WBCs will be analyzed for differentiating the types of acute leukaemia [4]. During this observation, the presence of lymphoblast in the blood sample will relate to the type of ALL, while the presence of the myeloblast will relate to the type of AML [1]. Since it is performed manually, the process is exhaustive and burdensome with the times taken to examine the slide are about 20 minutes per slide. As an aim of improving the reliability of diagnosis as well as decreasing the dependence on human interpretation, several previous studies have used the artificial neural network (ANN) for medical diagnosis. There are several applications of ANN have been performed in medical. These include classification of blood cells [5] as well as cancer diagnosis such as lung [6] and ovarian [7] cancers. In addition, several research studies that are useful for classification the types of leukaemia using ANN are also available. For instance, Aus et al. [8] have proposed the classification of blast cells by using 11 different types of leukaemia. The classification result showed that an approximation of 70% of the leukaemia samples have been classified correctly. Azuaje [9] has proposed of using the growing cell structures (GCS) and SFAM network in order to differentiate between the normal and diffuse large B-cell lymphoma subjects. The result showed that 76% of the recognition rate has been achieved.
Keywords-Leukaemia; white blood cells; classification; fuzzy ARTMAP neural network; simplified fuzzy ARTMAP neural network
I.
INTRODUCTION
Acute leukaemia is a fast growing cancer involving the blood and bone marrow. This disease is represented by a large amount of abnormal WBCs in the body [1]. Both children and adults can develop acute leukaemia and at this time, there is no real means of prevention for this disease. Acute leukaemia consists of two main types which are ALL and AML [1]. Leukaemia has been reported to be the commonest childhood cancers under the age of 15 in Malaysia, and this represents 47.6% and 45.5% of ten most frequent cancers in male and female, respectively [2]. Moreover, this disease has caused many deaths for the patients below 20 years of age compared to the other types of cancer [2].
978-1-4673-1666-8/12/$31.00 ©2012 IEEE
Mohapatra and Patra [10] have proposed an automated leukaemia detection that utilized fuzzy based blood image segmentation and support vector machine (SVM) to classify the lymphocytic cell nucleus as either lymphocyte or lymphoblast. Here, fuzzy based two-stage colour segmentation has been used to segment the WBCs in microscopic blood images. As for the feature extraction, four different types of
11
2012 IEEE EMBS International Conference on Biomedical Engineering and Sciences | Langkawi | 17th - 19th December 2012
increases the vigilance parameter of ART a, ρa by a minimum value, forcing the input to be re-classified into an appropriate node in the output layer. ρa will be set back to the baseline vigilance parameter, for every step of learning trial.
extracted features which consist of the fractal dimension, shape, texture and colour have been obtained from the WBC nucleus. The final result showed that an accuracy of 93% has been achieved. Due to the need for rapid analysis of blood cells for leukaemia, the current study will utilize the potential of artificial neural network that has been used to classify the individual WBC as lymphoblast, myeloblast and normal cell based on the extracted features of ALL, AML and normal blood samples. II.
B. Simplified Fuzzy ARTMAP Neural Network The SFAM network has a simplified architecture when compared to the original Fuzzy ARTMAP network [11]. The main architectures of SFAM network for classification of three categories or classes (C1, C2 and C3) are illustrated by the diagrams in Fig. 2 and 3. During learning, input data are presented to a complement coder, which involves normalization and complementation of the input. The expanded input vector, x is then passed to the input layer. Once an input pattern is presented, an output node is formed to represent it as shown in Fig. 2.
ARTIFICIAL NEURAL NETWORK
The ARTMAP network was firstly developed in 1987, which allowed the supervised classification of binary input patterns [11]. The ARTMAP system was then redefined by incorporating with Fuzzy ART and be named as Fuzzy ARTMAP (FAM) [12]. In 1993, Kasuba proposed the Simplified Fuzzy ARTMAP (SFAM) network, which is a simplification of the FAM network. The network is a step ahead of FAM in reducing the computational and architectural redundancy of the FAM network [13]. Thus, this paper will explore the potential of SFAM network for classification process as compared to the FAM network. A. Fuzzy ARTMAP Neural Network FAM incorporates four basic combinations with a pair of Fuzzy ART modules namely ARTa and ARTb, associate learning network, internal controller module and a map field, Fab [12]. Fig. 1 represents the architecture of FAM network. FAM network is based on ART algorithm with fuzzy logic operations embedded into their neurons. The ART a and ARTb produce compressed recognition codes to represent the class of their input vector a and b, respectively. Vector a is the measured data vector and vector b is the prediction of a.
Figure 2. SFAM network after the first input pattern has been learned [9].
During the learning phase, these weights form the associations between the input patterns and their associated category based on a number of adaptation steps. If a node in the output layer does not match with the teaching category described in the category layer, a re-set signal is generated at the output layer. If the expected output node does not exist, a new output node is created to classify the input as shown in Fig. 3.
Figure 1. Architecture of Fuzzy ARTMAP [12].
During the learning phase, the input vector, I0 is presented to the ARTa and the desired output vector, O0 is presented to the ARTb [12]. The ARTa and Fab modules classify the input and desired output vector into categories, and the association from the ARTa category to the ARTb category will be made by the Fab. If the O0 has been predicted wrongly, a mechanism called match tracking will be triggered. This mechanism
Figure 3. SFAM network after a number of learning steps [9].
12
2012 IEEE EMBS International Conference on Biomedical Engineering and Sciences | Langkawi | 17th - 19th December 2012
III.
METHODOLOGY FOR LEUKAEMIA SCREENING
This section discusses the proposed methods and procedures that have been developed for leukaemia screening. It comprises of four main steps as described in the following section. A. Image Acquisition In this research, 800 digital images have been acquired from the ALL, AML and normal blood samples with the capturing numbers of 200, 300 and 300 images, respectively. The blood samples are obtained from Hospital University Science Malaysia (HUSM). The ALL, AML and normal slides are analyzed using Leica DLMA microscope. The images are then captured in the BMP format at an image setting of 800 x 600 pixels using the Infinity-2 digital camera. B. Image Segmentation Based on visual analysis of the WBCs in several leukaemia images, it can be observed that several characteristics such as the size and shape of the nucleus are dominantly been used for the WBC characterization. Based on this observation, the nucleus of the WBC will be segmented from the image background. Thus, the current study has utilized a methodology for nucleus segmentation as described in [14]. In brief, the colour image segmentation for the WBCs has been performed by using the HSI (hue, saturation, intensity) colour space for utilizing the colour contents in an image. Here, combinations of dark stretching and thresholding techniques have been applied for obtaining the fully segmented WBC nucleus from leukaemia image.
(b) Normal_2
(c) ALL
(d) AML
Figure 5. Results of images for segmented WBC nucleus.
(a) Normal WBC
(c) Myeloblast
C. Feature Extraction of the White Blood Cells For this study, three main categories of features have been used for classification of the WBCs namely the size, shape and colour features. These three main features will provide useful information for further classifying the types of WBC as lymphoblast, myeloblast or normal cell. In brief, the features that are covered by the three main features are described as below:
Size Features: Area and perimeter of the fully segmented WBC nucleus. Shape Features: 2nd, 3rd, 4th central moment and affine invariant moment for the segmented WBC nucleus. Colour Features: Mean and standard deviation of red, green, blue and intensity colour components for the segmented WBC nucleus.
Based on haematologists, the shape of the nucleus is one of the important features that is used for characterization of the WBC. For the shape feature, the central moment and affine invariant moment are used for analysis of the shape. Here, the 2nd till the 4th order central moment are extracted based on the following equation [15]:
(b) Normal_2
pq
(c) ALL
(b) Lymphoblast
Figure 6. A single WBC in blood image.
Fig. 4(a)-(d) show the original ALL, AML and normal blood images. Meanwhile, the results of segmented WBC nucleus for ALL, AML and normal blood images are shown in Fig. 5(a)-(d). A single normal WBC, lymphoblast and myeloblast are depicted in Fig. 6(a), (b) and (c), respectively. Based on Fig. 6, it can be observed the differences in terms of size and shape for the three different WBCs.
(a) Normal_1
(a) Normal_1
(d) AML Figure 4. Original images.
13
X
Y
( x x ) x 1 y 1
c
p
q
( y yc ) f ( x , y )
(1)
2012 IEEE EMBS International Conference on Biomedical Engineering and Sciences | Langkawi | 17th - 19th December 2012
xc
yc
1 An
1 An
X
Y
xf ( x, y )
(2)
yf ( x, y)
(3)
will be used. Meanwhile, the numbers of category, C have been set to 3 for classifying the three different types of WBC in blood sample. Here, the three category nodes have been set as lymphoblast (0), myeloblast (1) and normal cell (2). For each classification using both FAM and SFAM networks, two important training parameters are required for obtaining the optimum performance of classification namely the numbers of epoch and vigilance parameter. Here, the optimal numbers of epoch and optimal values of vigilance parameter have been obtained when both networks have attained the highest performance of classification result. The analysis for each classification begins by searching for the epoch number that is able to give the best classification performance. Here, the vigilance parameter value has been set to 0.75. After the optimal epoch number has been obtained, the analysis will then be continued by finding the suitable value of vigilance parameter.
x 1 y 1
X
Y
x 1 y 1
Here, the moment order is represented by the addition of power (p + q). Meanwhile, both 1st and 2nd affine invariant moments can be applied by using the Equation 4 and 5, respectively [5].
L1
L2
1
10 00
2
1
2
4 00
2
( 02 20 11 )
3
3
2
2
( 03 30 6 03 12 2130 4 12 30 4 03 21 312 21 ) (5)
Colour is one of the important features that human perceives while performing the screening process. Based on this requirement, the current study has performed classification of the WBCs by using different colour features. A number of eight features which represent the colour have been extracted from the WBC nucleus which consists of the mean and standard deviation of red, green, blue as well as the intensity colour components.
Testing Accuracy vs Epoch
100 95 90
TRAINING AND TESTING DATA Training Data 466 459 275 1200
The WBCs have been classified as lymphoblast, myeloblast and normal cell by using two classifiers which are FAM and SFAM networks. Comparison of performance between the FAM and SFAM networks has been made for finding the best neural network that is capable of classifying the WBCs with optimum result. Here, the analysis of performance for both networks is based on accuracy. In this study, an overall of 24 extracted features that cover the size, shape and colour features have been obtained from each WBC nucleus of ALL, AML and normal blood images.
Testing Accuracy (%)
Classification of WBC Lymphoblast Myeloblast Normal Total
RESULTS AND DISCUSSIONS
The classification results for the size, shape, colour and overall extracted features are shown in Fig. 7 till Fig. 10. Here, Fig. 7 and 8 show the classification results for the different epoch numbers and different values of vigilance parameter, respectively by using the FAM network. Meanwhile, Fig. 9 and 10 represent the classification results for the different epoch numbers and different values of vigilance parameter, respectively by using the SFAM network. Based on these figures, the red, blue, pink and black lines indicate the performance of classification results for the size, shape, colour and overall extracted features, respectively.
D. Classification of the White Blood Cells After performing the feature extraction process, the extracted size, shape and colour features are furthered used for classification of the WBCs by using two types of ANN which are Fuzzy ARTMAP and Simplified Fuzzy ARTMAP neural networks. Comparison of performance between the FAM and SFAM networks are made based on the analyses to be carried out during performing the classification process. In this study, there are 2000 segmented WBCs have been obtained from 800 blood images. Here, the first 1200 WBCs have been used as training data, while the remaining 800 WBCs have been used as testing data. The distributions for the training and testing data are tabulated in Table I.
TABLE I.
IV.
Testing Data 309 307 184 800
85 80 75 70 65 60 55 50
The numbers of input node, I for both FAM and SFAM networks vary depending to the numbers of input feature that
0
2
Size
14
4
6
Shape
8
10 Epoch
12
Colour
14
16
18
20
Overall features
2012 IEEE EMBS International Conference on Biomedical Engineering and Sciences | Langkawi | 17th - 19th December 2012
Figure 10. Accuracy of testing versus different values of vigilance parameter for classification between lymphoblast, myeloblast and normal WBC using SFAM network.
Figure 7. Accuracy of testing versus different epoch numbers for classification between lymphoblast, myeloblast and normal WBC using FAM network. Testing Accuracy vs Vigilance Parameter
100
TABLE II.
95
Analyses
Testing Accuracy (%)
90
Numbers of input Numbers of epoch Vigilance parameter Training accuracy (%) Testing accuracy (%) Overall accuracy (%)
85 80 75 70 65 60
TABLE III.
55 50
0
0.1
0.2
0.3
0.4 0.5 0.6 Vigilance Parameter
0.7
0.8
0.9
Numbers of input Numbers of epoch Vigilance parameter Training accuracy (%) Testing accuracy (%) Overall accuracy (%)
Testing Accuracy vs Epoch
100
TABLE IV.
95 90
80
Numbers of input Numbers of epoch Vigilance parameter Training accuracy (%) Testing accuracy (%) Overall accuracy (%)
75 70 65 60
Main Features Shape
2 1 0.77 82.92 76.25 80.25
14 5 0.55 99.33 78.50 91.00
Colour
8 4 0.70 100.00 88.00 95.20
Size
Main Features
2 6 0.84 99.42 78.25 91.35
Shape
14 6 0.80 99.83 78.75 91.40
Colour
8 4 0.80 99.92 90.50 96.15
THE PERFORMANCE OF COMPARISON BETWEEN THE FAM AND SFAM NETWORKS Analyses
85
Size
THE PERFORMANCE OF CLASSIFICATION FOR THE THREE MAIN FEATURES BY USING THE SFAM NETWORK Analyses
1
Figure 8. Accuracy of testing versus different values of vigilance parameter for classification between lymphoblast, myeloblast and normal WBC using FAM network.
Testing Accuracy (%)
THE PERFORMANCE OF CLASSIFICATION FOR THE THREE MAIN FEATURES BY USING THE FAM NETWORK
Fuzzy ARTMAP 24 3 0.65 99.75 90.63 96.10
Simplified Fuzzy ARTMAP 24 2 0.78 98.58 92.00 95.95
55 50
0
2
4
6
8
10 Epoch
12
14
16
18
Table II and III tabulate the performance of classification results for the three main features by using the FAM and SFAM networks, respectively. A number of 2, 14 and 8 extracted features that represent the size, shape and colour features, respectively have been used for classification of the WBCs. Based on the classification results provided by FAM network, the colour feature has produced the highest accuracy for the testing with 88%. Following the classification results are the shape feature with 78.50% for the accuracy of testing and the size feature with 76.25% for the accuracy of testing. By referring to the classification results provided by SFAM network, colour feature has still producing the highest accuracy for the testing with 90.50%. Following the classification results are the shape feature with 78.75% for the accuracy of testing and the size feature with 78.25% for the accuracy of testing.
20
Figure 9. Accuracy of testing versus different epoch numbers for classification between lymphoblast, myeloblast and normal WBC using SFAM network. Testing Accuracy vs Vigilance Parameter
100 95
Testing Accuracy (%)
90 85 80 75 70 65 60
By comparing the performance of classification results for the three main features that have been obtained from Table II and III, the highest accuracy for the testing has been produced by colour feature. Following the classification results are the shape and size features in which there are only slightly differences in terms of testing accuracy for both features. In
55 50
0
0.1
0.2
0.3
0.4 0.5 0.6 Vigilance Parameter
0.7
0.8
0.9
1
15
2012 IEEE EMBS International Conference on Biomedical Engineering and Sciences | Langkawi | 17th - 19th December 2012
addition, it can be noticed that the classification results produced by SFAM network is better compared to the results provided by FAM network. The current study has also provides the results when the overall 24 extracted features have been given to the FAM and SFAM networks simultaneously. Table IV shows the performance of comparison between the FAM and SFAM networks by using the overall extracted features.
REFERENCES [1] [2] [3]
By referring to the classification results as provided in Table IV, both networks have provided good classification performance with accuracy of testing above 90%. Here, FAM network has achieved the optimal performance with 90.63% for the accuracy of testing at 3 epochs and 0.65 for the vigilance parameter. Meanwhile, the SFAM network has achieved the optimal performance with 92% for the accuracy of testing at 2 epochs and 0.78 for the vigilance parameter. Thus, the performance of classification results for SFAM network is slightly better compared to the results provided by FAM network. In addition, the combinations between the size, shape and colour features for classification of the three different WBCs has provided better classification results compared to the results provided by each category feature. Overall, the SFAM network has proven to be the best with good classification performance by reducing the structural redundancy of the FAM network. V.
[4]
[5]
[6]
[7] [8]
[9]
CONCLUSIONS
The FAM and SFAM networks have been used to classify the WBCs into three categories namely the lymphoblast, myeloblast and normal cell. An overall of 24 extracted features that cover the size, shape and colour features have been obtained from each WBC nucleus of the ALL, AML and normal blood images, and then been used as inputs data to FAM and SFAM networks. Overall, the results indicate that the SFAM network has proven to be the best based on the good classification performance that has been obtained from the size, shape, colour and overall extracted features compared to the results provided by FAM network. The results also proved that both FAM and SFAM networks are capable to classify the WBCs into three categories for the purpose of leukaemia screening.
[10]
[11]
[12]
[13] [14]
ACKNOWLEDGMENT The authors would like to acknowledge and thank the members of the leukaemia research team at Universiti Malaysia Perlis (UniMAP) for making this research achievable and Universiti Sains Malaysia (USM) for providing the acute leukaemia blood samples and validate the results.
[15]
16
S. Miwa, Atlas of Blood Cells. Bunkodo Co., Ltd., 1998. G. C. C. Lim, S. Rampal, and H. Yahaya, “Cancer incidence in peninsular malaysia,” The Third Report of the National Cancer Registry, Malaysia, 2008. T. N. Robinson. (2002). Screening [online]. Available: http://www.healthline.com/galecontent/screening. C. Reta, L. Altamirano, J. A. Gonzalez, R. Diaz, and J. S. Guichard, “Segmentation of bone marrow cell images for morphological classification of acute leukemia,” in Proceedings of the Twenty-Third International Florida Artificial Intelligence Research Society Conference (FLAIRS 2010), Florida, USA, 2010, pp. 86–91. A. S. Abdul Nasir, M. Y. Mashor, and H. Rosline, “Detection of acute leukaemia cells using variety of features and neural networks,” in The 5th Kuala Lumpur International Conference on Biomedical Engineering (BioMed 2011), Kuala Lumpur, Malaysia, 2011, pp. 40–46. T. Kondo, J. Ueno, and S. Takao, “Medical image diagnosis of lung cancer by hybrid multi-layered GMDH-type neural network using knowledge base,” in 2012 ICME International Conference on Complex Medical Engineering (CME), Japan, 2012, pp. 663–668. A. Thakur, V. Mishra, and S. K. Jain, “Feed forward artificial neural network: tool for early detection of ovarian cancer,” Scientia Pharmaceutica, vol. 79, pp. 493–505, 2011. H. M. Aus, H. Harms, V. T. Meulen, and U. Gunzer, “Statistical evaluation of computer extracted blood cell features for screening populations to detect leukemias,” in Pattern Recognition Theory and Applications, Springer-Verlag Berlin, Heidelberg, 1987, pp. 509–518. F. Azuaje, “Making genome expression data meaningful: prediction and discovery of classes of cancer through a connectionist learning approach,” in Proceedings of the 2000 IEEE International Symposium on Bio-Informatics and Biomedical Engineering, 2000, pp. 208–213. S. Mohapatra and D. Patra, “Automated leukemia detection using hausdorff dimension in blood microscopic images,” in 2010 International Conference on Imerging Trends in Robotics and Communication Technologies (INTERACT), Chennai, India, 2010, pp. 64–68. S. Rajasekaran and G. A. V. Pai, “Image recognition using simplified fuzzy ARTMAP augmented with a moment based feature extractor,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 14, pp. 1081–1095, 2000. G. A. Carpenter, S. Grossberg, N. Markuzon, J. H. Reynolds, and D. B. Rosen, “Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps,” IEEE Transactions On Neural Networks, vol. 3, pp. 698–713, 1992. P. Venkatesan and M. L. Suresh, “Classification of renal failure using simplified fuzzy adaptive resonance theory map,” International Journal of Computer Science and Network Security, pp. 129–134, 2009. A. N. Aimi Salihah, M. Y. Mashor, N. H. Harun, A. A. Abdullah, and H. Rosline, “Improving colour image segmentation on acute myelogenous leukaemia images using contrast enhancement techniques,” in 2010 IEEE EMBS Conference on Biomedical Engineering & Sciences (IECBES 2010), Kuala Lumpur, Malaysia, 2010, pp. 246–251. R. C. Gonzalez and R. E. Woods, Digital Image Processing. 3rd ed., Prentice Hall, 2007.