White Blood Cell Segmentation and Classification in Microscopic Bone Marrow Images Nipon Theera-Umpon Department of Electrical Engineering, Faculty of Engineering, Chiang Mai University, Chiang Mai 50200 Thailand
[email protected]
Abstract. An automatic segmentation technique for microscopic bone marrow white blood cell images is proposed in this paper. The segmentation technique segments each cell image into three regions, i.e., nucleus, cytoplasm, and background. We evaluate the segmentation performance of the proposed technique by comparing its results with the cell images manually segmented by an expert. The probability of error in image segmentation is utilized as an evaluation measure in the comparison. From the experiments, we achieve good segmentation performances in the entire cell and nucleus segmentation. The six-class cell classification problem is also investigated by using the automatic segmented images. We extract four features from the segmented images including the cell area, the peak location of pattern spectrum, the first and second granulometric moments of nucleus. Even though the boundaries between cell classes are not well-defined and there are classification variations among experts, we achieve a promising classification performance using neural networks with fivefold cross validation.
1 Introduction The differential counts, the counts of different types of white blood cells, provide invaluable information to doctors in diagnosis of several diseases. The traditional method for an expert to achieve the differential counting is very tedious and timeconsuming. Therefore, an automatic differential counting system is preferred. White blood cells are classified according to their maturation stages. Even though, the maturation is a continuous variable, white blood cells are classified into discrete classes. Because the boundaries between classes are not well-defined, there are variations of counts among different experts or within an expert himself. In the myelocytic series (or granulocytic series), they can be classified into six classes, i.e., myeloblast, promyelocyte, myelocyte, metamyelocyte, band, and polymorphonuclear (PMN) ordered from the youngest to the oldest cells [1–2]. Samples of all six classes of white blood cells in the myelocytic series are shown in Figure 1. As we can see from the figure, many characteristics of cells change during their maturation. Most of the previous proposed methods followed the traditional manual procedures performed by an expert, i.e., locating a cell, extracting its features, classifying the applied to peripheral blood only. The differential counting problem in bone marrow L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3614, pp. 787 – 796, 2005. © Springer-Verlag Berlin Heidelberg 2005
788
N. Therra-Umpon
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 1. Examples of cells in the myelocytic or granulocytic series: (a) Myeloblast, (b) Promyelocyte, (c) Myelocyte, (d) Metamyelocyte, (e) Band, and (f) PMN.
is much more difficult due to the high density of cells. Moreover, the immature white blood cells are normally seen only in the bone marrow [2]. There are many types of bone marrow white blood cells that may not be found in the blood. Therefore, the differential counts in peripheral blood may not be enough for doctors to diagnose some certain diseases. Our previous works were all applied to the problem in bone marrow, but were based on an assumption that the manually-segmented images were available [9–13]. To be more specific, we developed the mixing theories of the mathematical morphology and applied them to the bone marrow white blood cell differential counting problem [9–10]. We also developed a new training algorithm for neural networks in order to count numbers of different cell classes, without classification [11,12]. There are several other researches on cell segmentation in literature. Some examples of common techniques used in cell segmentation are thresholding [14,15], cell modeling [15–17], filtering and mathematical morphology [18], watershed clustering [6,17], fuzzy sets [19], etc. It should be noted that only the segmentation techniques performed in [5], [6], and [19] are applied to bone marrow. The other mentioned segmentation techniques are applied to peripheral blood. It should also be noted that most of the researches are emphasized on either segmentation or classification only. There are just a few of them that perform on both segmentation and classification. In this paper, we propose a technique to segment nucleus and cytoplasm of bone marrow white blood cells. We generate patches in cell images by applying the fuzzy C-means (FCM) algorithm to overly segment cells. The patches in each oversegmented image are then combined to form three segments, i.e., nucleus, cytoplasm, and background. The segmentation errors are evaluated by comparing the automatic segmented images to the corresponding images segmented by an expert using the probability of error in image segmentation. We also apply the outputs of the automatic segmentation technique to the cell classification problem using neural networks with the five-fold cross validation. Four features are extracted from each automatic segmented image based on the area of cell, and shape and size of its nucleus. This paper is organized as follows. The fuzzy C-means clustering, morphological operations, and morphological granulometries are briefly introduced in the next section. The bone marrow white blood cell data set is described in section 3. Section 4 shows the explanation of the proposed segmentation technique and feature extraction. The experimental results are shown and discussed in section 5. The conclusion is drawn in the final section.
White Blood Cell Segmentation and Classification
789
2 Methodology In this research, we apply the fuzzy C-means (FCM) algorithm and the mathematical morphology to segment white blood cells. The FCM algorithm is applied to overly segment each cell image to form patches. Cell and nucleus smoothing and small patch removal are done by using the binary morphological operations. Morphological granulometies are also applied to extract shape and size of an object. 2.1 Fuzzy C-Means Algorithm Fuzzy C-means clustering method is a well-known fuzzy clustering technique. It is widely available in literature [e.g., 20,21]. We will briefly introduce it here. Consider a set of data X = {x1, x2, …, xn}, where xk is a vector. The goal is to partition the data into c clusters. Assuming that we have a fuzzy pseudopartition P = {A1, A2, …, Ac}, where Ai contains membership grades of all xk to cluster i. The centers of the c clusters can be calculated by n
vi =
∑ [ A (x )]
m
i
k =1 n
xk
k
∑ [ A (x )]
, i = 1, 2,..., c ,
(1)
m
i
k =1
k
where m > 1 is a real number that controls the effect of membership grade. The performance index of a fuzzy pseudopartition P is defined by n
c
J m ( P ) = ∑∑ [ Ai (x k ) ] x k − v i m
2
,
(2)
k =1 i =1
where • is some inner product-induced norm. The clustering goal is to find a fuzzy pseudopartition P that minimizes the performance index Jm(P). The solution to this optimization problem was given by Bezdek in [21] and is now widely available in several textbooks. 2.2 Mathematical Morphology Mathematical morphology was first introduced by Matheron in the context of random sets [22,23]. Morphological methods are used in many ways in image processing, for example, enhancement, segmentation, restoration, edge detection, texture analysis, shape analysis, etc. [24,25]. Morphological operations are nonlinear, translation invariant transformations. Because we consider only binary images in this research, we only describe binary morphological operations. The basic morphological operations involving an image S and a structuring element E are E = ∩ {S – e: e ∈ E},
(3)
dilation: S ⊕ E = ∪ {E + s: s ∈ S},
(4)
erosion: S
790
N. Therra-Umpon
where ∩ and ∪ denote the set intersection and union, respectively. A + x denotes the translation of a set A by a point x. The closing and opening operations, derived from the erosion and dilation, are defined by closing: S
E = (S ⊕ (–E))
opening: S
E = (S
(5)
(–E),
E) ⊕ E,
(6)
where –E = {–e: e ∈ E} denotes the 180° rotation of E about the origin. We successively apply the opening operation to an image and increase the size of structuring element in order to diminish the image. Let Ω(t) be area of S tE where t is a real number and Ω(0) is area of S. Ω(t) is called a size distribution. The normalized size distribution Φ(t) = 1 – Ω(t)/Ω(0), and dΦ(t)/dt are called granulometric size distribution or pattern spectrum of S. The moments of the pattern spectrum are called granulometric moments.
3 White Blood Cell Data Set In the experiments we use grayscale bone marrow images collected at the University of Missouri Ellis-Fischel Cancer Center. Each white blood cell image was cropped manually to form a single-cell image. Then, each single-cell image was segmented manually into nucleus, cytoplasm, and background regions. The images were classified by Dr. C. William Caldwell, Professor of Pathology and Director of the Pathology Labs at the Ellis-Fischel Cancer Center. The data set consists of six classes of white blood cells – myeloblast, promyelocyte, myelocyte, metamyelocyte, band, and PMN – from the myelocytic series. After eliminating the images that do not contain the entire cells, we end up with 20, 9, 116, 31, 38, and 162 manually-segmented images for all six cell classes, respectively. Each manually-segmented image is composed of three regions – nucleus, cytoplasm, and background – with gray levels of 0, 176, and 255, respectively. The manually-segmented images corresponding to the cells shown in Figure 1 are shown in Figure 2.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 2. Corresponding manually-segmented images of cells shown in Figure 1: (a) Myeloblast, (b) Promyelocyte, (c) Myelocyte, (d) Metamyelocyte, (e) Band, and (f) PMN.
4 Proposed Techniques We propose a white blood cell segmentation technique that segments an image into three regions, i.e., nucleus, cytoplasm, and background. In this research, everything
White Blood Cell Segmentation and Classification
791
in an image except the cell of interest is considered background. We also introduce the features extracted from each segmented cell image. 4.1 White Blood Cell Segmentation Technique In this research we apply a 15×15 median filter to each cell image to ease the problem of intensity inconsistency in each region of a cell. This is a big problem, particularly in this data set, because the images are grayscale. The filtered images are then overly segmented using the FCM clustering. We heuristically set the parameter m to 2 and the number of clusters c to 10. Each patch is formed by connected pixels that belong to the same cluster. After overly segmentation, the patches in the oversegmented images are combined to form images with three segments – nucleus, cytoplasm, and background. The patch combining is achieved by considering the FCM centers. If the center of the patch is less than 60% of the mean of all centers (dark), then the patch is labeled as nucleus. If the center of the patch is less than 150% of the mean of FCM centers but greater than 60% of that (somewhat dark), then the patch is labeled as cytoplasm. Otherwise (bright), it is labeled as background. It should be noted that the list of the FCM centers is dynamic. If a patch is considered nucleus or cytoplasm but it touches the image border, then it will be labeled as background (this patch belongs to another cell) and the corresponding FCM center will be discarded from the list. This helps in the segmentation in which the cell of interest is brighter than the surrounding cells. The morphological operations, i.e., opening following by closing, both with a disk structuring element with the diameter of five pixels, are applied in the final step to remove the small patches and smooth the edges. The algorithm of the proposed technique is summarized as follows: Apply median filter to input image Apply FCM algorithm to the filtered image Sort FCM centers in ascending order For each patch corresponding to sorted FCM centers (from dark to bright) If (FCM center of patch) < (60% of mean of centers), then label patch as nucleus If (60% of mean of centers) < (FCM center of patch) < (150% of mean of centers), then label patch as cytoplasm If patch is labeled as nucleus or cytoplasm but it touches image border, then label patch as background and discard the FCM center from the list End (For each patch) Apply opening following by closing to nucleus region Apply opening following by closing to cytoplasm region Combine nucleus and cytoplasm regions 4.2 Features of Segmented Cell Images After segmenting each cell image, four features are extracted from each segmented image to form a feature vector for a classifier. As we know that the cell size becomes smaller and the size and shape of its nucleus changes when it becomes more mature, we extract the features accordingly. One feature is extracted from the entire cell segmentation, i.e., entire cell area. Three remaining features are extracted from the pattern spectrum of the nucleus of each cell, i.e., pattern spectrum peak location, first
792
N. Therra-Umpon
granulometric moment, and second granulometric moment. These last three features possess the size and shape information of the cell’s nucleus. In the experiments, we use a small disk with the diameter of four pixels as the structuring element.
5 Experimental Results Figure 3 shows examples of the outputs at each stage of our proposed cell segmentation technique. The original grayscale image, the corresponding oversegmented, and final automatic segmented images are depicted in Figure 3(a)-(c), respectively. We also show examples of automatic segmentation results along with original grayscale and expert’s manually-segmented images of all cell classes in Figure 4. By visualization, we achieve good overall segmentation results. In some cases, our results differ from the expert’s manually-segmented images but they are acceptable. For example, the output of the promyelocyte shown in Figure 4, it is hard to define the real boundary of the nucleus. To numerically evaluate the segmentation technique, we use the probability of error (PE) in image segmentation defined as PE = P(O)P(B⏐O) + P(B)P(O⏐B),
(7)
where P(O) and P(B) are a priori probabilities of objects and background in images, P(B⏐O) is the probability of error in classifying objects as background, and P(O⏐B) is the probability of error in classifying background as objects [26],[27]. This is basically the degree of disagreement between the algorithm and an expert. In the experiment, we compute the PE in segmentation of each segmented image compared to the corresponding expert’s manually-segmented image. We consider two objects of interest, i.e., nucleus and entire cell (nucleus+cytoplasm.) The overall segmentation error is calculated by averaging those of all 376 cell images. From the experiment, we achieve the overall PE in segmentation of 9.62% and 8.82% for nucleus and cell. segmentation, respectively. To evaluate the segmentation performance in each cell class, we calculate the class-wise PE in segmentation by averaging the errors in each class. The segmentation error for nucleus and entire cell segmentation in each class are shown in Tables 1 and 2, respectively.
(a)
(b)
(c)
Fig. 3. Examples of (a) grayscale image of a myelocyte, (b) corresponding oversegmented images, and (c) corresponding algorithm’s segmented images
White Blood Cell Segmentation and Classification Grayscale image
Expert’s manuallysegmented image
793
Algorithm’s segmented image
Myeloblast
Promylocyte
Myelocyte
Metamyelocyte
Band
PMN
Fig. 4. Examples of grayscale, corresponding manually-segmented, and automatic segmented images of six classes of bone marrow white blood cells Table 1. Class-wise probability of error in nucleus segmentation (%)
Cell class PE
Myeloblast Promyelocyte 10.01 16.75
Myelocyte Metamyelocyte Band
PMN
13.89
6.69
9.26
7.60
Table 2. Class-wise probability of error in entire cell segmentation (%)
Cell class PE
Myeloblast Promyelocyte 6.88 8.77
Myelocyte 8.38
Metamyelocyte 8.90
Band
PMN
9.15
9.29
794
N. Therra-Umpon
From Table 1, the PE in nucleus segmentation is smaller for a more mature class. This is because the nucleus boundary of a more mature cell is better defined than that of a younger cell. The intensity contrast between nucleus and cytoplasm is higher when a cell becomes more mature. The PE in the entire cell segmentation are similar among all six classes. However, the overall PE in the entire cell segmentation is smaller than that in the nucleus segmentation. This is because we try to discriminate the entire cells from the background. The similarity between a cell region and background is less than that between nucleus and cytoplasm. It should be noted that, in this case, background means everything except the cell of interest. Hence, parts of other cells and red cells can also cause the nucleus and entire cell segmentation errors. The good segmentation performance is not yet our final goal. We further apply the automatic segmentation results to the automatic cell classification. To justify the use of the derived automatic segmented images, we classify the cells using one of the most popular classifiers, i.e., neural networks. The neural networks used in the experiments consist of one hidden layers with ten hidden neurons. Because the cell data set is not divided into the training and test set, we perform the five-fold cross validation. We calculate the pattern spectrum of the nucleus of each segmented image. Four features, i.e., cell area, pattern spectrum’s peak location, first and second granulometric moments, as described in section 4.2, are extracted. The classification rates achieved by using the automatic segmented images are 70.74% and 65.69% on the training and test sets, respectively. While the classification rates achieved by using the manually-segmented images are 71.81% and 69.68% on the training and test sets, respectively. We can see that the classification rates achieved by using the automatic segmented images are close to that achieved by using the images segmented manually by the expert. These results show the promising classification performance based on the results of the automatic cell segmentation.
6 Conclusion We develop an automatic segmentation technique for microscopic bone marrow white blood cell images which is an important step in an automatic white blood cell differential counting. Each cell image is segmented into three regions, i.e., nucleus, cytoplasm, and background. The proposed segmentation technique is evaluated by comparing the results with the manually segmented images performed by an expert using the probability of error (PE) in image segmentation. We consider the entire cell and its nucleus as the objects of interest. From the experiments, we achieve good segmentation performances of less than 10% PE in the entire cell and nucleus segmentation. We further investigate the application of the automatic segmented images to the classification problem. Neural networks are chosen to be our classifier with four features extracted from the segmented images including the cell area, the peak location of pattern spectrum, the first and second granulometric moments of nucleus. The promising performance is achieved for this six-class classification problem with highly overlapping of cell from the adjacent classes because the boundaries are weak defined. One possible improvement is the acquisition of color microscopic images which will ease the segmentation problem very much, and, therefore, ease the classification problem.
White Blood Cell Segmentation and Classification
795
Acknowledgments This work is supported by the Ministry of University Affairs and the Thailand Research Fund under Contract MRG4680150. The author would like to thank Dr. C. William Caldwell of Ellis-Fishel Cancer Center, University of Missouri, for providing the data and the ground truth. We acknowledge the contribution of Dr. James Keller and Dr. Paul Gader through many technical discussions on this research.
References 1. Diggs L.W., Sturm D., and Bell A.: The Morphology of Human Blood Cells, Abbott Laboratories, Abbott Park (1985) 2. Minnich V.: Immature Cells in the Granulocytic, Monocytic, and Lymphocytic Series, American Society of Clinical Pathologists Press, Chicago (1982) 3. Beksaç M., Beksaç M.S., Tipi V.B., Duru H.A., Karakas M.U., Çakar A.N.: An Artificial Intelligent Diagnostic System on Differential Recognition of Hematopoietic Cells From Microscopic Images. In: Cytometry, Vol. 30. (1997) 145–150 4. Harms H., Aus H., Haucke M., Gunzer U.: Segmentation of Stained Blood Cell Images Measured at High Scanning Density With High Magnification and High Numerical Aperture Optics. In: Cytometry. Vol. 7. (1986) 522–531 5. Park J., Keller J.: Fuzzy Patch Label Relaxation in Bone Marrow Cell Segmentation. In: IEEE Intl Conf on Syst, Man,Cybern. (1997) 1133–1138 6. Park J., Keller J.: Snakes on the Watershed. In: IEEE Trans Pattern Anal Mach Intell. Vol. 23. No. 10. (2001) 1201–1205 7. Poon S.S.S., Ward R.K., Palcic B.: Automated Image Detection and Segmentation in Blood Smears. In: Cytometry. Vol. 13 (1992) 766–774 8. Sohn S.: Bone Marrow White Blood Cell Classification, Master’s Project, University of Missouri-Columbia, (1999) 9. Theera-Umpon N., Gader P.D.: Counting White Blood Cells Using Morphological Granulometries. In: Journal of Electronic Imaging. Vol. 9. No. 2. (2000) 170–177 10. Theera-Umpon N., Dougherty E.R., Gader P.D.: Non-Homothetic Granulometric Mixing Theory with Application to Blood Cell Counting. In: Pattern Recognition. Vol. 34. No. 12. (2001) 2547–2560 11. Theera-Umpon N., Gader P.D.: Training Neural Networks to Count White Blood Cells via a Minimum Counting Error Objective Function. In: Proc 15th Intl Conf on Pattern Recog, (2000) 299–302 12. Theera-Umpon N., Gader P.D.: System Level Training of Neural Networks for Counting White Blood Cells. In: IEEE Trans Systems, Man, and Cybern Part C: App and Reviews. Vol. 32. No. 1. (2002) 48–53 13. Theera-Umpon N.: Automatic White Blood Cell Classification using Biased-Output Neural Networks with Morphological Features. In: Thammasat Intl Journal of Sci and Tech. Vol. 8. No. 1. (2003) 64–71 14. Cseke I.: A Fast Segmentation Scheme for White Blood Cell Images. In: Proc 11th IAPR Intl Conf on Image, Speech and Signal Analysis. (1992) 530–533 15. Liao Q., Deng Y.: An Accurate Segmentation Method for White Blood Cell Images. In: IEEE Intl Sym on Biomedical Imaging. (2002) 245–248
796
N. Therra-Umpon
16. Nilsson B., Heyden A.: Model-Based Segmentation of Leukocytes Clusters. In: Proc 16th Intl Conf on Pattern Recognition (2002) 727–730 17. Jiang K., Liao Q., Dai S.: A Novel White Blood Cell Segmentation Scheme Using ScaleSpace Filtering and Watershed Clustering. In: Proc 2nd Intl Conf on Machine Learning and Cybern. (2003) 2820–2825 18. Anoraganingrum D.: Cell Segmentation with Median Filter and Mathematical Morphology Operation. In: Proc Intl Conf on Image Anal and Proc. (1999) 1043–1046 19. Sobrevilla P., Montseny E., Keller J.: White Blood Cell Detection in Bone Marrow Images. In: Proc 18th Intl Conf of the North American Fuzzy Info Proc Soc (NAFIPS). (1999) 403–407 20. Klir G.J., Yuan B.: Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall, New Jersey (1995) 21. Bezdek J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981) 22. Matheron G.: Random Sets and Integral Geometry. Wiley, New York (1975) 23. Serra J.: Image Analysis and Mathematical Morphology. Academic Press, New York (1983) 24. Dougherty E.R.: An Introduction to Morphological Image Processing. SPIE Press, Bellingham, Washington (1992) 25. Dougherty E.R.: Random Processes for Image and Signal Processing. SPIE Press, Bellingham, Washington, and IEEE Press, New York (1999) 26. Lee S.U., Chung S.Y., Park R.H.: A Comparative Performance Study of Several Global Thresholding Techniques for Segmentation. In: Computer Vision, Graphics, and Image Processing. Vol. 52. No. 2. (1990) 171–190 27. Zhang X.-W., Song J.-Q., Lyu M.R., Cai S.-J.: Extraction of Karyocytes and Their Components from Microscopic Bone Marrow Images Based on Regional Color Features. In: Pattern Recognition. Vol. 37. No. 2. (2004) 351–361