Swarnalatha.P. et. al. / International Journal of Engineering Science and Technology. Vol. 2(11), 2010, 6771-6778. Microscopic Image Segmentation and.
Swarnalatha.P. et. al. / International Journal of Engineering Science and Technology Vol. 2(11), 2010, 6771-6778
Microscopic Image Segmentation and Recognition on Cancerous Cells Prof. Swarnalatha. P, Assistant Professor (Selection Grade) , School of Computing Science and Engineering,VIT University, Vellore-632 014, India.
Prof. Prabu S, Assistant Professor (Selection Grade), School of Computing Science and Engineering,VIT University, Vellore-632 014, India. Abstract : Cancer is known as a somatic genetic disease, in which an abnormal form of a gene appears in some part of the body. Early diagnosis is the most important and difficult tasks of the pathologists. Existing approaches are mostly manual, time-consuming and subjective. A pathologist’s general desire lies in selecting the region of interest, so an interactive system is highly preferred. This method of solving involves a partially automated system which is well received. The paper deals with feature-based classification of cancer cells. As features of individual cells are being measured, cytology image of a lung cancer is taken as an input. A semi-automated algorithm, namely, Iterative Marker-based Watershed Segmentation (IMWS), is used for segmentation of cancerous cells. The various meaningful features are then extracted from the segmented regions using moments. The Back Propagation Neural Network (BPNN) is trained with these features for identifying a Benign Cell from Malignant Cell to reduce the error rate. Keywords: Feature-Based Classification, Propagation Neural Network.
Iterative
Marker-Based
Watershed
Segmentation,
Back
1. Introduction The main aim of this paper is to develop a semi-automated and feasible system with which cancer cell identification can be done at a cheaper rate and with minimum delay. Here, Benign cancer cell will be differentiated from the Malignant one using Morphometry. Previously, many systems have been developed for cancer cell identification, which are fully automated ones. In this work, a semi-automated system has been developed, as the full replacement of the doctor or pathologist is not possible. Here, a robust segmentation technique is used so that the area of interest can be accurately separated. This semi-automated cancer cell detection system is requisite, as numbers of new patients in recent years are increasing rapidly. It is also needed, as there is a great need for early diagnosis of cancer so that the ideal treatment can be started without any further delay such that the cancer will not advance to next stage. 1.1. Motivation Cancer is one of the main diseases that humans haven’t conquered fully in the 16th century. This is one among the dreadful diseases that has no specific diagnosis technique yet. The survival rate of patients is much higher if cancers are detected and treatments are started at its early stage. The main motivation for making this system is, to serve the patients who are poor and needy by designing an efficient and capable semi-automated cancer detection system. Number of new patients in recent years is increasing rapidly and this system can be a helpful one to a normal man as its main objectives are cost effective and fast response so that, further development in the stage of cancer can be avoided. 2. Literature Survey Segmenting the image into the relevant objects and background parts is a crucial step in almost all image analysis tasks. It is also in many cases one of the more difficult tasks. There are numerous segmentation techniques in medical imaging depending on the region of interest. Thresholding is the most basic one; it is based on separating pixels in different classes depending on their gray level. In the previous work, an Adaptive Min-Distance Segmentation (AMDS) [8] is being used. The mindistance segmentation is based on the definition of distance from the points being segmented to every subset, merges them into the min-distance subset. Given N subsets and their respective samples centers ω1, ω2…ωn the …… (1) segmentation rule of point P is: di (P) < dj (P), j ≠ i =>P € wi, i, j=1, 2, ..., N
ISSN: 0975-5462
6771
Swarnalatha.P. et. al. / International Journal of Engineering Science and Technology Vol. 2(11), 2010, 6771-6778 Here, di(P) means the Euclidean distance from point P to subset i.Firstly, through the segmentation of the subset of samples, estimation on the distribution of samples is calculated. Secondly, on the basis of the above results, the distance is calculated from the point to be segmented to every subset, and employ the distance as the criterion of segmentation. The result of samples segmentation has a direct impact on the ultimate segmentation result. So, an AMDS algorithm is proposed. But in this segmentation algorithm, there are few drawbacks. They are: (a) a clustering algorithm is to be used, which will make the process a bit delayed (b) the formation of the tree is a bit tough job, as the criteria on which the tree is to be built is not clear. (c) the chain codes are to be calculated which is a complex task. The watershed transform can be classified as a region-based segmentation approach. The watershed transform is the method of choice for image segmentation in the field of mathematical morphology [[8]. There are many types of watershed transforms, among them the immersion-based watershed algorithm [10] is considered here. It can be illustrated by imagining the (magnitude of the) gradient image of the (smoothed) original image as a relief, with the ‘height’ variable being the grey-value for each pixel position. Imagine, water immersing from the bottom of the relief (at grey-level 0). Every time the water reaches a minimum, which corresponds to a region in the original image, a catchment basin is grown. When two neighbouring catchment basins eventually meet, a dam is created to avoid the water spilling from one basin into the other. When the water reaches the maximum grey-value, the edges of the union of all dams form the watershed segmentation. The main problem of this initial method is that in real images which are often noisy, there are a lot of local minima. This leads to over-segmentation. Each minimum creates catchments basin represented with a different color. Watersheds can be seen as white lines. Snakes [3] are curves defined within the image’s domain and driven by the internal forces within the curve and the external forces derived from the image data. There are mainly two general types of active contour models described in the literature: parametric active contours and geometric active contours. Parametric active contour [4] models are widely used in many applications, including edge detection, object recognition, shape modeling, and motion tracking, to mention only a few. The image gradients can be used as the external forces in parametric active contour models. Examples include the traditional snake, the balloon snake, the pressure forces model and the Gradient Vector Flow (GVF) [2] model. Parametric Snake Model A traditional snake is a curve, x(s) = [x(s), y(s)], s Є [0, 1], that moves through the spatial domain of an image to minimize the energy functional
2 | x(s) | 1
E=
1
2
| x ( s ) | 2 E ext ( x( s )) ds
--------------- (2)
0
where α and β are weighting parameters that control the snake’s tension and rigidity, respectively, and x ( s ) and x ( s ) denote the first and second derivatives of with respect to s. The external energy function Eext is derived from the image so that it takes on its smaller values at the features of interest, such as boundaries. Given a gray-level image I(x, y) (viewed as a function of continuous position variables (x, y)), typical external energies designed to lead an active contour toward step edges [2] are 1 E ext ( x, y ) I ( x , y )
2
2 E ext ( x, y ) (G ( x, y ) * I ( x, y ))
2
where (G ( x, y ) is a two-dimensional Gaussian function with
standard deviation σ and is the gradient operator. If the image is a line drawing (black on white), then appropriate external energies include : ( 3) E ext ( x, y ) I ( x , y )
--------------------- (6)
( 4) E ext ( x, y ) G ( x, y ) * I ( x, y )
It is easy to see from these definitions that larger’s will cause the boundaries to become blurry. Such large’s are often necessary, however, in order to increase the capture range of the active contour. A snake that minimizes must satisfy the Euler equation
x ( s ) x ( s) E ext
0.
This can be viewed as a force balance equation
Fint Fext( p )
--------------------- (7) 0. --------------------- (8)
ISSN: 0975-5462
6772
Swarnalatha.P. et. al. / International Journal of Engineering Science and Technology Vol. 2(11), 2010, 6771-6778 ( p) --------------------- (9) where Fint x(s) x(s) and Fext Eext. The internal force discourages stretching and bending while the external potential force pulls the snake toward the desired image edges. To find a solution to (7), the snake is made dynamic by treating x as function of time t as well as s —i.e., x(s, t). Then, the partial derivative of x with respect to t is then set equal to the left hand side of (7) as follows: x(s, t) x ( s) x ( s) E ext . When the solution x(s, t) stabilizes, the term xt (s, t) vanishes and we
achieve a solution of (7). Drawbacks of traditional snakes are poor convergence, the problem of limited capture range, magnitude of the external forces die out quite rapidly away from the object boundary. To normalize the external forces so that the maximum magnitude is equal to one, and use a unit temporal step-size for all the experiments. Color GVF [2] snakes can be obtained by exploiting prior-knowledge of the specific application. In this particular microscopy application, there are two regions of interest: the nuclei and the cytoplasm with the cytoplasm of the cells, surrounding the nucleus. Therefore, the robust color GVF snake, like many other high- level image segmentation approaches, is tuned to provide a reliable means for segmenting imaged hematopathology specimens. Feature extraction is a crucial step in most cytometry [6] studies. The feature sets described are divided into morphometric [9], [[8], densitometric [11], textural [7] and structural [5] features. Unfortunately in spite of more then 30 years of intense efforts in this field there still does not exist any generally accepted and applied set of feature definitions and measurement standards. For more complex features derived from statistical properties, either from histograms (1st order statistics) or from two-dimensional distributions (2nd order statistics) like most of the texture features, this is not acceptable. Statistical features are strongly dependent on parameters and configuration decisions. I/;.n [11], lymphocytes are classified automatically in accordance with their visual pre-classification with well balanced combination of geometric parameters describing some of the more complex morphologic features of their cell nuclei is used. To evaluate tissue sections more completely, not only global photometric parameters but geometric measurement values characterizing segregated image objects, to include the topologic relationships (e.g., inside, surrounding, regularity in its different meanings) between these objects have been considered. Hence, various lymphocytes can be classified properly based on various features considered. The drawbacks in this system are for automated analysis of tissue sections, algorithms which identify “relevant structures” in gray value discriminatory images of histologic specimens is not been done and after clustering, whether or not these subclusters can be correlated to functionally defined subpopulations is not been considered at all. This is fully an automated one, which is conflicting with the requirements of the user. A Computer-Aided Diagnosis (CAD) system [12] based on a twolevel Artificial Neural Network (ANN) architecture. This was trained, tested, and evaluated specifically on the problem of detecting lung cancer nodules found on digitized chest radiographs. The first ANN performs the detection of suspicious regions in a low-resolution image. The inputs to the second ANN are the curvature peaks computed for all pixels in each suspicious region. Automatic system for the CAD of nodular structures for radiographic images of the thorax in which, maintaining the system’s level of sensitivity. The system presents a great degree of robustness in that a low number of criteria are used for making decisions, taking advantage of the natural capacity of generalization of ANN’s. The results are not very better for early nodules, so its not very efficient one. Different individual neural networks in an ensemble that learn different samples have different performance for the same input data. The weights of conventional ensemble method is fixed, it may decrease the performance of some individual neural networks which can have better performance and lower weights, so it can influences performance of whole ensemble. An automatic pathological diagnosis procedure named Neural Ensemble-based Detection (NED) has been described and realized in an early stage Lung Cancer Diagnosis System (LCDS) [1] is which utilizes artificial neural network ensemble to identify lung cancer cells in images of the specimens of needle biopsies. The core of NED is a two-level ensemble architecture that is composed of heterogeneous ensembles that not only comprises individual networks with different number of output units, but also employs different methods to combine individual predictions. For improving the accuracy of false negative identification, full voting scheme is used in the first-level ensemble. NED achieves not only high rate of overall identification and low rate of false negative identification. The drawbacks with this NED are at each phase a neural network is used, hence it will delay the output, the system will not deal with overlapped cells. This is fully automated one, which is conflicting with the requirements of the user.
ISSN: 0975-5462
6773
Swarnalatha.P. et. al. / International Journal of Engineering Science and Technology Vol. 2(11), 2010, 6771-6778 3. Working Principle: Framework for Lung Cancer Cell Identification:
Digital cytological lung image
Preprocessing
IMWS
ROI
Feature Extraction
Benign or Malignant Cell
Classification & Recognition
Fig.1. System Framework
The framework of the Cancer Cell Identification System is shown in the above Fig 1. In this project, the lung cancer cells are to be identified, from the microscopic images which are fed into the system. The project is being modeled in a linear way. So, each module output will be an input to the next module. This project mainly consists of six modules. The cytology image of a lung with cancer is given as an input to the system. In the pre-processing step, the background suppression is done with the help of Top Hat and Bottom Hat Transforms. Threshold calculation is done on this pre-processed image with the help of histogram approach; this image is fed into the Segmentation module as input. Here, semi-automated segmentation is carried out based on the Iterative Marker-based Watershed Segmentation (IMWS) [10] approach. As it is a semi-automated segmentation, the user can select the area of interest interactively, and can get the accurate area of interest segmented. This segmented image is cropped for Region of Interest calculation where the features of the image are calculated in the Feature Extraction [9] module. With these features calculated one can know whether the given cell is malignant or not by comparing those with the features of the normal cell. Sub-headings should be typeset in boldface italic and capitalize the first letter of the first word only. Section number to be in boldface roman. 4. Design of Cancer Cell Identification System 4.1. Details of each module 4.1.1 Image Acquisition Microscopic images can be acquired in two steps. Firstly, with the help of a physical device which takes the image and secondly, with the help of digitizer the image is digitized. In this project, Cytology Image of Lung cell is taken as input as shown Fig 2, Cytology [6] is known as cell biology, studies cell structure, cell composition, and the interaction of cells with other cells and the larger environment in which they exist. Cytology can also refer to cytopathology, which analyzes cell structure to diagnose the disease. Recognizing the similarities and differences of cells is of the utmost importance in cytology. Microscopic examination can help identify different types of cells or the abnormalities present in the cells. Cytopathology [6] is one of the main diagnostic tools for detecting cancer. The recognition and study of cells represent huge improvements in the medical care and diagnostics. Here, as we are going for early stage detection of cancer and measurement of the features of the individual cells, the best image suitable is cytology image.
ISSN: 0975-5462
6774
Swarnalatha.P. et. al. / International Journal of Engineering Science and Technology Vol. 2(11), 2010, 6771-6778 Image Acquisitio n
Compu CCD Fluorescence Cytology Lung Image
viewed under microscope
Fig 2 Image Acquisition
Images produced from a microscope have some advantages and disadvantages. The major advantage with the microscopic image is, Individual cell morphology and tumor architecture may be well-differentiated. The main disadvantages with microscopic images are, they are frequently distorted, by noise and blurring due to Point Spread Function (PSF) of the imaging system, ringing effect around sharp edges and negative pixel values in the image that can occur with linear algorithms. 4.1.2. Pre-Processing The main function at this step is to improve the image in many ways so that the image can be used properly in all other stages. Pre-processing of microscopic images can be done through some available algorithms, where we enhance contrast, removing noise and isolating area of interest. The major task of pre-processing is to derive a representation of cells which makes subsequent classification computationally effective and insensitive to environmental changes by providing the classifier only with the information essential for recognition. In images, many objects of different sizes are touching each other. To minimize the contrast of objects of interest in microscopic image of the cell, cells need to be enhanced. Technique used here for contrast enhancement is the combined use of the top-hat and bottom-hat transforms as shown in Figure 3.2. Microscopic Image
Top Hat
Bot Hat
(Top + I) - Bot
Background Suppres ed s
Image Top = foreground parts of an Image Top Hat Transform Bot = background parts of an Image Bot Hat Transform
Pre-processed Image
Fig 3 Pre-Processing
ISSN: 0975-5462
6775
Swarnalatha.P. et. al. / International Journal of Engineering Science and Technology Vol. 2(11), 2010, 6771-6778 The top-hat image contains the "peaks" of objects that fit the structuring element. In contrast, the bottomhat image shows the gaps between the objects of interest. To maximize the contrast between the objects and the gaps that separate them from each other, the "bottom-hat" image is subtracted from the "original + top-hat" image. Thus the image obtained can be used for segmentation. 4.1.4. Segmentation Image segmentation is the process of isolating objects in the image from the background, i.e., partitioning the image into disjoint regions, such that each region is homogeneous with respect to some property, such as grey value or texture. It can be defined as partitioning of an input image into its constituent parts or objects through which we can get the required characteristics of the image from it. By segmentation of data, it simplifies the image, makes it easier to understand for a human and it allows for automatic calculation of certain image attributes. The necessity of segmentation is to calculate the boundary of every subset. The more precise the boundary is, the better the result is in line with the actuality The watershed transform [10] can be classified as a region-based segmentation approach. The watershed transform is the method of choice for image segmentation in the field of mathematical morphology [[8]. The intuitive idea underlying this method comes from geography; it is that of a landscape or topographic relief which is flooded by water, watersheds being the divide lines of the domains of attraction of rain falling over the region. An alternative approach is to imagine the landscape being immersed in a lake, with holes pierced in local minima. Basins (also called `catchment basins') will fill up with water starting at these local minima, and, at points where water coming from different basins would meet, dams are built. When the water level has reached the highest peak in the landscape, the process is stopped. As a result, the landscape is partitioned into regions or basins separated by dams, called watershed lines or simply watersheds. Circumference of the Circle (14) Area of Nucleus / Area of Cytoplasm
4.1.7. Classification The classification of the images are done, so that the cancerous cells are categorized based on many factors. In the project, BPNN [8] Classifier has been used to classify the images which are more robust and with great deal of accuracy and efficacy. The BPNN is a perceptron with multiple layers, a different threshold function in the artificial neuron, and a more robust and capable learning rule. The Back-Propagation Algorithm Size
O
Circu
U
I N
Circul
P
·
T
·
·
U
P
Regul
I
t
Hidd
O t
t
Figure No: 3.13: Back-Propagation Neural Network
In order to train a neural network to perform some task, the weights must be adjusted for each unit in such a way that the error between the desired output and the actual output is reduced. This process requires that the neural network compute the error derivative of the weights (EW). In other words, it must calculate how the error changes as each weight is increased or decreased slightly. The back-propagation algorithm is the most widely used method for determining the EW. The back-propagation algorithm is easiest, if all the units in the network are linear. The algorithm computes each EW by first computing the EA, the rate at which the error changes as the activity level of a unit is changed. For output units, the EA is simply the difference between the actual and the desired output. To compute the EA for a hidden unit in the layer just before the output layer, we first identify all the weights between that hidden unit and the output units to which it is connected. We then multiply those weights by the EA’s of those output units and add the products. This sum equals the EA for the chosen hidden unit. After calculating all the EA’s in the hidden layer just before the output layer, in similar way the EA’s for other layers are calculated, moving from layer to layer in a direction opposite to the way activities propagate through the network. This is what gives back-
ISSN: 0975-5462
6776
Swarnalatha.P. et. al. / International Journal of Engineering Science and Technology Vol. 2(11), 2010, 6771-6778 propagation its name. Once the EA has been computed for a unit, it is straight forward to compute the EW for each incoming connection of the unit. The EW is the product of the EA and the activity through the incoming connection. Before back-propagating, the EA must be converted into the EI, the rate at which the error changes as the total input received by a unit is changed. After training the neural network with the features, the cancer cells will be classified and the recognition of the cancer cells is done by the pathologist or the doctor. This is because; this semi-automated system which has been designed is a helping aid to the user, but not the full replacement of the doctor or pathologist. 1. Experimentation and optimization With regard to speed, in this system is running with Matlab version 6.5.0.180913a (R13) on a standard PC with an Intel Pentium 4/1.6G processor and 256-MB RAM, a single training process usually takes about 30–40 min. After the ANN is trained, a 256 ҳ 165 sized image requires a processing time of 1–3 min, depending on the boundary of cells present. This is judged to be acceptable for many applications. Acknowledgments I take this opportunity to convey my regards to the second author. I thank the VIT administration for providing support to the research. First A. Author : Member of IACSIT, Vellore, Pursuing Ph.D (Intelligent Systems). Swarnalatha Purushotham, Assistant Professor (Selection Grade), in the school of computing sciences and engineering, VIT University, at Vellore, India, has published more than 15 papers and guided many students of UG and PG so far. She is having 10 years of teaching experiences. She is associated with CSI, ACM, IACSIT, IEEE(WIE). Her current research interest includes Image Processing, Neural Networks, Pattern Recognition, Remote Sensing. Second B.Author: Prabu S, Assistant Professor (Selection Grade) in the school of computing sciences and engineering, VIT University, at Vellore, India, has published 20 papers and guiding two students for PhD. so far. He is having more than 6 years of teaching experiences. He is associated with CSI, IEEE. He is also associated as an editor with many national and international journals. His current research interest includes Image Processing, Neural Networks, Pattern Recognition, Remote Sensing. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]
Bingjie Liu, Changhua Hu, “Adaptive Neural Network Ensemble Algorithm”, Proceedings of the 6th World Congress on Intelligent Control and Automation, June 21 - 23, 2006, Dalian, China C. Xu and J. L. Prince, “Snakes, Shapes, and Gradient Vector Flow,” IEEE Trans. on Image Processing, Vol. 7, No. 3, pp. 359-369, March 1998. Cotran, Kumar, Collins, Robbins Pathologic Basis of Disease, 6th Edition. E Leymaric, M.Levine, ’Tracking deformable objects in the plane using an active contour model”, IEEE Trans. Pattern Anal. Mach. Intell., Vol. 17, No. 6, Dec. 1993, pp. 617-634. E Schnorrenberg, N. Tsapatsoulis, et al., ”Improved Detection of Breast Cancer Nuclei Using Modular Neural Networks”, IEEE Engineering in Medicine and Biology, Jan./Feb. 2000, pp. 48-63. G A Meijer, J A M Belien, P J van Diest, J P A Baak, “Origins of...Image analysis in clinical pathology”, Journal of Clinical Pathology, 1997;50:365-370 François Chaumette, “Image Moments: A General and Useful Set of Features for Visual Servoing”, IEEE Transactions On Robotics, Vol. 20, No. 4, August 2004 Hongyuan WANG, Xiaogang Wang et al., “The Researches of Microscopic Image Segmentation and Recognition on the Cancer Cells Fallen into Peritoneal Effusion”, Computer Vision Lab, Dept. of Computer, Nanjing Univ. of Sci. & Tech., Nanjing, China, 2001. Jean-Philippe Thiran and Benoit Macq, “Morphological Feature Extraction for the Classification of Digital Images of Cancerous Tissues”, IEEE Transactions On Biomedical Engineering, Vol. 43, No. 10, OCTOBER 1996 L. Vincent & P. Soille. “Watersheds in digital spaces: An efficient algorithm based on immersion simulations.” IEEE Transactions Pattern Anal. Mach. Int. 13(6), pp. 583–598, June 1991. Manfred Pfoch and Wolfgang Kade “Automated Classification Of Cells In Electron Microscopic Images Of Lymphoreticular Tissue”, The Journal Of Histochemistry And Cytochemiatry vol. 25, no. 7, pp. 655-661, 1977. M.G Penedo, M.J. Carreira, A. Mosquera, etc., “Computer-Aided Diagnosis: A Neural Network Based Approach to Lung Nodule Detection”, IEEE Transactions on Medical Imaging, Vol. 17, No. 6, Dec. 1998, pp. 872-880. M. Kass, A. Witkin & D. Terzopoulos. “Snakes: active contour models.” Int.J.Comp.Vis. 1(4), pp. 321–331, 1987. M.V. Boland, R.F. Murphy, “A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of hela cells”, Bioinformatics 17 (2001) 1213-1223. Per Jesper Sjöström, Beata Ras Frydel, and Lars Ulrik Wahlberg, “Artificial Neural Network-Aided Image Analysis System for Cell Counting”, Cytometry 36 (1999) 18-26. Rafael C. Gonzalez, Richard E. Woods, Digital Image Processing, 2nd Edition R. M. Haralick, S. R. Sternberg, and X. Zhuang, “Image analysis using mathematical morphology,” IEEE Trans. Pattern Anal. Machine Intell., vol. 9, no. 4, pp. 532-549, July 1987. S. X. Liao and M. Pawlak, “On image analysis by moments,” IEEE Trans. Pattern Analysis and a Machine Intelligence, vol. 18, pp. 254– 266, 1996. T.W. Nattkemper, H. Ritter and W. Schubert, “A neural classifier enabling highthroughput topological analysis of lymphocytes in tissue sections”, IEEE trans Info. Tech. Biomedicine 5(2) (2001) 138-149.
ISSN: 0975-5462
6777
Swarnalatha.P. et. al. / International Journal of Engineering Science and Technology Vol. 2(11), 2010, 6771-6778 [20] Ying-Lun Fok, Joseph C. K. Chan, and Roland T. Chin, “Automated Analysis of Nerve-Cell Images Using Active Contour Models”, IEEE Trans. Medical Imaging 15(3) (1996) 353-368. [21] Zhou Zhihua, Jiang Yuan, Yang Yubin, et al., “Lung cancer cell identification based on artificial neural network ensembles”, Artificial Intelligence in Medicine,2002,24 (1):25 -36.
ISSN: 0975-5462
6778