The Automated Identification of Tubercle Bacilli using ... - CiteSeerX

11 downloads 60853 Views 365KB Size Report
Sep 4, 1998 - E-mail: joannj@mail.saimr.wits.ac.za. Abstract. Tuberculosis is currently the world's leading cause of adult death from a single. infectious ...
Proceedings of the 8th International Conference on Artificial Neural Networks, vol 2, pp 797-802 Skövde, Sweden, 2-4 September 1998

The Automated Identification of Tubercle Bacilli using Image Processing and Neural Computing Techniques K. Veropoulos, C. Campbell Department of Engineering Mathematics, University of Bristol Bristol, United Kingdom E-mail: [email protected]

G. Learmonth, B. Knight Department of Anatomical Pathology, University of Cape Town Cape Town, South Africa E-mail: [email protected]

J. Simpson Provincial TB Laboratory, South African Institute for Medical Research Western Cape Province, South Africa E-mail: [email protected]

Abstract Tuberculosis is currently the world's leading cause of adult death from a single infectious disease. Sputum examination remains the cornerstone of diagnosis in epidemic situations. To improve the diagnostic process we are developing an automated method for the detection of tubercle bacilli in clinical specimens, principally sputum smears. A preliminary investigation is presented here, which makes use of image processing techniques and neural network classifiers for the automatic identification of TB bacilli on Auramine stained sputum specimens. Currently, the developed system shows a sensitivity of 93.5% for the identification of individual bacilli. As there are usually fairly numerous TB bacilli in the sputum of patients with active pulmonary TB, the overall diagnostic accuracy for sputum smear positive patients is expected to be very high. Potential benefits of automated screening for TB are rapid and accurate diagnosis, increased screening of the population, and reduced health risk to staff processing slides.

1

Introduction

Tuberculosis (TB) is currently the world's leading cause of adult death from a single infectious disease and in developing countries it is responsible for more adult deaths than AIDS and malaria combined. The disease is now on the increase in both developing and developed countries with a projected increase in annual mortality of about one million by 2004. This increase has stemmed from a number of factors such

Proceedings of the 8th International Conference on Artificial Neural Networks, vol 2, pp 797-802 Skövde, Sweden, 2-4 September 1998

as the increased multiple drug resistance of the disease, inadequate control programmes, co-infection with HIV, increased migration, and the increased number of young adults in the world - the age group with the highest mortality from the disease. Globally, more than 50 million individuals may now be infected with multi-drug resistant TB. In 1993, due to the increasing incidence of multi-drug resistance and the highly infectious nature of TB, the WHO declared tuberculosis to be a global emergency, the only declaration of this kind in the history of WHO. Since TB remains largely treatable the key to controlling the disease is through correct initial diagnosis and subsequent monitoring. WHO guidelines suggest diagnosis of TB by viewing and screening Ziehl-Neelsen (ZN) stained specimens under a light microscope or Rhodamine/Auramine stained specimens under a fluorescence microscope [5]. Fluorescence microscopy is more sensitive and thus considered to be superior to light microscopy. However, manual screening for TB is labour intensive and has a high false negative rate. This investigation is the first attempt to automatically identify TB bacilli in sputum smears and its aim is the detection of tubercle bacilli using image processing and recognition techniques.

2

Method

The method is mainly divided into three steps: image capture, image processing and analysis, and image recognition using neural network methods.

2.1

Data Preparation & Image Capture

The first stage involves visual identification of the TB bacilli on Auramine stained sputum specimens, examined under a fluorescence microscope. Images of the bacilli are then captured using a digital camera attached to the microscope. The data used for this investigation was drawn out of a set of 1000 randomly chosen Auramine stained slides, prepared at the TB Control Laboratory at the South African Institute for Medical Research (SAIMR) in Cape Town, South Africa. The bacilli were identified firstly at low power magnification (x100) and high power magnification (x400). For image capture x630 magnification was used. The digital camera used for capturing the images contained a CCD chip with a resolution of 720x512 pixels (72 dpi). The image in Figure 1 shows numerous fluorescent TB bacilli in a background that is free of debris and contaminants. However, in the majority of smears there are only a few bacilli scattered over the entire slide (Figure 3 - left). Moreover, the bacilli may be faint, occluded, obscured by cells or remnants, or inside macrophages – this imparts a hazy outline to the bacilli, which may cause oversights in recognition. Also, the background can be complex due to debris and other features in the sputum, making recognition even more difficult. During the course of infection the number of bacilli in 4 the sputum may fluctuate close to the estimate of 10 /ml needed for a smear-positive result. A few of the images used for this study contained a very high density of bacilli (Figure 3 - right). These images were discarded from the training and test sets in the

Proceedings of the 8th International Conference on Artificial Neural Networks, vol 2, pp 797-802 Skövde, Sweden, 2-4 September 1998

investigation, because the bacilli are so numerous that they are frequently superimposed on each other or indistinguishable from each other. However, this was not regarded as a drawback since these images generally contain a sufficient number of bacilli for reliable identification of tuberculous infection.

2.2

Image Processing & Analysis

Image processing and analysis is the second stage of the method. Recognition of TB bacilli is solely based on the shape of bacilli. The image processing techniques used in this investigation help in enhancing the images and highlighting features needed for the shape description of each bacillus in the image. Separate entities in the images, such as TB bacilli, will be referred to as objects (each object being considered as a separate region). 1. Edge Detection. A number of edge detection algorithms have been investigated [8,10] and it was found that the most appropriate edge detector for the captured images is the Canny operator [2,10]. The sensitivity of the Canny filter must be adjusted according to the amplification gain and resolution used during the original image capture. 2. Region Labelling & Region Removal. The next step is to label regions and remove some of them based on their size. The region removal process enables the system to examine only regions that belong to a certain size-range excluding regions that could not possibly be bacilli due to their size – this eliminates some unnecessary processing. In this step and the boundary tracing below, 8-connectivity is used [10]. 3. Edge Pixel Linking. After region labelling, edge pixel linking [4] needs to be applied on objects with possible open edges. All points (pixels) in the image - that are similar in terms of magnitude and angle - are linked, forming a closed boundary of pixels. This step is necessary to improve segmentation of the image, since none of the existing edge detection operators yields optimal segmentation. 4. Boundary Tracing. When segmentation is complete and nearly optimal, boundary tracing techniques are used to find the boundaries of each object in the image (Figure 2). In this investigation the inner boundary tracing algorithm is used which results in a shape closer in size to the size of the object being examined [10]. This step is necessary for the derivation of shape descriptors needed for the identification of each object. 5. Shape Description. Having found the regions and their boundaries, shape descriptors can then be derived from the boundary curve of each region. After experimentation, a total of 15 Fourier descriptors was found to be sufficient to represent each region [4,7,9,10].

2.3

Image Recognition

The shape descriptors were fed into a classifier to identify any relevant regions as bacilli. A number of classification techniques were used at this stage, e.g. discriminant methods, neural networks, etc [6]. The approach that has performed optimally in terms of overall accuracy is a multi-layered neural network with one hidden layer, trained by using standard Back-Propagation.

Proceedings of the 8th International Conference on Artificial Neural Networks, vol 2, pp 797-802 Skövde, Sweden, 2-4 September 1998

3

Results

A training set consisting of 900 image objects, a validation set of 100 objects and a test set of 147 objects (bacilli and non-bacilli) were used to train and test a neural network. There were 15 input values corresponding to the 15 Fourier descriptors, and one output value (positive/negative) for labelling each image object. 10-fold cross validation was used in our experiments and the average overall accuracy on the test set is shown in Table I.

Accuracy

KNN(5) 91.80%

RBF 88.06%

BP 97.57%

KR 95.24%

Table I: The expected class is the category assigned to regions by manual identification. The detected class is the category found by the computer.

The best performing algorithm was back-propagation (BP) which showed little variability for 0-5 hidden nodes (the figure quoted is for no hidden nodes). K-nearest neighbours (KNN) and RBF networks were poorer overall. The Kernel Regression (KR) is a new algorithm, which emulates Support Vector Machines [1]. For back-propagation the sensitivity was 93.53% and the specificity 98.79% where sensitivity is the ratio of true positive decisions against the number of positive cases and specificity is the ratio of true negative decisions against the total number of negative cases [11]. Table I shows the detection rate for individual bacilli present. However, on most positive smears, there will be more than one bacillus present. Positive diagnosis of TB is not usually made unless three or more bacilli are detected [3]. Thus the diagnostic accuracy for these smears will be very high.

4

Discussion

In this paper it is shown that automated detection of TB bacilli is certainly feasible and it could be a cost effective approach in areas with high incidence rates such as the Western Cape region of South Africa [12]. It has been claimed [13] that current screening methods may miss 33-50% of active cases and automation may reduce this error rate since a greater number of fields will be investigated by the machine. Automation would also enable the screening of a larger proportion of the population and the increased monitoring of patients currently receiving treatment. Recognition rates using ZN (light microscopy) and Papanicolaou (fluorescence microscopy) stained specimens are currently being investigated. An extension of the work presented here is to introduce a confidence measure for the classification of image objects. The use of bacilli counts and time elapsed since onset of treatment is also being investigated as a means of distinguishing multi-drug resistant and nonresistant infection. These and other results will be presented elsewhere.

Proceedings of the 8th International Conference on Artificial Neural Networks, vol 2, pp 797-802 Skövde, Sweden, 2-4 September 1998

5

Acknowledgements

The authors wish to thank Dr. Mark Jepson and the staff of the Cell Imaging Facility at the Biomedical Sciences School (Bristol) and Ms. Marlein Bosman and the staff at SAIMR (Cape Town) for their technical assistance.

References [1] Campbell C, Cristianini Nello, Veropoulos K. (preprint in preparation) [2] Canny JF. A computational approach to edge detection. IEEE Trans Patt Anal & Mach Intell 1986; 8:679-698 [3] Collins CH, Grange JM, Yates MD. Organisation and practice in tuberculosis bacteriology. Butterworths, London, 1985 [4] Gonzalez RC, Woods RE. Digital image processing. Addison-Wesley, USA, 1992 [5] Kupper TH, Steffen K, Wekle G, Richartz G, Pfitzer P. Morphological study of bacteria of the respiratory system using fluorescence microscopy of Papanicolaou stained smears with special regard to identification of Mycobacteria in sputum. Cytopath 1995; 6:388-402 [6] Mitchie D, Spiegelhalter DJ, Taylor CC. Machine learning, neural and statistical classification. Ellis Horwood, New York, 1994 [7] Persoon E, Fu K. Shape discrimination using Fourier descriptors. IEEE Trans Sys, Man, and Cyber 1977; 7(3):170-178 [8] Phillips D. Image processing in C: analyzing & enhancing digital images. R&D Publications, Kansas, 1994 [9] Reeves AP, Prokop RJ, Andrews SE, Kuhl FP. Three-dimensional shape analysis using th moments and Fourier descriptors. Proc 7 Int Conf Patt Rec 1984; 447-450 [10] Sonka M, Hlavac V, Boyle R. Image processing, analysis and machine vision. Chapman and Hall, London, 1993 [11] Veropoulos K. The Probabilistic Neural Network. MSc thesis, University of the West of England, Bristol, 1995 [12] Wilkinson D, de Cock KM. Opinion: tuberculosis control in South Africa – time for a new paradigm? South African Med J 1996; 86:33-35 [13] WHO Tuberculosis Diagnostics Workshop (ed. Foulds J). Product development guidelines. WHO, Cleveland, Jul 1997

Proceedings of the 8th International Conference on Artificial Neural Networks, vol 2, pp 797-802 Skövde, Sweden, 2-4 September 1998

Figure 1: An image with a number of distinct bacilli present.

Figure 2: The image of Figure 1after edge detection, region labelling/removal, edge pixel linking, and boundary tracing have been applied.

Figure 3: A few bacilli may be scattered among a large number of fields (left); some fields have a very high densities of bacilli (right).

Suggest Documents