Proceedings of IEEE International Conference TCSET 2004, Lviv-Slavsk,. Ukraine 2004, pp. 169â172. [2] Bieniecki W., Grabowski Sz., Sekulska. J.: A system for ...
ZESZYTY NAUKOWE POLITECHNIKI ŁÓDZKIEJ Nr xxx
ELEKTRYKA, z. 1xx
200x
WOJCIECH BIENIECKI Technical University of Łódź Computer Engineering Department
NEW IMAGE PROCESSING ALGORITHMS IN COMPUTER VISION SYSTEMS FOR PATHOMORPHOLOGIC DIAGNOSTICS Reviewer: prof. dr hab. Dominik Sankowski Manuscript received: 2-11-2005
In this paper main contributions of the author’s Ph. D. dissertation are summarized. The aim of the research was to develop two image processing systems for biomedicine: a system of automatic breast cancer nuclei classification and a system for immunoenzymatic lymphocyte response measurement. Standard image processing algorithms are insufficient for the completion of the task, so new algorithms had to be proposed. These include noise removal from color images, enhanced binarization method, pixel classification algorithm and watershed-based segmentation algorithm.
1. INTRODUCTION Breast cancer is still one of the most dangerous diseases among women. In Poland the death ratio is still increasing, mainly as a result of insufficient diagnostics. One of the methods of monitoring the growth of the cancer is observation of cancer tissue images and quantitative analysis of the cancer cells. The aim of the pathomorphological examination is detection of estrogen and progestagen receptors in cancer tissue nuclei, which are responsible for tissue growth. One of the detection techniques is immunohistochemical staining of the tissue. Nuclei with receptors react with a dye and change colours into redbrown, while other remain pale blue. The fraction of coloured nuclei in the specific area gives an important hint on choosing proper pharmacological therapy. Manual classification of the images may be replaced with laboursaving
54
Wojciech Bieniecki
automatic analysis performed on a PC computer. Another important problem of the present medicine is transplantation. The most frequent are kidney transplantations. Before the operation specialized donor and recipient examinations are carried out in order to minimize the risk of graft outcome. One of the examination methods is observation of microscope image of recipient lymphocytes response against donor antigens. In the ELISPOT method the image analysis needs evaluation of specific morphological parameters of the objects appearing as round spots. The quantitative analysis of images is possible with the use of commercial systems, but the time and cost of such examination are unacceptable for scientific research. cythoplasm
Acquisition artifacts Badly stained nucleus
air bubble
7 jammed nuclei
positively reacted nucleus
negatively reacted nucleus
~6 mm
Partly positive nucleus
non-nuclear structures
Fig. 1 A sample image from biopsy to be segmented and recognized (magnification 300x)
Fig. 2 A microscope image of lymphocyte response in ELISPOT method
In both cases the automation of the image analysis by the application of the computer vision system can speed up the process and make the diagnosis more reliable. The author of this dissertation took up the task and developed flexible software for image processing: PATO – a system for automatic breast cancer nuclei count and classification [2] and SPOTVIEW – a system for immunoenzyatic lymphocyte response measurement [5]. The systems are a fruit of collaboration with two medical research institutes: The Research Institute of Polish Mother's Memorial Hospital and The Department of Nephrology and Transplantation Medicine of the Medical University in Wroclaw. PATO and SPOTVIEW are systems, which follow the common data
New image processing algorithms in computer vision systems …
55
processing flow: Image acquisition and archiving. Inspected images (Figs. 1 and 2) are taken by a digital or analog camera connected to the optical microscope with proper magnification. The images are stored as color JPEG or TIFF files with the resolution 0,53,2 MPix. Additional interface for camera control are supported but no photography database system. Image preprocessing: This step encloses the following tasks: extraction of ROI (region of interest) [7], scale evaluation and preliminary filtering for segmentation [6]. Segmentation is usually defined as a procedure that splits the whole image into connected, non-overlapping components which have strong correlation with real objects. In both systems this goal is achieved in two steps: objects and background pixels identification [3] and region oriented segmentation [1]. Once the segmentation parameters are set, such an approach enables unattended processing for all images. Object identification and measurement – all segmented particles are examined by means of some morphological parameters. Individual and summary statistics are exported to a workbook. Specific algorithms have been presented in the following sections.
2. THE PATO SYSTEM To satisfy the requirements the author of the dissertation concentrated at the construction of efficient image processing algorithms. In Fig. 3 the summary image processing algorithms (all steps from image acquisition to object identification) are presented. In the algorithm some groups of operations may be distinguished: image segmentation with use of thresholding – operations (a), (b), (d), (f); region-based segmentation with use of watershed algorithm – (a), (e), (g); segmentation and object identification with a hybrid method: watershed + binarization (f) + (g); a supplement of color information and the classification with use of k-nearest neighbor method and decision tree. Additionally – statistic preparation – (a), (b), (c), (i), (j), (k). The system PATO has been verified against the analysis carried out manually by the experienced lab assistant. It has been assumed, that the results of manual analysis are model. The 144 images have been taken from the microscope slides of biopsy, histological extracts and frozen plate imprints.
56
Wojciech Bieniecki I M A G E
INPUT IMAGE
de-interlace filter 2pMLF filter
a
smoothed color image - background subtraction - contrast adjustment
P R E P R O C E S S I N G
b e
d
- grayscale conversion - mathematical morphology
desaturation c
saturated color image
monochrome image
monochrome image f
g
thresholding
watershed
binary image
indexed image (objects)
indexed image (objects)
h morphology
indexed image (objects)
q
i color classification
training set
indexed image (objects) decision tree
j
identification
k classified objects
morphometry
S E G M E N T A T I O N I D E N T I F I C A T I O N
Fig. 3 The summary image processing algorithm for the system PATO
The lab assistant marked the nuclei in a view area and put down the number of all found nuclei and the number of nuclei with positive response (with receptors). The same analysis was carried out with help of PATO. Additionally the program evaluated the proportion of the stained area (area of nuclei with positive reaction) to the whole area of found nuclei (without segmentation of nucleus compartments). It has been assumed, that such measurement is equivalent to the fraction of stained nuclei count measurement. The results have been compared and the mean deviation evaluated as a normalized square mean error (NSME). The following values have been compared: 1. the total count of all cell nuclei found in view area manually and using a
New image processing algorithms in computer vision systems …
57
program PATO (mT vs. pT); 2. the count of positively stained cell nuclei found in view area manually and using a program PATO (mP vs. pP); 3. the percentage of count of stained nuclei among all nuclei in view area found manually and using a program PATO (mP% vs. pP%); 4. the percentage of count of stained nuclei among all nuclei in view area found manually and percentage of pixels classified as stained to all object pixels in the image evaluated by PATO (mP% vs. pA%); 5. the percentage of count of stained nuclei among all nuclei in view area found by PATO and percentage of pixels classified as stained to all object pixels in the image evaluated by PATO (mP% vs. pA%); The results have been elaborated for each type of microscope slide. It was expected to obtain the most accurate results for histological extracts because of the highest quality of the image. The results are printed in Table 1. Table 1. The values of NMSE for the measurements grouped by preparation method Comparison description
comparison Histosymbol logy
BAC
mT vs. pT
5.5%
2 Count of positive nuclei
mP vs. pP
13.8%
3 Fraction of count of posit. nuclei
mP% vs. pP%
8.0% 13.0% 19.7% 13.9%
4 fraction of count of posit. nuclei (manual) vs. fraction of area of posit. nuclei
mP% vs. pA%
10.0% 14.1% 18.1% 14.3%
2.5%
3.7%
Avg.
1 Count of all nuclei
5 fraction of count of positive nuclei pP% vs. (automatic) vs. fraction of area of pA% positive nuclei
2.4%
Imprint
4.9%
9.3% 20.8% 13.9%
2.4%
4.1%
3.0%
The deviation of total count of the nuclei is the measure of the segmentation algorithm reliability, while the identification error indicates the reliability of classification algorithms. Table 1 shows, that the segmentation algorithms run properly in most cases. The error level 5% is acceptable in medical applications. The error increases in case of images of histological extracts. This is probably the result of greater number of nuclei compartments in a single image. Table 2 presents the average count of nuclei found within a view area. PATO systematically returns the lower value, than in case of manual analysis. The results for pP% vs. pA%
58
Wojciech Bieniecki
examination suggest that for an individual image there is a strong correlation between relative count and relative area of detected positively reactive nuclei. Table 2. Average count of cancer cell nuclei in a view area for manual and automatic analysis Examination
mT – average
pT – average
Histopathology
190
219.2
BAC
76.2
72.3
Imprint
69.5
62.1
This occurs because the nuclei in the photo taken at specified magnification have practically the same area. This allows analyzing pictures even in the case, when the segmentation failed, i.e. the algorithm did not correctly disjoin overlapping objects.
3. THE SPOTVIEW SYSTEM In SPOTVIEW system inspected images are taken by a digital camera with macro function, that enables 20x magnification. The image segmentation is carried out by binarization with the use of adaptive thresholding [4]. For all found connected components, the following measures are computed: area – a value proportional to number of pixels in one component is computed with use of the scale info; perimeter – the value is evaluated by contour extraction and is expressed only in pixel scale. This value helps to compute the shape descriptors, which are: roundness, compactness and aspect ratio. In this step the segmented image is filtered to rule out the components that do not seem to be spots: delete all components that touch the border of ROI; delete all components smaller than 0.03 mm; delete all components, for which the compactness is less than 50%. In fact, the spots are not exactly round as the stain distribution within the spot is not exactly uniform. Due to the compound nature of color distribution (actual secretion level, lateral diffusion and dissociation), it is clear that the amount of cytokine is not directly proportional to the spot area. To perform a more accurate measurement the author of the thesis introduced another descriptor called weighted area. The measure adds the intensity values of each pixel of original image but within
New image processing algorithms in computer vision systems …
59
individual connected component.
Wk :
x , y S k
Lmax l x, y n k Lmax l x, y Lmax Lmin Lmax Lmin
(1)
The author of the dissertation carried out a statistical comparison of the results obtained by our segmentation algorithm and the one used in the system ImmunoSpot®, courtesy of Cellular Technology Ltd, USA. Table 3 contains the number of spots (N) extracted from 10 images obtained by these two applications. The proposed algorithm seems to be more sensitive for small, hardly visible objects, which are omitted by commercial software. Table 3. The comparison of counts N and distribution parameters: mean value and variance 2 for 10 samples Sample no.
B4
B5
B6
C4
C7
C8
D6
D12
E8
G12
N Immunospot
459
31
306
589
37
399
319
14
289
114
N SPOTVIEW
773
310
647
1008 325
643
597
282
445
548
g. of fit 2 95% YES YES YES YES NO
YES YES NO
YES NO
g. of fit 95%
YES YES X
YES X
YES YES YES YES X
It results in large difference between N for some images. All analyzed images have been “manually” inspected for any segmentation errors: skipped spots or image noise/dirt classified as spots. Both programs did not recognize false spots, but some spots were misdetected (Fig. 4).
Fig. 4 The spots extracted from the image with use of: a) Immunospot®, b) SpotView
It can be seen that SPOTVIEW “perceives” tighter contours of the spots. In
60
Wojciech Bieniecki
Immunospot® output data are available as histograms of spot area rescaling the values to log mm2. SPOTVIEW, on the other hand, outputs the area values individually for each spot, but for compatibility it has been decided to build the histograms in the same way. For each histogram the mean and the variance are computed. Variances were statistically compared with Fisher-Snedecor variance test and for successfully compared pairs mean values have been tested with tStudent test. We can observe that despite some differences between the number of detected spots, the distributions (which are in fact the most important data) in 7 cases are fitted. Table 4 presents the comparison of spot area distribution if the smallest objects (diagonal less than 0.12 mm) were filtered out. Table 4. The comparison of counts N and distribution parameters: mean value and variance 2 for 10 filtered samples Sample no.
B4
B5
B6
C4
C7
C8
D6
D12
E8
G12
N Immunospot
226
6
110
257
17
202
124
7
156
49
N SPOTVIEW
277
8
149
302
20
226
152
8
175
60
g. of fit 2 95% YES YES YES YES YES YES YES YES YES YES g. of fit 95%
NO
YES YES YES YES YES YES NO
YES YES
The experiment was based on the assumption that Immunospot® does not detect the smallest spots. The results show that for medium and big spots both programs return the same area distributions. In Table 5 tests for „weight” and „area” measure for 10 images are showed. Table 5. Distribution parameters comparison for weight and area logarithm for selected samples Sample
B4
B5
B6
C4
C7
C8
D6
D12
E8
G12
N SPOTVIEW
773
310
647
1008 325
643
597
282
445
548
Weight - 2
0.63
0.144 0.617 0.416 0.431 0.732 0.622 0.334 0.648 0.676
Area - 2
0.63
0.171 0.386 0.395 0.235 0.603 0.368 0.202 0.701 0.462
Weight -
-3.08
-4.30 -3.76 -3.38 -4.35 -2.95 -3.68 -4.39 -2.86 -3.73
Area -
-2.4
-2.75 -2.46 -2.44 -2.69 -2.24 -2.45 -2.67 -2.15 -2.42
The weight, intuitively, better approximates the degree of immunological
New image processing algorithms in computer vision systems …
61
reaction. We showed that the area and weighted area distributions vary significantly, which suggests that the plain area is not an appropriate indicator of lymphocyte secretory activity. The comparison of the results obtained with SPOTVIEW against the results offered by Immunospot®, a respected system for ELISPOT image processing and analysis, allows claiming that the proposed image processing algorithms are appropriate for this application.
CONCLUSIONS PATO enabled the accurate analyses of hundreds of images in the Institute of Polish Mother's Memorial Hospital which conducts the research to evaluate the reliability of three different techniques of microscope slide preparation. Moreover, the system may be used as a valuable tool for routine examination, and than its advantage is speed. The time of a single image analysis may be reduced to few seconds, while the qualified lab assistant needs a few minutes. The reliable automatic segmentation is achieved for almost 90% of all tested images, which was proved statistically. Other, poor quality images, may be analyzed by evaluation of percentage of pixels classified as stained to all object pixels in the image. Such an evaluation is realizable only with use of the image analysis program. In the routine manual examination poor quality images are skipped. The second system developed as a part of the dissertation was SPOTVIEW, a system for segmentation and identification of images obtained in ELISPOT examination. The introduction of the new binarization algorithm enabled the analysis of all ELISPOT images. The software has been successfully compared to the commercial system, formerly used by the Department of Nephrology. The development of SPOTVIEW helped with conducting the research over monitoring the risk of grafted kidney outcome.
REFERENCES [1] Bieniecki W.: Oversegmentation avoidance in watershed-based algorithms for color images. Proceedings of IEEE International Conference TCSET 2004, Lviv-Slavsk, Ukraine 2004, pp. 169–172. [2] Bieniecki W., Grabowski Sz., Sekulska. J.: A system for pathomorphological microscopic image analysis. KOSYR 2003 Computer Recognition Systems, Wrocław 2003, str. 21–28. [3] Bieniecki W., Grabowski Sz.: Nearest Neighbor classifiers for color image
62
[4] [5]
[6] [7]
Wojciech Bieniecki segmentation. Proceedings of IEEE International Conference TCSET 2004, Lviv– Slavsk, Ukraine 2004, pp. 209–212. Bieniecki W., Grabowski Sz.: Multi-pass approach to adaptive thresholding based image segmentation. Proceedings of the 8th International IEEE Conference CADSM 2005, Lviv–Polyana, Ukraine 2005, pp. 418–423. Bieniecki W., Grabowski Sz., Sankowski D., Kościelska-Kasprzak K., Bernat B., Klinger M.: An Efficient Processing and Analysis Algorithm for Images Obtained from Immunoenzymatic Visualization of Secretory Activity. Proceedings of the 8th. International IEEE Conference CADSM 2005, Lviv-Polyana, Ukraine 2005, pp. 458–460. 3. Grabowski Sz., Bieniecki W. (2003): A two-pass median-like filter for impulse noise removal in multi-channel images. KOSYR 2003 Computer Recognition Systems, Wrocław, pp. 195–200. Wójcicki D., Bieniecki W., Grabowski Sz., Kościelska-Kasprzak K.: Algorytmy przetwarzania wstępnego obrazów mikroskopowych w badaniu aktywności wydzielniczej limfocytów. Krajowa Konferencja Sieci i Systemy Informatyczne, Łódź, październik 2005.
NOWE ALGORYTMY PRZETWARZANIA OBRAZÓW W WIZYJNYCH SYSTEMACH KOMPUTEROWYCH WSPOMAGAJĄCYCH DIAGNOSTYKĘ PATOMORFOLOGICZNĄ Streszczenie W artykule zaprezentowano najważniejsze osiągnięcia autora pracy doktorskiej. Celem pracy było zaprojektowanie dwóch systemów automatycznego przetwarzania obrazów dla potrzeb biomedycyny: system rozpoznawania komórek tkanki raka piersi i system badania komórek układu odpornościowego po przeszczepie nerki. Istniejące algorytmy przetwarzania obrazu okazały się niewystarczające do realizacji zadania. Autor pracy zaproponował zmodyfikowane algorytmy: usuwanie szumu impulsowego z obrazów barwnych, wieloprzebiegowy algorytm binaryzacji obrazu, metoda klasyfikacji pikseli i algorytm segmentacji oparty o metodę działów wodnych. Promotor: prof. dr hab. inż. Dominik Sankowski Recenzenci pracy doktorskiej: 1. prof. dr hab. inż. Andrzej Napieralski 2. prof.dr hab Kazimierz Wiatr