recognize and classify fish oocytes in histological images - CiteSeerX

2 downloads 348 Views 1MB Size Report
A Coru˜na (Spain) email: cernadas@dec.usc.es. P. Carrión and A. Formella ..... Thomsom Publishing (ITP), 1999. [8] W. K. Pratt. Digital Image Processing. Wiley ...
Proceedings of the Eighth IASTED International Conference Visualization, Imaging, and Image Processing (VIIP 2008) September 1-3, 2008 Palma de Mallorca, Spain

RECOGNIZE AND CLASSIFY FISH OOCYTES IN HISTOLOGICAL IMAGES E. Cernadas Dpto. de Electr´onica e Computaci´on Campus Sur s/n 15782 Santiago de Compostela A Coru˜na (Spain) email: [email protected]

P. Carri´on and A. Formella Dpto de Inform´atica, E.S.E.I. Universidade de Vigo Campus Universitario As Lagoas s/n 32004 Ourense (Spain)

obtaining a size distribution of the developing oocytes. In recent years, with the development of computer image analysis software, researchers on this field use stereological techniques [3] on histological sections to estimate fecundity and other reproductive parameters. Figure 1 shows an example of histological section of fish ovary and its correspondent set of oocytes of interest (annotated oocytes). The most often method used to count the oocytes is Weibel stereometry [4]. It counts the number of points of a fixed grid which lie upon sections of the object in question (oocytes). Weibel stereometry needs large numbers of sample points for reliable estimation, so that stereometry can be a tedious technique to apply in practice. Also the size distribution of the developing oocytes is required to account the variation of particles size that can introduce some systematic error in the method. For automating this process, each oocyte in the histological section must be identified; and the real diameter of the oocytes has to be estimated. Because the nucleus of the oocytes is always situated in the centre, only the diameter of those oocytes where the nucleus is visible should be used, while in oocytes where the nucleus is not visible the measured diameter underestimates the real one. Thus, for a full automation of the process the presence of nucleus of the oocytes should be also identified. Our attempt is the design of a computer vision system, which automatically recognizes and classifies oocytes in histological images and can run on–line. In the preliminary works [5, 6], we only focused on extracting the silhouette of oocytes. In this paper, we improve the method to recognize oocytes in the histological image and we propose an algorithm to discriminate between oocytes with visible nucleus and oocytes without visible nucleus. The paper is organized as follows: Section 2 describes the technical details of image acquisition. Section 3 describes the proposed method for the different stages of the system and Section 4 discusses the statistical results obtained. Finally, Section 5 draws the conclusions.

ABSTRACT The study of biology and population dynamics of fish species requires the estimation of fecundity in individual fish in many fisheries laboratories. The traditional procedure used by fisheries research is to count manually the oocytes on a subsample of known weight of the ovary, and to measure a few oocytes under a binocular microscope. This process can be done on a computer using an interactive tool to count and measure oocytes. In both cases, the task is very time consuming, which implies that fecundity studies are rarely conducted routinely. We attempt to design a computer vision system which is able to recognize and classify the oocytes in a histological image. The boundary of oocytes is detected using an algorithm based on edge information. Afterwards, oocytes are classified in cells with and without nucleus. A statistical evaluation of both stages reveals correct detection and classification of 65% when a 80% of overlap is demanded. KEY WORDS Image analysis, classification, segmentation, fish oocytes, fecundity, histological images.

1 Introduction The description of the reproductive strategies and the assessment of fecundity are fundamental topics in the study of biology and population dynamics of fish species [1]. Studies on reproduction, including fecundity, the assessment of size at maturity, duration of reproductive season, daily spawning behavior and spawning fraction, permit a quantification of the reproductive capacity of individual fish [2]. This information increases the knowledge that fisheries researchers need to improve the standard assessments of many commercially valuable fish species. To estimate fish female fecundity, researchers need to count the number of oocytes (i.e., ovarian cells precursors of the ovules) that are developing within the breeding season. There is a threshold diameter at which the developing oocytes could be separated from non developing ones, which varies from species to species. Thus, it is critical to know at what developing stage is the ovary analyzed. This is normally achieved measuring the oocyte diameter and

630-048

R. Dom´ınguez and F. Saborido-Rey Instituto de Investigaci´ons Mari˜nas CSIC, Vigo (Spain)

2 Image acquisition Ovaries were embedded and sectioned with standard histological procedures. The sections were stained with Haematoxylin-Eosin, which produces a wide range of

180

olution was set to 3.3 Mpixels (2088 × 1550 pixels). The camera produces squared pixels of 3.45 μm. The exposure time and color balance was set automatically.

3

Methods

The computer vision system analyses the histological images of the oocytes in the following stages: first, the oocytes of interest are recognized in a histological image; second, the cells being extracted in the previous stage are divided into cells with and without nucleus; and third, the number of cells of each type is counted and their diameters is measured. Recognizing the silhouettes of cells in the image implies a segmentation process to extract the boundaries of the cells. There is a broad literature on image segmentation [7, 8] available. The simplest and most popular image segmentation technique is thresholding [9]. Threshold methods often fail to produce accurate segmentation on images containing shadings, highlights, non–uniform illumination or texture. Other segmentation methods are based on two basic properties of pixels in relation to their local neighborhood: continuity and similarity. Pixel discontinuity gives rise to boundary–based methods [10] and pixel similarity gives rise to region–based methods [11, 12]. Recently, Mu˜noz et al. [13] review several strategies to integrate region and boundary information for segmentation and many algorithms were proposed in the literature [14, 15]. We have applied some of these proposals in our previous works [5, 6] on a limited dataset of images. The results were quite encouraging in relation to the correct detection of oocytes (approximately 75% of oocytes are recognized when an overlapping area of at least 75% between machine detection and true oocyte is demanded). The main drawback was the large number of noisy cells. The methods employed extract all cells in the image, but we are only interested in matured cells. The method needed to be improved to incorporate some criteria which allow to distinguish the matured cells among other kind of oocytes or artefacts. The rest of this section describes the methods we propose to implement the different stages of the computer vision system. 3.1 Recognizing cells

Figure 1. Typical histological image of fish ovary (upper) and its correspondent set of annotated oocytes with nucleus (middle) and oocytes without nucleus (bottom).

Taking into account our previous experience working with region and edge–based methods to detect oocytes in histological images [5, 6], we propose a method based on edge–based approaches. Region–growing approaches behave very good in relation to the precision of the the detection of the cells boundary. Nevertheless, the results depend strongly on the tuning parameters which need the algorithm to be applied (grey level thresholds and seed position). It is an important drawback for the design of a robust algorithm which could work with different samples or species. These reasons led us to focus only on the edge–

stained structures usually exhibiting good contrast. The images were obtained from the histological sections with R a LEICADRE research microscope with a total magnification of 6.3 and a calibration of 0.547619 microns R per pixel. A LEICADF C320 digital camera, directly connected to the microscope, combined with the Leica c was used to digitize the images. The camera resIM 50

181

based approach. The proposed algorithm improves the previous work, it works on color images instead of grey level images, adds a post–processing step to reduce the number of noisy cells, and improves the accuracy in boundary detection. Biologists estimate the fecundity from the number of oocytes with a diamater larger than a given threshold or s minimum diameter (D min ). As well, the objects we are interested to recognize in the image are of rounded shape, as it can be seen in Figure 1. From a statistical analysis of the cells annotated by one of the authors, expert on histology, we can conclude that the features of interest have a roundness lower than a threshold, which we call ρ smax . The roundness of a polygon P is defined as: ρP =

L2P α4LP AP

2.a). To compute the second–order derivative, we also use an implementation of the module gevd of VXL using default parameters and considering only the peaks (see Figure 2.b). Both methods incorporate a linking edge step and report the set of edges found. The set of edges given by both l detectors are added. Let S C = {Cjl , j = 1, . . . , NCl } be the set of edges detected in image I l . SPl is a set of polygons being Pjl the polygon defined by the points z k ∈ Cjl . 3.1.2 Analyse edges For all components of the image, many undesired edges are detected. They may be classified in two types: edges due to the appearance of textured inner parts of the oocytes; and the outlines of undesired cells (cells which are not of interest for fecundity estimation) or artefacts. Some heuristical rules are applied to remove the undesired edges using a prior knowledge of the application. This process is critical, because some interesting edges are broken and they may be confused with edges being noise. Taking into account the above ideas, we first apply a l filter to the set of edges, S C , to remove the noisy edges. This filter removes the edges, which are enclosed in a box of size d × d pixels. We adopt a conservative criterion to preserve true cell edges that are broken. The value of d is s determined from the application to d = D min /6. This step removes edges that enclose small areas normally associated to the textured inner content of the cells. Edges due to a long piece of the boundary of a cell of interest or to the cytoplasm may be preserved after this filter. Once some noisy edges are removed, broken edges should be linked in order to extract the silhouette of the cells of interest. For this objective, we propose to join polygons when their intersection is significant, i.e., let S Pl = {Pjl , j = 1, ..., NCl } be a set of polygons of component l, and let A(Pjl ) be the area enclosed by the polygon P jl . We define a new set of polygons S Pl˜ = {P˜ijl , i, j = 1, ..., NPl˜ } where P˜ijl is: ⎧ l if A(Pil ∩ Pjl ) > a × A(Pil ) or ⎨ Pi ∪ Pjl l A(Pil ∩ Pjl ) > a × A(Pjl ) P˜ij = ⎩ l otherwise Pi (2) where a is the overlapping rate. We used the value 0.05 for a. The value of this parameter is critical. Higher values of a do not link broken edges of the same cell and lower values of a may link edges belonging to different cells. Afterwards, convex hulls of all polygons of the resulting set are calculated to fill holes inside the cells (see Figure 2.c). Finally, a post–processing step is applied to remove s the cells whose diameter is smaller than D min pixels and s the roundness is greater than ρ max .

(1)

where LP is the length of the polygon contour, A P is its area and α is a correction factor which is fixed to 1.064. Another parameter is the minimum area of the cells with s nucleus Asn . All parameters (D min , ρsmax and Asn ) are intrinsic characteristics of each species s. The main steps of the algorithm to recognize oocytes are: 1. Capture of the color image of the histological sample. This process was decribed in Section 2. 2. Compute edges: Computes a set of edges, which are candidates for the contour of the oocyte, applying first and second–order derivative filters. 3. Analyse edges: Takes the edges as computed in the previous step as input, removes the noisy edges and links the edges that belong to the same cell. 4. Merge information: Merges the information provided by the different components of the image. 3.1.1 Compute edges Let I be a color RGB histological image and I l be a component of the original image. I l is a one plane image that could be a transformation of the original image or a simple plane of the color image. In our approach, l = r, g, b represents the red, green and blue band of the image. Let zk = (xk , yk ) be the coordinates of the k–th point in the image and Cjl the j–th edge of image I l for j = 1, . . . , NCl (Cjl contains the set of points z k that compose the j–th edge). Each color band I l is processed independently, and eventually the resultant edges of all color bands are merged. Inicially, first and second–order derivatives are applied to I l to detect the edges that are present in the image. To compute the first–order derivative, we use the implementation of the Canny filter provided by the module osl of the VXL library 1 with the default parameters (see Figure

3.1.3 Merge information In this step, the cells detected from each component of the image are merged. The following cases may occur in the

1 VXL is a collection of opensource C++ libraries designed to computer vision research (vxl.sourceforge.net).

182

a

b

c

d

Figure 2. An example of the visual performance of the algorithm on image tl–31–39 (the black overlays represent the detected edges): (a) edges after applying the first–order derivative filter; (b) edges after applying the second–order derivative filter; (c) after removing edges enclosed in a box of size d × d; (d) after removing cells with an area lower than 2d × 2d.

merging process: i) a cell is detected in all components but the accuracy of the cell outline varies among components; ii) there is an over– or under–segmentation in the detection of a cell in some component; and iii) a cell is detected in some component of the image and is missed is other ones. Let SPlˆ = {Pˆil , i = 1, . . . , NPlˆ } be the set of convex hull polygons that represent the cell boundaries of image I l . Let SPm = {Pjm , j = 1, . . . , C m }, m = 1, . . . , M be the sets of detected cell candidates to represent the true cell P m in image I. A detected cell Pˆil ∈ SPlˆ is included in the set S Pm , if it significantly overlaps any cell of S Pm , m = 1, . . . , M . Otherwise, a new set SPM+1 is formed containing the cell Pˆil ∈ SPlˆ . An overlap of 50% is demanded, i.e., A( Pˆil ∩ Pjm ) > 0.5A(Pˆil ) and A(Pˆil ∩ Pjm ) > 0.5A(Pjm ), j = 1, . . . , C m .

cell or cells. Two cases are distinguished: • If the number of cells in the set is lower than or equal to the number of components of the image, we assume that over or under–segmentation is not present and all the detected cells represent the same true cell. So, the detected cell with the lowest roundness is selected. • Otherwise, the set may contain more than one true cell. If ∃Pj , Pk ∈ SPm , k = j which verifies that Pk ∩ Pj = ∅, then both Pk and Pj are considered as detected cells. If various cells verify the criterion, we take the ones with lowest roundness. After this process, a set of detected cells S P = {Pi , i = 1, . . . , N } for image I has been calculated. Figure 3 illustrates the cells detected for each component of an image and the result after merging.

Once the sets SPm , m = 1, . . . , M for image I are built, each set is analysed to determine the outline of the

183

a

b

c

d

Figure 3. An example of the visual performance of the algorithm on image tl–31–39 (the black overlays represent the detected cells): (a), (b) and (c) show the detected cells in the red, green, and blue band of the image; (d) after merging the information from all color bands.

3.2

zi and grows until the grey level of the image is lower than threshold T i using the 4-connected neighborhood. Eventually, we take the convex hull of the resulting region. We assume that a cell has a nucleus if the area of the nucleus is greater than 5% of the cell area.

Classifying oocytes

Once the cells in the image are recognized, we need to discriminate between cells with nucleus and without nucleus. For this stage, we propose an algorithm to recognize the nucleus of the cell based on region-growing techniques. The tuning parameters seed point and threshold to grow the region are calculated from the detected cell and the statistical characteristics of the image. In this process, we only use the green band of the color image. Let Pi be the detected cell i of image I. The seed points are the controids z i of the detected cells. The similarity criterion to grow the region is determined from the estatistical characteristics of the detected cell. A threshold Ti = μi − σi /2 is calculated for each cell, where μ i is the mean grey level of pixels in cell i of image I g (we only use the green band of the image) and σ i is its standard deviation. The region–growing process starts at the seed points

4 Results and discussions 4.1 Data set Our data set contains 10 images of the species redfish. The total number of oocytes, identified manually, is 378 ranging from 27 to 53 between images. The outlines of the oocytes have been marked by one of the authors, expert on histology, to evaluate the performance of the segmentation algorithm. This author also classifies the oocytes in cells with and without nucleus. Figure 1 shows an image and its corresponding set of annotated oocytes.

184

4.2 Results and discussion

5 Conclusion

Before applying the method described in Section 3, some application dependent parameters must be fixed. In particular, the minimum diameter of the oocytes of the species redfish is set to 75 pixels, the maximum roundness of the cells is 1.2, and the minimum area of the cells with nucleus is 15000 pixels. The performance of the system is evaluated using statistical measures. In previous work [5, 6], we used the metrics proposed by Hoover et al.[16] (correct detection, over–segmentation, under–segmentation, noise instance and missed instance) to measure the performance of the segmentation algorithm. We proved that the number of over–segmentation and under–segmentation were insignificant for the edge–based approach. In this paper, we propose to jointly evaluate the performance of a segmentation and a clasification algorithm. We measure the instances that are correctly detected and classified, the instances that are correctly recognized but erroneously classified (misclassified), noisy instances and missed instances (see Appendix for definitions). We calculate a rate of instances for each image in relation to each tolerance T and average the results over all images. Figure 4 shows the rate of instances for all cells, cells with nucleus and cells without nucleus for different tolerances. A correct detection and classification of 65% is achieved when a 80% of overlap is demanded. For this tolorance, only 5% of cells are detected but misclassified and 15% of the cells are noise. We also tested the influence of wrong estimation of the cells with nucleus with area lower than 15000 pixels. If this criterion is not applied, the rates are preserved with the exception of the number of noisy instances which increases up to 42%. The performance of the proposed method is quite independent of whether the cells have or not have nucleus (see graphs b and c of Figure 4). The main difference is for the rate of noisy instances which is 4.2% for classifying the cells with nucleus and increase up to 30% for cells without nucleus. It implies that almost all noisy detected cells are classified as cells without nucleus. We also tested the performance of the method when only a single component of the color image (green, red, or blue band of color or grey level image) are used to detect the oocyte boundary. The best results are provided by the green band. Its behavior is similar to the one of color image but its rates of correct detection and classification are slightly lower. Many of the missed instances are due to the incomplete cells annotated in the borders of image (see the middle and bottom images of Figure 1). The histologists use those cells in their daily work to count the number of oocytes in a sample but not to meassure their diameters. For the computer algorithm, many of these cells are missestimated because they do not fulfill the roundness criterion. In the future, we will define a method to estimate fecundity without taking account the incomplete cells of the border of image. In this case, the rate of correct detection and classification will increase significantly.

We proposed a method to recognize and classify the matured oocytes in histological images. The algorithm encloses two stages: detection of matured cells, and their classification in cells with and without visible nucleus. Our current results on a limited dataset of images reveal a correct detection and classification of 65% when a 80% of overlap is demanded (although this rate is pessimistically biased, as we commented above). For this tolerance, only 5% of the cells are detected but misclassified and 15% of the cells are counted as noise. Given the current interest in fisheries management on a proper assessment of fish reproductive potential, the automation of the proccess of routine estimation of fecundity using estereological approaches is requested and this study has provided interesting tools on such direction.

Appendix: Statistical measures Let D i be the number of recognized cells, A i the number of annotated cells in image I i and C the number of classes in which the annotated cells are classified. Let the number of pixels of each recognized cell R di , d = 1, . . . , Di be called Pdi . Similarly, let the number of pixels of each annotated i cell Rai , a = 1, . . . , Ai be called Pai . Let Oda = Rdi ∩ Rai be the number of overlapped pixels between R di and Rai . i Thus, if there is no overlap between two regions, O da = 0, i i while if the overlapping is complete, O da = Ra = Rdi . Let a threshold T be a measure for the strictness of the overlapping (0.5 ≤ T ≤ 1). A region can be classified in the following types: • A pair of regions R di and Rai is classified as an instance i of correct detection if O da ≥ Pdi T (at least percentage T of the pixels of region R di in the detected image i overlap with Rai ) and Oda ≥ Pai T . – If the class in which R di is classified is the true class of its corresponding annotated cell, the cell is counted as an instance of correct detection and classification. – Otherwise, the cell is counted as an instance of correct detection but misclassified • A region Rai that does not participate in any instance of correct detection is classified as missed. • A region Rdi that does not participate in any instance of correct detection is classified as noise.

References [1] J. R. Hunter, J. Macewicz, N. C. H. LO, and C. A. Kimbrell. Fecundity, spawning, and maturity of female Dover Sole, Microstomus pacificus, with an evaluation of assumptions and precision. Fishery Bulletin, 90:101–128, 1992.

185

1

0.8

0.8

0.8

0.6 0.4 0.2 0

Rate of instances

1

Rate of instances

Rate of instances

1

0.6 0.4 0.2

0.6

0.7 0.8 Tolerance (%)

0.9

1

0

0.6 0.4 0.2

0.6

0.7 0.8 Tolerance (%)

a

b

0.9

1

0

0.6

0.7 0.8 Tolerance (%)

0.9

c

Figure 4. Rate of instances of correct detection and classification (∗), correct detection and misclassification ( ), noise () and missed (◦): a) all cells; b) cells with nucleus and c) cells without nucleus.

[12] J. Fan, G. Zeng, M. Body, and M-S. Hacid. Seed region growing: an extensive and comparative study. Pattern Recognition Letters, 26:1139–1156, 2005.

[2] H. Murua and F. Saborido-Rey. Female reproductive strategies of marine fish species of the North Atlantic. Journal of Northwest Atlantic Fishery Science, 33:23–31, 2003.

[13] X. Mu˜ noz, J. Freixenet, X. Cuf´ı, and J. Mart´ı. Strategies for image segmentation combining region and boundary information. Pattern Recognition Letters, 24:375–392, 2003.

[3] H. Murua, G. Kraus, F. Saborido-Rey, P. Witthames, A. Thorsen, and S. Junquera. Procedures to estimate fecundity of marine fish species in relation to their reproductive strategy. Journal of Northwest Atlantic Fishery Science, 33:33–54, 2003.

[14] E. Cernadas, M.L. Duran, and T. Antequera. Recognizing marbling in dry-cured iberian ham by multiscale analysis. Pattern Recognition Letters, 23:1311– 21, 2002.

[4] E. R. Weibel and D. M. G´omez. A principle for counting tissue structures on random sections. Journal of Applied Physiology, 17:343, 1962.

[15] Chen Guang Zhao and Tian Ge Zhuang. A hybrid boundary detection algorithm based on watershed and snake. Pattern Recognition Letters, 26:1256–1265, 2005.

[5] Comparison of region and edge segmentation approaches to recognize fish oocytes in histological images. Lecture Notes in Computer Science, 4142:853– 864, 2006.

[16] A. Hoover, G. Jean-Baptiste, X. Jiang, P. J. Flynn, H. Dunke, D. B. Goldgof, K. Bowyer, D. W. Eggert, A. Fitzgibbon, and R. B. Fisher. An experimental comparison of range image segmentation algorithms. IEEE Trans. on Pattern Analysis and Machine Intelligence, 18(7):673–689, 1996.

[6] Combining region and edge information to extract fish oocytes in histological images. In IASTED International Conference on Visualization, Imaging, and Image Processing, pages 82–87, 2007. [7] M. Sonka, V. Hlavac, and R. Boyle. Image Processing, Analysis, and Machine Vision. International Thomsom Publishing (ITP), 1999. [8] W. K. Pratt. Digital Image Processing. InterScience, 2001.

Wiley-

[9] P. K. Sahoo, S. Soltani, and A. K. C. Wong. A survey of thresholding techniques. Computer Vision, Graphics and Image Processing, 41:233–260, 1988. [10] J. F. Canny. A computational approach to edge detection. IEEE Trans. on Pattern Analysis and Machine Intelligence, 8(6):679–698, 1986. [11] R. Adams and L. Bischof. Seeded region growing. IEEE Trans. on Pattern Analysis and Machine Intelligence, 16(6):641–647, 1994.

186

1

Suggest Documents