Cell identification in Differential Interference Contrast microscope images using template matching
D. Young1 , C.A. Glasbey2 , A.J. Gray1 and N.J. Martin3 1
Department of Statistics and Modelling Science, University of Strathclyde, 26 Richmond Street, Glasgow G1 1XH, Scotland, UK email:
[email protected] 2 Scottish Agricultural Statistics Service, JCMB, King’s Buildings, Edinburgh EH9 3JZ, Scotland, UK email:
[email protected] 3 Department of Biochemical Sciences, Scottish Agricultural College, Auchencruive, Ayr KA6 5HW, Scotland, UK
Abstract Some issues are considered which arise when template matching is used to identify cells in differential interference contrast (DIC) microscope images. An automatic method of counting and sizing the cells in such images is proposed and two examples given. Template matching is complex due to the nature of DIC images, namely the way in which light is captured across the image. Further complications arise due to different cell sizes and orientations. The method used compensates for these problems.
1
Motivation
The work described here results from a need in research into high rate algal ponds, an environmentally important development in applied microbiology. These are simple, energy efficient, low-technology waste treatment systems (see eg. [3]). Achieving optimum efficiency of such systems relies on knowledge of the biomass of algae and bacteria in the mixed microbial populations of the pond. This is determined by viewing pond samples under a microscope, counting the number and measuring the size of cells and using standard formulae to estimate biomass from these measurements. The current methods for identifying, counting and measuring cells are at best semi-automatic and slow. No fully automatic method (eg. edge-detection algorithms) has so far proved successful. This is due to the complex nature of the microscope images eg. the presence of different types and shapes of cells, blurring from out-of-focus cells, presence of detritus material etc. Also, algal cells are typically clustered and/or overlapping, and a method is needed for accurately separating, identifying and counting individual cells in a sample, while ignoring noise. Overcoming these problems using image analysis will be the first step in developing automatic methods of estimating biomass directly from the microscope image. Ideally any method developed would be applicable to all microscope modalities - eg. brightfield, differential interference contrast (DIC) (see eg. [2], [8]), phase contrast and epifluorescence microscopy (see eg. [7]). Template matching will be investigated as a means of achieving this.
2
Introduction
Template matching methods (see eg. [4]) find matches of a sub-image with grey levels w(x, y) within an image of grey levels f (x, y), based on a ‘goodness of match’ statistic evaluated at each pixel in the image. Here the statistic used is the covariance, taking the value g(i, j) at pixel location (i, j) within the image, where
g(i, j) =
m X
m X
(f (i + k, j + l) − f¯(i, j)) (w(k, l) − w), ¯
k=−m l=−m
using a (2m + 1) × (2m + 1) template (ie. with ‘window size’ 2m + 1), centred at (i, j), and where w ¯ is the template mean grey level and f¯(i, j) is a local image mean evaluated over the area being matched. High covariance values indicate a good match.
3
Summary of Method
The stages below indicate the procedure developed for counting and sizing the cells in a DIC image (of most interest here): (a) semi-automatic part 1. construct templates corresponding roughly in size to the largest, middle-sized and smallest cells in the image. (Different templates are required for different types of cell). Choose a shape which corresponds approximately to the shape of the cells eg. circular algae, ellipsoidal yeast. Also record the size of the cells within these templates. 2. apply differencing to the templates in the approximate direction that the light hits the cells. Modify these templates if necessary by using weighted differencing or thresholding to match the cells in the image more closely (see below). Add Gaussian blur to the templates to correspond more closely to the image. (b) automatic part – cell identification 1. apply the largest template. If rotation is required (for non-circular cells) take the maximum value across rotations of the goodness-of-fit statistic at each pixel location. Choose cell centres as points with g(i, j) values greater than those of their 8 neighbours and greater than or equal to 80% of the maximum g(i, j) value. (Note that for some matching statistics, a good fit corresponds to a minimum, so values less than or equal to 20% of the maximum value could be chosen). If there is a difference of 200 or more between consecutive ordered statistic values then increase the cut-off value to cut above the lower value. 2. repeat the previous step for the middle-sized and smallest templates in turn. The algorithm automatically removes cells as they are identified.
– cell sizing 1. site a fuller range of templates at each cell centre, choosing the best fitting template to estimate the cell size. These templates are automatically created by the algorithm. For images in which the templates require rotating, a range of orientations is considered, to improve the accuracy of sizing. 2. output the number, size and location of the identified cells. The application and development of these ideas is considered using two examples of DIC images: Candida yeast cells (Section 4) and algal cells (Section 5).
4
Yeast Cells
Fig. 1 shows a 256 × 256 grey-level image of transparent Candida yeast cells, a relatively simple example chosen for algorithm development.
4.1
Template Construction
The yeast cell templates were constructed using knowledge of the theory of DIC microscopy [2]. The cells are assumed to be ellipsoidal - purely from visual, not biological, information. A template was constructed by assigning grey levels to pixels within an ellipse according to their distance from the edge, the centre pixel receiving intensity of 1, points at the edge and beyond, intensity of 0. A 63 × 63 template with major axis length 25 and minor axis length 15 was constructed using (1).
f (i, j) = 1 −
(
i σx
2
+
j σy
2 )
(1)
where σx = 25, σy = 15 and if f (i, j) < 0 then f (i, j) is set to 0. The grey levels were then scaled to the full range (255, 0). The result is shown in Fig. 2. To mimic the light in the DIC image, which is a first derivative image of optical specimen density, first-order differencing was applied to Fig. 2 at the angle at which the light hits the specimen. Here it appears to come from the top left, so differencing was done at 450 in this direction, taking w′ (i, j) = w(i, j) − w(i − 1, j − 1) as the differenced template value at location (i, j). (To adequately represent the cells in an image, it may be necessary to apply a weighting factor to the differencing - see Section 5). The light has quite a striking effect on the cells in the image (possibly because the cells are higher in the centre). This was reproduced in the template by thresholding grey-level values of the differenced template above 200 to white and below 3 to black. Gaussian blur (with variance 16/3) was also applied, chosen to correspond visually with the blur in the image. The final template is shown in Fig. 3. For cell identification three square templates of size 63, 49 and 39 pixels wide were used. They were rotated in steps of 100 from 00 to 1700 (to match roughly the
orientations of cells in the image), then differenced and thresholded. The maximum covariance value at each pixel, for each template, was recorded and this information used to extract the cell centres. The result of choosing points with g(i, j) values greater than those of their 8 neighbours, ordering these values and then cutting off at the level of the maximum difference is shown in Fig. 4, where a black dot represents a ‘centre’. The actual range of covariance values in this image is from −723.034 to 2221.59.
4.2
Cell Identification
The method identifies 20 cells in the image. The centres of the smaller cells appear to be fairly accurate, however the larger cells have multiple centres and a point has also been picked between two touching cells (see Fig. 4). A good fit corresponds to a high covariance indicating that the 11 points with the highest values should correspond to the 11 cell centres. Fig. 5 shows these 11 best fitting points. These do not all occur at the centres of cells. While this suggests that the templates may not be very good representations of cells, clearly they more closely match a cell than any other area within the image. The reason for larger cells giving multiple centres is that smaller templates have matched up better within a cell or even across a cell. Similarly, two touching cells may lead to a match which results in a ‘centre’ at or near their border (see Fig. 4). This reasoning was justified by applying the smallest (39 × 39) template to the image. In theory this should give maximum fits at the smallest cells. The range of g(i, j) values for this template is from −1347.76 to 2135.98. The points given by the highest three g(i, j) values are shown as black points on the cells in Fig. 1. The template correctly identifies the cell with one centre, at an angle of 900 . The cell with the two centres identified had two matches across the cell at angle 1500 , while the actual cell is rotated at approximately 700 . Any fully automatic method must allow for this. If smaller templates are not allowed to lie inside bigger cells the problem should not occur. The method proposed applies the templates individually (starting with the largest template), choosing centres after each template has been fitted then eliminating these larger identified cells from the image before running the next largest template. The criterion used for choosing centres was to record the maximum covariance values from the largest rotated templates. The points with covariance greater than that of their 8 neighbours were identified, and then reduced to those with values greater than or equal to 80% of the maximum. A further constraint was added to ensure that only the best fits were chosen, namely that if there was a difference in value of more than 200 between any of the top 20% of values then only points above this threshold were chosen. In this case the maximum value was 1688.54, giving a cut-off level of 1350. The maximum difference between the ordered values was 178 so the cut-off remains at 1350 and 5 cells are identified in the image. These 5 cells are then removed from the image, by centering an ellipse (equal in size to the ellipse part of the template) on these points and setting pixels within this ellipse to a grey-level similar to the average grey-level of the background (in this example taken as the average pixel value in the image ie. 157). Alternatively, the area of match could simply be masked out without modifying the image. The middle sized (49 × 49) template was applied to the remaining cells and the
same criterion used for choosing the centres. Cutting at the 80% level (covariance value 1529) identified a further 5 cells in the image, which were also then removed. The smallest (39 × 39) template was then applied and the remaining cells identified using the 80% rule with the difference of 200 between covariance values now being applicable. The image with all centres identified is shown in Fig. 6. This fully automatic method of identifying centres gives an accurate cell count. The centres were then successfully used as a basis for estimating cell sizes (results not shown) by fitting at the centres all templates from smallest to largest (width 35, 37, .., 63 pixels) rotated at angles 4o less to 4o more than the recorded orientation. The size is then estimated from the best fitting template.
5
Algal Cells
The method developed for the yeast image is now applied to a 512 × 512 DIC image of algal cells with 256 grey-levels. Templates were constructed similarly to the yeast cell template, but assuming that cells are circular (so rotation is unnecessary). The differencing was applied at 450 from the top right using a weighting factor with w′ (i, j) = w(i, j) − 1.5 × w(i − 1, j − 1), to more adequately represent the cells in the image. The algal cells are semitransparent and the weighting was chosen to remove the brightfield image component in the DIC image. Three sizes of template were now applied, namely squares of 35, 29 and 23 pixels wide. The results from the largest template, applying the ‘80% – 200’ rule, identified 9 cells (cut-off at 1915 ie. 80% of maximum covariance value). After removing these cells (average value of background taken to be 144), the middle sized template identified another two cells (using cut-off at difference of 200). The remaining two cells were found by the smallest template (cut-off at 80% value of 1694). All the centres are shown in Fig. 7. Cells were then sized using square templates of width 23, 25, .., 35 pixels.
6
Computational Issues
After construction (and rotation if required) of the appropriate templates to represent the cells in the DIC image, the method described is fully automatic. It is fairly accurate but very computer intensive. For example, the CPU time taken to apply the largest (35 × 35) template to the algal cell image was 896 seconds on a SUN SPARCstation 10. Various approaches have been taken in the literature towards speed up of template matching. Parallel processing architectures and algorithms for template matching are considered, for instance, in [11] and [12]. For non-parallel computation, Fast Fourier Transforms may produce faster computation of degree of match, depending on the chosen similarity criterion and on image and template size. This is currently being implemented. The number of computations may also be reduced by ordered matching. In [9] probabilistic information (eg. a grey level histogram) from the image and template
is used to order computation of the matching criterion so as to minimise the number of computations required before the threshold value is exceeded. Degree of computational saving depends on the shape of the grey level distribution. However this approach is not directly applicable to the examples considered here as the threshold is not fixed. A different approach to speed up is multi-stage matching. A two-stage approach is taken in [6], using a sub-template initially to determine good candidate locations for a match (ie with similarity measure values beyond a chosen threshold), then applying the whole template in the second stage only to locations identified by the sub-template. Degree of speed up (and matching success) depends on subtemplate size and the stage one threshold value. In [10] the initial stage is used to search for matches of a template feature which occurs only rarely in the image, exhaustive template matching then being applied in the vicinity of the identified feature locations. Hierarchical matching algorithms generalise these ideas to matching at multiple resolutions. See e.g. [1], which also considers template orientation, as does Goshtasby [5] which uses normalised invariant moments as similarity measures in a two-stage approach similar to that of [6]. The need to rotate templates adds greatly to the computational burden of the algorithm described in this paper. It would be useful to investigate the applicability of [5] to the algal and yeast images. The method described has also been applied to other microscope images, namely a 512 × 768 DIC image of algal and bacterial cells and two 512 × 512 brightfield images (without template differencing), one of rod-shaped algal cells and the other of an insect-like organism comprising four distinct but touching parts. The results were less successful for images in which objects overlap, and also disappointing for very blurred images. Further work will involve assessment of accuracy of sizing via simulation, after issues of computational efficiency have been addressed.
Acknowledgement The first author was supported by an Earmarked Studentship from the Engineering and Physical Sciences Research Council.
References [1] Anisimov, V.A. and Gorsky N.D. (1993). Fast hierarchical matching of an arbitrarily oriented template. Pattern Recognition Letters, 14, 95-101. [2] Cogswell, C.J. and Sheppard C.J.R. (1990). Confocal differential interference contrast (DIC) microscopy: including a theoretical analysis of conventional and confocal DIC imaging. Journal of Microscopy, 165, 81-101. [3] Fallowfield, H.J. and Martin, N.J. (1990). The operation, performance and computer design of high rate algal ponds. Institute of Chemical Engineering Symposium Series, 111. [4] Gonzalez, R.C. and Woods, R.E. (1992). Digital Image Processing, AddisonWesley, Massachusetts.
[5] Goshtasby, A. (1985). Template Matching in Rotated Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 7, 338-344. [6] Goshtasby, A., Gage, S.H. and Bartholic, J.F. (1984). A two-stage cross correlation approach to template matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 374-378. [7] Herman, B. and Jacobson, K. (1990). Optical Microscopy for Biology. Wiley, New York. [8] Holmes, T.J. and Levy, W.J. (1987). Signal-processing characteristics of differential interference contrast microscopy. Applied Optics, 26, 3929-3939. [9] Margalit, A. and Rosenfeld, A. (1990a). Using probabilistic domain knowledge to reduce the expected computational cost of template matching. Computer Vision, Graphics and Image Processing, 51, 219-234. [10] Margalit, A. and Rosenfeld, A. (1990b). Using feature probabilities to reduce the expected computational cost of template matching. Computer Vision, Graphics and Image Processing, 52, 110-123. [11] Prasanna Kumar, V.K. and Krishnan, V. (1989). Efficient parallel algorithms for image template matching on Hypercube SIMD machines. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11, 665-669. [12] Sid-Ahmed, M.A. (1990). Serial architectures for the implementation of 2-D digital filters and for template matching in digital images. IEEE Transactions on Acoustics, Speech and Signal Processing, 38, 853-857.
Figure 1: Yeast-cell image
Figure 2: Ellipse template
Figure 3: Final template with added Gaussian blur
Figure 4: Centres identified by the method
Figure 5: 11 highest covariance values
Figure 6: Centres of cells identified by proposed method
Figure 7: Algal cell centres identified