Boundary-to-Marker Evidence Controlled Segmentation ... - IEEE Xplore

4 downloads 0 Views 2MB Size Report
Boundary-to-Marker Evidence Controlled. Segmentation and MDL-Based Contour. Inference for Overlapping Nuclei. Jie Song, Liang Xiao, Member, IEEE, and ...
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015

1

Boundary-to-Marker Evidence Controlled Segmentation and MDL-Based Contour Inference for Overlapping Nuclei Jie Song, Liang Xiao, Member, IEEE, and Zhichao Lian  Abstract—This paper presents a novel method for automated morphology delineation and analysis of cell nuclei in histopathology images. Combining the initial segmentation information and concavity measurement, the proposed method firstly segments clusters of nuclei into individual pieces, avoiding segmentation errors introduced by the scale-constrained Laplacian-of-Gaussian filtering. After that, a nuclear-boundary-to-marker evidence computing is introduced to delineate individual objects after the refined segmentation process. The obtained evidence set is then modeled by the periodic B-splines with the minimum description length principle, which achieves a practical compromise between the complexity of the nuclear structure and its coverage of the fluorescence signal to avoid the underfitting and overfitting results. The algorithm is computationally efficient and has been tested on the synthetic database as well as 45 real histopathology images. By comparing the proposed method with several stateof-the-art methods, experimental results show the superior recognition performance of our method and indicate the potential applications of analyzing the intrinsic features of nuclei morphology. Index Terms—Morphology delineation and analysis, concavity measurement, nuclear-boundary-to-marker evidence set, periodic B-splines, minimum description length

I. INTRODUCTION

T

he morphology analysis of cell nuclei in histopathology specimens is an compelling need for both biologists and medical scientists. According to extensive empirical studies in 2-D histological images, quantitative micrograph analysis highly depends on the surface morphology of nuclei, so precise measures of their morphology enable reliable characterizing of individual nuclei features. Capturing precise morphology requ-

This work was supported in part by the Natural Science Foundation of China under Grant 61171165, Grant 11431015, and Grant 61571230, in part by the Fundamental Research Funds for the Central Universities under Grant 30915012204, in part by the National Scientific Equipment Developing Project of China under Grant 2012YQ050250, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20150784, in part by the Research Innovation Program for College Graduates of Jiangsu Province under Grant KYLX15_0377, and in part by the Six Top Talents Project of Jiangsu Province under Grant 2012DZXX-036 J. Song is with the School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China (e-mail: [email protected]). L. Xiao is with the School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China; Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing University of Science and Technology, Nanjing 210094, China ([email protected]). Z. Lian is with the School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China (e-mail: [email protected]).

ires not only separating connected clusters of nuclei into individual pieces, but also inferring the missing partial contours occluded by overlaps, as well as providing a smooth but compactly supported representation of nuclear boundaries. Today's labor-intensive image analysis of cell nuclei in histological images is a significant bottleneck for efficient material characterization owing to the low accuracy of the image processing tools, and most current pathology diagnosis is based on the subjective opinion of pathologists. Although numerous related literatures have been sprung up, their performances are still unsatisfactory, due to unique features that the 2-D histopathology staining sections have. There are a new set of challenges in processing such images such as partially imaged nuclei, a large number of objects to be processed, special overlapping structures, and noise (see Fig.2(a) for an example). Partially imaged nuclei are caused by many factors such as incomplete cell nuclear staining, information loss in the process of sectioning, and superimposed nuclei structure. It is difficult to restore the whole nuclei content owing to the blurry or even nonexistent object imaging. It is worth noting that the missing content in 2-D case is not only around its boundary, also likely to occur near the center of the object, so a more robust method should be proposed to capture the correct contour. In addition, a large number of overlapping nuclei will have serious effect in the segmentation. Usually, a 2-D histopathological micrograph of cell nuclei we come across has hundreds to thousands of objects, and this certainly will cause a great percentage of nuclei densely clustered in the limited image space. Such complicated nuclear structure and complex cytoplasm really present great challenges for a majority of algorithms to recognize and segment individual nuclei from overlapping agglomerate accurately. Furthermore, the effect caused by the nonuniform staining of cell nuclei as well as noise, can also be obstacles in segmentation and lead to the excessive detection, missing detection, and false detection of nuclei. The main issue of this paper is to design an advanced segmentation and contour inference algorithm for large amounts of partially overlapping nuclei in the real histopathological images. In recent years, many studies have been proposed for nuclei segmentation from microscopic images. A comprehensive review for nuclei detection, segmentation, and classification can be found in [39], in which the advantages and disadvantages of existing approaches are presented and quantitative performances about these methods are also given using different evaluation parameters. Here, we choose a line of research to briefly review that directly related to multiple nuclei

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015 segmentation, which is necessary to decompose the clusters into individual ones. There are three classes of technologies commonly used for such a purpose: active contour models, marker-controlled methods, and model-based segmentation algorithms. To segment multiple objects, active contour based on level sets [1], multiphase active contour [2], and integrated active contour [3] are proposed. In addition, geometric active contours based on level sets in [4] are becoming increasingly popular because they neither require any explicit parameterization nor suffer from any constraints on the topology as snakes, and could be used for the separation of overlapping nuclei. In this approach, each cell is represented by its own level-set function and a coupling constrain prevents neighboring contours from overlapping each other. Recently, Plissiti and Nikou [43] proposed a deformable model driven by physical principles, helping to delineate the borders of overlapping nuclei. The main concern for this line of methods is to overcome the undersegmentation errors when an image region contains large amounts of overlapping objects, as well as demanding computation. Marker-controlled methods usually use predefined markers as approximate locations of objects and segment an image region into several influence zones of markers. The watershed algorithm is the most classical segmentation scheme among them, which is a way of automatically separating or cutting apart particles that touch. However, watershed segmentation only works well for the smooth convex objects that don’t overlap too much, and usually leads to oversegmentation. To handle the problems, the h-minima/h-maxima transform [5]-[7] could be applied to suppress undesired minima or maxima. Another technique to detect the markers before segmentation process involves Laplacian-of-Gaussian (LoG) filter and its variants [8]-[10]. Especially Al-Kofahi et al. [10] combine the LoG filter with automatic and adaptive scale selection to better improve the accuracy of marker locations. More recently, a modified ultimate erosion process is developed by Park et al. [11], to decompose a mixture of particles into markers. In their work, each marker corresponds to one particle and the occlusion characteristics of clustered particles can be revealed through inferring the missing contours between them. However, this method does not work well in histopathology image because of unsatisfactory signal-to-noise ratio. Model-based algorithms are another class of cell segmentation methods, which decompose overlapping nucleus clusters into individual nuclei by constructing a model on priori information about nuclear properties. A large set of these algorithms uses a convexity property of connected component to guide nuclei segmentation, including rule-based approach [12], non-parametric concavity analysis [13], ellipse fitting technique [14], [41], and Delaunay triangulation [15]. Furthermore, in the work of [42], Cloppet et al. introduce a prior information about the usual shape of cell nuclei, in order to optimize the selection of markers from which the flooding will start, during the watershed-based segmentation. In contrast, Arslan et al. [16] rely on modeling cell nucleus boundaries for segmentation, and use the boundary primitives for nucleus localization and region growing (segmentation). Recently, an

2

Fig. 1. Flowchart of the proposed nuclear segmentation method

elegant set of algorithms with respect to iterative radial voting are presented in [17], and have been used in several papers [18], [19]. They locate nucleus centers by having pixels iteratively vote at a given radius and orientation specified by the predefined kernels. The model-based methods generally work well, but they do not provide any inference on the occluded parts. This paper presents an automatic algorithm framework for large-scale nuclei segmentation in 2-D histological images that addresses the challenges mentioned above. In the first stage, we segment each nucleus from clusters of nucleus by extracting the boundary information of a single nucleus from each connected component. The marker detection is one of the key techniques to the segmentation stage for cell nuclei, and in this paper we adopt the multiscale Laplacian-of-Gaussian (LoG) filter, with the addition of adaptive scale selection. The concavity measurement is then applied on the result of LoG filter response. To extract the desirable split lines and reasonable markers inside the nuclear clumps, we define three types of split point candidates on the binary silhouette of histological nuclei image, and search for the optimized split point pairs based on iterative erosion operations to refine the detected split lines. In the second stage, the boundary evidence set for each nucleus obtained by the nuclear-boundary-to-marker association calculation is used to estimate the latent contour with a periodic B-spline regression model based on the minimum description length (MDL) methodology. Fig. 1 illustrates the flowchart of our proposed method. Benefitting from this flexible segmentation framework and MDL methodology, we are able to infer the occluded parts, and take a tradeoff between the structure complexity of each nucleus and its data coverage on the occluded one. Furthermore, with additional preprocessing such as cytoplasm separation and spectral unmixing, our method is applicable to both histopathological images with and without cytoplasm. The experiments on both of these real histopathological data demonstrate our method achieves outstanding segmentation accuracy and outperforms the existing approaches. II. METHODOLOGY The proposed method aims at segmenting overlapping nuclei and inferring the contour of each nucleus. Thus, the beginning of our method is to separate the connected region of nuclei into

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015

3

(a)

(b)

(c)

(d)

(e)

(f)

the minimum error thresholding. As expected, the binarization refinement removes irregular noise effectively, while retaining the details of nuclei. Thus, it enables us to detect the markers correctly in the next steps. The following part in this subsection describes how to extract the rough markers for the nuclei based on the optimized binary silhouette. Since the direct use of multiple LoG filter may cause that a cluster of two or more small nuclei is detected falsely as a single larger blob that may also encroach on smaller blobs in its vicinity [10]. We exploit shape and size cues available in the Euclidean distance map D(x, y) to constrain the maximum scale values σmax as σMAX. Specifically, given a binary image generated by the above method, the responses of the scalenormalized LoG filter are computed across scales {σmin, σmin+1,..., σMAX}, and then a response surface R(x, y) corresponding to the largest LoG response is chosen, which can be expressed by a convolution formulation:

R  x, y  

(g)

(j)

(h)

(k)

(i)

(l)

Fig. 2. Illustrating the proposed methods: (a) original subimage, (b) obtained binary image, (c) distance map, (d) 2-D surface plot of response R(x, y), (e) initial segmentation based on R(x, y), (f) detected split lines, (g) intersection points shown in magenta, (h) concave points shown in red, (i) three types of split point candidates with the joint points shown in green, (j) refined markers, (k) boundary evidences for each nucleus, (l) complete contour inferences.

individual pieces, which in this work contains two steps: marking each nucleus with a marker and obtaining the nuclear boundary evidences based on the marker. After that, a smooth contour representation for each nucleus is achieved using a periodic B-spline fitting model and a MDLbased prior is introduced to infer the complete contour of cell nuclei describing nuclear structures such as shapes, sizes, and orientations. A. Rough Marker Extraction using Distance-Map-Constrained Multiscale LoG Filter As illustrated in Fig. 2(a), blurred boundaries and intensity variations exist in inter and intra nuclei due to several reasons such as nonuniform absorption of the stains, effect of the tissue fluid, microscopic imaging, and spectral unmixing. These variations cause great difficulties in detecting marker accurately and obtaining a desirable binary silhouette as shown in Fig. 2(b) from digitized histopathological images. Therefore, based on the characteristics of histological specimen, a method described in [10] is applied to binarize the image based on the graph-cuts algorithm [20]-[22], with automatic learning using

max

 [ min, MAX]

LoG

norm

 x, y;    I  x, y 

(1)

where LoGnorm( x, y;  )   2(2g( x, y; ) / x2 2g (x, y; ) / y 2) , is the scale-normalized LoG filter, g(x, y; σ) is a Gaussian kernel.  MAX  max{ min, min{ max, 2 D(x, y)}}. I represents the original nuclei image. The R(x, y) response can be considered as a topographical surface whose peaks indicate the locations of nuclear markers, shown as Fig. 2(d) in 2-D form. Using R(x, y) and detected markers as inputs to a local-maximum-based clustering algorithm [24], we obtain an initial segmentation result, as illustrated in Fig. 2(e). B. Split-Point-Candidates-based Segmentation Refinement The initial segmentation boundaries extracted by the abovementioned process easily produce several types of segmentation errors, since the method used fails over heterogeneous clusters of nuclei with irregular shapes, and weak separating edges. Therefore, it is not enough to only depend on the R(x, y) information for delineating the latent edges of individual nuclei when nuclei are clustered in a histological image, and further refinement with prior knowledge about nucleus properties is required. In this section, we first define the split point candidates by incorporating concavity measurement into the initial segmentation results. Then, according to the defined split point candidates and our predefined constraints, we propose a novel heuristic method to detect the optimized split point pairs with an iterative morphological erosion algorithm, and thus determine the final refined split lines. The flow of proposed method is illustrated in Fig. 3 and the details are given in the following. 1) The Definition of Split Point Candidates: Firstly, given a set of split lines obtained from R(x, y) (refer to Fig. 2(f)) and the boundaries of connected regions in a binary silhouette found by Canny method, we denote the intersection points where the split lines cross the boundary as the first type of split point candidates, shown in Fig. 2(g). Secondly, we observed that reasonable split lines are always located between two points with the deepest concavity. Hence, we introduce a concavity

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015

4

Fig. 3. Flowchart of the split-point-candidates-based segmentation refinement with iterative erosion algorithm

C1 R

C2

P2 P3

P1

C3 O (a) (b) (c) (d) (e) (f) Fig. 4. Illustrating the reason why we use the iterative morphological erosion to split the nuclei clumps. (a) a simpler case of the synthetic image. (b) concavity-points based clump splitting without iterative erosion operations, where Pi denotes a concavity point here, and (c) with iterative erosion operations. (d) another synthetic image depicting the scenario of clump with holes. (e) initial concavity-points of (d). (f) clump decomposing with iterative erosion operations.

measurement to extract the second type of split point candidates called concavity points (as shown in Fig. 2(h)), which are pixels on the boundary of each connected region Rbin in the binary silhouette with the largest perpendicular distances from the corresponding edge of convex hull O [25]. Specifically, we formulate the definition process of concavity points as follows: let C be a set of connected subregions obtained by O/Rbin, and satisfy:

C  i Ci , s.t. Ci  O  Rbin ; i, j, Ci  C j  

(2)

From the above definition, C is composed of several parts {Ci}, and these parts do not overlap with each other, as illustrated in Fig. 4(a). If we denote the boundary of ith connected subregion Ci by Bi, a concavity point can be defined by Pi  arg   Bi  R, Bi  O   arg max d  x, Bi  O  x Bi  R

 arg max min

x Bi  R y Bi  O

x y

2 2

(3)

where Bi  R is an arc of closed curve Bi, and Bi O is a straight line segment being chord of Bi. We denote the maximum distance from Bi  R to Bi O as (Bi  R, Bi  O) and the minimum distance from x to Bi O as d ( x, Bi  O) here. Finally, we define the third type of split point candidates as joint points (green-colored points in Fig. 2(i)), each of which is both intersection points and concavity points. 2) Iterative Morphological Erosion: Methods on concavitymeasure-based segmentation usually split a clump by joining pairs of concavity points. When the clumps are irregular or complex, however, it is common for these methods to suffer

from undersplitting (refer to Fig. 4(b)). Another problem is that they depend on several user-defined parameters to detect the corresponding point for each concavity point. This results in their performances degrade since it is difficult or even impossible to optimize the parameters in practice. Finally, an important issue that is not addressed well in earlier methods using concavity measurement is that there tend to be some holes within the clump when the number of objects in the clump increases (see Fig. 4(e)). Therefore, when finding the split lines, undersplitting will occur if these holes are not taken into account. Here in this section, we propose a novel method for refined segmentation based on the automatic iterative morphological erosions. The morphological erosion is performed by applying the Minkowski subtraction to the binary image Ibin with respect to a closed ball of radius one [26] in  2 , which is equivalent to peeling off I bin from its boundary by one pixel. For each connected region Rk of I bin at tth iteration, we first compute the respective maximum size of concavity and erode a region if its maximum concavity is greater than Tero (refer to Fig. 3), but (t ) (t 1) stop otherwise. The iterations continue until I bin .  I bin Fig. 4 illustrates the advantages of clump splitting with the iterative erosion algorithm. For the first case, the missing split point would be found when the nuclear region in Fig. 4(b) erodes into two disconnected regions in Fig. 4(c), so that the undersplitting can be avoided. For the second case, we are able to detect more concavity points at the boundaries of corresponding holes after several erosion operations, and split the clump correctly with the initial concavity points in Fig. 4(e), as illustrated in Fig 4(f). Thus, the iterative algorithm can help us to find more correct split lines since a connected region will become several regions after a number of erosion iterations, so

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015

5 13th iter.

×

×

×

×

12th iter.

6th iter.

8th iter.

21st iter.

18th iter.

15th iter.

× × Intersection Point Concavity Point Joint Point

(a) 1st iteration (b) 1st iteration (c) 15th iteration (d) 21st iteration Fig. 5. Illustrating the construction of split point candidates in different iterations and refinement procedure based on the split point candidates in these iterations. In the first row, the optimized pairs that satisfy three constrains in Section II-B3 are indicated as gray solid lines and the others as dashed form. In the second row, the initial and refined split lines are shown in magenta and black, respectively. Notice that the third column includes the search procedures of 6th iteration, 8th iteration, 12th iteration, 13th iteration, and 15th iteration, and the line between two optimized split point candidates corresponds to the refined segmentation determined by each iteration. Similarly, all the correct split lines are identified after the 35th iteration.

that the number of concavity points increases. 3) Optimized Split Point Pairs Search: Based on the defined split point candidates and three predefined constraints introduced in the following paragraph, we propose a novel heuristic method to detect the optimized split point pairs with an iterative morphological erosion algorithm. In each iteration, three types of split point candidates are detected on a basis of their definitions. Afterwards, the split point pairs are searched for on each connected region to identify the reasonable split lines. The refined segmentation extracts only split point pairs that satisfy three constraints, leaving the others to later iterations, which results in greater segmentation precision. In this algorithm, let SPC be a set of split point candidates, and Eini represents edges generating from the initial segmentation result. To construct a valid split line between a split point pair (p, q), p, q ∈ SPC, we introduce the following constraints: 1) A pair of split point candidates forming a candidate line are located in high concavity regions. 2) Such two point candidates are found in proximity as close as possible. 3) Such two subregions: Ci and Cj are oppositely aligned. Note that an optimized split point pair should satisfy all of the above three criterions. Thus, taking the Fig. 4(b) as an example, one point pair (P2, P3) can be easily identified as the optimized split point pair for the given nuclei clump. The introduction of the intersection point can contribute to quickly locating ideal concavity points when several pixels on the boundary of a connected subregion have concavity maxima after some erosion operations. The proposed method for the optimized split point pairs search is illustrated in Fig. 5. We start with searching split point pairs between two intersection points and then deleting the corresponding point candidates and edges from the point set SPC and edge set Eini, respectively (as illustrated in Fig. 5(a)), with the purpose of reducing the computational cost of searching greatly. After that, we identify the

point pairs for each connected region that satisfy the three constraint conditions, form a straight line joining these two points, and update the set SPC and Eini. The searching process at each new iteration continues until no one aforementioned pair that satisfies the constraints remains, whereas the nondetermined pairs would be left in the sets for the next iteration. Fig. 5(c) and Fig. 5 (d) illustrate the optimized split point pairs search in different iterations. For example, "6th iter." in Fig. 5(c) represents that two split point pairs can be identified as the optimized ones at 6th erosion iteration. Notice that, a new concavity point may be detected after several erosion iterations since there are more than one convex hulls at that time, and thus an intersection point would be updated to a joint point. As seen from the second row of Fig. 5, we generate refined split lines to separate nuclei clumps into individual pieces, and are finally able to compute new locations of markers through the average values of individual refined nuclear boundaries (see Fig. 2(j)). C. Boundary Evidences Computation for Each Nucleus Once the refined markers are obtained, the boundary evidences for each nucleus can be defined based on a nuclearboundary-to-marker association, to infer the complete contour of each nucleus, as illustrated in Fig. 2(k). Given N markers: {m1, ... , m N}, and all the boundary pixel coordinates from the binary image, denoted by b = {b1, ... , bn}, n  N , we first apply the Euclidean distance to measure the association between b and a certain mj. We look for the boundary pixels close to the relevant marker and employ these minimum distances to define the distance measure. Note that a mask should be defined from the binary image before the calculation of this measure, to eliminate the effect of irrelevant nuclear boundaries but close to mj: f bi  m j  I bin  1, B(bi , m j)     , otherwise

(4)

f where bi ∈ b, and I bin denotes the foreground of binary mask.

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015

m308

bi

6 D.MDL-Based Contour Inference

 θ307 -gr θ308

m307 ed

(a)

(b)

(c)

Fig. 6. Illustrating (a) the definition of nuclear-boundary-to-marker evidences with (b) only distance measure; (c) distance and angle measure.

This characteristic function will be 1 when the line between bi f and mj is entirely located in I bin , and ∞ otherwise. Then, the distance measure normalized to (0,1] can be defined as:



dist  b, m j   1 1  Rcri   B(bi , m j)  ed  b, m j  

2



(5)

where ed (b, mj)  minbib || bi  mj ||2 is the Euclidean distance between a boundary pixel b i and the marker mj, and Rcri  1 / 150 is a threshold being set according to the critical value of nuclear radii. Fig. 6(b) illustrates the result of distance measure for a selected nucleus cluster. However, if considering the defined distance only, it may cause some errors. Taking the black circle in Fig. 6(b) as an example, some pixels in the boundary of 308th nucleus are miss-classified into the 307th nuclear boundary. Hence, to avoid such error, we introduce a second measure called angle measure, which defines an angle between the  direction  gr ( bi ) of negative intensity gradient at b i and the  direction li (mj , bi ) of line from mj to b i. The usage of a negative gradient is based on an observation that the image intensity of nuclear regions is higher than the one of the background in our histopathology images so that the direction of negative gradient at b i diverges from mj. Since the contours between any two overlapping nuclei are concave as mentioned in Section II-B,   the angle between  gr ( bi ) and li (mj, bi) is lower than the angle   between  gr ( bi ) and li (mk , bi) if b i is located at the boundary of jth nucleus, as shown in Fig. 6(a). Therefore, we can define the angle measure, normalize it to (0,1] as follows:      gr  bi   li  m j , bi    ang  m j , b   min    bi  b  gr  bi   li  m j , bi   2 

    1   2  

2 (6)

Thus, nuclear-boundary-to-marker evidence sets can be calculated by combining the distance measure and angle measure, as illustrated in Fig. 6(c). In practice, these two terms can be weighted differently to adjust the tradeoff between them. For this, we introduce an additional parameter α for such adjustment. The α-adjusted association measure is given by:

The methods mentioned above provide reliable split results for nuclei clumps and enable us to identify each nucleus accurately. However, there still remain three huge challenges to perform automated contour inferring for each nucleus with high accuracy, as illustrated in Fig. 7. Firstly, only partial contour of each nucleus can be obtained since the nuclei touch or overlap with each other. In addition, a jagged contour appears around the edge of nuclei clump due to original images containing blurred nuclear boundaries and intensity variations. Furthermore, various nuclear intrinsic features cannot be captured by a fitting model with global parameters although the jaggedness can be alleviated by this model. Hence, it is necessary for us to propose a novel method to solve the aforementioned issues. Given N targeted markers corresponding to all the nuclei in a histological image, we have a set of ni (1 ≤ i ≤ N) edge points as the boundary evidences for the ith nucleus, denoted as {b i,1, bi,2,..., bi,ni}, where bi,j = (xi,j, yi,j)T∈  2 , and bi,j is a 2×1 vector. To achieve the expected effect (refer to Fig. 7(e)), the contour for a nucleus is required to be smooth, so an application of periodic B-spline fitting to these evidences is naive. We want to infer a contour, fitted to the known evidences and restricted by some prior knowledge. Since the advantages of MDL principle [27]-[36], we propose a MDL-based periodic B-spline representation of contours, with adaptive estimation of parameters. 1) Periodic Model Description: Typically, the spline contour on a digital image is a discretized curve, and if we now have determined the final optimal location and length of spline parameters, then a periodic B-spline can be represented through data sampling among the knots. Given a periodic set of knots: {ti,0 < ti,1 < … < ti,l} ⊂ [ti,0, ti,l] for the ith fitted contour of cell nucleus, and the knot sequence equispacedly sampled to be a ni×2 vector sΘi, j = {(xi,j, yi,j), 1 ≤ j ≤ n i}, where  Θ is an optimal set of parameters estimated by MDL principle, and n i >> m, a coordinate on the periodic spline curve can be calculated as

  j t  t    j  t  t   sΘi, j  [xi, j yi, j]   x  ti,0  i,l 1 i,0  y  ti,0  i,l 1 i,0  (8) ni ni      To build periodic spline model, we assume the initial bending function is  k,d (t) inside each interval [ti,d, ti,l-d], and periodic  ) (per) version is k(,per  (t) with ti,j = ti,j mod l d (t) . k, d (t)   j  k  j  ti,l ti,0  ,d

[36] still satisfy with the condition of so-called partition of the ) (per) unity property:  k(,per d (t)  1, k,d (t)  0 . Therefore, a linear k

system denoted by l periodic bending function for the l-knots periodic fitted curve can be written in matrix form as l 1

rel  m j , b     dist  b , m j   1     ang  m j , b 

(7)

 s iΘ  t     k,per d  t  c i, k  Β i ci

(9)

k 0

Fig. 2(k) shows a result using the above method for the case when α=1/2. After above process, we obtain a set of boundary evidences for each nucleus, and next we would infer the complete nuclear contours using MDL-constrained periodic B-spline curves based on these boundary evidences sets.

) where Bi is a 2ni×2l matrix with k(,per d (ti,0  j (ti,l 1  ti,0) ni )I 2 as

its (j, k)th submatrix, and ci is a 2l×1 column vector containing the coordinates of control points. In real problems, however, {l, ci} are unknown and need to be chosen optimally. Based on the

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015

7

(a)

(b)

(c)

(d)

(e)

Fig. 7. Complete contour inference based on the MDL-constrained B-spline model. (a) nuclei clump detected from the image, (b) overlaid with the binary edges and markers in white. (c) generated boundary evidences for each nucleus. (d) initial fitting result, and (e) final inferred contours constrained by MDL criterion illustrated in blue, overlaid with the initial fitting result shown as red curve in dashed form.

observed boundary evidences of nuclei, we can directly specify the model degree, and cubic B-splines are typical in this work, We bind all the relevant parameters in a vector Θ and will seek the optimal model for each nucleus according to adaptive selection of control points in the following subsection. 2) MDL-Constrained Spline Model Configuration: For the time being, assume that the knot vector is known, and we determine an adaptively optimal fitted spline based on a MDL criterion. Now we have a coordinate sequence of boundary evidences Xi = {b i,j, 1 ≤ i ≤ N, 1 ≤ j ≤ ni} as the observed data. The MDL principle says that, the encoding scheme to be used is based on a model s Θi (t) with the corresponding Θ estimated from X i, and the encoding of Xi can be done in two parts [28]. The first part is the parameter description length L(Θ), to encode the fitted model s Θi (t) , and the second part is the data description length L(Xi | Θ), which is to encode X i conditioned on L(Θ). Hence, the total description length (TDL) is defined by TDL(Xi, Θ) = L(Xi | Θ)+ L(Θ), and the joint MDL estimate of Θ can be written as  Θ  arg min L  X i Θ   L  Θ  Θ





(10)

where the former on the right hand side is the deviations of the fitted curve from the observed data Xi, and is equivalent to -log p(Xi | Θ) [29]. L(Θ) represents the concise degree of the spline. Suppose that b i,j ∈ Xi is a noisy observation of s Θi (t) at a parameter value x( ti,0 + j(ti,l-1 - ti,0)/n i ) of periodic B-spline, and let εi,j denote the error between them. To reduce computation, we assume that the errors are independent and identically distributed zero-mean Gaussian variables with variance σ 2. Then, the likelihood of s Θi (t) given Xi is







p  Xi Θ   p Xi ci l ,  2   Xi Βi l  ci l ,  2I 2ni







ni 1 log  2 2   Xi  Βi l  cil  2 2 2

 ci(l) such that | Β(il)  c(il)  Βi(l) c(il) |  εi , and we assume that ||εi || γ , so γ is the precision for spline curve Βi(l)c(il) . Usually, γ=1, since Βi(l)c(il) is a discretized curve as described above. Thus, we have l   l  (l ) Βi l  c i  Βi ci

2

(12)

2

 Βil 





l  (l ) c i ci

l  (l) ⇒ c i c i







γ

γ Β i l 

(13) 

) where || Βi l  ||  max   k(,per , due to d ( ti,0  j (ti,l 1  ti,0)/ ni )  1 k j

the partition of the unity property. Hence, the control point description length precision is γ || Βi(l) || [31]-[33]. Obviously, each nucleus is located inside a specific subregion of image plane, and using the pixel integers of entire image to represent every control points (e.g. for ith nucleus) is unreasonable. Thus, we create a local window for each nucleus according to the information provided by the convex hull of the corresponding evidence set. Specifically, given a convex hull, we can easily estimate the length h and width w of local window size of corresponding nucleus by finding the maximum and minimum x (y) values of such a region, then computing the differences between the maximum x (y) value and minimum x (y) value, respectively. Suppose that the size of a window for ith nucleus is h i×wi, and the MDL estimate for spline parameters can be

  γ L  ci l ,  2, l    log 2  hi  wi   log 2   l    Βi l   hi  wi   Β i      log 2 l γ  

    l  

  log 2  hi  wi    l

(11)

where superscript l is used to emphasize that there are l control points. The corresponding log likelihood can be log p Xi ci l ,  2  

compute the complexity. Let  ci(l) represent an approximation of

(14)

which guarantees the concise contour regulated by the control points. Substituting (12) and (14) into (10), we can find the minimum value w.r.t. unknown l, c(il) , and σ2, and write as

   arg min L  c ,  , l   max log p  X c ,   c ,  l ,     (l) i

l

2

i

l

l

2

 c(i l) ,  2

i

i

2

(15) As regards the parameter description length described in the second term of (10), it consists of l control points, and each of them is assumed to have the same description length when we

This problem can be solved by the alternating minimization framework (See Algorithm 1). Fixing the parameter l, we can

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015

8

obtain the optimal value: l  c i 

1

Β  Β  Β  X   X  Β Β  Β    Β  X 2

i

T

l

l 

i

l 

l 

i

i

T

T

l 

i

i

l 

i

1

i

l 

2

T

i

ni

i

(16)

2

Fixing the parameter c(il) , and σ2, we can simplify the corresponding log likelihood term max (l) 2 {log p( Xi |ci(l),  2 )} .

(a)

ci , 

Thus (15) can be deduced by



l  arg min  log 2  hi  wi    l  ni log 2 l



2  l  



(17)

2(l ) is a function of parameter l. Since  2 is affected by Here,  l, the direct derivation to calculate the minimum of l is infeasible. Hence, we adopt a heuristic method to obtain this minimum as follows: given a set Ω = {lmin ,..., lmax}, starting (l) 2 from lmin to lmax, we in turn calculate the value of c i ,  (l) and

the corresponding TDL based on (15). Then, we can produce a set of contour estimations. With these, we can compute a serial TDL denoted by {TDL(l)}ll max min and find the minimum with respect to l according to (15). Fig. 8(b) plots the total description length TDL(Xi,Θ) = L(Xi | Θ)+ L(Θ) in (17) for each given nucleus (with a different region size) when the number of control points varies. The smoothed contour estimation (blue curves) in Fig. 8(a) correspond to the minimum points of the plot in Fig. 8(b), representing an optimal tradeoff between the model's conciseness and its fitting quality. Notice that, from the above Fig. 8, two conclusions can be drawn. First, the boundary evidences are insufficient, thus for a 2 (l) ) but given ni and hi×wi, increasing l decreases ni log 2 (  increases (log2(hi·wi))·l dramatically. Second, the more evidences for a nucleus we have, the more control points we need. The MDL-based estimation on a periodic B-spline model described in Algorithm 1 makes inferring missing boundaries and capturing special nuclear structures possible. However, one key element should not be missing, that is to determine the direct correspondence relationship between the observed data (boundary evidences) and the phenotypic value points of spline. As (8) implies, we can take an equispaced sample of the optimal estimated knots. An example of final contour inference (in blue) for individual nuclei are shown in Fig. 2(l) with results (in red dashed form) obtained without using MDL criterion. As seen, MDL-based B-spline model infer the complete smooth contours adaptively while capturing the nuclear structures. III. EXPERIMENTS AND RESULTS A. Applications to Benchmark Set of Synthetic Images To test and validate the proposed method, we first used the benchmark set of synthetic images of fluorescence-stained cell populations with realistic properties generated with the SIMCEP simulation tool [37]. They were generated using the package of files, downloadable from: http://www.cs.tut.fi/sgn/

(b) Fig. 8. (a) Boundary evidences (red points), marker (black point),control points (small squares) and estimated spline (blue solid line), for each nucleus. (b) Plots of the description lengths (minimum at 8, 5, and 6).

Algorithm 1 Algorithm for the MDL-constrained Periodic BSpline Inference (for each nucleus) Algorithm Input: Nuclear boundary evidences X i, an initial Θ0 , local window size: hi, wi, and Ω  {lmin ,..., lmax} . Algorithm Output: The inferred complete contour sΘi (t) . Initialization: Set d = 3, estimate a contour sΘi 0 (t) , then let k=0 Iterations: Step (1): Build Βi(lmin) , and compute a Kronecker product as: (lmin) Βi  I 2 . Step (2): Compute the optimal parameters using the ML esti2 || Xi  Βi(lmin) ((Βi(lmin))T  mate:  ci(lmin)  ((Βi(lmin))T Β(ilmin) )1 (Βi(lmin) )T Xi and  Β(ilmin) )1 (Βi(lmin) )T Xi ||22 ni based on sΘi k (t) . (lmin) 2 (lmin) , compute and record the Step (3): Given c and  i corresponding TDL from n n  2 (lmin)  TDL  lmin    (log 2 (hi  wi))  lmin  i  i log 2 (2)  ni log 2  2 2   (lmin) and 2 and estimate a contour sΘi k 1 (t) using lmin, c .  (lmin)





i

Step (4): Update lmin = lmin+1, and go back to Step (1). Termination criteria: lmin = = lmax+1. Step (5): Compute l  arg min{TDL(lmin),..., TDL(lmax)} . Step (6): Generate an inferred complete contour sΘi (t)  Βi ci (l ) 2( l ) . for ith nucleus according to the estimated l , c , and  i

csb/simcep/, with different parameters. The entire set contains simulated images of cell populations with the corresponding ground truth images in which the cells are represented as binary superpixels for validation. To simulate the real microscopic images and test the performance of our method, we created an image set which contains images with cell clumps of varying sizes and complexity. Hence, the generated images consist of overlapping cells with three different values of clustering probabilities as well as additive noise. For each clustering probability, we used three different values for the amount of

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015

9

(a)

(b)

(c)

(d)

Fig. 9.1 Result of our proposed segmentation method. (a) synthetic microscopy image containing cell clumps from SIMCEP simulation tool with cellular clustering probability = 0.4 and amount of overlap = 0.2; (b) detected markers (shown in red) using the proposed split-point-candidates-based method; (c) boundary evidences for individual cells calculated by the nuclear-boundary-to-marker association; (d) results of the MDL-constrained contour inference.

(a) (b) (c) (d) Fig. 10. Results of '>300 nuclei', (a) original nuclei image, (b) detected markers (red points) (c) nuclear boundary evidences; (d) MDL-constrained contour inferring. TABLE I THE ESTABLISHED BENCHMARK SYNTHETIC IMAGE SET CONTAINING CELL CLUMPS WITH INCREASIN G CLUSTERING PROBABILITY AND AMO UNT OF OVERLAP FROM THE SIMCEP TOOL.

Parameter Probability of clustering Amount of overlap Number of image sets Images per set Cells per image Total number of cells for one set

Value 0.4, 0.5, 0.6 0.2, 0.3, 0.4 9 50 200, 300 12500

overlap and simulated 50 images (altogether 450 images constituting 9 image sets), each of which either contains 200 cells or contains 300 cells, with approximately 5 cell clumps per image. Table I shows the necessary parameters and values to be used for generating the image set, and Fig. 9 illustrates the results of our proposed method on a synthetic example. 1

You can increase the zoom value to 400% to see greater detail of the results proposed by our method, similar to Fig. 10 and Fig. 11.

B. Applications to Real Histopathology Micrographs In this section, we firstly evaluated our proposed method on 5 real histopathology data obtained from FARSIGHT toolkit (www.farsight-toolkit.org)where cytoplasm is already removed using spectral unmixing, containing nearly 2000 nuclei in all. In order to evaluate the practicable performance of proposed method, we categorized these real histopathology images into three groups according to their amounts of nuclei: ">200 nuclei", ">300 nuclei", and ">600 nuclei". The degree of overlap for each case is also different. For example, the images of ">200 nuclei" have slightly touching among nuclei, but the cases of ">300 nuclei" overlap more severely that makes overlapping structures be the salient regions. The main results of proposed method for the case ">300 nuclei" are displayed in Fig. 10 as an example. In addition, to better test the robustness of proposed algorithm, we performed a quantitive evaluation on a subset of 10 whole slide images (WSIs) of kidney renal clear cell carcinoma

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015

10

Fig. 11. Sample segmentation results on the whole slide images of kidney renal clear cell carcinoma.

(KIRC) from TCGA data portal (https://tcga-data.nci.nih.gov/ tcga/) where the cytoplasm is present in the image. Except for the above-mentioned challenges, color variations exist in inter and intra nuclei in these images due to nonuniform absorption of the stains, stains fading, and other factors [40]. Therefore, spectral unmixing [10] was firstly used to obtain the histopathological images with cytoplasm removing. Then our method is applied to get the final results. Fig. 11 illustrates the qualitative results of our proposed method on four tiles. The detailed results for all the real histopathology data will be attached in the supplementary file. C. Performance Evaluation of Segmentation The proposed split-point-candidates-based segmentation refinement correctly recognize the total number of nuclei and detect one marker per nucleus for almost all the cases although different noise levels exist. As illustrated in Fig. 9(b) and 10(b), the red-colored points imply the detected markers for individual nuclei using the proposed refinement method. In general, when the nuclei drastically overlap, it is difficult to detect the concave points correctly for most existing concavity-based methods. Without the help of iterative erosion operations, the undersegmentations may be occurred since the concavity are not constantly updated. On the other hand, the termination criteria of our algorithm for searching the optimized split point pairs assures that the final erosion result produces only one connected region for each nucleus, thereby avoiding oversegmentation of nuclei. The nuclear-boundary-to-marker association method better suits our needs since we are going to infer a contour for each nucleus after calculating their corresponding refined markers. Boundary evidences are first extracted by Canny’s edge detection method, and then they were associated with the markers detected using the procedure in Section II-B. The results calculated by combining distance measure and angle measure are consistent with the visual results although some noise edge pixels have been classified as valid contour evidences, which is illustrated in Fig. 9(c) and 10(c). As seen from Fig. 10(d) and Fig.11, the usage of MDL-based periodic B-spline model looks reasonable, although a very few nuclei are not appropriately treated. The contour inferring results match well with the original images, so as to not only capture the desired nuclear structures, but estimate the satisfactory occlusion degrees among the nuclei. In the cases where the boundary evidences are not sufficient, the MDL criterion forces the estimated missing contours to deviate from the correct nuclear structures less as much as possible, and makes

the recovered contours look good. We want to emphasize that such prior also achieves an optimal compromise between the algorithm performance and computational complexity. The proposed method was applied on all the image sets consisting of both synthetic images as well as real histopathology micrographs. For synthetic images, we evaluated our method using precision and recall analysis, and compared the segmentation results produced by the proposed method with the corresponding auto-generated binary masks (ground truth database). We obtained true positives (TP), false positives (FP), and false negatives (FN), and the precision (PR) rate and recall (RC) rate are then computed through PR= TP TP+FP  , RC = TP TP+FN 

(18)

where the PR implies the percentage that of the cells detected by the proposed method are actually the ones of the corresponding ground truth image. RC in this context is defined as the percentage of the objects of the ground truth image are detected by our method. In addition to average precision and recall, the average F-Measure are compared over the entire ground truth database as the harmonic mean of precision and recall, with the F-Measure defined as: F1  2   PR  RC 

 PR  RC 

(19)

The obtained performance parameters for 18 image sets are demonstrated in detail in Table II where the subscripts 200 and 300 represent the synthetic image containing 200 cells and containing 300 cells, respectively. As seen, the proposed method performs accurately when the probability of clustering and the amount of overlap are small. On the one hand, when the amount of overlap increases to 0.30, the increase in probability of clustering causes degradation in the performance of proposed method. On the other hand, as the number of cells increase, more cells marked in the ground truth cannot be detected by our method, especially when the value of amount of overlap is 0.40. Nevertheless, the worst F-measure obtained by the proposed method is 0.914, proving the high performance of our proposed method. To take full advantage of real histopathology images, the proposed framework is evaluated in two steps according to the sources of data. For FARSIGHT data, we evaluated the produced segmentation results visually and quantitatively with the common metric for quantitative evaluation:accuracy, which measures how close the segmentation is to the gold standard. In

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015

11

TABLE II PERFORMANCE PARAMETERS FOR THE ESTABLISHED IMAGE DATABASE OBTAINED BY THE PROPOSED METHOD . IN SET200, THE ENTIRE SYNTHETIC DATA SET CONTAINS 45000 CELLS WHILE IN SET300, THE TOTAL NUMBER OF CELLS IS 67500. T EXT IN BOTH COLUMN “S ET200” AND “SET300” IS INTERPRETED AS IMAGE SET , PROBABILITY OF CLUSTERIN G, AND AMOUNT OF OVERLAP IN TURN. Set200

PR

RC

F

Set200

PR

RC

F

Set200

PR

RC

F

S04,02

0.998

0.997

0.997

S05,02

0.997

0.993

0.995

S06,02

0.997

0.990

0.993

S04,03

0.996

0.986

0.991

S05,03

0.996

0.974

0.985

S06,03

0.994

0.942

0.967

S04,04

0.993

0.951

0.972

S05,04

0.992

0.923

0.956

S06,04

0.992

0.901

0.944

Set300

PR

RC

F

Set300

PR

RC

F

Set300

FP

RC

F

S04,02

0.996

0.991

0.993

S05,02

0.995

0.987

0.991

S06,02

0.993

0.985

0.989

S04,03

0.991

0.969

0.980

S05,03

0.991

0.952

0.971

S06,03

0.990

0.917

0.952

S04,04

0.990

0.903

0.945

S05,04

0.989

0.896

0.940

S06,04

0.989

0.849

0.914

(a) Original image (b) Proposed method (c) Vese and Chan [2] (d) Al-Kofahi et al. [10] (e) Park et al. [11] (f) Arslan et al. [16] Fig. 12. Nuclei segmentation results. The red-colored curves correspond to the contours generated by our proposed method and four other different methods. TABLE III COMPARISON OF SEGMENTATIONS BY THE PROPOSED METHOD AND REPRESENTATIVE METHODS ON NUCLEI IMAGES (AVERAGE VALUE OF THREE CASES) Nuclei segmentation Our proposed method The method of Al-Kofahi et al. The method of Park et al. The method of Arslan et al.

Correctly seg. 89.37% 85.23% 50.12% 68.10%

Overseg. 1.53% 3.73% 9.62% 18.46%

our work, we followed an approach similar to [38] to measure accuracy. To better demonstrate the segmentation performance of proposed method and the efficacy of clump identification, we compare our proposed method with four representative works using 5 real histopathology micrographs from FARSIGHT toolkit and list the results of them in Table III. For computing as many nuclei as possible in the method of Arslan et al, we took the primitive length threshold with tsize = 20, the percentage threshold used in the iterative search with tperc = 0.1, and the standard deviation threshold used in localization with tstd = 6. Examples of nuclei segmentations for these methods are displayed in Fig. 12. As seen, the result of Vese and Chan's method suffers from severe undersegmentation, and fails to detect the individual difference among overlapping nuclei with similar intensities since their algorithm is guided by image intensities. The segmentation of Al-Kofahi et al.’s algorithm performs comparably well but it also struggles with the case where the nuclei are aggregate because their work lies in the control of the LoG scale, so it cannot ensure the inexistence of neighborhood encroachment to prevent the undersegmentation and false-segmentation errors. The difficulty of the Park et al.’s

Underseg. 6.38% 5.21% 15.67% 3.31%

Miss-seg. 0.83% 0.03% 23.16% 3.31%

False-seg. 1.89% 5.80% 1.43% 6.82%

Correctly seg from clusters 89.17% 65.79% 66.09% 58.77%

algorithm applied in the histopathology images includes two aspects: first, comparably low signal-to-noise ratio and uneven intensity distribution inside and outside nuclei limit the number of detected nuclei, leading to unsatisfactory miss-segmentation. Second, ultimate erosion for convex sets used in their work fails in here since it is difficult to measure the size of concavity in some clustered places, so that a certain amount of undersegmentations appear. From Fig. 12(f) and Table III, in order to segment enough nuclei using Arslan et al.'s method, a lot of oversegmentations are inevitable because it is difficult to predefine the sizes of nuclei in real applications. Meanwhile, smaller value of the percentage threshold also yields more false detections. In contrast, the proposed method outperforms all of these four methods and is able to segment most touching nuclei correctly from the connected clumps, as shown in Fig. 12. Furthermore, our method can produce the complete contours of almost all nuclei through inferring the latent missing parts even in the complex clumps (such as a clump with holes) and the case where the image contrast is low. The performance assessments of our segmentation and others are shown in Tables III, corresponding to the average

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015

12

TABLE IV RESULTS (MEAN AND STANDARD D EVIATION IN B RACKETS) ON THE TCGA KIRC D ATA IN T ERMS OF THE TPR, FNR, FPR, AND DC Nuclei segmentation Our proposed method The method of Irshad et al. The method of Al-Kofahi et al.

TPR % 72.36 (5.83) 66.35 (9.75) 76.26 (4.90)

FNR % 27.64 (5.83) 33.65 (9.75) 23.74 (4.90)

FPR % 10.46 (8.31) 9.11 (9.27) 15.03 (10.91)

DC % 70.12 (4.58) 68.48 (5.20) 67.28 (5.92)

considered annotations from research fellows as GT. The quantitative assessments on TCGA KIRC data is listed in Table IV, and qualitative evaluation of our proposed methodology is shown in Fig. 11. Note that in addition to the performance improvement of nuclei segmentation, our proposed method infers the complete contour of each nucleus and captures the latent nuclear structures compared to the mentioned methods.

Fig. 13. Number of correctly segmented nuclei for '>200 nuclei', '>300 nuclei', and '>600 nuclei', respectively

value of three cases. A nucleus is said to be correctly segmented if it is matched to an annotated nucleus in the gold standard and vice versa, and a segmented nucleus A is matched to an annotated nucleus B if at least half of A's area overlaps with B. Notice that, the correct segmentation must be a one-to-one match. Likewise, an annotated nucleus is oversegmented if two or more segmented nuclei match the annotated nucleus, and annotated nuclei are undersegmented if two or more match the same segmented nucleus. In the meantime, a segmented nucleus is a false detection if it does not match any annotated nucleus and an annotated nucleus is a miss one if it does not match any segmented nucleus. Similarly, we could find the correct number of nuclei segmented from clumps. In Fig. 13, we give the bar graph of correct segmented nuclei for all three cases, i.e., '>200 nuclei', '>300 nuclei', and '>600 nuclei', which illustrates the segmentation performance of our method. The average of '>200 nuclei' consists of 240 nuclei, the average of '>300 nuclei' consists of 337 nuclei, and the average of '>600 nuclei' consists of 676 nuclei. For TCGA KIRC data, quantitative performance is assessed using the average Dice Coefficient (DC) values, measured as

DC = 2 ||AA((SS)|)|AA((GG)|)| with a specific threshold, where the minimal DC threshold is above 0.6. We also report the pixel-based true positive rate (TPR), false negative rate (FNR), and false positive rate (FPR) for each cell in pixel level using this DC threshold. The details of our algorithm evaluation on real histopathology data are explained in the following. In this experiment, we considered expert's annotations as ground truth and compared the proposed nuclei segmentation method with that of Irshad et al. [40] using different TCGA KIRC data, as well as that of Al-Kofahi et al. [10]. In terms of pixel-based nuclei segmentation, we achieve the best DC value of 0.7012, as compared to [40] which report the DC value of 0.6848 and [10] with the DC value of 0.6728 when we

D.Computational Complexity The average running time of our algorithm on real histopathology database (5 FARSIGHT data and 40 TCGA KIRC images) is calculated and compared with those of the existing segmentation approaches. ARGraphs-based method [16] spends the shortest time, 9 seconds, with mixed programming of Matlab and C, MPAC [2] 1026 seconds using Matlab code, the method in [10] 83 seconds with C++ programming, and the approach in [11] 311 seconds with Matlab programming. Although our method is not the fastest among the five methods, taking 456 seconds based on Matlab programming, it produces more accurate segmentation and complete contour inference. IV. CONCLUSION In this paper, a novel algorithm has been proposed for automatic segmentation of cell nuclei in histopathology images. The proposed method applies a distance-map-constrained multiple LoG filter for initial segmentation, and thereafter refine the segmentation by searching optimized split point pairs. The refined segmentation incorporates intensity information and concavity information, and thus avoids oversegmenting and undersegmenting problems. After the refinement process, nuclear-boundary-to-marker evidences are adopted to delineate the contour for each nucleus and a periodic B-spline curve with the minimum description length (MDL) is introduced to infer the contour for each nucleus. The final results provide smooth and complete nuclear contour representations, solving the issues caused by nuclei clump, jaggedness of nuclear boundary evidences, and various geometric shapes and sizes for nuclei in a histopathological specimen. Hence, the proposed MDL-constrained periodic B-spline model achieves an optimal compromise between the conciseness of spline and its fitting quality. Evaluated by a synthetic database and 45 real microscopy data containing hundreds to thousands of cell nuclei, the proposed method shows the superior performances compared with several state-of-the-art methods. Moreover, our method is more robust to noise in separating clustered nuclei, and provides a complete contour for each nucleus. Hence, we obtain a higher accuracy in nuclear delineation without any prior information. The handling of histopathology images is our principle

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015

13

interests in this paper. However, the method could be applied to other images for segmentation such as the image of overlapping cervical cells. Moreover, the proposed method can be applied to extract various intrinsic features of each nucleus, i.e., measurements quantifying aspects of morphology and/or appearance of objects in a single image. These intrinsic features will play an important role in further tasks such of morphology analysis as classification and detection. Thus, the automatic extraction of intrinsic measures for each object and morphology analysis could be a future research direction. REFERENCES [1] [2]

[3]

[4]

[5]

[6]

[7]

[8] [9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

T. F. Chan and L. A. Vese, “Active contours without edges,” IEEE Trans. Image Process., vol. 10, no. 2, pp. 266–277, Feb. 2001. L. A. Vese and T. F. Chan, “A multiphase level set framework for image segmentation using the mumford and shah model,” Int. J. Comput. Vis., vol. 50, no. 3, pp. 271-293, 2002. S. Ali and A. Madabhushi, “An integrated region-, boundary-, shapebased active contour for multiple object overlap resolution in histological imagery”, IEEE Trans.Med. Imag., vol. 31, no. 7, pp. 1448-1460, 2012 C. Zimmer and J. C. Olivo-Marin, “Coupled parametric active contours,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 11, pp. 1838–1842, Nov. 2005. J. Cheng and J. C. Rajapakse, “Segmentation of clustered nuclei with shape markers and marking function,” IEEE Trans. Biomed. Eng., vol.56, no. 3, pp. 741–748, Mar. 2009. C. Jung, C. Kim, S. W. Chae, and S. Oh, “Unsupervised segmentation of overlapped nuclei using bayesian classification,” IEEE Trans. Biomed. Eng., vol. 57, no. 12, pp. 2825–2832, Dec. 2010. C. Jung and C. Kim, “Segmenting clustered nuclei using h-minima transform-based marker extraction and contour parameterization,” IEEE Trans. Biomed. Eng., vol. 57, no. 10, pp. 2600–2604, Oct. 2010. T. Lindeberg, “Feature detection with automatic scale selection,” Int. J. Comput. Vis., vol. 30, no. 2, pp. 79–116, Nov. 1998. J. Y. Byun, M. R. Verardo, B. Sumengen, G. P. Lewis, B. S. Manjunath, and S. K. Fisher, “Automated tool for the detection of cell nuclei in digital microscopic images: Application to retinal images,” Mol. Vis., vol. 12, no. 105–107, pp. 949–960, Aug., 2006. Y. Al-Kofahi, W. Lassoued, W. Lee, and B. Roysam, “Improved Automatic Detection and Segmentation of Cell Nuclei in Histopathology Images,” IEEE Trans. Biomed. Eng., vol. 57, no. 4, pp. 841–852, Apr., 2010. C. Park, J. Z. Huang, J. X. Ji, and Y. Ding, “segmentation, inference, and classification of partially overlapping nanoparticles, ” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 3, pp. 669-681, Mar., 2013. S. Kumar, S. H. Ong, S. Ranganath, T. C. Ong, and F. T. Chew, “A rule-based approach for robust clump splitting,” Pattern Recognit., vol. 39, pp. 1088–1098, 2006. Muhammad Farhan, Olli Yli-Harja, and Antti Niemisto, “A novel method for splitting clumps of convex objects incorporating image intensity and using rectangular window-based concavity point-pair search,” Pattern Recognit., vol. 46, pp. 741–751, 2013. S. Kothari, Q. Chaudry, and M. D. Wang, “Automated cell counting and cluster segmentation using concavity detection and ellipse fitting technique,” in Proc. IEEE Int. Symp. Biomed. Imag.: From Nano to Macro, 2009, pp. 795–798. Q.Wen, H. Chang, and B. Parvin, “A Delaunay triangulation approach for segmenting clumps of nuclei,” in Proc. IEEE Int. Symp. Biomed. Imag.: From Nano to Macro, 2009, pp. 9–12. S. Arslan, T. Ersahin, R. Cetin-Atalay, and C. Gunduz-Demir, “Attributed relational graphs for cell nucleus segmentation in fluorescence microscopy images,” IEEE Trans.Med.Imag., vol.32, no.6, pp.1121-1131,2013 Q. Yang and B. Parvin, “Perceptual organization of radial symmetries,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2004, pp. 320–325. B. Parvin, Q. Yang, J. Han, H. Chang, B. Rydberg, and M. H. Barcellos-Hoff, “Iterative voting for inference of structural saliency and characterization of subcellular events,” IEEE Trans. Image Process., vol. 16, no. 3, pp. 615–623, Mar. 2007. H. Chang, Q. Yang, and B. Parvin, “Segmentation of heterogeneous blob objects through voting and level set formulation,” Pattern Recognit. Lett.,

vol. 28, no. 13, pp. 1781–1787, Oct. 1, 2007. [20] Y. Boykov and G. Funka-Lea, “Graph cuts and efficient N-D image segmentation,” Int. J. Comput. Vis., vol. 70, no. 2, pp. 109–131, Nov. 2006. [21] Y. Boykov and V. Kolmogorov, “An experimental comparison of mincut/max-flow algorithms for energy minimization in vision,” IEEE Trans. Pattern Anal. Mach., vol. 26, no. 9, pp. 1124–1137, Sep. 2004. [22] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 11, pp. 1222–1239, Nov. 2001. [23] J. Gill, H. Breu, D. Kirkpatrick, and M. Werman, “Linear time Euclidean distance transform algorithms,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 5, pp. 529–533, May 1995. [24] X. W. Wu, Y. D. Chen, B. R. Brooks, and Y. A. Su, “The local maximum clustering method and its application in microarray gene expression data analysis,” EURASIP J. Appl. Signal Process., vol. 2004, no. 1, pp. 53–63, Jan. 1, 2004. [25] A. Rosenfeld, “Measuring the Sizes of Concavities,” Pattern Recognit. Lett., vol. 3, no. 1, pp. 71-75, 1985. [26] R. Schneider, Convex Bodies: The Brunn-Minkowski Theory. Cambridge Univ. Press, 1993. [27] J. Rissanen, “Modeling by shortest data description, ” Automatica, vol. 14, no. 5, pp. 465–471. Sep. 1978. [28] P. Grunwald, J. Myung, and M. Pitt, “Advances in minimum description length: Theory and applications, ” Cambridge Univ. Press, 2004. [29] M. H. Hansen and B. Yu, “Model selection and the principle of minimum description length,” J. Amer.Stat.Assoc.,vol.96,no.454,pp.746–774, 2001. [30] T. J. Cham and R. Cipolla, “Automated b-spline curve representation incorporating MDL and error-minimizing control point insertion strategies,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 1, pp. 49–53, Jan., 1999. [31] T. C. M. Lee, “An introduction to coding theory and the two-part minimum description length principle,” Int. Statist. Rev., vol. 69, no. 2, pp. 169–183, 2001. [32] M. Figueiredo, J. Leitao, and A. Jain, “Unsupervised contour representation and estimation using b-splines and a minimum description length criterion,” IEEE Trans. Image Proc., vol. 9, no. 6, pp. 1075–1086, 2000. [33] D. Lolive, N. Barbot, and O. Boeffard, “Melodic contour estimation with b-spline models using a MDL criterion,” in Proc. SPECOM, 2006, pp. 333–338. [34] D. Lolive, N. Barbot, and O. Boeffard, “B-spline model order selection with optimal MDL criterion applied to speech fundamental frequency stylization,”IEEE J. Selected Topics in Signal Process.vol.4, no.3, Jun.2010. [35] X. Yuan, J. T. Trachtenberg, S. M. Potter, and B. Roysam, “MDL constrained 3-D grayscale skeletonization algorithm for automated extraction of dendrites and spines from fluorescence confocal images,”

Neuroinform, vol. 7, no. 4, pp. 213-232, 2009. [36] M. Flickner, J. Hafner, E. Rodriguez, and J. Sanz, “Periodic quasiorthogonal spline basis and applications to least squares over curve fitting of digital images,” IEEE Trans. Image Process.,vol. 5, pp. 71–88, Jan. 1996. [37] Antti Lehmussola, Pekka Ruusuvuori, Jyrki Selinummi, Heikki Huttunen, and Olli Yli-Harja, “Computational framework for simulating fluorescence microscope images with cell populations, ” IEEE Trans. Med. Imag., vol. 26, no. 7, pp. 1010-1016, Jul. 2007. [38] J. K. Udupa, V. R. LeBlanc, Y. Zhuge, C. Imielinska, H. Schmidt, L. M. Currie, B. E.Hirsch, and J.Woodburn, “A framework for evaluating image segmentation algorithms,” Comput. Med. Imag. Grap., pp. 75–87, 2006. [39] H. Irshad, A. Veillard, L. Roux, and D. Racoceanu, “Methods for nuclei detection, segmentation, and classification in digital histopathology: A review-Current status and future potential,” IEEE Rev. Biomed. Eng., vol. 7, pp. 97–114, 2014. [40] H. Irshad, L. Montaser-Kouhsari, G. Waltz, O. Bucur, J. A. Nowak, F. Dong, N. W. Knoblauch, and A. H. Beck, “Crowdsourcing image annotation for nucleus detection and segmentation in computational pathology: evaluating experts, automated methods, and the crowd,” in Pacific Symposium on Biocomputing, 2014, pp. 294-305. [41] X. Bai, C. Sun, and F. Zhou, “Splitting touching cells based on concave points and ellipse fitting,” Pattern Recognit., vol. 42, no. 11, pp. 2434– 2446, 2009. [42] F. Cloppet and A. Boucher, “Segmentation of complex nucleus configurations in biological images,” Pattern Recognit. Lett., vol. 31, no. 8, pp. 755–761, 2010. [43] M. Plissiti and C. Nikou, “Overlapping Cell Nuclei Segmentation Using a Spatially Adaptive Active Physical Model,” IEEE Trans. Image Process., vol. 21, no. 11, pp. 4568-4580, 2012.

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2015.2504422, IEEE Journal of Biomedical and Health Informatics

JBHI-00416-2015

14 Jie Song received the B.S. degree in computer science and technology from Nanjing University, Nanjing, China, in 2009 and the M.E. degree in computer science and engineering from Yunnan University, Yunnan, China, in 2013. He is currently pursuing the Ph.D. degree in computer science and technology at Nanjing University of Science and Technology,

Nanjing, China. His research interest includes biomedical image processing, machine learning, and pattern recognition.

Zhichao Lian received Bachelor degree and Master degree in computer science from Jilin University, Changchun, China, in 2005 and 2008 respectively. He received the Ph.D. degree from Nanyang Technological University in 2013. From 2012 to 2014, he was a Postdoctoral Associate in the Department of Statistics in Yale University. Dr. Lian is currently an Associate Professor with the School of Computer Science and Engineering, Nanjing University of Science and Technology in China. His research areas include biomedical image processing, pattern recognition, and neuroimaging.

Liang Xiao (M’11) was born in Hunan, China, in 1976. He received the B.S. degree in applied mathematics and the Ph.D. degree in computer science from the Nanjing University of Science and Technology (NJUST), Nanjing, China, in 1999 and 2004, respectively. From 2006 to 2008, he was a Postdoctoral Research Fellow with the Pattern Recognition Laboratory, NJUST. From 2009 to 2010, he was a Postdoctoral Fellow with Rensselaer Polytechnic Institute, Troy, NY, USA. Dr. Xiao is currently a Professor with the School of Computer Science and Engineering, NJUST. Since 2013, he serves as the deputy director of Key Laboratory of Spectral Imaging Intelligent Perception-Jiangsu Province of China. Since 2014, he serves as the second director of the Key Laboratory of Intelligent Perception and Systems for HighDimensional Information of Ministry of Education. His fields of interest include biomedical image processing, signal processing, image modeling and computer vision, machine learning, and pattern recognition.

2168-2194 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.