Nonlinear Multiscale Graph Theory based Segmentation of Color Images I. Vanhamel and H. Sahli Vrije Universiteit Brussel, ETRO-IRIS Pleinlaan 2, 1040 Brussels, Belgium {iuvanham,hsahli}@etro.vub.ac.be
I. Pratikakis Institute of Informatics and Telecommunications, NCSR ‘Demokritos’ 15310 Athens, Greece
[email protected]
Abstract
tioning in the region adjacency graph (RAG) of an oversegmented image has been advocated in [14] to reduce the computational burden and to resolve particular problems concerning the approximation of the cut-algorithm. Multiscale graph cuts are proposed in [1], where an algebraic multigrid method is adopted for solving the normalized cut criterion efficiently. They employ recursive graph coarsening to produce an irregular pyramid encoding based region grouping cues used for region grouping. Cour et al. [3] apply the normalized cuts on a multiscale graph decomposition and in [8] a multiscale clustering followed by a bottom-up graph partitioning approach was presented. This paper investigates the application of graph-cuts on a graph that corresponds to the hierarchy of partitions that is obtained by a multiscale segmentation [15]. The main difference with graph partitioning applied to the pixel graph and the RAG is that this graph contains two very different types of arcs: the intra-scale and the inter-scale arcs. The former are similar to the arcs in the RAG and the latter connect the nodes of successive scales and are subject to a causality restriction. Figure 1.a shows the intra-scale scale arcs with thick lines and the inter-scale arcs with thin ones. This paper is organized as follows: Section 2 covers the creation of the multiscale RAG. We briefly summarize the methodologies and corresponding parameters. The valuation of the arcs in RAG is covered in Section 3. Section 4 introduces the multiscale graph cut algorithm. Section 5 gives experimental results and conclusions.
In this paper the issue of image segmentation within the framework of nonlinear multiscale watersheds in combination with graph theory based techniques is addressed. First, a graph is created which decomposes the image in scale and space using the concept of multiscale watersheds. In the subsequent step the obtained graph is partitioned using recursive graph cuts in a coarse to fine manner. In this way, we are able to combine scale and feature measures in a flexible way: the feature-set that is used to measure the dissimilarities may change as we progress in scale. We employ the Earth Mover’s Distance on a featureset that combines color, scale and contrast features to measure the dissimilarity between the nodes in the graph. Experimental results demonstrate the efficiency of the proposed method for natural scene images.
1. Introduction Image segmentation is an important task in computer vision that aims at partitioning the image into physically meaningful regions. There are several approaches to the segmentation problem, that have adopted tools from mathematical morphology, energy minimization, partial differential equations, graph theory based approaches or their combination. A graph theory based image segmentation consists of two main steps: (i) the graph creation/valuation and (ii) the graph partitioning. Usually these algorithms are applied on the pixel graph, where the nodes correspond to the pixels and the arcs to their connections. The weights associated to an arc expresses the (dis)similarity of the pair of nodes it connects. Early graph partitioning relied on thresholds and local measures, followed by methods based on concepts such as the minimum spanning tree (MST) [6, 16]. Recent methods include graph cuts such as the minimum cut [18], the normalized [14], and the average cuts [4]. Graph parti-
2. Multiscale region adjacency graph The creation of the multiscale region adjacency graph (MS-RAG) consists of three steps: (i) scale-space stack generation by vector-valued nonlinear diffusion filtering [15]. (ii) Linking: At each scale the vector-valued gradient is estimated using the gradient-tensor methodAt the localization scale, the watershed transformation is performed to identify the position of all the contours in the image. At the higher scales, the duality between the regional minima 1
0-7695-2521-0/06/$20.00 (c) 2006 IEEE
a connected component Bq of B in A at scale quantization level si+1 :
of the gradient and the catchment basins of the watershed is exploited to make a robust region-based parent-child linking scheme. These two steps form an iterative process which is completed when there is only one segment left or when a predefined criterion is satisfied. The resulting multiscale RAG (MS-RAG), depicted in Figure 1.a, is the combination of a multiscale tree and the corresponding RAGs at each scale. 8 7 3‘
8
7
4
3
5
2
1
3‘
5‘ 2‘
1‘
2‘‘
1‘‘
Any regional minimum of the set {msi }, that is spatially s projected on the geodesic influence zone izAi+1 (Bq ) at scale quantization level si+1 , will be linked with the res gional minimum mqi+1 . The projected minimum of the set s {msi } ∈ izAi+1 (Bq ), which is the closest to the minimum msqi +1 , is considered as the father. The remaining projected minima onto the same influence zone are considered annihilated. Closeness is defined with respect to the topographic distance which is a natural distance measure following the steepest gradient path inside the catchment basin. At the end of the linking step, for each couple of neighboring segments (Si , Sj ) that share a common border at the localization scale, a linkage list Λ(Si , Sj ) is constructed. The linking process is illustrated in Figure 1.b.
2
4
5
6
Intra-scale arc 6
s
izAi+1 (Bq ) = {p ∈ A, ∀j ∈ [1, k]/{q}, dA (p, Bq ) < dA (p, Bj )} (2)
1
6‘
6‘
5‘
4‘
2‘
4‘ 1‘
3‘
Inter-scale arc
3‘’
4‘’
4‘‘
3‘‘
2‘’
1‘’ 1‘’’
(a)
3. Graph valuation
1‘‘‘
(b)
Next step is to valuate the intra-scale arcs in the MSRAG, which requires the attribution of a feature set to the nodes of the localization scale. Let G = (V, E) represent the MS-RAG which consists of a set of linked RAGs, that are denoted by G(si ) = (V (si ) , E (si ) ). The set of links between the nodes of different RAGs, i.e. the inter-scale arcs E (Λ) , is given by the linkage list Λ(Si , Sj ). An inter(s ) (s ) scale arc between the adjacent nodes Sx i and Sy i+1 is (Λ,s ) denoted by a(x,y)i , and the set of nodes and intra-scale arcs
Figure 1. (a) Multiscale region adjacency graph and (b) Linkage list.
Scale-Space Generation. The scale-space stack is generated using vector-valued anisotropic diffusion [15]. It is based upon the regularized Perona and Malik anisotropic diffusion scheme [11] and it uses a system of coupled diffusion equations. The method adopts the Additive Operator Splitting numerical scheme(AOS) [17], which is computational more efficient than most other schemes. In its PDE formulation, it is given by: −1 ∇u(c) K2 (c) (1) 1+ ut = div |∇σ u|2 |∇u(c) |
at each scale are given by S (si ) ∈ V (si ) and a(si ) ∈ E (si ) . The nodes at the localization scale are attributed with the mean color-vector (m)) of the segment they represent and its area weighted color-contrast (macw ): P (s ) m(Sx 0 ) (s ) macw (Sx 0 )
(c)
where ut is the scale-space image of the cth image band, |∇σ u| is the regularized gradient and K is the contrast parameter that regulate the diffusion. Both K and σ are estimated using robust statistics on the gradient image profile. The diffusion process yields a discrete set of scales {us0 , us1 , . . . , usend } which are selected from a sampled time period of the diffusion times (t in Eq.1) [9]. Linking Scheme. The linking scheme aims to track the regional minima in the gradient through the scale-space. The linking process is applied using the approach proposed in [12]. The linking of the minima for successive scales is applied by using the proximity criterion [7]. This criterion is limited to projected minima of scale quantization level s si inside the same geodesic influence zone izAi+1 (Bq ) of
= =
(s ) p∈Sx 0
u(p,s0 ) (s0 )
Card(S P x
)
(s ) (s ) Sy 0 ∈N (Sx 0 )
(s0 ) |m(Sx )−m(Sy(s0 ) )| (s0 )
Card(N (Sx
))
(3) where N (S) is the local neighborhood of the segment S, Card is the cardinality of a set. Valuation by Down-projection. The intra-scale arcs at the localization scale are attributed with the dynamics of contours in scale space [12]First the dynamics of contours [10] at each scale are estimated. These measures are down-projected to valuate the gradient watersheds at the localization scale s0 . For each segment couΛ(S ,S ) Λ(S ,S ) ple (Si i j m , Sj i j m ), appearing at the branch m (scale quantization level sm ) of the linkage list Λ(Si , Sj ), m we compute the dynamics of contours DCΛ(S . It exi ,Sj )m presses how much contrasted the adjacent regions (Si , Sj ) 2
0-7695-2521-0/06/$20.00 (c) 2006 IEEE
are at the scale quantization level sm . The dynamics of contours at each scale are normalized according to the maximum dynamic at that scale and finally the dynamics of contours in scale-space (DCS) for the adjacent region couple (Si , Sj ) is defined as the sum for all normalized valuations during the evolution in scale-space: N −1 m=0
DCS(Si , Sj ) =
m DCΛ(S i ,Sj )m
max
Si ,Sj ∈sm
m DCΛ(S i ,Sj )m
subnodes. In our approach, we define an enhanced weighting factor which combines area with scale and global contrast, which can all be expressed by the valuation of dynamics of contours in scale-space. More precisely, the weighting factor is computed as follows: wqi = [wDCSi A(Ri )] .
(4)
EM D (Q, T ) =
i=1 j=1
MS-RAG partitioning is achieved by recursively applying the normalized cut criterion [14] and subgraph downprojection. The process is applied iteratively in top down fashion: first the cut criterion is applied to the RAG of the coarsest scale. The latter results in a set of components that consists of one or more spatially connected nodes, i.e. a subgraph. Each of these subgraphs is down-projected by following the inter-scale arcs that are connected to its nodes. The result of the down-projection is a refinement of the subgraph with nodes and arcs from the finer scale. To each of these down-projected subgraphs - individually - the cut criterion is applied and the obtained subgraphs are downprojected and used as input for the cutting in the next scale. This process continues until the localization scale is reached or until a stopping criterion is satisfied, which in our case is an indicator of the amount of segments the partitioning should contain. Subgraph down-projection. The down-projection of the (s ) (s ) (s ) graph G1 i = (V1 i , E1 i ) that corresponds to a component C1 resulting from the cut criterion entails two steps. In the first step all the nodes S (si ) ∈ C1 are projected to finer scale si−1 by following the appropriate Λ,s(i−1 ) (s ) (s ) inter-scale arcs: V1 i−1 = {Sx i−1 |∃a(x,y) ∈ E :
αij
where αij is the optimal admissible flow from qi to tj that minimizes the numerator of Eq. 5 subject to the following constraints: N
αij ≤ wq ,
M
αij ≤ wt j=1 i=1 N M N M αij = min wq , wt i=1 j=1
i=1
(8)
4. Graph partitioning
(5)
i=1 j=1
wDCSj A(Rj )
where wDCSi is area weighted contrast produced by the dynamics of contours in scale-space.
αij d (qi , tj )
N M
−1
j=1
where N denotes the branch of the linkage list Λ(Si , Sj ) where the contour formed by the region couple (Si , Sj ) is annihilated. Multiscale dissimilarity weight. The dissimilarity between two adjacent nodes at a given scale is given by the Earth Mover’s Distance (EMD) which is a flexible similarity measure between multidimensional distributions [13]. We measure the amount of work needed at a certain scale to transform two disjoint segments into their union given (s ) (s ) their scale-space evolution. Let a(si ) = (Sx i , Sy i ) be the intra-scale arc which we want to valuate. Let Q = {(q1 , wq1 ), (q2 , wq2 ), . . . , (qM , wqM )} and T = {(t1 , wt1 ), (t2 , wt2 ), . . . , (tN , wtN )} be the sets the local(s ) (s ) ization scale’s segments that composes Sx i and Sy i respectively, where qi , ti denote the segment’s feature set and thecorresponding weight of the region. wqi ,wti , denote Also, let d wqi , wtj be the ground distance between qi and tj . The EMD between Q and T is then: N M
card(R)
(s )
(s )
Sy i ∈ V1 i }. The arc set of the down-projected subgraph consists of all the intra-scale arc betweens the nodes (s ) in V1 i−1 . For example in figure 1, lets assume we have cut the node 1 into 2 components: V1 = {1 , 3 , 4 } and V2 = {2 }. The down-projection of G1 = (V1 , E1 ) is given by: V1 = {1 , 3 , 4 } → V1 = {1 , 3 , 4 , 5 , 6 } 1 = and E1 = {(1 , 3 ), (1 , 4 ), (3 , 4 )} → E {(1 , 3 ), (1 , 4 ), (1 , 6 ), . . .}. In the second step the dissimilarity measure for each arc in the down-projected subgraph is estimated using the measure described in Section 3. Recursive normalized cuts The partitioning of a subgraph is achieved by applying recursive cutting until the stopping criterion relative to the subgraph and current scale is met. Let G (E , V ) represent a down-projected subgraph, then
(6)
j=1
In the proposed approach, we define the ground distance as follows:
3 (k) d(qi , tj ) = (∆m(k) (qi ))2 + β(∆macw (tj ))2 (7) k=1
where β is a weighting parameter that enhances the importance of the corresponding features. The weighting factors wqi and wtj in Eq. 6 emphasizes the importance of the 3
0-7695-2521-0/06/$20.00 (c) 2006 IEEE
References
the optimal cut correspond to the partitioning that minimizes: N Cut =
[1] E. S. B. Brandt and R. Basri. Fast multiscale image segmentation. In CVPR, pages I:70–77, 2000. [2] C. Carson, S. Belongie, H. Greenspan, and J. Malik. Blobworld: Image segmentation using e-m and its application to image querying. IEEE Trans. PAMI, 24(8):1026–1038, 2002 2002. [3] T. Cour, F. Benezit, and J. Shi. Spectral segmentation with multiscale graph decomposition. In CVPR, volume 2, pages 1124–1131, 2005. [4] I. Cox, S. Rao, and Y. Zhong. Ratio regions: A technique for image segmentation. In Proc. of the 13th Int. Conf. on Pattern Recognition, 1996. [5] Y. Deng and B. Manjunath. Unsupervised segmentation of color-texture regions in images and video. IEEE Trans. PAMI, 23(8):800–810, 2001. [6] P. Felzenszwalb and D. Huttenlocher. Efficient graphbased image segmentation. Int. Journal of Computer Vision, 59(2):167–181, 2004. [7] J. Koenderink. The structure of images. Biol. Cybern., 50:363–370, 1984. [8] S. Makrogiannis, S. Economou, G. Fotopoulus, and N. Boubakis. Segmentation of color images using multiscale clustering and graph theoretic region synthesis. IEEE Trans. Systems, Man and Cybern., Part A, 35(2):224–238, 2005. [9] C. Mihai, I. Vanhamel, A. Katartzis, and H. Sahli. Scale selection methods for the compact scale-space representation of images. Technical Report TR0095, V.U.B., 2005. [10] L. Najman and M. Schmitt. Geodesic saliency of watershed contours and hierarchical segmentation. IEEE Trans on PAMI, 18(12):1163–1173, 1996. [11] P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE Trans on PAMI, 12(7):629–639, 1990. [12] I. Pratikakis, H. Sahli, and J. Cornelis. Low level image partitioning guided by the gradient watershed hierarchy. Signal Processing, 75(2):173–195, 1998. [13] Y. Rubner and C. Tomasi. Perceptual metrics for image database navigation. Kluwer Academic Publishers, Boston, 2000. [14] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans on PAMI, 22(8):888–905, 2000. [15] I. Vanhamel, I. Pratikakis, and H. Sahli. Multi-scale gradient watersheds of color images. IEEE Trans. on IP, 12(6):617– 626, 2003. [16] T. Vlachos and A. Constantinides. Graph-theoretical approach to color picture segmentation and contour classification. In IEE Proceedings, volume 140, Part I, N:1, pages 36–45, 1993. [17] J. Weickert, B. ter Haar Romeny, and M. Viergever. Efficient and reliable schemes for nonlinear diffusion filtering. IEEE Trans on IP, 7(3):398–410, 1998. [18] Z. Wu and R. Leahy. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. on PAMI, 15(11):1101–1113, 1993.
D(Si , Sl )
Sj ∈Vi ,Sl ∈V −Vi d(Si , Sl ) i
(9)
Sj ∈Vi ,Sl ∈V
where Vj correspond to the node set of an obtained subgraph and D is similarity measure between the node and is calculated as follows [14]: w(Si , Sl )2 D(Si , Sl ) = exp − α ∗ W2
(10)
where w(Si , Sl ) = EM D(Si , Sl ), W is the dynamic range of the weights in the subgraph and α = 0.1. The cutting of the subgraph is terminated when the mean of the weights associated with the active arcs (minimized cut-value) exceeds a percentage of the total amount of weights present in the full graph of the image at the current scale. An additional constraint regarding the minimum segment size and connectivity is added as-well. Cuts yielding components that are to small or consist of disconnected subcomponents are discarded.
5. Results and discussion We have compared our method with three other state-ofthe-art segmentation algorithms: (i) JSEG [5], (ii) E-M algorithm (Blobworld) [2], and (iii) a graph theory based segmentation [6]. The resulting segmentation after the application of the examined algorithms in a set of natural scene images is shown in Figure 2. For the proposed approach, the stopping criterion is satisfied when the total amount of subgraphs at a given scale is more than 20. Hence we aim to retrieve the 20 most dominant segments in the image. A size restriction is added to the cutting criterion as well: a cut yielding a component with less then 100 pixels is rejected. The proposed approach produces segmentations of high quality. For all images in Figure 2, the set of segments is reasonably compact with only a slight over-segmentation in the mountain image and a bit of under-segmentation in the bus. Compared with the other methods, the proposed approach has overall less over-segmentation and a very good boundary location. The partitioning of the graph is rather fast since the subgraphs generally contain a small amount of nodes. In this work, we proposed a segmentation method that combines the strength of multiscale watersheds and graph cuts. The partitioning of the multiscale graph uses a top-down strategy that allows to modify the feature set with in scale-space. 4
0-7695-2521-0/06/$20.00 (c) 2006 IEEE
(a)
(b)
5
(c)
(d)
Figure 2. Segmentation results: (a) proposed method, (b) JSEG [5], (c) E-M algorithm (Blobworld) [2], and graph theory based segmentation [6]
0-7695-2521-0/06/$20.00 (c) 2006 IEEE