Segmentation Ensemble via Kernels Sandro Vega-Pons
Xiaoyi Jiang
Jos´e Ruiz-Shulcloper
Advanced Technologies Application Center (CENATAV), Havana, Cuba Email:
[email protected]
Department of Mathematics and Computer Science University of M¨unster, Germany Email:
[email protected]
Advanced Technologies Application Center (CENATAV), Havana, Cuba Email:
[email protected]
Abstract—Clustering ensemble is a promising technique to face data clustering problems. Similarly, the combination of different segmentations to obtain a consensus one could be a powerful tool for addressing image segmentation problems. Such segmentation ensemble algorithms should be able to deal with the possible large image size and should preserve the spatial relation among pixels in the image. In this paper, we formalize the segmentation ensemble problem and introduce a new method to solve it, which is based on the kernel clustering ensemble philosophy. We prove that the Rand index is a kernel function and we use it as similarity measure between segmentations in the proposed algorithm. This algorithm is experimentally evaluated on the Berkeley image database and compared to several state-of-theart clustering ensemble algorithms. The achieved results ratify the accuracy of our proposal. Index Terms—Consensus segmentation, clustering ensemble, Rand index, kernel function.
I. I NTRODUCTION Clustering ensemble has become a popular technique to deal with data clustering problems [1]. It is known that there is no clustering algorithm that effectively works for every clustering problem. When different clustering algorithms are applied to the same dataset, different clustering results can be obtained. Instead of trying to find the “best” one, the idea of combining these individual results in order to obtain a consensus has gained an increasing interest in the last years. In practice, such a procedure could produce high quality final clusterings. A similar situation can be found in the image segmentation context. There is no segmentation algorithm able to correctly work with all type of images. Different segmentation algorithms or the same segmentation algorithm with different parameter initializations could produce very different segmentations of the same image. Therefore, if there are several segmentations of the image, the idea of combining them following the philosophy of the clustering ensemble algorithms seems to be appropriate. A first approximation could be the use of the clustering ensemble algorithms directly for segmentation combination problems. In other words, assuming that an image is a set of pixels and a segmentation is a partition of this set of pixels, any clustering ensemble algorithm could be applied to obtain a consensus segmentation. Indeed, some clustering ensemble algorithms (e.g. [2], [3]) have been evaluated on image segmentation problems; see [4] for a recent study of applying general clustering ensemble algorithms on image
segmentation combination problems. However, this way of facing the segmentation ensemble problem has two fundamental drawbacks: • Images are normally composed of a large number of pixels. For example, a rather small image of 500 × 500 pixels would lead to a dataset of 250000 objects. • We are loosing the spatial relation among pixels if we assume that an image is only a set of individual pixels. Besides, if this spatial relation is not taken into account, unappropriate segmentations could be considered as potential consensus result. In this paper we first formalize the segmentation ensemble problem and then propose a segmentation combination algorithm based on the philosophy of clustering ensembles, which is able to overcome the above limitations. The rest of this paper is organized as follows. Section II briefly introduces the general clustering ensemble problem and presents some clustering ensemble algorithms. Section III formalizes the segmentation ensemble problem. Section IV presents the proposed method and the proof that the Rand index is a kernel function. This proof allows the use of the Rand index as a similarity measure between segmentations in the proposed algorithm. In Section V several experiments are shown and discussed, and finally Section VI concludes this paper. II. C LUSTERING E NSEMBLE Clustering ensemble methods combine partitions of the same dataset into a final consensus clustering. Let X = {x1 , x2 , . . . , xn } be a set of objects, P = {P1 , P2 , . . . , Pm } is a clustering ensemble, where each Pj = {C1j , C2j , . . . , Cdjj } is a partition of the set of objects X with dj clusters, for all j = 1, . . . , m. We also denote PX the set of all possible partitions of X and the consensus partition is represented by P ∗ . The consensus partition P ∗ is usually defined through the median partition problem: P ∗ = arg max
P ∈PX
m ∑
Γ(P, Pi )
(1)
i=1
where Γ is a similarity measure1 between partitions. Over PX a partial order relation ≼ (“nested in”) can be defined. For all P, P ′ ∈ PX , P ≼ P ′ if and only if, for all 1 This problem can be equivalently defined by minimizing the dissimilarity with all partitions in the clustering ensemble.
cluster C ′ ∪ ∈ P ′ there are clusters Ci1 , Ci2 , . . . , Civ ∈ P such v ′ that C = j=1 Cij . This way a lattice structure is associated to PX (see example in Fig. 1). In the last years, several {a, b, c, d}
{a} {b, c, d}
{a} {b} {c, d}
{b} {a, c, d}
{c} {a, b, d}
{a} {c} {b, d}
{d} {a, b, c}
{a} {d} {b, c}
{a, b} {c, d}
{a, b} {c} {d}
{a, c} {b, d}
{a, c} {b} {d}
{a, d} {b, c}
{a, d} {b} {c}
III. S EGMENTATION E NSEMBLE P ROBLEM F ORMULATION
{a} {b} {c} {d}
Fig. 1. Hasse diagram or graphical representation of the lattice associated to the set of partitions of the set X = {a, b, c, d}
clustering ensemble methods, where the consensus partition is defined through the median partition problem (1), have been proposed. Most of these methods consist of heuristics based on the use of different mathematical and computational tools (see [1] for a survey). The segmentation ensemble method proposed in this paper is based on the Weighted Partition Consensus via Kernels (WPCK) method introduced in [5], which is briefly explained in the next section. A. Weighted Partition Consensus via Kernels In the Weighted Partition Consensus via Kernel (WPCK) method, the consensus partition is theoretically defined2 as: P ∗ = arg max
P ∈PX
m ∑
because by directly working with equation (2) we can compute the sum of similarity values of a given partition, but we cannot estimate its deviation to the consensus solution. Therefore, equation (3) allows defining an iterative procedure that starting from an initial partition, tries to find a new partition closer to the consensus partition. In [5] the simulated annealing metaheuristic is used to avoid the convergence to a low quality local optimum.
k(P, Pi )
(2)
i=1
where k is a positive definite kernel3 function [6]. If k is a kernel function, there is a function ϕ : PX → H that allows mapping the problem (2) into the Reproducing Kernel Hilbert Space H associated to k. The exact solution ψ in H can be found as the sum of the input partitions mapped in H by ϕ, divided by its norm, i.e.: ∑m ϕ(Pi ) ψ = ∑i=1 m ∥ i=1 ϕ(Pi )∥ Then, the consensus partition P ∗ ∈ PX is the solution of the pre-image problem P ∗ = arg min ∥ϕ(P ) − ψ∥2 P ∈PX
where ∥ϕ(P ) − ψ∥ can be rewritten only in terms of the similarity measure k in the following way: ∑m 2 i=1 k(P, Pi ) ∥ϕ (P ) − ψ∥ = 2 − 2 ∑m ∑ (3) m i=1 j=1 k(Pi , Pj ) 2
where the denominator is a constant value that does not depend on any particular partition P , therefore can be computed only once. By this last equation we know how close each partition in PX is to the consensus partition. Thus, this equation can be used as a global measure to evaluate any candidate partition P ∈ PX as a possible consensus. This is of great importance 2 In [5], the median partition problem is defined by using weights, we assume here that all weights are equal to 1. 3 For simplicity we also say kernel.
As was previously mentioned, a direct application of clustering ensemble algorithms on segmentation combination problems present some drawbacks, e.g. generation of very large datasets and loss of spatial relation among pixels. In order to overcome the first limitation, the idea of the super-pixel representation [3], [4] of an image is used. The use of super-pixels is motivated by the fact that neighboring pixels that were grouped in the same region in all segmentations do not have to be analyzed in a separated way. It is expected that such pixels are grouped in the same region in the consensus segmentation. Therefore, a representative element (super-pixel) could be selected from each of these regions. Thereby, a super-pixel is a connected component in the image, formed by pixels that were grouped in the same cluster in all the segmentations to be combined. Moreover, two superpixels σa , σb are neighbors if there are at least two neighboring pixels πa , πb such that πa ∈ σa and πb ∈ σb . Given an image and a set of segmentations of the image, the super-pixels computation can be efficiently performed in O(w · h · m), where w and h are the image dimensions and m the number of segmentations to be combined. In the worst case the number of super-pixels could be the number of pixels in the image. In practice, however, the number of objects is substantially reduced after the super-pixel computation. Our segmentation ensemble problem formulation is based on the definition of the concepts Image and Segmentation by using super-pixels. These definitions also consider the spatial relation among pixels in the image. As it can be seen in Figure 2, we can associate a graph to an image, which represents the neighborhood relation among the super-pixels in the image. This graph is essential in the following definitions. Definition 1: An image is a pair I = (IΣ , IG ), where IΣ = {σ1 , σ2 , . . . , σn } is a set of super-pixels. IG = (V, E) is the connectivity graph of super-pixels, where there is a node vi ∈ V associated to each super-pixel σi and there is an edge eij ∈ E if and only if the super-pixels σi and σj are neighbors. Proposition 1: The graph IG is a planar graph. This proposition is given without a proof, since it can be easily verified from the graph definition, i.e., this graph connects neighboring objects in the two-dimensional plane. Definition 2: A segmentation S = {R1 , R2 , · · · , Rd } is a set of regions (clusters) Ri ⊆ IΣ that holds the following four properties: • Ri ̸= ∅, ∀i = 1, . . . , d ∪d • i=1 Ri = IΣ • Ri ∩ Rj = ∅, ∀i, j = 1, . . . , d, with i ̸= j
∀R ∈ S, the graph RG is a connected graph. RG is the subgraph induced by the set of nodes associated to the super-pixels in R. In other words, RG is obtained by removing all the nodes (and related edges) from IG , which do not represent any super-pixel in the region R. The first three properties ensure that a segmentation is a partition of the set of super-pixels. The last property ensures that in any region of a segmentation, any pair of super-pixels are connected by a path of super-pixels that also belong to the same region. In Fig. 2 it can be seen that only the partitions in red (bold) color satisfy the segmentation definition. •
b
c a
b
a
c
d d (a)
{a} {b, c, d}
{b} {a, c, d}
{c} {a, b, d}
{d} {a, b, c}
{a, b} {c, d}
{a, c} {b, d}
S0 = arg min ∥ϕ(S) − ψ∥2 S∈S
{a, d} {b, c}
• {a} {b} {c, d}
{a} {c} {b, d}
{a} {d} {b, c}
{a, b} {c} {d}
{a, c} {b} {d}
{a, d} {b} {c}
(6)
Other parameters of the simulated annealing like the temperature, cost function, etc. can be computed as in [5]. However, in this case, the search space of the problem is SI instead of PI . Then, the state (segmentation) neighborhood should be modified. In order to guarantee the good performance of the simulated annealing, it is important that the new neighborhood definition fulfils the following conditions: •
(b)
{a, b, c, d}
This equation is used to know how close to the consensus segmentation any segmentation S ∈ SI is. Therefore, the simulated annealing meta-heuristic can also be used. The first state (segmentation) in the process is defined as the segmentation in the segmentation ensemble S closest to the theoretical consensus, which can be computed as follows:
Given a state (segmentation), the process of generating a new neighbor state must ensure that the new state holds the proposed segmentation definition (without an additional verification). Starting from an initial segmentation, any segmentation in SI can be reached in O(n) steps.
where Γ is a similarity measure between segmentations. The fact that SI ⊆ PI is important for two reasons. First, only partitions of the set of super-pixels that respect the spatial relation among super-pixels are considered. Second, the search space of the problem is reduced.
Given an image I = (IΣ , IG ), with IG = (V, E), and a segmentation S of this image, the following neighbor segmentation generation process is proposed. A super-pixel σ ∈ IΣ and an edge (σ, σ ′ ) ∈ E are randomly selected. If the superpixel σ ′ does not belong to the same region (cluster) of σ in S, the super-pixel σ is moved to the region where σ ′ belongs. On the other hand, if σ and σ ′ are in the same region, a new empty region is created and σ is moved to this new region. This procedure satisfies both properties mentioned above. Finally, as the graph IG = (V, E) is planar, |E| ≤ 3·|V |−6 and therefore |E| = O(|V |). In other words, we can store in memory and iterate over the set of edges E. The space requirement and time complexity are lineal with respect to the number of super-pixels in the image. Hence, the proposed neighboring segmentation generation process does not affect the algorithm’s complexity. On the other hand, the reduction of the search space to SI allows finding good solutions with a smaller number of iterations. We call the proposed algorithm Segmentation Ensemble via Kernels (SEK). Given an image and a set of segmentations of this image, the algorithm steps are the following:
IV. S EGMENTATION E NSEMBLE VIA K ERNELS The proposed segmentation ensemble method is based on the WPCK [5] method. Therefore, we use a kernel function k as similarity measure between segmentations in the problem formulation (4). In principle, any kernel similarity measure defined for comparing partitions could be used since following the definitions given before, a segmentation is a partition. Following the idea of the WPCK algorithm (see Section II-A), we have ∑m 2 i=1 k(S, Si ) ∑ (5) ∥ϕ (S) − ψ∥ = 2 − 2 m ∑ m i=1 j=1 k(Si , Sj )
1) Computation of the set of super-pixels IΣ and the connectivity graph IG . Selection of the kernel similarity measure k between segmentations to be used in the problem formulation. 2) Computation of the initial state (segmentation) S0 by solving equation (6). 3) Application of the simulated annealing. Take S0 as current solution and while a maximum number of iterations is not reached, try to find a new segmentation in SI better than the current solution. The move from one segmentation to another is done by using the proposed segmentation neighboring relation.
{a} {b} {c} {d}
(c)
Fig. 2. Given an image, the set of all segmentations is a subset of the set of all partitions (SI ⊆ PI ). (a) Super-pixel image {a, b, c, d}. (b) Connectivity graph associated to this image. (c) The set of segmentations (red) is a subset of the set of partitions of this super-pixel image.
From these definitions, it can be deduced that the set of all segmentations SI of an image is a subset of all possible partitions PI of the set of super-pixels in the image. This way, given a set of segmentations S = {S1 , S2 , . . . , Sm }, the consensus segmentation S ∗ is formally defined as: m ∑ Γ(S, Si ) (4) S ∗ = arg max S∈SI
i=1
4) In the solution given by the simulated annealing, each super-pixel is substituted by the set of pixels that it represents in order to form the final consensus segmentation of the image. A. Rand index as a kernel similarity measure between segmentations In the experiments we will use the proposed algorithm with two kernels as similarity measures between segmentations. The first is the subset significance based measure introduced in [5]. Given a set of objects X and the set of all possible partitions of these objects PX , this measure is defined by ∑ SS(Pa , Pb ) = µ(S, Pa )µ(S, Pb ) S⊆X |S| where µ(S, P ) = |C| if ∃C ∈ P such that S ⊆ C, and µ(S, P ) = 0 otherwise. The second measure is the Rand index, which is a widely used similarity measure between clusterings and segmentations. We present in this section a proof that the Rand index measure is a positive definite kernel function. The Rand index [7] is a function RI : PX × PX → R, which can be defined between two partitions Pa and Pb as:
RI(Pa , Pb ) =
ab ab + N11 N00 n(n − 1)/2
(7)
ab is the number of pairs of objects that belong to where N11 ab is the same cluster in both partitions Pa and Pb , while N00 the number of pairs of objects that belong to different clusters in Pa and in Pb . These measures can be written as: ∑n ∑n b a i=1 j=1 δij · δij ab N11 = , where 2 { 1, if (i ̸= j) ∧ (∃C ∈ Pt | xi ∈ C ∧ xj ∈ C); t δij = 0, otherwise.
∑n ab N00
{ t γij
=
=
i=1
∑n j=1
2
a b γij · γij
, where
1, if @C ∈ Pt | xi ∈ C ∧ xj ∈ C; 0, otherwise.
Then, the Rand index can be rewritten as: ∑n ∑n ∑n ∑n a b a b i=1 j=1 δij · δij + i=1 j=1 γij · γij RI(Pa , Pb ) = n(n − 1) (8) Proposition 2: The Rand index is a positive definite kernel. Proof: In order to prove that the Rand index (RI) is a kernel function we need to prove that: • RI is symmetric. • ∀t ∈ N, ∀α1 , α2 , . . . , αt ∈ R and ∀P1 , P2 , . . . , Pt ∈ PX : t ∑ t ∑ a=1 b=1
αa αb RI(Pa , Pb ) ≥ 0
The symmetry is evident from the definition of the Rand index. The second property can be proven by substituting equation (8) in the above expression. ∑n ∑n ∑n ∑n t ∑ t a b a b ∑ i=1 j=1 δij δij + i=1 j=1 γij γij αa αb ≥0 n(n − 1) a=1 b=1
Reorganizing the terms we have
( t t ) n ∑ n t ∑ t ∑ ∑∑ ∑ 1 a b a b αa δij αb δij + αa γij αb γij ≥ 0 n(n − 1) i=1 j=1 a=1 a=1 b=1
b=1
and reorganizing again the terms we obtain ( )2 ( t )2 n ∑ n t ∑ ∑ ∑ 1 a a ≥0 + αa δij αa γij n(n − 1) i=1 j=1 a=1 a=1 and the proposition is proven. V. E XPERIMENTAL R ESULTS We used the color images from the Berkeley dataset [8] for the experiments. It is widely used for image segmentation evaluation and is composed of 300 natural images of size 481 × 321. For each image, we used two state-of-the-art segmentation algorithms to generate two ensembles: TBES ensembles and UCM ensembles. Each ensemble is composed of 10 segmentations obtained by varying the parameter values of the segmentation algorithms. TBES ensembles were generated with the TBES algorithm [9], which is based on the MDLprinciple and has as parameter the quantization level (ϵ). We varied ϵ = 40, 70, 100, 130, . . . , 310 to obtain the 10 segmentations in the ensemble. Furthermore, UCM ensembles were generated with a segmenter based on ultrametric contour map (UCM) [10]. Its only parameter is the threshold l, we choose l = 0.03, 0.11, 0.19, 0.27, 0.35, 0.43, 0.50, 0.58, 0.66, 0.74. Each image in this dataset has several human segmentations (ground-truth). In the experiments, we compared the obtained results with the human segmentations of each image by using three well-known measures to evaluate the algorithm results: Normalized Mutual Information (NMI) [11], Variation of Information (VI) [12], and Rand Index (RI) [7]. NMI and RI are similarity measures that take values in the range [0, 1], where 1 means a perfect correspondence between the segmentation and the ground-truth. On the other hand, VI is a dissimilarity measure that takes values in [0, +∞], where 0 means a perfect correspondence. In order to show the experimental results in a homogeneous way we present 1-NMI and 1-RI values, which are dissimilarity versions of the original measures. Thus, for all measures lower values mean better correspondence. In Table I, the comparison of the obtained results with the best human segmentation is presented in the column denoted bestGT and the average value of the comparisons with all the human segmentations is presented in column allGT. In this table, the proposed algorithm (SEK) is compared with several clustering ensemble algorithms applied to the segmentation combination problem. These algorithms are BOK and BOEM [13]; CSPA, HGPA and MCLA [11]; EA-AL [2] and QMI
[14]. For each algorithm, the average results over the 300 analyzed images is presented. In the proposed algorithm, we used two similarity measures between segmentations: the subset significance kernel measure [5] denoted by (SEK + SS) and the Rand index (SEK + RI). Besides, we apply supervised parameter learning (SUPLRN) to the same ensembles. For each ensemble we compute the average performance measure over all 300 images of Berkeley dataset for each parameter setting. The parameter setting with the largest value is selected as the optimal fixed parameter setting for the corresponding ensemble. By this means we may provide a quantitative comparison with the proposed approach. TABLE I E XPERIMENTAL COMPARISON OF THE PROPOSED ALGORITHM ( TWO VERSIONS SEK + SS AND SEK + RI) WITH SEVERAL CLUSTERING ENSEMBLE ALGORITHMS IN SEGMENTATION COMBINATION PROBLEMS . I N THE LAST ROWS THE SUPERVISED LEARNING (SUP-LRN) RESULTS ARE SHOWN . F OR EACH ENSEMBLE , THE BEST RESULTS ARE HIGHLIGHTED . 1 - NMI Ensemble Method bestGT allGT BOK 0.41 0.48 BOEM 0.35 0.42 CSPA 0.33 0.39 HGPA 0.32 0.38 TBES MCLA 0.34 0.41 EA-AL 0.32 0.39 QMI 0.33 0.39 WPCK 0.32 0.39 SEK + SS 0.31 0.38 SEK + RI 0.31 0.37 SUP-LRN 0.31 0.37 BOK 0.34 0.40 BOEM 0.41 0.46 CSPA 0.34 0.40 HGPA 0.42 0.49 UCM MCLA 0.36 0.42 EA-AL 0.35 0.41 QMI 0.37 0.43 WPCK 0.34 0.40 SEK + SS 0.34 0.40 SEK + RI 0.31 0.38 SUP-LRN 0.28 0.35
VI bestGT allGT 1.34 1.73 1.52 1.82 1.75 1.99 1.75 1.98 1.47 1.77 1.51 1.78 1.68 1.93 1.58 1.85 1.42 1.75 1.43 1.72 1.34 1.69 1.90 2.17 2.20 2.44 1.90 2.17 3.67 4.00 1.91 2.18 1.90 2.17 2.26 2.52 2.06 2.32 1.36 1.69 1.35 1.64 1.29 1.61
1 - RI bestGT allGT 0.21 0.28 0.16 0.22 0.14 0.21 0.14 0.21 0.16 0.22 0.15 0.21 0.15 0.21 0.15 0.22 0.16 0.23 0.14 0.20 0.14 0.20 0.15 0.21 0.19 0.25 0.15 0.21 0.18 0.27 0.16 0.22 0.15 0.21 0.16 0.24 0.15 0.21 0.19 0.24 0.15 0.21 0.11 0.18
As can be seen in Table I, for almost all ensembles and validity measures, one of the two variants of the proposed algorithms obtained the best results. In addition, a superiority of the SEK + RI configuration can be appreciated. This ratifies that the Rand index is an appropriate similarity measure between segmentations and thanks to the proof given in Section IV-A, it can be used in the proposed algorithm. From this table we can experimentally corroborate that the application of general clustering ensemble algorithms in segmentation ensemble problems is possible. However, the design of an algorithm that takes into account the peculiarities of the segmentation ensemble problem produces better results. Finally, we can see from Table I that the proposed algorithm obtained very close results to the supervised parameter learning approach, even when supervised learning makes use of the ground-truth information. It is important to notice that supervised learning is not applicable in many practical scenarios because ground-
truth is not available. VI. C ONCLUSION The segmentation ensemble problem differs from the general clustering ensemble problem. When clustering ensemble algorithms are applied to segmentation combination problems, they are not capable of taking into account and benefiting from the spatial relation among pixels in the image. Besides, the large number of pixels in an image could affect this kind of algorithms. In this paper, we formalized the segmentation ensemble problem and proposed an algorithm based on kernel clustering ensembles, which is able to overcome the above mentioned limitations. The super-pixel representation of the image and the connectivity graph of super-pixels are fundamental tools in our approach. On the other hand, we proved that the Rand index, which is a widely used similarity measure between partitions and segmentations, is a kernel function. This extends the application scope of this measure. In particular, it is successfully used in the proposed algorithm. Finally, the comparison with several state-of-the-art clustering ensemble algorithms on image segmentation combination problems showed the accuracy of the proposed algorithm. R EFERENCES [1] S. Vega-Pons and J. Ruiz-Shulcloper, “A survey of clustering ensemble algorithms,” International Journal of Pattern Recognition and Artifcial Intelligence, vol. 25, no. 3, pp. 1 – 36, 2011. [2] A. L. N. Fred and A. K. Jain, “Combining multiple clustering using evidence accumulation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 835–850, 2005. [3] V. Singh, L. Mukherjee, J. Peng, and J. Xu, “Ensemble clustering using semidefinite programming with applications,” Machine Learning, vol. 79 (1-2), pp. 177 – 200, 2010. [4] L. Franek, D. D. Abdala, S. Vega-Pons, and X. Jiang, “Image segmentation fusion using general ensemble clustering methods,” in ACCV (4), 2010, pp. 373 – 384. [5] S. Vega-Pons, J. Correa-Morris, and J. Ruiz-Shulcloper, “Weighted partition consensus via kernels,” Pattern Recognition, vol. 43(8), pp. 2712–2724, 2010. [6] B. Sch¨olkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA, USA: MIT Press, 2002. [7] W. M. Rand, “Objective criteria for the evaluation of clustering methods,” J. American Statistical Assoc., vol. 66, pp. 846 – 850, 1971. [8] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proc. ICCV, vol. 2, 2001, pp. 416–423. [9] S. Rao, H. Mobahi, A. Y. Yang, S. Sastry, and Y. Ma, “Natural image segmentation with adaptive texture and boundary encoding,” in ACCV (1), 2009, pp. 135–146. [10] P. Arbelaez, M. Maire, C. C. Fowlkes, and J. Malik, “From contours to regions: An empirical evaluation,” in CVPR. IEEE, 2009, pp. 2294– 2301. [11] A. Strehl and J. Ghosh, “Cluster ensembles: a knowledge reuse framework for combining multiple partitions,” J. Mach. Learn. Res., vol. 3, pp. 583–617, 2002. [12] M. Meil˘ a, “Comparing clusterings–an information based distance,” Journal of Multivariate Analysis, vol. 98, no. 5, pp. 873 – 895, 2007. [13] V. Filkov and S. Skiena, “Integrating microarray data by consensus clustering,” International Journal on Artificial Intelligence Tools, vol. 13(4), pp. 863 – 880, 2004. [14] A. P. Topchy, A. K. Jain, and W. F. Punch, “Clustering ensembles: Models of consensus and weak partitions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 12, pp. 1866–1881, 2005.