Constrained Directed Graph Clustering and

0 downloads 0 Views 3MB Size Report
graph clustering method and segmentation propagation method for the multiple foreground ..... Similar to traditional KNN graph, each edge will be given a.
10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

1

Constrained Directed Graph Clustering and Segmentation Propagation for Multiple Foregrounds Co-segmentation Fanman Meng, Member, IEEE, Hongliang Li, Senior Member, IEEE, Shuyuan Zhu, Member, IEEE, Bing Luo, Chao Huang, Bing Zeng, Senior Member, IEEE, and Moncef Gabbouj, Fellow, IEEE

Abstract—This paper proposes a new constrained directed graph clustering method and segmentation propagation method for the multiple foreground co-segmentation. We solve the multiple object co-segmentation with the perspective of classification and propagation, where the classification is used to obtain the object prior of each class and the propagation is used to propagate the prior to all images. In our method, the directed graph clustering method is designed for the classification step, which adds clustering constraints in co-segmentation to prevent the clustering of the noise data. A new clustering criterion such as the strongly connected component search on the graph is introduced. Moreover, a linear time strongly connected component search algorithm is proposed for the fast clustering performance. Then, we extract the object priors from the clusters, and propagate these priors to all the images to obtain the foreground maps, which are used to achieve the final multiple objects extraction . We verify our method on both the co-segmentation and clustering tasks. The experimental results show that the proposed method can achieve larger accuracy compared with both the existing co-segmentation methods and clustering methods. Index Terms—Object Co-segmentation, Multiple Classes, Directed Graph Clustering, Propagation.

I. I NTRODUCTION Multiple foreground co-segmentation [1]–[5] aims at jointly extracting multiple common objects from a group of images, which is a fundamental task for many computer vision applications, such as image retrieval [1], [6], image representation [7], image classification [8] and image retargeting [9]. Although several multiple foreground co-segmentation methods have been proposed in the last decade, multiple foreground extraction is still a challenging task since many unknown Copyright (c) 2014 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to [email protected]. This work has been supported in part by the National Basic Research Program of China (973 Program 2015CB351804), National Natural Science Foundation of China (Nos. 61271289 and 61300091), the Ph.D. Programs Foundation of Ministry of Education of China (No. 20110185110002), and the Program for Young Scholars Scientific and Technological Innovative Research Team of Sichuan Province, China (No. 2014TD0006). F. Meng, H. Li, S. Zhu, B. Luo, C. Huang are with School of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, China. E-mail: [email protected] B. Zeng is with School of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, China, and Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong. M. Gabbouj is with Department of Signal Processing, Tampere University of Technology, Tampere, Finland.

objects may be contained in each image, which makes the foreground prior generation difficult [10]. The successful multiple foreground co-segmentation depends on the accurate object prior generation. The existing multiple foreground methods generate the object priors based on two steps. The first step segments (clusters) the foreground regions that are repeatedly contained in the image group. The second step uses the segments to learn the multiple object priors. The two steps are iteratively performed by EM like algorithm [11] until the convergency, where E-step clusters the training data (Clustering Step) and M-step learns the objects priors (Learning Step) based on the clustering results. Since the foreground and background region are unknown in the multiple foreground co-segmentation, much noise data such as unsuccessful segmentation and incorrect object prior will be inevitably generated in the clustering steps, which easily interferes the iteration and results in the local minimum solution of the EM algorithm. For example, the segments containing only partial object regions will be easily clustered in the object region class, which then interfere the accurate object prior generation. Moreover, the object regions with the incorrect foreground priors will also be clustered in the classification step, which will lead to an inaccurate prior learning. Hence, it is useful to remove the noise data in the clustering step in order to achieve more accurate multiple foreground co-segmentation. An effective method for the noise data removing is to add clustering constraints in the clustering step, which only cluster the data that satisfy the constraints and thus avoid the noise data clustering. But, for the general clustering problem [12], it is difficult to set the noise data removing criterions to distinguish the noise data from the right data. Fortunately, in the co-segmentation problem, there are several special characteristics that can help us to find the noise data. For example, (1) clustering a subset of the object regions can also provide sufficient prior information of the classes. Hence, we can set strong similarity constraints in the clustering to avoid the noise data clustering. (2) the incorrect foreground prior information in initial rough prior can be avoided by setting a consistency constraint of the foreground priors in each cluster. (3) there are several constraints in the clustering results. For example, the samples in each cluster prefer to come from different images and the samples from the same image ought to be non-overlapping. These constraints can also be added in the clustering to help the noise data replacement. These

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

2

special characteristics of the co-segmentation motivate us to design a new multiple constraint based clustering algorithm so that we can avoid the noise data clustering to improve the multiple foreground co-segmentation performance. In this paper, we proposed a constrained clustering method together with a segmentation prorogation method for the multiple foreground co-segmentation, which has been briefly presented in [13]. Based on a set of initial object proposals and rough initial priors, the main idea is to set various noise removing constraints based on the segment consistencies and relationships, and then add these constraints in the clustering step to only cluster the right data and discard the noise data. Furthermore, the segment propagation method is proposed to transfer the object priors and detect the foreground regions. The proposed method consists of two steps: region proposal clustering and prior propagation. The first step uses a new directed graph clustering to cluster the initial object proposals and form the multiple class region data. The second step learns the foreground priors and transfers the object priors to achieve the final foreground extraction. We verify the proposed method on co-segmentation dataset (MFCFlickr and ICoseg dataset) and clustering dataset (COIL-20, COIL-100). The results show that the proposed method can avoid the noise data clustering and provide sufficient multiple class information to improve the co-segmentation performance. Furthermore, the proposed directed graph clustering method obtains better clustering accuracy compared with several existing clustering methods. The structure of this paper is organized as follows. In Section II, the related work is introduced. In Section III, we present the proposed co-segmentation method by illustrating the directed graph clustering and cluster propagation. Section IV shows the experimental results of the proposed method. Finally, in Section V, the conclusion is given. II. R ELATED W ORK The existing co-segmentation can be divided into two categories: single class co-segmentation [1], [3], [4], [7], [14]– [23] and multiple class co-segmentation [8], [10], [24]–[26]. Here, we focus on the multi-class co-segmentation, which aims to simultaneously divide a set of images into multiple class regions [26]. The multi-class co-segmentation is a general concept of co-segmentation and can be used in many practical applications. However, it is still challenging because K priors are required to be discovered. Recently, several multi-class co-segmentation methods have been proposed, such as multiple foreground co-segmentation [10], multiple region co-segmentation [26], [27], and classdiscriminative learning co-segmentation [8], [25]. The most related work is the multiple foreground co-segmentation in [10], which aims to segment multiple foreground classes from the image group. Two steps are iteratively applied to achieve the multiple foreground extraction, i.e., the foreground modeling and the region assignment. The first one learns the appearance models of K foregrounds, and the second one allocates the regions of each image to one of the K foregrounds or the background. In the multi-regions cosegmentation [26], [27], the classes both in foreground and

background are simultaneously segmented, which leads to a non-overlapping local region segmentation for each image. Although this type of co-segmentation is able to efficiently extract K classes from a large scale of image group, it assumes that the object is contained in every image, which does not explicitly consider the cases where foregrounds irregularly occur across the images [10]. In class-discriminative learning co-segmentation [8], [25], the diversities between foreground classes and background classes are learned from a weakly labeled dataset, which is able to achieve more accurate multiple foreground co-segmentation. The fact that the class contained by many image groups (such as “Sky”) is the background class is used to learn the diversity. This method is merged into classification model to improve the image classification [8], [25]. Although more accurate co-segmentation can be obtained by this method, it requires user to construct the training dataset for each image group. In these multiple class co-segmentation methods, generating the foreground prior models is a challenging problem because the foreground regions in images are unknown. Several foreground prior generation methods have been proposed, which focuses on two problems: the foreground region localization for each object and the corresponding foreground prior model generation. For example, user interaction is used to accurately locate the foreground pixels in [22] and [19]. Gaussian Mixture model (GMM) is used for the prior model generation. In addition, others automatically generate the foreground models by accompanying with the co-segmentation. The method in [27] generates the foreground prior after each step of cosegmentation. Gaussian Mixture model (GMM) is used for the prior model representation. In [10], the foregrounds and their models are iteratively generated by an iteration optimization algorithm. GMM model and spatial pyramid matching (SPM) with the linear SVM are used to model the foreground prior. The method in [26] formulates multiple class co-segmentation by the Expectation-maximisation (EM) algorithm, which iteratively obtains the foregrounds and their priors. GMM is also used to describe the prior model. In this paper, we learn the foreground priors by the low-rank matrix discovery based method. Note that other foreground prior models can also be used in our framework for generating the foreground prior, such as the GMM model [16] that learns the prior from the clustering results and then provides the prior information for the next propagation step. Another related work is the object co-segmentation method, which aims to only extract the interesting common objects from similar backgrounds. Here, the interesting common objects are defined as the common semantic foregrounds. Recently, two single class based object co-segmentation methods [3], [21] have been proposed. Vicente et al. [21] represent each image by a set of object proposals, and then describe the relationship between the object proposals by a full-connected graph. The common object extraction is converted to an MRF matting problem, which can be solved by the Loop Belief Propagation. In [3], Meng et al. treat each image (such as ith image) as one layer (the ith layer) of the graph representing the foreground similarities. To simplify the graph representation, only the relationships between neighbor layers are considered.

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

3

The common object extraction is finally formulated as a graph shortest path searching problem, which can be efficiently solved by dynamic programming. In graph clustering, directed graph clustering is an important aspect, which uses directed graph to represent asymmetric affinity relationships between samples and performs clustering by the clustering criterion defined on the constructed directed graph. For example, Meila et al. [28] define a generalized weighted cut as the clustering criterion with the aim to find a cut (boundary) that has both low weight and balanced cluster sizes. The directed clustering is formulated as searching the clusters with minimum weighted cuts, and is relaxed to Rayleigh quotient problem. Frey et al. [29] automatically obtain the exemplars (class center) and samples of each class by passing messages such as responsibility information from exemplar to the data and availability information from data to exemplar on the directed graph. Recently, Zhang et al. [30] perform clustering on directed graph by firstly constructing K-NN directed graph to represent the sample relationships. Then, initial classes are iteratively merged by their distances until the class number is reached. Although the directed graph clustering is successfully used in some applications, these methods do not consider the introduction of constraints in the edge weight calculation, which is important for the flexibility of directed graph clustering in practical applications. In the next section, we will show how to add constraints in the graph construction to make the clustering more suitable for co-segmentation problem. III. T HE

PROPOSED METHOD

A. Overview Our method is based on a weak object prior such as the bottom-up saliency maps [31], [32]. Based on the prior, we segment the multiple common objects via four steps, as shown in Fig.1. In the first step, each original image is segmented into a set of overlapping objects proposals. A novel directed graph is constructed to represent the similarities between the object proposals. In the second step, a new directed graph clustering method is performed on the graph to cluster the similar object proposals into a number of clusters. These clusters are the basic segments, and is used to generate the foreground priors. In the third step, the foreground priors are propagated to the original images to obtain the foreground probability map. The final co-segmentation is achieved by the foreground probability map using the graph-cuts algorithm. In our method, the following constraints are considered in the clustering steps: • In the clustering step, the number of classes is unknown. The clustering method is required to clustering the objects without the class number. • The initial proposals from the same image may be overlapping, which makes these proposals more similar to each other than the similarities with the proposals of different images. This can result in the clustering of the proposals from the same image. However, we need to cluster the proposals from different image for accurate object prior generation. Hence, the method is required

to avoid the clustering of the proposals from the same image. • There are many interference local regions for a class, such as the local regions partially covering the object and background. These regions can provide incomplete or inaccurate object prior, which results in the unsuccessful object prior generation. It requires the clustering algorithm to eliminate these interferences. In other words, we need to cluster the object proposals, while reject the inaccurate proposals. • There must be some errors in the rough object prior, such as the inversely saliency extraction of the foreground and background. We need to consider the good object prior, while avoid the interferences of the bad object prior in the clustering and propagation. We next details the proposed method, including directed graph generation, directed graph clustering and the prior propagation method. B. Directed graph generation We first segment each image Ii into a set of overlapping objects proposals P i = {P1i , · · · , PNi i }, where Ni is the number of the object proposals in image Ii . The set of all object proposals are denoted as P = {P 1 , · · · , P M }. We obtain P i ∈ P based on the method [3] which obtains object proposals by three methods such as over-segmentation, saliency detection based object segmentation, and object detection based object segmentation. Hence, each Pji can be considered as a sematic local region, and can be used to provide the “objectness” of the multiple classes. We next cluster the local proposals based on the directed graph clustering [30]. The idea of directed graph clustering (DGC) [30] is to first construct KNN directed graph to represent the similarity relationship between the samples, where the vertices of the graph represent the samples, and the directed edge is formed by connecting the K nearest similar vertices with each vertex. Then, the vertices are clustered into several classes based on the consistencies and the diversities intra and inter clusters. Since several constraints need to be added in the clustering, the traditional direct graph clustering method is not suitable to solve our problem. Firstly, the traditional DGC method requires the number of clusters, which is unknown in our problem. Secondly, the initial proposals from the same image with overlapping have the similar appearance. The traditional graph construction method tends to cluster these overlapping regions from the same image instead of the proposals between images. Thirdly, the traditional method will cluster every proposal. However, our aim here is to only cluster object proposals satisfying the constraints. Therefore, we propose a new constrained direct graph clustering method to solve the problem with new graph construction and clustering criterion. 1) Graph construction: We first introduce the directed graph construction. We use G = (V, E) to denote the directed graph, where V is vertices set, E is edge set. Each vertex vij ∈ V represents an object proposal Pji . Each edge e ∈ G directed from vij to vkl is represented by e = (vij , vkl ), where

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

4

Fig. 1. The flowchart of the proposed method.

vkl is said to be a direct successor of vij , and vij is said to be a direct predecessor of vkl . vij is called the head and vkl is called the tail of the edge. For a vertex vij , only the top K nearest object proposals vkl will be connected with vij and the direction is from vij to vkl . For the other object proposals, there will be no edges. To focus on clustering the object proposals of different images, we introduce two connection constraints: • For any edge e = (vij , vkl ), i ̸= k. • For any pair of edges e = (vij , vkl ) and e = (vij , vk′ l′ ) deriving from vij , k ̸= k ′ . The first constraint indicates that the vertices vij and vk come from different images. In the second constraint, the vertices vkl and vk′ l′ derived from the same node vij should come from different images. These constraints guarantee that the directed graph can fully represent the object proposal similarities among the images. After connecting all vertices, we obtain a directed graph G. Similar to traditional KNN graph, each edge will be given a weight representing the similarity between the vertices. In this paper, we set all weights with value one since the connections between vertices can fully represent the similarity relationships when K is much smaller than the number of common objects, which usually happens in realistic image groups. 2) K Nearest Neighbors Selection: In the classical edge construction of KNN, the K nearest neighbors of node vij are searched by the feature distance d(fij , fkl ) between the two nodes vij and vkl , where fij and fkl are their features. This method is sensitive to the noise data. Here, we add the clustering constraints in the K nearest neighbors selection by two aspects. One is the similarities between regions, and the other is the consistencies of the initial foreground priors of the regions. The first is to guarantee the clustering of very similar samples. The second is to prevent the clustering of the samples with incorrect initial foreground priors. These constraints are added by a new distance metric which simultaneously takes the feature distance and the foreground prior consistency into account. We next introduce the calculation of the distance metric. Let Ji be the saliency map (the initial foreground prior) of Ii , we first score each region proposal Pji of Ii by ! Rij = (Ji (k, l))/|Pji |; (1) (k,l)∈Pji

where |Pji | is the number of the foreground pixels in Pji . Ii (k, l) is the pixel belonging to Pji . In this paper, we use the method in [31] to obtain the saliency map. Then, we intend to collect the foregrounds with consistent prior while refusing the foreground with inconsistent prior. Here, the similarity between two nodes (vij , vkl ) is calculated by ∥fij − fkl ∥2 d(vij , vkl ) = (2) min(Rij , Rkl ) where d(vij , vkl ) is the distance between a pair of nodes vij and vkl , fij and fkl are the features of the nodes vij and vkl , respectively. A pair of foreground regions prefer small distance when their features are similar and their initial foreground probabilities are large. Hence, given a foreground region with a large R, the similar foreground regions with the large value of R will be ranked in the front, and will be more easily selected as its K nearest neighbors. For a background regions with a small R, min(Rij , Rkl ) is small and the distance are mainly determined by their feature similarities. C. Directed graph clustering After graph construction, we next cluster the object proposals by the directed graph clustering. Since the number of the classes and the classes contained in each image are unknown, obtaining the accurate clustering is difficult. Fortunately, it is easier to cluster a subset of a class with very similar regions, which can be achieved by setting strong similarity constraints in clustering criterion. Meanwhile, these clusters are able to provide sufficient appearance prior of each class in the multiple foreground co-segmentation. This motivates us to perform the clustering on the very similar proposals, which are then propagated to the images to achieve the final multiple foreground co-segmentation. In the clustering, we need to only consider a subset of the proposals that contains very similar proposals. This requires a clustering method that can solve the clusters of a subset of samples. The existing clustering criterions usually considers the clustering of all vertices. For example, in the directed graph clustering method [30], the vertices are clustered by merging the similar vertices sets at each step until the number of the clustering is reached. In this paper, we propose a new directed graph clustering method for our co-segmentation tasks. We first introduce several definitions in directed graph theory. In a directed graph G, the number of edges pointing

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

5

to a node v is called the in-degree deg − (v) of the node and the number of edges starting from the node is its out-degree deg + (v). Node u connects to v (u → v) if G contains a directed path from u to v. It is seen that the connectivity indicates that node vij is similar to vkl . The connectivity meets transitive law, but does not fulfill commutative law. In other words, vi is similar to vj , vj may not be similar to vi . For example, a background proposal is unique, and may connect to a foreground proposal. But the foreground proposal will not connect back to the background proposal. This means that there are wrong connections in G which will interfere the clustering. To avoid the interferences, we believe that two nodes u and v will be labeled the same cluster only if u and v connect with each other. This can guarantee the clustering of very similar proposals because only the very similar proposals satisfy the criterion. Based on the connection of two nodes, we define a set V" ⊆ V to be a cluster if any pair u ∈ V" and v ∈ V" are connected with each other, which is a strict criterion and will result in a compact clustering of the samples. The goal of our clustering process is to search all the clusters in G. It is seen that the cluster defined here is the strongly connected component of directed graph G. The strongly connected components of a directed graph G are its maximal strongly connected subgraphs, where any pair of vertexes are connected by a path. Hence, the problem changes to search strongly connected component in G. The strongly connected components in a digraph G can be efficiently searched in linear time. Here, we introduce a linear time algorithm to obtain the strongly connected component of a graph from a given node. To introduce our algorithm, we give two definitions: D+ (u) = {u} ∪ {v|∀v, u → v} and D− (u) = {u} ∪ {v|∀v, v → u}, where D+ (u) is the set of nodes to which u connect, and u is linked by the nodes in D− (u). Based on the definition of the connection, we have the lemma: Lemma 1. Given two vertices u and v, they are in the same class if u ∈ D+ (v) and v ∈ D+ (u). Proof. Given two vertices with u ∈ D+ (v) and v ∈ D+ (u), it indicates that there are directed paths on the graph that starts from v to u and from v to u. That is the two vertices connect with each other. Hence, the two vertices are in the same class. Then, based on the lemma, we have the following theorem: Theorem 2. Given a node u, Su is the cluster containing u. We have Su = D+ (u) ∩ D− (u). Proof. we first proof that any pair of vertices u ∈ D+ (u) ∩ D− (u), v ∈ D+ (u) ∩ D− (u) are in the same class. Then, we proof that there are no vertices v ̸∈ D+ (u) ∩ D− (u) that belongs to Su . Given ∀v ∈ D+ (u) ∩ D− (u), we have v ∈ D+ (u) and u ∈ D+ (v). Hence, u and v are in the same class according to the Lemma 1. Based on the transitive law, any pair of vertices u and v are in the same class. Assume there is a vertex v ∈ Su and v ̸∈ D+ (u) ∩ D− (u). As v and u are in the same cluster and based on the definition

of a cluster, we have u ∈ D+ (v) and v ∈ D+ (u). Hence, v ∈ D+ (u) and v ∈ D− (u), which is contrast to the assumption. Theorem 2 gives us a method to obtain the cluster in referring to each vertex. Given a vertex u, the corresponding cluster Su can be obtained by calculating the intersection of its sets D+ (u) and D− (u). In our method, the set of D+ (u) can be obtained by depth-first search algorithm (DFS) with u as the root. For D− (u), the search can be done by performing depth-first search algorithm on the transpose graph G˜ of G, where G˜ is the graph by reversing the directions of all edges in G. The algorithm of searching the cluster Su for a node u (CFN: the algorithm of Clustering For Node) is shown in Algorithm 1, where D+ (u) and D− (u) are the set of nodes that have the path directed from u to v (u → v) and the path directed from v to u (v → u), respectively. Algorithm 1 Su = CFN(G, u): the algorithm for searching cluster with respect to u. Input: Graph G and vertex u. Output: The cluster Su with respect to u. D+ (u) ← DFS(G, u). Obtaining G˜ of G. ˜ u). D− (u) ← DFS(G, Su ← D+ (u) ∩ D− (u). Based on Algorithm 1, the clusters of G can be obtained by iteratively searching the clusters of the unclassified vertices until all clusters satisfying the criterion are obtained. Meanwhile, the vertex with small input degree usually corresponds to a small set of cluster. Here, we set threshold on degree deg − (v) of the vertices to avoid the small class clustering. The algorithm for graph clustering (CFG: the algorithm of Clustering For Graph) is presented in Algorithm 2, where T = 1 is threshold. It is seen that we only consider the vertices u with deg − (u) ≥ 0, since |Su | = 0 if |deg − (u)| = 0. Algorithm 2 S = CFG(G): the algorithm for searching clusters S in graph G. Input: Graph G, The input degree matrix Di . Output: The cluster S = {Sk }, k = 1, · · · , Ns . k=1 u ← arg maxu deg − (u). While deg − (u) ≥ T Sk ← CFN(G, u). k = k + 1. deg − (v) = 0, for ∀v ∈ Sk . u ← arg maxu deg − (u). End While S ← {Sk }, k = 1, · · · , Ns . As mentioned in Section I, three constraints need to be added in the clustering algorithm for the co-segmentation problem, such as a): the constraints of the subset clustering

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

6

(by neglecting noise data), b): avoiding incorrect initial rough foreground prior and c): clustering data from different images without overlap. In our clustering method, the first constraint (subset clustering) is achieved by setting a strong clustering criteria, i.e., strongly connected component that only the samples connected with each other will be clustered into one class. Given noise data, although it may connect to the foreground data, there are no return connections so that the noise data will not be clustered into any foreground class. The second constraint is described in the weight calculation of edges, where only the foreground region pairs with both similar feature and large foreground prior will have small distance and will be clustered into the same class. The third constraint is represented by the constraints of the K-nearest neighbor selection (the connection constraints), i.e., only the regions from different K images are allowed to be connected by edges so that it is able to avoid the clustering of the regions from the same image. According to Algorithm 2, the foregrounds can be classified into several clusters. Although the Algorithm 2 can avoid the clustering of the noise data, the proposals of the same object may be classified into several classes due to the strict constraint of the cluster generation. We next merge the clusters based on their similarity distances. A pair of clusters Si and Sj will be merged when d(Si , Sj ) < Tm , where d is the distance between two clusters and Tm is the threshold. The merging will be iteratively performed until d(Si , Sj ) > Tm for any cluster pair (Si , Sj ). Given a set of clusters S = {S1 , S2 , · · · , SNs }, we first merge Si and Sj if d(Si , Sj ) is the minimal value among all cluster pairs, and d(Si , Sj ) < Tm . We next update S by replacing Si and Sj by Si ∪ Sj . Then, we repeat the merging procedure until the smallest distance is larger than Tm , i.e., d(Si , Sj ) > Tm . Based on the fact that two similar clusters have more edges between each other, the distance between two clusters are measured by the edges between the cluster pair [30], which is represented by ! d(Si , Sj ) = − w(u, v)/ min(|Si |, |Sj |) (3) (u,v)∈Eij

where Eij is the set of edges between Si and Sj , w(u, v) is the weight of edge (u, v) representing the number of edges. |Si | is the number of samples in cluster Si . Large number of edges with small size of set leads to the large similarity measurement. The negative value is used in (3) by aiming at obtaining a small value (distance) for similar set. Here, we denote Gm (constructed by Km nearest neighbours) as the directed graph for accounting the edge numbers, and we set Km = K and Tm = −2. We display some clustering results in Figs. 2 (a)-(d), which show that both the foreground and background classes are successfully clustered by our method, such as the “Fishes” in Figs. 2 (d) and the “Person” in Figs. 2 (b). For each cluster, the elements are very similar without noise data. Furthermore, these clusters can provide sufficient object priors for the next prior propagation step.

D. The propagation of the clustering segments We next generate and propagate the appearance model A = {A1 , · · · , ANs } and the foreground probability C = {C1 , · · · , CNs } for clusters S = {S1 , S2 , · · · , SNs }, where Ai and Ci correspond to Si . Finally we obtain the foreground probability maps B = {B1 , · · · , BM } for all images based on A and S. We first compute the appearance model Ai for each cluster Si . Denote the cluster as Si = {S1 , S2 , · · · , SNi }, where Sk is the k-th sample. Each Sj has feature hj , and we define matrix H = [h1 , h2 , · · · , hNi ] by concatenating these features. Each column of H represents a feature of sample. Then, we perform the low-rank matrix discovery [33] on H and get low-rank components L and sparse component E, which is represented by min L,E

= rank(L) + λ∥E∥1

s.t.

H =L+E

(4)

We next select # the column i∗ with the minimal sum in E by i∗ = arg mini j |E(j, i)| and treat hi∗ as the appearance model of Si , i.e., Ai = hi∗ . In our method, we use the code in [33] 1 to perform low-rank matrix discovery. The foreground probability Ci for Si is obtained by averaging the foreground probabilities of the samples, i.e., # Pkj ∈Si Rjk (5) Ci = |Si | where Pkj is the object proposal in Si , Rjk is calculated by (1). We next propagate the appearance model A and the foreground probability C based on each object proposal. Given Pji with feature fij , the foreground probability bij is propagated by bij = d(fij , Ak∗ ) · Ck∗

(6)

k ∗ = arg max d(fij , Ak )

(7)

where k

and d(fij , fkl ) =

β !

min(fij (a), fkl (a))

(8)

a=1

where β is the length of fij . It is seen that distance in (8) is histogram intersection, which will be large if the two features are similar. The formula in (7) is to select the nearest cluster Ak∗ based on the distance (8). Then, the distance d(fij , Ak∗ ) and the foreground probability Ck∗ of Ak∗ are used to calculate bij . Based on (6)-(8), each pixel in Pji is given the value bij . By finishing all Pji ∈ P i , we obtain the foreground probability map Bi of Ii . Note that a pixel may be given more than two values of bij since it may belong to several Pji . Here, we chose the one with the largest foreground probability. Furthermore, normalization is performed to make the value of Bi being in [0, 1]. We display some propagation results in Fig. 3, where the results on Flickr MFC dataset and ICoseg dataset are shown in 1 http://perception.csl.illinois.edu/matrix-rank/sample

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

code.html

10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

7

(a)

(b)

(c)

(d)

Fig. 2. The clustering results obtained by our constrained directed graph clustering method.

Fig. 3(a) and Fig. 3(b), respectively. It is seen that the objects are assigned with large foreground probabilities from complex backgrounds, such as the “Bears” and “Football players”. Moreover, there are multiple classes in some images, such as the “Butterflies” and “Flowers” in butterfly. The propagation method obtains large high foreground probabilities for these multiple classes, which demonstrates the effectiveness of the proposed propagation method. Based on the foreground probability maps, we finally obtain the multiple foregrounds by graph-cuts algorithm where the foreground probability map (Bi ) is used as the unary term of the energy function in the Markov random filed based segmentation model. IV. T HE EXPERIMENTAL RESULTS In this section, we verify the proposed co-segmentation method on FlickrMFC dataset [10] and ICoseg dataset [19]. FlickrMFC consists of 14 groups, each of which contains more than two classes. ICoseg dataset is a single class based image dataset which is widely used for co-segmentation verification. All the image groups in FlickrMFC and ICoseg dataset are used in our experiments. In our method, color histogram is used to represent each object proposals. The K in KNN graph construction is adjusted among the groups to obtain the better performance. The value of K is in the range of 2 to 4. A. The Co-segmentation Results 1) The verification on FlickrMFC dataset: We first show the results on FilckrMFC dataset. Some subjective cosegmentation results are shown in Fig.4, where the results of six classes such as Apple, Butterfly, Cow, Parrot, Dolphin and Fishing are selected. For each image class, six original images and the co-segmentation results are presented. From Fig.4, we can see that more than two classes exist in each group, and the classes contained in each image are varying. It is also seen that the proposed multi-object co-segmentation can successfully segment the common objects from these images. For example, in cow, both the “cow” and “person” are simultaneously extracted, which mainly benefits from the accurate generation and propagation of the object priors by the proposed method. To verify the proposed method, we compare our method with several state-of-the-arts co-segmentation methods such as

the methods in [18], [24], [26] and [10]. The method in [18] is a single class co-segmentation method, which formulates co-segmentation as discriminative clustering classifier search problem. This method can extract similar regions from dissimilar backgrounds. In [24], a linear anisotropic diffusion system based multi-class co-segmentation method is proposed, which is able to extract multiple common regions among the images with fast speed and low computational cost. An EM based algorithm combing spectral-and discriminative-clustering was proposed in [26] which iteratively update the foreground priors and the common region extractions by the number of foreground classes. In the method [10], a multiple foreground co-segmentation method is proposed, which iteratively learned the foreground models and the pixel assignments from the image level labels. For fair comparison, the source codes23 released by the authors are used in our experiments. Furthermore, we adjust the corresponding parameters for each method to obtain better cosegmentation performance. In our experiments, the Chi-square kernel is selected for the method in [18]. In the experiments of [24] and [26], the number of segments (K) and the foreground label are adjusted for the better co-segmentation performance. Because there are no image level label on ICoseg dataset for the method in [10], we manually set the labels for comparison. We also objectively evaluate the proposed co-segmentation method based on the intersection-over-union metric (IOU), which is defined as the ratio of the area of the intersectional foreground regions between the result and the ground-truth to i ∩Ri the area of their union foreground regions, i.e., GT GTi ∪Ri . The large IOU value corresponds to the accurate co-segmentation performance. The mean IOU value over the images is used to evaluate the performance of each group. The values of the proposed co-segmentation method on FlickrMFC dataset are shown in Table I. We can see that the proposed cosegmentation method achieves large accuracy in many classes, such as Apple and Butterfly, which demonstrate the effectiveness of our method. The IOU values of the existing methods on FlickrMFC dataset are shown in Table I. We can see that the method in [10] obtains the largest IOU values (0.416) among the existing methods due to the image level labels, which can help the foreground prior generation. Meanwhile, the average IOU values 2 www.di.ens.fr/∼joulin 3 http://www.cs.cmu.edu/∼gunhee

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

8

(a)

(b)

Fig. 3. The foreground probability maps obtained by the proposed propagation method. (a) The results on Flickr MFC dataset. (b) The results on ICoseg dataset.

Fig. 4. The segmentation results of the proposed method on FlickrMFC dataset.

of the existing methods are smaller than 0.42, which is caused by the foreground variations and the cluttered backgrounds in the FlickrMFC dataset. Compared with the existing methods, the proposed method achieves the largest IOU values on most classes (13 of 14). Meanwhile, the average IOU values over all classes of FlickrMFC dataset is 0.547, which is at least 0.1309 larger than the existing co-segmentation methods. The comparison values demonstrate that the proposed method is effective for the multiple foreground co-segmentation. 2) The verification on ICoseg dataset: We next verify the proposed method on ICoseg dataset. The segmentation results are displayed in Fig. 5. For each image class, six original images and the corresponding co-segmentation results are exhibited. It is seen that the original images contain similar backgrounds, such as “Grasses” and “Skys” in Liverpool and Souwester, respectively. Meanwhile, the proposed method successfully segments the common objects from these images, such as “Soccer players” and “Stones” in Souwester. Furthermore, there are multiple foregrounds on several classes, such as the players with red and yellow clothes in Liverpool. It is seen that these “players” are extracted by the proposed method. The objective IOU values of the proposed method on ICoseg dataset are shown in Table II. It is seen that our method gains large IOU values on many classes, such as Pandas, Cheetah and Rich. We display the IOU values of the existing cosegmentation methods [18], [24], [26] and [10] in Table II for comparison. It is seen that the proposed method achieves the largest IOU value in many classes (in 20 of 38 classes). Moreover, the mean IOU value of the proposed method

(0.567) is about 0.104 larger than the largest mean IOU value (0.463) of the existing methods, which further demonstrates the effectiveness of the proposed method. It is seen that there are also 18 classes (in 38 classes) that our method does not obtain the best IOU values. The reason is that ICoseg dataset contains one common object (single class co-segmentation) with simpler backgrounds compared with FlickrMFC dataset. The existing methods can also perform well on some classes by careful parameter setting, such as the method of [24] that achieves the best IOU values on 9 classes. It is also seen that although our IOU values are not the largest in some classes, our IOU values are also comparable with the best IOU values. Hence, the average IOU value of the proposed method is larger than the average values of the existing methods. B. The Directional Graph Clustering Results We next verify the proposed directed graph clustering method on object image databases such as COIL-20 and COIL-100 [30], where there are 20 and 100 object classes in COIL-20 and COIL-100, respectively. Several changes of the proposed directed graph clustering method are used for the verification. Firstly, we remove the edge connection constraints in constructing the KNN graph, and use the k nearest feature distances between the image features to generate the directed graph. Secondly, we merge the clustering results by the method in section III-C until a given class number is reached. Therefore, the clustering algorithm can perform the clustering task

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

9

TABLE I R ESULTS COMPARISON BETWEEN THE PROPOSED CO - SEGMENTATION METHOD AND THE EXISTING METHODS IN TERMS OF IOU. FlickrMFC Method [18] [24] [26] [10] Ours Method [18] [24] [26] [10] Ours

Apple 0.505 0.397 0.548 0.657 0.779 Gorilla 0.325 0.214 0.270 0.319 0.399

Baseball 0.532 0.261 0.487 0.401 0.609 Liberty 0.487 0.255 0.163 0.427 0.520

Butterfly 0.560 0.427 0.584 0.406 0.723 Parrot 0.399 0.472 0.397 0.436 0.485

Cheetah 0.442 0.252 0.407 0.337 0.568 Stonehenge 0.526 0.494 0.645 0.531 0.466

Cow 0.416 0.272 0.498 0.549 0.577 Swan 0.282 0.203 0.300 0.312 0.497

Dog 0.333 0.318 0.343 0.375 0.528 Thinker 0.322 0.285 0.258 0.425 0.491

Dolphin 0.457 0.267 0.464 0.341 0.544 Average 0.414 0.307 0.400 0.416 0.547

Fishing 0.204 0.189 0.234 0.304 0.471

Fig. 5. The segmentation results of the proposed method on ICoseg dataset.

TABLE II R ESULTS COMPARISON BETWEEN THE PROPOSED CO - SEGMENTATION METHOD AND THE EXISTING METHODS ON IC OSEG DATASET IN TERMS OF IOU. ICoseg Method [18] [24] [26] [10] Ours Method [18] [24] [26] [10] Ours Method [18] [24] [26] [10] Ours Method [18] [24] [26] [10] Ours Method [18] [24] [26] [10] Ours

Alaskan 0.413 0.308 0.453 0.231 0.534 Elephants 0.310 0.534 0.454 0.248 0.400 kitekid 0.221 0.290 0.223 0.406 0.496 Hall-white 0.486 0.520 0.514 0.065 0.484 SanSuiSo 0.841 0.654 0.844 0.485 0.717

Sox 0.553 0.634 0.541 0.511 0.508 Goose 0.711 0.791 0.697 0.613 0.547 Margate 0.678 0.491 0.678 0.348 0.571 Monks 0.631 0.727 0.649 0.504 0.726 Brown 0.788 0.497 0.764 0.405 0.562

Rjt208 0.375 0.758 0.386 0.161 0.378 Pandas 0.363 0.400 0.658 0.419 0.710 Colt 0.280 0.107 0.282 0.492 0.475 Balloons 0.484 0.364 0.228 0.468 0.557 Average 0.463 0.406 0.460 0.363 0.567

Souwester 0.459 0.296 0.581 0.712 0.538 Helicopter 0.609 0.091 0.283 0.191 0.765 Gymna1 0.304 0.538 0.307 0.569 0.594 Liberty 0.879 0.285 0.860 0.484 0.926

Liverpool 0.397 0.453 0.413 0.279 0.435 Planes 0.249 0.034 0.223 0.148 0.304 Gymna2 0.205 0.325 0.234 0.312 0.512 Christ 0.809 0.438 0.324 0.378 0.764

Ferrari 0.520 0.660 0.527 0.539 0.614 Huntsville 0.137 0.019 0.086 0.150 0.449 Gymna3 0.525 0.346 0.525 0.399 0.728 Speed 0.227 0.149 0.112 0.186 0.202

Taj 0.734 0.457 0.734 0.392 0.469 Cheetah 0.215 0.378 0.541 0.372 0.769 Rich 0.611 0.338 0.610 0.213 0.772 Track 0.190 0.179 0.169 0.153 0.422

Inde 0.198 0.260 0.161 0.233 0.305 Pandas2 0.393 0.351 0.396 0.556 0.641 Grand 0.479 0.515 0.488 0.424 0.660 Windmill 0.269 0.176 0.262 0.334 0.523

Egypt 0.557 0.440 0.590 0.143 0.397 Brighton 0.489 0.222 0.519 0.367 0.723 Hall-red 0.396 0.504 0.396 0.592 0.614 Vincent 0.600 0.903 0.755 0.297 0.762

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

10

based on a given class number for fair comparison. Thirdly, it is suggested to use a larger Km in the calculation of (3) for COIL-20 and COIL-100 datasets [30]. Here, we set K = 3 and Km = 20 for KNN graph construction in the clustering and merging step, respectively. In the verification, we first perform the clustering algorithm on all the images in COIL-20 and COIL-100 datasets to obtain 20 and 100 classes respectively. Then, we compare the results with the ground-truth results and measure the clustering performance by Mutual Information (NMI) [34]. A larger NMI values relates to better clustering results. The feature of the grey pixel values are used to represent the images. Euclidean distance is used to generate the KNN graph. The NMI values of the proposed method and the comparison methods are shown in Table III, where k-med [35], Link [35], G-Link [35], NCuts [36], NJW-SC [37], DGSC [38], STSC [39] , Zell [40] and AGDL [30] are used for comparison. It is seen that the NMI values of the proposed method are larger than the comparison methods on the two dataset, which demonstrates that the proposed clustering method outperforms the comparison clustering methods.

Fig. 8. The mean IOU values by the proposed method based on different selection of K. The x-axis is the number of K. The y-axis is the mean IOU values. The results on the FlickrMFC and ICoseg dataset are shown.

C. Discussion In our experiments, we adjust the number of the nearest neighbours K to obtain the better co-segmentation results. We discuss the sensitiveness of the proposed method to the setting of K. The IOU values of the proposed method under different settings of K (K = 2, 3, 4) are shown in Fig. 6 and Fig. 7, respectively. MFC and ICoseg dataset are considered. The last column of each figure is the average IOU value over all groups. It is seen that the largest average IOU values are obtained by setting K = 3 and K = 2 for FlickrMFC dataset and ICoseg dataset, respectively. Moreover, the differences between the average IOU values of K = 2, 3 and 4 are small on both FlickrMFC dataset and ICoseg dataset, which demonstrates the robustness of our method to the setting of K. In our method, K represents the number of nearest neighbours linked with each node. We select the small values of K in K ∈ [2, 4]. The reason is that a small selection of K (such as k = 2, 3 and 4) describes the very similar relationships between data (with less noise relationships), which leads to many small clusters for a group of images (≫3). These small clusters contain consistent data with less noise data. Meanwhile, a large selection of K can describe more sufficient relationships between data but also introduce much noise information. This will lead to large size of clusters with much noise data that is difficult to be deleted. Since the small clusters can be combined to large clusters by the merging step, small selection of K is able to generate large size of clusters with less noise data. Hence, we select small K ranging in [2, 4] for clustering. When a large K is selected, more noise data will be clustered, which inevitably interferes the class priors generation and leads to unsuccessful co-segmentation. Fig. 8 displays the mean IOU values by the proposed method based on different selection of K. The x-axis is the number of K. The y-axis is the mean IOU values. The results on the FlickrMFC and ICoseg dataset are shown. It is seen that

(a)

(b)

(c)

Fig. 11. The failure segmentation results by the proposed method. (a)-(c):the three original images, the segmentation results and the ground-truth results, respectively.

larger selection of K (such as K = 6, 8, 10) leads to smaller mean IOU values on both the FlickrMFC and ICoseg dataset. Moreover, large average IOU values are usually obtained by K ∈ [2, 4] for both the FlickrMFC and the ICoseg dataset. Hence, we select K ∈ [2, 4]. We next compar our segmentation results with several existing co-segmentation methods. The results of classes of Apple and Butterfly from FlickrMFC dataset and Pandas2 and Hall-red from ICoseg dataset are shown in Fig. 9 and Fig. 10, respectively. The original image, the segmentation results of the methods in [18], [24], [26], [10] and the proposed method are shown from the top row to the bottom row, respectively. It is seen that the existing methods fail to extract the foreground regions in some images, such as the fifth image of Butterfly in Fig. 9 that the “butterfly” is segmented as backgrounds by [18], [24], [26] and [10]. Moreover, in some images, the backgrounds are wrongly segmented as the common objects, such as the results of the fourth image of Pandas2 by the methods [18], [24], and [10] in Fig. 10. We can see that the proposed method successfully extracts the common object from these images. The reason is that the existing methods are interfered by the noise data, which leads to the wrong segmentation for both the foreground and background. The proposed clustering algorithm instead considers the removing of the noise data in the clustering step, which leads to more accurate clustering and thus results in more successful cosegmentation. We further display some failure results in Fig. 11, where

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

11

TABLE III T HE NMI VALUES OF THE PROPOSED DIRECTED GRAPH CLUSTERING ALGORITHM AND THE COMPARISON CLUSTERING ALGORITHMS Dataset COIL-20 COIL-100

k-med 0.710 0.706

Link 0.647 0.606

G-Link 0.896 0.855

NCuts 0.884 0.823

NJW-SC 0.889 0.854

DGSC 0.904 0.858

STSC 0.895 0.858

Zell 0.911 0.913

AGDL 0.937 0.933

Ours 0.9629 0.9460

Fig. 6. The IOU values of the classes in FlickrMFC dataset under different K.

Fig. 7. The IOU values of the classes in ICoseg dataset under different K.

Fig. 9. From the top row to the bottom row: the original image in Apple and Butterfly (from FlickrMFC dataset), and the segmentation results of the methods in [18], [24], [26], [10] and the proposed method, respectively.

Fig. 10. From the top row to the bottom row: the original image in Pandas2 and Hall-red (from ICoseg dataset), and the segmentation results of the methods in [18], [24], [26], [10] and the proposed method, respectively.

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

12

the three original images, the segmentation results and the ground-truth results are shown in Fig. 11 (a)-(c), respectively. It is seen that our method fails to segment the common objects “Fish” from the image in the top row. Moreover, the object “Hand” is not extracted from the image in the middle row, and the image in the bottom row contains the background “Sky”. Two reasons cause these failure results. One is that the foregrounds are very similar to the backgrounds, which makes the co-segmentation difficult, such as the similar “shark” and “seafloor” in the image of the top row. Secondly, our method is based on the assumption that a part of the object initial prior is correct. When most initial foreground probabilities of the objects are incorrect, such as the objects of “Hand” and “Sky”, the objects will be wrongly extracted due to the incorrect foreground probability priors. In our method, the saliency information is used in two steps, i.e., generating the proposals (used in the proposal generation method in [3]) and achieving the K nearest neighbour searching (for the calculation of edge weight). Similar to the existing co-segmentation models [3], [7], [23], [25], the saliency plays an important part in our method since it is able to distinguish the common objects from the similar backgrounds. Moreover, the saliency information can be easily added into the proposed directed graph clustering algorithm to improve the final cosegmentation performance. The proposed method uses two graphs, i.e., the graphs in the graph-cut segmentation (the last step) and the graphs in the proposed clustering method. The first graph is undirected graph (the classical graph in graph-cut segmentation), where the nodes are the image pixels and the edges describe the similarities between the neighbour pixels (such as 3 × 3 neighbourhood). The graph in our method instead is directed graph, where the nodes represent the object proposal regions, and the directed edges describe the K-nearest relationships between regions. It is seen that the two graphs are different. To compare the proposed clustering method with the existing directed graph clustering method, we further replaced our partition criterion by the generalized weighted cut in directed graph (Spectral) clustering method [28], which extends the famous minimum weight cut [36] by aiming at searching the partition with low weight and balanced cluster sizes. The new partition is performed based on our directed graph for fair comparison. The spectral decomposition is used to search the partition. We perform the clustering algorithm [28] by MATLAB. The number of nearest neighbours (K ∈ [2, 4]) and the clustering centers (changing in [4,7]) are also varied to achieve better results. The comparison results are shown in Table IV. It is seen that our method obtains larger IOU values, which is caused by the fact that the noise data is neglected by our clustering criterion so that more accurate object priors are generated to improve the co-segmentation performance. V. C ONCLUSION In this paper, we design a new constrained directed graph clustering method and segmentation propagation method for multiple foreground co-segmentation problem, which consists of two steps: clustering and propagation. The clustering step

TABLE IV T HE SEGMENTATION RESULTS BY REPLACING OUR CLUSTERING METHOD TO THE DIRECTED GRAPH CLUSTERING METHOD [28]. T HE METHOD BASED ON OUR CLUSTERING AND [28] ARE NAMED [28]+C OSEG AND O UR +C OSEG , RESPECTIVELY. Method [28]+Coseg Our+Coseg

FlickrMFC Dataset 0.489 0.547

ICoseg Dataset 0.539 0.567

generates the appearance priors and foreground probability priors for the multiple classes. Then, these priors are propagated in the second step to distinguish and segment the foregrounds from the complicated backgrounds. We use the KNN directed graph to represent the similarity relationships between the proposals, and propose a new constrained directed graph clustering method to cluster the proposals that are not only very similar to each other, but also well fulfill the initial object priors. A new object proposal based segmentation propagation method is also proposed for the prior propagation. We verify the proposed method on both the co-segmentation and clustering tasks. The experimental results demonstrate the effectiveness of the proposed method. R EFERENCES [1] C. Rother, V. Kolmogorov, T. Minka, and A. Blake, “Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs,” in IEEE Conference on Computer Vision and Pattern Recognition, 2006. [2] H. Li, F. Meng, Q. Wu, and B. Luo, “Unsupervised multi-class region cosegmentation via ensemble clustering and energy minimization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, pp. 789–801, May 2014. [3] F. Meng, H. Li, G. Liu, and K. N. Ngan, “Object co-segmentation based on shortest path algorithm and saliency model,” IEEE Transactions on Multimedia, vol. 14, pp. 1429–1441, Oct. 2012. [4] F. Meng, H. Li, G. Liu, and K. N. Ngan, “Image cosegmentation by incorporating color reward strategy and active contour model,” IEEE Transactions on Cybernetics, vol. 43, pp. 725–737, April 2013. [5] H. Li, F. Meng, and K. N. Ngan, “Co-salient object detection from multiple images,” IEEE Transactions on Multimedia, vol. 15, pp. 1896– 1909, Dec. 2013. [6] L. Zhang, L. Wang, W. Lin, and S. Yan, “Geometric optimum experimental design for collaborative image retrieval,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, pp. 346–359, Feb 2014. [7] K. Chang, T. Liu, and S. Lai, “From co-saliency to co-segmentation: An efficient and fully unsupervised energy minimization model,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011. [8] Y. Chai, EsaRahtu, and V. Lempitsky, “Tricos: A tri-level classdiscriminative co-segmentation method for image classification,” in European Conference on Computer Vision, 2012. [9] S.-S. Lin, C.-H. Lin, S.-H. Chang, and T.-Y. Lee, “Object-coherence warping for stereoscopic image retargeting,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, pp. 759–768, May 2014. [10] G. Kim and E. P. Xing, “On multiple foreground cosegmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012. [11] F. Meng, H. Li, K. N. Ngan, L. Zeng, and Q. Wu, “Feature adaptive cosegmentation by complexity awareness,” IEEE Transactions on Image Processing, vol. 22, pp. 4809–4824, Dec 2013. [12] M. Heritier, L. Gagnon, and S. Foucher, “Places clustering of full-length film key-frames using latent aspect modeling over sift matches,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, pp. 832–841, June 2009. [13] F. Meng, B. Luo, and C. Huang, “Object co-segmentation based on directed graph clustering,” in Visual Communications and Image Processing (VCIP), pp. 1–5, Nov 2013.

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

13

[14] L. Mukherjee, V. Singh, and C. R. Dyer, “Half-integrality based algorithms for cosegmentation of images,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009. [15] D. S. Hochbaum and V. Singh, “An efficient algorithm for cosegmentation,” in International Conference on Computer Vision, 2009. [16] S. Vicente, V. Kolmogorov, and C. Rother, “Cosegmentation revisited: models and optimization,” in European Conference on Computer Vision, 2010. [17] D. Batra, D. Parikh, A. Kowdle, T. Chen, and J. Luo, “Seed image selection in interactive cosegmentation,” in IEEE International Conference on Image Processing, 2009. [18] A. Joulin, F. Bach, and J. Ponce, “Discriminative clustering for image co-segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2010. [19] D. Batra, A. Kowdle, and D. Parikh, “Icoseg: interactive co-segmentation with intelligent scribble guidance,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 3169–3176, 2010. [20] L. Mukherjee, V. Singh, and J. Peng, “Scale invariant cosegmentation for image groups,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011. [21] S. Vicente, C. Rother, and V. Kolmogorov, “Object cosegmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011. [22] M. Collins, J. Xu, L. Grady, and V. Singh, “Random walks for multiimage cosegmentation: Quasiconvexity results and gpu-based solutions,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012. [23] J. Rubio, J. Serrat, A. L´ opez, and N. Paragios, “Unsupervised cosegmentation through region matching,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012. [24] G. Kim, E. P. Xing, L. Fei-Fei, and T. Kanade, “Distributed cosegmentation via submodular optimization on anisotropic diffusion,” in International Conference on Computer Vision, 2011. [25] Y. Chai, V. Lempitsky, and A. Zisserman, “Bicos: A bi-level cosegmentation method for image classification,” in International Conference on Computer Vision, 2011. [26] A. Joulin, F. Bach, and J. Ponce, “Multi-class cosegmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012. [27] G. Kim, E. P. Xing, L. Fei-Fei, and T. Kanade, “Distributed cosegmentation via submodular optimization on anisotropic diffusion,” in International Conference on Computer Vision, 2011. [28] M. Meila and W. Pentney, “Clustering by weighted cuts in directed graphs,” in In Proceedings of the 2007 SIAM International Conference on Data Mining, 2007. [29] B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science, vol. 315, pp. 972–976, 2007. [30] W. Zhang, X. Wang, D. Zhao, and X. Tang, “Graph degree linkage: Agglomerative clustering on a directed graph,” in European Conference on Computer Vision, 2012. [31] M. Cheng, G. Zhang, N. J. Mitra, X. Huang, and S. Hu, “Global contrast based salient region detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011. [32] L. Xu, H. Li, L. Zeng, and K. N. Ngan, “Saliency detection using joint spatial-color constraint and multi-scale segmentation,” Journal of Visual Communication and Image Representation, vol. 24, pp. 465–476, May 2013. [33] Z. Lin, M. Chen, and Y. Ma., “The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices..” UIUC Tech. Rep. UILU-ENG-09-2214, 2010. [34] M. Wu and B. Scholkopf, “A local learning approach for clustering,” in Proceedings of Advances in Neural Information Processing Systems, pp. 1529–1536, 2006. [35] T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: data mining, inference and prediction, Second edition. Springer, 2009. [36] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888–905, 2000. [37] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” Proceedings of Advances in Neural Information Processing Systems, vol. 14, pp. 849–856, 2001. [38] D. Zhou, J. Huang, and B. Scholkopf, “Learning from labeled and unlabeled data on a directed graph,” in International conference on Machine learning, pp. 1036–1043, ACM, 2005. [39] L. Zelnik-Manor and P. Perona, “Self-tuning spectral clustering.,” in Proceedings of Advances in Neural Information Processing Systems, vol. 17, p. 16, 2004.

[40] D. Zhao and X. Tang, “Cyclizing clusters via zeta function of a graph.,” in Proceedings of Advances in Neural Information Processing Systems, pp. 1953–1960, 2008.

Fanman Meng (S’12-M’14) received the Ph.D. degree in single and information processing from University of Electronic Science and Technology of China (UESTC), Chengdu, China, in 2014. From July 2013 to July 2014, he joined Division of Visual and Interactive Computing of Nanyang Technological University in Singapore as a Research Assistant. He is currently an Associate professor in the School of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China. His research interests include image segmentation and object detection. Dr. Meng has authored or co-authored numerous technical articles in wellknown international journals and conferences. He received the ”Best student paper honorable mention award” for the 12th Asian Conference on Computer Vision (ACCV 2014) in Singapore and the “Top 10% paper award” in the IEEE International Conference on Image Processing (ICIP 2014) at Paris, France. He is now a member of IEEE and IEEE CAS society.

Hongliang Li (SM’12) received his Ph.D. degree in Electronics and Information Engineering from Xi’an Jiaotong University, China, in 2005. From 2005 to 2006, he joined the visual signal processing and communication laboratory (VSPC) of the Chinese University of Hong Kong (CUHK) as a Research Associate. From 2006 to 2008, he was a Postdoctoral Fellow at the same laboratory in CUHK. He is currently a Professor in the School of Electronic Engineering, University of Electronic Science and Technology of China. His research interests include image segmentation, object detection, image and video coding, visual attention, and multimedia communication system. Dr. Li has authored or co-authored numerous technical articles in well-known international journals and conferences. He is a co-editor of a Springer book titled “Video segmentation and its applications”. Dr. Li was involved in many professional activities. He is a member of the Editorial Board of the Journal on Visual Communications and Image Representation, and the Area Editor of Signal Processing: Image Communication, Elsevier Science. He served as a Technical Program co-chair in ISPACS 2009, General co-chair of the ISPACS 2010, Publicity co-chair of IEEE VCIP 2013, local chair of the IEEE ICME 2014, and TPC members in a number of international conferences, e.g., ICME 2013, ICME 2012, ISCAS 2013, PCM 2007, PCM 2009, and VCIP 2010. He serves as a Technical Program Co-chairs for IEEE VCIP2016. Dr. Li was selected as the New Century Excellent Talents in University, Chinese Ministry of Education, China, in 2008. He is now a senior member of IEEE.

Shuyuan Zhu (S’08-A’09-M’13) received the Ph.D. degree from the Hong Kong University of Science and Technology, Kowloon, Hong Kong, in 2010. He is currently an Associate Professor with the University of Electronic Science and Technology of China, Chengdu, China. His research interests focus on image/video compression.

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

10.1109/TCSVT.2015.2402891, IEEE Transactions on Circuits and Systems for Video Technology

14

Bing Luo received the B.Sc. degree in communication engineering and the M.Sc. degree in computer application technology in 2009 and 2012 respectively. Since September 2012, he has been working toward the Ph.D degree in the Intelligent Visual Information Processing and Communication Laboratory (IVIPC) at University of Electronic Science and Technology of China (UESTC), supervised by Professor Hongliang Li. His research interests include image and video segmentation, machine learning.

Chao Huang received a BSc degree in University of Electronic Science and Technology of China in 2012, advised by Professor Hongliang Li. Now, he is working toward a PhD degree in the Institute of Image Processing at the University of Science and Technology of China. His research interests include image classification and segmentation.

Bing Zeng received his BEng and MEng degrees from University of Electronic Science and Technology of China, Chengdu, Sichuan, China, in 1983 and 1986, respectively, and his PhD degree from Tampere University of Technology, Finland, in 1991, all in electrical engineering. He worked as a postdoctoral fellow at University of Toronto during 1991-92 and as a researcher at Concordia University during 1992-1993. He was a visiting researcher at Microsoft Research Asia, Beijing, China in 2000. He joined the Hong Kong University of Science and Technology in 1993 and is currently a full professor in Department of Electronic and Computer Engineering. Since August 2013, he has been a National “1000-Talent-Program” Chair Professor at University of Electronic Science and Technology of China, where he leads the Institute of Image Processing to work on image/video coding and processing, 3D/multi-view video technology, and visual big data. In 2014, he received a 2nd Class Natural Science Award (the first recipient) from Chinese Ministry of Education. Prof. Zeng¡¯s general research interests include image/video coding and processing, image super-resolution, endoscopy image/video processing, compressive sensing (CS) theory and applications, light-field signal processing, and HDTV technology. These research activities have generated over 200 publications in various leading journals and conferences. Three representing works are as follows: one paper on fast block motion estimation, published in IEEE TCSVT in 1994, has been SCI-cited more than 870 times (Googlecited more than 1800 times) and currently stands at the 7th position among all papers published in this IEEE journal (since its launch in 1991); one paper on smart padding for arbitrarily-shaped image blocks, published in IEEE TCSVT in 2001, leads to a US patent that has been successfully licensed to industry; and one paper on directional transforms, published in IEEE TCSVT in 2008, receives the 2011 IEEE Transactions on Circuits and Systems for Video Technology Best Paper Award. He also received the best paper award at ChinaCom three times (2009 Xian, 2010 Beijing, and 2012 Kunming). Prof. Zeng served as an associate editor for the IEEE Transactions on Circuits and Systems for Video Technology during 1995-99 and 2010-14. He is now on the Editorial Board of Journal of Visual Communication and Image Representation. He also served in various positions in a number of international conferences. He is a member of Visual Signal Processing & Communications Technical Committee of IEEE CAS Society and Multimedia Communications Technical Committee of IEEE COM Society. He will be General Chair of IEEE VCIP-2016 to be held in Chengdu, China.

Moncef Gabbouj received his BS degree in electrical engineering in 1985 from Oklahoma State University, Stillwater, and his MS and PhD degrees in electrical engineering from Purdue University, West Lafayette, Indiana, in 1986 and 1989, respectively. Dr. Gabbouj is an Academy Professor with the Academy of Finland since January 2011. He held several visiting professorships at different universities, including The Hong Kong University of Science and Technology, Hong Kong (2012-2013), Purdue University, West Lafayette, Indiana, USA (AugustDecember 2011), the University of Southern California (January-June 2012), and the American University of Sharjah, UAE, (2007-2008). He holds a permanent position of Professor at the Department of Signal Processing, Tampere University of Technology, Tampere, Finland, where he leads the Multimedia Research Group. He was Head of the Department during 20022007, and served as Senior Research Fellow of the Academy of Finland in 1997-1998 and 2007-2008. His research interests include multimedia contentbased analysis, indexing and retrieval, machine learning, nonlinear signal and image processing and analysis, voice conversion, and video processing and coding. Dr. Gabbouj is a Fellow of the IEEE. He is currently the Chairman of the DSP Technical Committee of the IEEE Circuits and Systems Society and member of the IEEE Fourier Award for Signal Processing Committee. He was Honorary Guest Professor of Jilin University, China (2005-2010). He served as Distinguished Lecturer for the IEEE Circuits and Systems Society in 2004-2005, and Past-Chairman of the IEEE-EURASIP NSIP (Nonlinear Signal and Image Processing) Board. He was chairman of the Algorithm Group of the EC COST 211quat. He served as associate editor of the IEEE Transactions on Image Processing, and was guest editor of Multimedia Tools and Applications, the European journal Applied Signal Processing. He is the past chairman of the IEEE Finland Section, the IEEE Circuits and Systems Society, Technical Committee on Digital Signal Processing, and the IEEE SP/CAS Finland Chapter. He was also Chairman of CBMI 2005, WIAMIS 2001 and the TPC Chair of ISCCSP 2012, 2006 and 2004, CBMI 2003, EUSIPCO 2000, NORSIG 1996 and the DSP track co-chair of the 2013, 2012, 2011, and 1996 IEEE ISCAS. He is also member of EURASIP Advisory Board and past member of AdCom. He also served as Publication Chair and Publicity Chair of IEEE ICIP 2005 and IEEE ICASSP 2006, respectively, and the Innovation chair of ICIP 2011. He is a member of IEEE SP and CAS societies. Dr. Gabbouj was the Director of the International University Programs in Information Technology (1991-2007) and vice member of the Council of the Department of Information Technology at Tampere University of Technology. He is also the Vice-Director of the Academy of Finland Center of Excellence SPAG, Secretary of the International Advisory Board of Tampere International Center of Signal Processing, TICSP, and member of the Board of the Digital Media Institute. He served as Tutoring Professor for Nokia Mobile Phones Leading Science Program (2005-2007 and 1998-2001). Dr. Gabbouj was the supervisor of the main author receiving the Best Student Paper Awards from IEEE International Symposium on Multimedia, ISM 2011 and the 4th European Workshop on Visual Information Processing, EUVIP 2013. Two of his papers were ranked top 10 at the IEEE International Conference on Image Processing, ICIP 2013. He was recipient of the 2012 Nokia Foundation Visiting Professor Award, the 2005 Nokia Foundation Recognition Award, and co-recipient of the Myril B. Reed Best Paper Award from the 32nd Midwest Symposium on Circuits and Systems and the NORSIG Best Paper Award from the 1994 Nordic Signal Processing Symposium. He is co-author of 500+ publications and supervised 38 PhD theses. Dr. Gabbouj has been involved in several past and current EU Research and education projects and programs, including ESPRIT, HCM, IST, COST, Tempus and Erasmus. He also served as Evaluator of IST proposals, and Auditor of a number of ACTS and IST projects on multimedia security, augmented and virtual reality, image and video signal processing.

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.