Region Correspondence for Image Matching via

Region Correspondence for Image Matching via EMD Flow Hayit Greenspan Faculty of Engineering, Tel-Aviv University, Tel-Aviv 69978, Israel [email protected]

Guy Dvir Faculty of Engineering, Tel-Aviv University, Tel-Aviv 69978, Israel [email protected]

Abstract The content of an image can be summarized by a set of homogeneous regions in an appropriate feature space. When exact shape is not important, the regions can be represented by simple “blobs”. Even for similar images, the blobs in the two images might vary in shape, position, and the represented features. In addition, separate blobs in one image might get merged together in the other image. In this paper we present a novel method to compute the dissimilarity of two sets of blobs. Gaussian mixture modeling is used to represent the input images. The Earth Mover’s Distance (EMD) is utilized to compute both the dissimilarity of the images and the flow matrix of the blobs between the images. The flow is used to merge blobs such that the dissimilarity between the images gets smaller. Examples are shown on synthetic and natural images.

1. Introduction The proposed image matching framework combines an initial transition from image pixels to representative image regions (segments or blobs), followed by utilizing EMD for finding the best correspondences between regions in the two images, and extracting an overall image matching measure between two input images. We present the combined region-EMD framework, for a simultaneous solution to both the image region correspondence problem and the estimation of the distance between images Representation schemes, such as histograms and their variations, and related distances have become common practice in image matching applications. One of the variations of image histograms is the EMD distance measure [1] that extracts dominant modes from a histogram as a signature, and defines a measure of similarity between signatures. Several distance measures between histogram representations, in an image matching task are evaluated and compared in [1, 2, 3].

Yossi Rubner Net2Wireless, 11 Haamal St. Park Afek, Rosh Haayin 48092, Israel [email protected]

Several works extend the histogram representation to include spatial information. Unsupervised segmentation of an image into homogeneous regions in the feature space, such as the color and texture space, can be found in the “blobworld” image representation [4, 5]. In [5] the user composes a query by viewing the blobworld representation, and selecting the blobs to match, along with possible weighting of the blob features. In essence, the image matching problem is shifted to a (one or two) blob to image matching problem. Each blob in one image is compared with all blobs in a second, database image. The matching process between blobs is a nontrivial task, as regions in one image might get oversegmented in the other. In [5] this issue is addressed by using a single ‘blob to image’ matching problem. In this work we use the EMD flow matrix to enable ‘multiple-blob to multiple-blob’ matching. We show how the EMD flow provides for image representations that are more uniform, and best aligned between the two images to be matched. The overall framework of the image representation and matching phases is represented in Figure 1. We next elaborate on the different stages of the system and present initial results.

2 Image representation via Gaussian mixture modeling Each input image is modeled as a Gaussian mixture distribution in feature space. The image representation is a localized region representation, in which the image is first segmented into homogeneous regions in feature space. Each homogeneous region is represented by a Gaussian distribution in feature space. The set of regions in an image is represented by a Gaussian mixture. It should be noted that the representation model is a general one, and can incorporate any desired feature space (such as color, texture, shape, etc) or combination thereof. A continuous representation is used. The Expectation-Maximization (EM) algorithm is used (similar to [5]) to determine the maximum likelihood pa-

Matching via

Original image

Blobed image

Original image

Blobed image

EMD

Image1 1111111 0000000 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111

1111 0000 0000 1111 0000 1111 0000 1111

Image2 000000 111111 111111 000000 000000 111111 000000 111111

111 000 000 111 000 111

Gaussian Mixture Representation

111111 000000 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111

000000 111111 111111 000000 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 0000000 1111111 000000 111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111

Gaussian Mixture Representation

Figure 1. A block diagram of the region-EMD matching system.

rameters of a mixture of k Gaussians in the feature space. The first step in applying the EM algorithm to the problem at hand is to initialize the mixture model parameters. The K-means algorithm is utilized to extract the data-driven initialization. The MDL principle [4] is used to select the number of mixture components (or number of means), k , as best suits the natural number of groups present in the image. The EM algorithm, along with the model selection, can be applied to a particular feature space, such as color (in which case the image color space is being clustered into the most dominant colors of the image); alternatively, feature spaces may be combined within the mixture modeling framework, for example, if we wish to incorporate image spatial information into the representation we run the EM algorithm on both color and space domains simultaneously. An image representation example is shown in Figure 2 in which we see an input image (left) and a set of localized Gaussians representing the image (right). In this visualization each localized Gaussian mixture is shown as a set of ellipsoids. Each ellipsoid represents the support, mean color and spatial layout, of a particular Gaussian in the image plane (Please note: all Figures in the paper are color images and are described as such in the text. A color version of the paper may be found at: (http : ==www :eng :tau :ac :il = hayit =publications ).

3. Region correspondence via EMD flow In order to compute similarities between images that are represented by regions (or blobs), we need to define an appropriate dissimilarity measure. Often, a region in one image should be matched to the union of several regions in the second image. An example for this can seen in Figure 2. Both images show a lake and two trees. However, in the top image the lake is represented by a single region while in the bottom image it is represented by three regions. Similarly, the tree-tops in the bottom image are combined into a sin-

Figure 2. Original image (left) and image representation via a Gaussian mixture (right). Similar semantic content (“trees next to a lake") is represented by different number of regions and region colors.

gle region. In order for the dissimilarity measure to perform properly, it should take such cases into account. This can be done by the Earth Mover’s Distance (EMD) which in addition to computing the dissimilarity between sets of regions, also returns the correspondence (flow) between them.

3.1. The Earth Mover’s Distance (EMD) In [1] the concept of the Earth Mover’s Distance (EMD) is introduced as a flexible similarity measure between multidimensional distributions, and is described in detail therein. Intuitively, given two distributions represented by sets of weighted features, one can be seen as a mass of earth properly spread in the feature space, the other as a collection of holes in that same space. Then, the EMD measures the least amount of work needed to fill the holes with earth. Here, a unit of work corresponds to transporting a unit of earth by a unit of ground distance which is a distance in the feature space. The EMD is based on the transportation problem [6] and can be solved efficiently by linear optimization algorithms that take advantage of its special structure. Formally, let P = f(p1 ; wp1 ); : : : ; (pm ; wpm )g be the first set with m regions, where pi is the region descriptor and wpi is the weight of the region; Q = f(q1 ; wq1 ); : : : ; (qn ; wqn )g the second set with n regions; and DIST = [dist(pi ; qj )] the ground distance matrix where dist(pi ; qj ) is the distance between regions pi and qj . The EMD between sets P and Q is then

Pmi Pnj fij dist(pi; qj ) Pmi Pnj fij : EMD(P; Q) = =1

=1

=1

(1)

=1

where F = [fij ], with fij 0 the flow between pi and qj , is the optimal admissible flow from P to Q that minimizes

the numerator of (1) subject to the following constraints:

Xn f

j =1 m n

ij

XXf i=1 j =1

wpi ;

ij = min(

m X f i=1

ij

wqj

Xw ;Xw m

i=1

pi

n

j =1

qj ) :

More details on the EMD can be found in [1]. In this work, the ground distance dist(p; q) is defined as an equally weighted Euclidean distance of the representations of the two regions in a space that consists of the region size in pixels, their mean CIE-Lab color, their location, their variance in the xy-space, and their variance in color. The weighting may be updated according to the relative importance of the difference features.

3.2. Region correspondence We investigate the EMD flow matrix as a means for region correspondence between two input images. Most previous works have focused on a global representation scheme for the image. In such cases, the EMD flow matrix has little meaning in the image plane. We are now suggesting a transition to a localized representation and using the EMD flow matrix as a means for extracting region correspondences across an image pair. The flow matrix in our case shows the transformation of each region (blob) in image 1 (matrix rows) to regions (blobs) of image 2 (matrix columns). Each row is normalized so as to sum to 1, with flow values [0-1]. Utilizing the flow matrix, correspondence of blobs can be extracted. The first correspondence criteria, we use, requires a flow amount above a predefined threshold, between the source blob and the target blob. A second correspondence criteria requires spatial overlap between the two blobs. This way, blobs that are distant in spatial location but close in color space, for example, will not be considered. A candidate list is formed with the corresponding source-target blobs that pass the two criteria above. A follow up merging process entails going over the correspondence list. If two or more source blobs are in correspondence with a target blob, the source blobs are merged. In the merging process the mixture model is updated, resulting in a smaller set of blobs and updated feature characteristics. An iteration includes a dual correspondence and merging step, with the second pass switching the roles of the source and target images. The process is an iterative one, finalizing when no additional merging is possible. Using the combined region-EMD framework, both the region correspondence problem and the overall image pair distance is estimated, simultaneously. Regions are matched based on their model characteristics (e.g., mean color, color variance, mean texture, texture variance). The overall dis-

tance measure for a given image pair is computed based on the extracted flow matrix.

4. Examples In Figure 3 we illustrate several steps of region-EMD matching. The EMD flow matrix is generated for a given image pair, followed by a merging process of selected regions. Regions within an image are merged once found corresponding to the same region in the second image. Note that the process is a symmetric one, with interchanging of query (top) and target (bottom) images. The process is iterative, with each iteration reducing the overall matching distance between the input images. For each image, the Gaussian mixture representation is shown left, a segmented image (pseudo-coloring based on the extracted model) is shown center, and a corresponding silhouette image is shown on the right. The flow matrix is shown on the bottom. The entries in the flow matrix represent the amount of flow [0-1], according to the intensity level (brighter entries represent larger flow values). On the left side of the matrix is a column that represents the blobs in the source image, where each column entry represents the mean color of the respective blob. A row vector on the bottom represents the blobs in the target image (Note that the gray values in the column and row arrays are in correspondence with the image gray levels, not related to the intensity values of the flow matrix). Several cells (blobs) with the same color may exist - representing different blobs with the same color, but separated in XY space. In (a) the flow matrix indicates a possible merging of the two green colors of the tree tops in image 1 (blobs 3 and 4), into a single green color of image 2 (blob 2). The merging is shown in step (b). Note the reduction in the distance measure, from d = 0:1 to d = 0:093. Next we note the different blue colorings of the lake in image 2 (blobs 3, 6 and 8) all flow to a single blue in image 1 (blob 2). In (c) we see the merging of the lake region (in color space), with an additional reduction in the cost, d = 0:05. Additional iterations (with a symmetric evaluation of query and target images) indicate that the merging process is complete. Note the resemblance of the two image models in (c) vs. the initial representation in (a). The distance measure decreases by approx. 50% in the process. Figure 4 shows an additional example of an image pair taken from the COREL database. For each input image (left) the image model is first estimated (center), and the segmentation of the image based on the estimated image models are shown (right). The top image (image 1) includes a red car in a green background. The bottom image (image 2) shows a pink car in a green background. In both natural image examples we can see the large fragmentation in color and space domains that may be generated by the Gaussian

img #1

Pseudo image #1

img #2

Pseudo image #2

Original image

Blobed image

Pseudo image

Original image

Blobed image

Pseudo image

Flow matrix, emd=0.103050 1 1

0.9 0.8

2

img1 blobs

0.7 0.6

3

0.5 4

0.4 0.3

5 0.2 0.1

6

Figure 4. Car example: For each input image (left) the image model is first estimated (center), and the extracted image regions are shown (right).

0 1

2

3

4

5

6

7

8

img2 blobs

(a) img #1

Pseudo image #1

img #2

Pseudo image #2


0.9 0.8

img1 blobs

2

0.7 0.6

3

0.5 0.4 0.3

4

0.2 0.1

5

0 1

2

3

4

5

6

7

8

img2 blobs

(b) img #2

Pseudo image #2

img #1

Pseudo image #1


0.9 0.8

img2 blobs

2

0.7 0.6

3

0.5 0.4 0.3

4

0.2 0.1

5

0 1

2

3

4

5

img1 blobs

(c) Figure 3. Region correspondence and Image matching via EMD flow.

mixture. The grass region is fragmented into several blobs representing varying greens and distinct locations. In image 2, the car region is segmented into two distinct color blobs, pink and brown, with the brown region associated with the more shadowed areas of the car. Matching between such large sets of blobs can turn out to be a very noisy process. We address the challenge by the region-matching process of the region-EMD matching framework. In Figure 5 we present a selected set of iterations of generating the EMD flow matrix for the given image pair of Figure 4, followed by a merging process of selected regions (layout as in Figure 3). The initial matching distance is d = 0:16. In the first iteration (a), the flow matrix indicates a merging of the grass regions of image 1 (blobs 1,2,3,5). This merging can be seen in (b). In (b) the flow matrix indicates the merging of the two car regions of image 2 (pink and brown regions, blobs 2 and 3) into a single region, associated with the red region of image 1 (blob 1). The merging procedure finalized (c) with a final distance measure of d = 0:088. As we have seen in the above examples, several regions in one image may be in correspondence with a particular region(s) in a second image. The merging procedure provides for a more accurate model, enabling the addition of features such as shape. Suppose we start with the color feature. The flow matrix indicates region correspondence (as discussed and exemplified above). We merge the regions that are at the source image by merging the respective Gaussian mixtures. This enables us to extract more accurate variance features. From the image plane we may extract shape characteristics, such as region silhouettes. An iterative process may be defined, with an augmentation of the EMD distance to include the additional feature terms. In Figure 5 we can see an example in which merging the color regions of the car object of image 2 results in a new silhouette, that is closer to the silhouette of the car object in image 1.

5. Discussion img #1

Pseudo image #1

img #2

Pseudo image #2


img1 blobs

0.9 2

0.8

3

0.7 0.6

4 0.5 5 0.4 6

0.3

7

0.2 0.1

8 0 1

2

3

4

5

6

7

8

9

10

11

12

13

img2 blobs

(a) img #1

Pseudo image #1

img #2

Pseudo image #2

In this work we present the combined region-EMD framework, for a simultaneous solution to both the image region correspondence problem and the estimation of an image pair distance. Regions are matched based on their model characteristics (e.g., mean color, color variance, mean texture, texture variance), with an option for hierarchical augmentation of the EMD distance measure. We are presenting here a different approach from the well-researched discrete histogram representations. The image model is a continuous one. The proposed modeling scheme enables a transition to a representation that includes spatial information with localized clustering in the spatial domain as well as in the defined feature space. Using the region-EMD framework we solve the region correspondence problem across an image pair. No more do we need to separate the process to a matching process between individual regions. We view this work as a first step in an extensive research effort ahead, in which we augment the region representation vector, to include features such as texture, size and shape, in addition to the color feature chosen here. A more elaborate definition of the hierarchical matching framework is under way. In particular, the transition from regions to representative silhouettes (building the object from its parts) holds great potential for general image matching tasks.


0.9 0.8

2

img1 blobs

0.7 0.6

3

0.5 4

0.4

Acknowledgement This work was supported by the Israeli Ministry of Science, Grant number 05530462.

0.3 5 0.2 0.1

6

0 1

2

3

4

5

6

7

8

9

10

11

12

13

img2 blobs

(b) img #2

Pseudo image #2

img #1

Pseudo image #1


img2 blobs

0.9 2

0.8

3

0.7 0.6

4 0.5 5 0.4 6

0.3 0.2

7

0.1 8 0 1

2

3

4

5

6

img1 blobs

(c) Figure 5. Region correspondence and Image matching via EMD flow.

References [1] Y. Rubner, “Perceptual Metrics for Image Database Navigation”, Ph.D. Thesis, Stanford University, 1999. [2] J. Puzicha, J.M. Buhmann, Y. Rubner and C. Tomasi, “Empirical evaluation of dissimilarity measures for color and texture”, Proc. of the Int. Conference on Computer Vision, pp 1165-72, 1999. [3] J. R. Smith, “Integrated Spatial and Feature Image Systems: Retrieval, Analysis and Compression”, Ph.D. Thesis, Columbia University, 1997. [4] C. Carson, S. Belongie, H. Greenspan and J. Malik, “Regionbased Image Querying ”, Proc. of the IEEE Workshop on Content-based Access of Image and Video libraries (CVPR’97) pp 42-49, 1997. [5] S. Belongie, C. Carson, H. Greenspan and J. Malik, “Color and texture-based image segmentation using EM and its application to content based image retrieval”, Proc. of the Int. Conference on Computer Vision, pp 675-82, 1998. [6] F. L. Hitchcock. The distribution of a product from several sources to numerous localities. J. Math. Phys., 20:224–230, 1941.