Alternating Direction Graph Matching

Alternating Direction Graph Matching D. Khuê Lê-Huu Nikos Paragios CentraleSupélec, Université Paris-Saclay, France

arXiv:1611.07583v2 [cs.CV] 24 Nov 2016

{khue.le, nikos.paragios}@centralesupelec.fr

Abstract

Graph matching has been an active research topic in the computer vision field for the past decades. In the recent literature, [13] proposed a graduated assignment algorithm to iteratively solve a series of convex approximations to the matching problem. In [21], a spectral matching based on the rank-1 approximation of the affinity matrix (composed of the potentials) was introduced, which was later improved in [9] by incorporating affine constraints towards a tighter relaxation. In [22], an integer projected fixed point algorithm that solves a sequence of first-order Taylor approximations using Hungarian method [18] was proposed, while in [28] the dual of the matching problem was considered to obtain a lower-bound on the energy, via dual decomposition. In [6], a random walk variant was used to address graph matching while [33] factorized the affinity matrix into smaller matrices, allowing a convex-concave relaxation that can be solved in a path-following fashion. Their inspiration was the path-following approach [30] exploiting a more restricted graph matching formulation, known as KoopmansBeckmann’s QAP [17]. Lately, [7] proposed a max-pooling strategy within the graph matching framework that is very robust to outliers.

In this paper, we introduce a graph matching method that can account for constraints of arbitrary order, with arbitrary potential functions. Unlike previous decomposition approaches that rely on the graph structures, we introduce a decomposition of the matching constraints. Graph matching is then reformulated as a non-convex non-separable optimization problem that can be split into smaller and mucheasier-to-solve subproblems, by means of the alternating direction method of multipliers. The proposed framework is modular, scalable, and can be instantiated into different variants. Two instantiations are studied exploring pairwise and higher-order constraints. Experimental results on widely adopted benchmarks involving synthetic and real examples demonstrate that the proposed solutions outperform existing pairwise graph matching methods, and competitive with the state of the art in higher-order settings.

1. Introduction The task of finding correspondences between two sets of visual features has a wide range of applications in computer vision and pattern recognition. This problem can be effectively solved using graph matching [28], and as a consequence, these methods have been successfully applied to various vision tasks, such as stereo matching [14], object recognition and categorization [1, 12], shape matching [1], surface registration [32], etc. The general idea of solving feature correspondences via graph matching is to associate each set of features an attributed graph, where node attributes describe local characteristics, while edge (or hyper-edge) attributes describe structural relationships. The matching task seeks to minimize an energy (objective) function composed of unary, pairwise, and potentially higher-order terms. These terms are called the potentials of the energy function. In pairwise settings, graph matching can be seen as a quadratic assignment problem (QAP) in general form, known as Lawler’s QAP [19]. Since QAP is known to be NP-complete [5, 26], graph matching is also NP-complete [13] and only approximate solutions can be found in polynomial time.

Recently, researchers have proposed higher-order graph matching models to better incorporate structural similarities and achieve more accurate results [11, 31]. For solving such high-order models, [31] viewed the matching problem as a probabilistic model that is solved using an iterative successive projection algorithm. The extension of pairwise methods to deal with higher-order potentials was also considered like for example in [11] through a tensor matching (extended from [21]), or in [32] through a third-order dual decomposition (originating from [28]), or in [20] through a high-order reweighted random walk matching (extension of [6]). Recently, [25] developed a flexible block coordinate ascent algorithm for solving third-order graph matching. They lifted the third-order problem to a fourth-order one which, after a convexification step, is solved by a sequence of linear or quadratic assignment problems. Despite the impressive performance, this method has two limitations: (a) it cannot be applied to graph matching of arbitrary order other than third and fourth, and (b) it cannot deal with 1

Nb By convention, F d=a xd = F if a > b. In this work, we are interested in tensors having the same dimension at every mode, i.e. n1 = n2 = . . . = nD = n. In the sequel, all tensors are supposed to have this property.

graph matching where occlusion is allowed on both sides, nor with many-to-many matching. In this paper, a novel class of algorithms is introduced for solving graph matching involving constraints with arbitrary order and arbitrary potentials. These algorithms rely on a decomposition framework using the alternating direction method of multipliers. The remainder of this paper is organized as follows. Section 2 provides the mathematical foundations of our approach while in Section 3 the general decomposition strategy is proposed along with two instantiations of this framework to a pairwise and higher-order approach. Section 4 presents in-depth experimental validation and comparisons with competing methods. The last section concludes the paper and presents the perspectives.

2.2. Graph and hypergraph matching A matching configuration between two graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ) can be represented by a matchn ×n ing matrix X ∈ {0, 1} 1 2 where n1 = |V1 | , n2 = |V2 |. An element x(i1 ,i2 ) of X equals 1 if the node i1 ∈ V1 is matched to the node i2 ∈ V2 , and equals 0 otherwise. Standard graph matching imposes the one-to-(at most)-one constraints, i.e. the sum of any row or any column of X must be ≤ 1. If the elements of X are binary, then X obeys the hard matching constraints. When X is relaxed to take real values in [0, 1], X obeys the soft matching constraints. In this paper we use the following notations: vec(V) denotes the column-wise vectorized replica of a matrix V; mat(v) is the n1 × n2 reshaped matrix of an n-dimensional vector v, where n = n1 n2 ; X ∈ Rn1 ×n2 the matching matrix and x = vec(X) ∈ Rn the matching vector; M∗ (respectively M) is the set of n1 × n2 matrices that obey the hard (respectively, the soft) matching constraints.

2. Mathematical background and notation Let us first provide the elementary notation as well as the basic mathematical foundations of our approach. In the first subsection we will give a brief review of tensor, which will help us to compactly formulate the graph matching problem, as will be shown in the subsequent subsection.

2.1. Tensor

Energy function. Let xi = x(i1 ,i2 ) be an element of X representing the matching of two nodes i1 and i2 . Suppose that matching these nodes requires a potential Fi1 ∈ R. Simi2 larly, let Fij denote the potential for matching two edges 3 (i1 , j1 ), (i2 , j2 ), and Fijk for matching two (third-order) hyper-edges (i1 , j1 , k1 ), (i2 , j2 , k2 ), and so on. Graph matching can be expressed as minimizing X X X 2 3 Fi1 xi + Fij xi xj + Fijk xi xj xk + · · · (5)

A real-valued Dth -order tensor F is a multidimensional array belonging to Rn1 ×n2 ×···×nD (where n1 , n2 , . . . , nD are positive integers). We denote the elements of F by Fi1 i2 ...iD where 1 ≤ id ≤ nd for d = 1, 2, . . . , D. Each dimension of a tensor is called a mode. We call the multilinear form associated to a tensor F the function F : Rn1 × Rn2 × · · · × RnD → R defined by F (x1 , . . . , xD ) =

n1 X i1 =1

nD X

···

Fi1 i2 ...iD x1i1 x2i2 · · · xD iD

i

iD =1

(1) where xd = (xd1 , xd2 , . . . , xdnd ) ∈ Rnd for d = 1, 2, . . . , D. A tensor can be multiplied by a vector at a specific mode. Let v = (v1 , v2 , . . . , vnd ) be an nd dimensional vector. The mode-d product of F and v, denoted by F ⊗d v, is a (D − 1)th -order tensor G of dimensions n1 × · · · × nd−1 × nd+1 × · · · × nD defined by Gi1 ...id−1 id+1 ...iD =

nd X

F 3 (x, y, z) = Fi1 ...id ...iD vid .

(2)

n X n X n X

3 Fijk xi yj zk

(6)

i=1 j=1 k=1

defined for x, y, z ∈ Rn . Clearly, the third-order terms in (5) can be re-written as F 3 (x, x, x). More generally, Dth -order potentials can be represented by a Dth -order tensor F D and their corresponding terms in the objective function can be re-written as F D (x, x, . . . , x), resulting in the following reformulation of graph matching.

With this definition, it is straightforward to see that the multilinear form (1) can be re-written as F (x1 , x2 , . . . , xD ) = F ⊗1 x1 ⊗2 x2 · · · ⊗D xD . (3) Nb Let us consider for convenience the notation d=a to denote a sequence of products from mode a to mode b: F

ijk

The above function can be re-written more compactly using tensors. Indeed, let us consider for example the third-order 3 3 potentials. Since Fijk has three indices, (Fijk )1≤i,j,k≤n can be seen as a third-order tensor belonging to Rn×n×n and its multilinear form (c.f . (1)) is the function

id =1

b O

ij

Problem 1 (Dth -order graph matching). Minimize xd = F ⊗a xa ⊗a+1 xa+1 · · · ⊗b xb .

(4)

F 1 (x) + F 2 (x, x) + · · · + F D (x, x, . . . , x)

d=a

2

(7)

subject to x ∈ M∗ , where F d (d = 1, D) is the multilinear form of a tensor F d representing the dth -order potentials.

3.2. Graph matching decomposition framework Decomposition is a general approach to solving a problem by breaking it up into smaller ones that can be efficiently addressed separately, and then reassembling the results towards a globally consistent solution of the original non-decomposed problem [2, 4, 10]. Clearly, the above ADMM is such a method because it decomposes the large problem (8) into smaller problems (10). In computer vision, decomposition methods such as Dual Decomposition (DD) and ADMM have been applied to optimizing discrete Markov random fields (MRFs) [15, 16, 24] and to solving graph matching [28]. The main idea is to decompose the original complex graph into simpler subgraphs and then reassembling the solutions on these subgraphs using different mechanisms. While in MRF inference, this concept has been proven to be flexible and powerful, that is far from being the case in graph matching, due to the hardness of the matching constraints. Indeed, to deal with these constraints, [28] for example adopted a strategy that creates subproblems that are also smaller graph matching problems, which are computationally highly challenging. Moreover, subgradient method has been used to impose consensus, which is known to have slow rate of convergence [2]. One can conclude that DD is a very slow method and works for a limited set of energy models often associated with small sizes and low to medium geometric connectivities [28]. In our framework, we do not rely on the structure of the graphs but instead, on the nature of the variables. In fact, the idea is to decompose the matching vector x (by means of Lagrangian relaxation) into different variables where each variable obeys weaker constraints (that are easier to handle). For example, instead of dealing with the matching vector x ∈ M, we can represent it by two vectors x1 and x2 , where the sum of each row of mat(x1 ) is ≤ 1 and the sum of each column of mat(x2 ) is ≤ 1, and we constrain these two vectors to be equal. More generally, we can decompose x into as many vectors as we want, and in any manner, the only condition is that the set of constraints imposed on these vectors must be equivalent to x1 = x2 = · · · = xk ∈ M where k is the number of vectors. As for the objective function (7), there is also an infinite number of ways to rewrite it under the new variables x1 , x2 , . . . , xk . The only condition is that the re-written objective function must be equal to the original one when x1 = x2 = · · · = xk = x. For example, if k = D then one can re-write (7) as

In the next section, we propose a method to solve the continuous relaxation of this problem, i.e. minimizing (7) subject to x ∈ M (soft matching) instead of x ∈ M∗ (hard matching). The returned continuous solution is discretized using the usual Hungarian method [18].

3. Alternating direction graph matching 3.1. Overview of ADMM We briefly describe the (multi-block) alternating direction method of multipliers (ADMM) for solving the following optimization problem: Minimize φ(x1 , x2 , . . . , xp ) subject to A1 x1 + A2 x2 + · · · + Ap xp = b, xi ∈ Xi ⊆ Rpi

(8)

∀1 ≤ i ≤ p,

where Xi are closed convex sets, Ai ∈ Rm×ni ∀i, b ∈ Rm . The augmented Lagrangian of the above problem is Lρ (x1 , x2 , . . . , xp , y) = φ(x1 , x2 , . . . , xp )

p

2 ! p

X ρ

X

> Ai xi − b , +y Ai xi − b +

2 i=1

i=1

(9)

2

where y is called the Lagrangian multiplier vector and ρ > 0 is called the penalty parameter. In the sequel, let x[a,b] denote (xa , xa+1 , . . . , xb ) (by convention, if a > b then x[a,b] is ignored). Standard ADMM solves problem (8) by iterating: 1. For i = 1, 2, . . . , p, update xi : k k xk+1 = argmin Lρ (xk+1 i [1,i−1] , x, x[i+1,p] , y ).

(10)

x∈Xi

2. Update y: y

k+1

k

=y +ρ

p X

! Ai xk+1 i

k+1

−b

.

(11)

i=1

The algorithm converges if the following residual converges to 0 as k → ∞:

F 1 (x1 ) + F 2 (x1 , x2 ) + · · · + F D (x1 , x2 , . . . , xD ). (13) Each combination of (a) such a variable decomposition and (b) such a way of re-writing the objective function will yield a different Lagrangian relaxation and thus, produce a different algorithm. Since there are virtually infinite of such combinations, the number of algorithms one can design from

p

2 p

X

X

k k

Ai xki − Ai xk−1 2 . r = Ai x i − b + i 2

i=1 i=1 2 (12) We will discuss the convergence of ADMM in Section 3.5. k

3

them is also unlimited, not to mention the different choices of the reassembly mechanism, such as subgradient methods [2, 4], cutting plane methods [2], ADMM [3], or others. We call the class of algorithms that base on ADMM Alternating Direction Graph Matching (ADGM) algorithms. A major advantage of ADMM over the other mechanisms is that its subproblems involve only one block of variables, regardless of the form the objective function. As an illustration of ADGM, we present below a particular example. Nevertheless, this example is still general enough to include an infinite number of special cases.

Therefore, D X

=

(15)

xd ∈ Md

(16)

i=d

D X

+y

Ad x d

(17)

In summary, an ADGM algorithm has three main steps:

D

2

ρ

X

Ad xd . (18) +

2

2. Update xk+1 by minimizing (28) over Md . d 3. Update yk+1 using (11).

2

The steps 2. and 3. are repeated until convergence.

3.3. Two simple ADGM algorithms

(19)

Let us follow the above three steps with two examples. Step 1: Choose (Ad )1≤d≤D and (Md )1≤d≤D . First, (Md )1≤d≤D take values in one of the following two sets:

Denote

i=1

pkd

=

D X

D X

Aj xkj ,

(20)

Mr = {x : sum of each row of mat(x) is ≤ 1} ,

j=d+1

F

i

i=d

d−1 O j=1

xk+1 j

i O

(29)

Mc = {x : sum of each column of mat(x) is ≤ 1} , (30) xkl .

(see (4))

(21)

such that both Mr and Mc are taken at least once. If no occlusion is allowed in G1 (respectively G2 ), then the term “≤ 1” is replaced by “= 1” for Mr (respectively Mc ). If many-to-many matching is allowed, then these inequality constraints are removed. In either case, Mr and Mc are closed and convex. Clearly, since Mr ∩ Mc = M, condition (17) is satisfied.

l=d+1

From (3), we have k i F i (xk+1 [1,d−1] , x, x[d+1,i] ) = F

(26)

> 1 > > 1 > k k k x Ad Ad x + A> s + (A y + p ) x. (28) d d d 2 ρ d

x∈Md

Ai xk+1 + i

k > F i (xk+1 [1,d−1] , x, x[d+1,i] ) = (pk ) x.

1. Choose (Ad )1≤d≤D and (Md )1≤d≤D satisfying the conditions stated in Problem 2.

k k xk+1 = argmin Lρ (xk+1 d [1,d−1] , x, x[d+1,D] , y ).

d−1 X

(25)

l=d+1

and the subproblems (19) are reduced to minimizing the following quadratic functions over Md (d = 1, 2, . . . , D):

The y update step (11) and the computation of the residual (12) is trivial. Let us focus on the x update step (10):

skd =

j=1

(24)

l=d+1

F d (x1 , . . . , xd )

d=1

d=1

j=1

xkl 

k k k > Lρ (xk+1 [1,d−1] , x, x[d+1,D] , y ) = (pd ) x

2 ρ + (yk )> (Ad x + skd ) + Ad x + skd 2 , (27) 2

d=1

!

xk+1 ⊗d x j



Thus,

It is easily seen that the above problem is equivalent to the continuous relaxation of Problem 1. Clearly, this problem is a special case of the standard form (8). Thus, ADMM can be applied to it in a straightforward manner. The augmented Lagrangian of Problem 2 is

>

i O

i=d

where (Ad )1≤d≤D are m × n matrices, defined in such a way that (15) is equivalent to x1 = x2 = · · · = xD , and (Md )1≤d≤D are closed convex subsets of Rn satisfying

D X

F i

d−1 O

or equivalently

A1 x1 + A2 x2 + · · · + AD xD = 0,

Lρ (x1 , x2 , . . . , xD , y) =



 > D d−1 i X O O k+1 = Fi xj xkl  x,

subject to

D X

D X i=d

F 1 (x1 ) + F 2 (x1 , x2 ) + · · · + F D (x1 , x2 , . . . , xD ) (14)

M1 ∩ M2 ∩ · · · ∩ MD = M.

(23)

i=d

Problem 2 (Decomposed graph matching). Minimize

∀ 1 ≤ d ≤ D,

k F i (xk+1 [1,d−1] , x, x[d+1,i] )

d−1 O j=1

xk+1 ⊗d x j

i O

xkl .

l=d+1

(22) 4

Second, to impose x1 = x2 = · · · = xD , we can for example choose (Ad )1≤d≤D such that x1 = x2 ,

x1 = x3 , . . . ,

x1 = xD

From (20) we easily have

(31)  −xk2  −xk3    sk1 =  .   ..  

or alternatively x1 = x2 ,

x2 = x3 , . . . ,

xD−1 = xD .

(32)

−xkD

It is easily seen that the above two sets of constraints can be both expressed under the general form (15). Each choice leads to a different algorithm. Let ADGM1 denote the one obtained from (31) and ADGM2 obtained from (32). Step 2: Update xk+1 . We will show that, when plugd ging (31) and (32) into (15), the subproblems (28) are reduced to 1 2 k+1 > (33) kxk2 − cd x , xd = argmin 2 x∈Md

for 2 ≤ d ≤ D. Now we compute the vectors (cd )1≤d≤D . > • For d = 1: Since A1 = I I · · · I we have A> 1 A1 = (D − 1)I, k A> 1 s1 = −

where (cd )1≤d≤D are defined as follows, for ADGM1: ! D D X X 1 1 1 c1 = xkd − ydk − pk1 , (34) D−1 ρ ρ d=2

cd =

xk+1 1

k A> 1y =

∀2 ≤ d ≤ D,

−xD

0

0

0

···

−I

(43)

D X

ydk .

(44)

D

D

d=2

d=2

X 1 1X k 1 k 2 (D−1) kxk2 + − xkd + yd + p1 2 ρ ρ

!> x.

(45) Clearly, minimizing this quantity over M1 is equivalent to solving (33) for d = 1, where c1 is defined by (34). • For d ≥ 2: Since Ad = 0 · · ·

0 −I

0 ···

> 0

(46)

(the −I is at the (d − 1)-th position) we have A> d Ad = I, k A> d sd k A> dy

= =

−xk+1 , 1 k −yd .

(47) (48) (49)

Plugging these into (28), it becomes

0

> 1 k 1 k 1 2 k+1 kxk2 + −x1 − yd + pd x. 2 ρ ρ

which can be in turn re-written as A1 x1 + A2 x2 + · · · + AD xD = 0 where Ad is the dth (block) column of the following (D − 1) × D block matrix A whose blocks are n × n, and as a consequence, y is also a (D − 1) × 1 block vector where each block is an n-dimensional vector:     I −I 0 · · · 0 y2 I 0 −I · · · 0   y3      A = . .. .. ..  , y =  ..  . (40) ..  ..   .  . . . . I

xkd ,


(35)

(Recall that pkd is defined by (21)). We detail below the calculation for ADGM1 and refer the reader to the appendix for ADGM2. Indeed, (31) can be written in the following form:           x1 −x2 0 0 0 x1   0  −x3   0  0            ..  +  ..  +  ..  +· · ·+  ..  =  ..  , (39)  .   .   .   .  . 0

(42)

d=2

d=2

1 1 + ydk − pkd ρ ρ

D X d=2

and for ADGM2: 1 1 c1 = xk2 − y2k − pk1 , (36) ρ ρ 1 k 1 k cD = xk+1 (37) D−1 + yD − pD , ρ ρ 1 1 1 k cd = (xk+1 + xkd+1 ) + (ydk − yd+1 ) − pkd (38) 2 d−1 2ρ 2ρ ∀2 ≤ d ≤ D − 1.

x1

and

 k+1   k+1  x1 x2  ..   ..   .   .   k+1   k+1   x1  x  k+1   d−1   −  0  (41) skd =  x   1   xk+1  xk   1   d+1   .   .   ..   ..  xkD xk+1 1

(50)

Minimizing this quantity over Md is equivalent to solving (33), where cd is defined by (35). Step 3: Update yk+1 . From (31) and (32), it is seen that this step is reduced to ydk+1 = ydk + ρ(xk+1 − xdk+1 ) for 1 k+1 k+1 k+1 k ADGM1 and yd = yd + ρ(xd−1 − xd ) for ADGM2. The two algorithms are summarized in Table 1. It is straightforward to see that they are identical when D = 2 .

yD 5

Then there exists k ∗ ∈ Z, 1 ak∗ > λk∗ ≥ ak∗ +1 . An (u∗1 , u∗2 , . . . , u∗d ) is given by: ( ai − λk∗ ∗ uσ(i) = 0

ADGM1 – ADGM2 0. Input: ρ, N, , (x0d )1≤d≤D , (yd0 )2≤d≤D . 1. Repeat for k = 0, 1, 2, . . .: (a) For d = 1, 2, . . . , D: xk+1 = argmin d

x∈Md

1 2 kxk2 − c> x , d 2

≤ k ∗ ≤ d, such that optimal solution u∗ =

if 1 ≤ i ≤ k ∗ , if k ∗ < i ≤ d.

(b) Let u0 = max(c, 0). We have: • If 1> u0 ≤ 1 then u∗ = u0 .

where (cd )1≤d≤D are defined by (34), (35) for ADGM1 and by (36), (37), (38) for ADGM2.

• Otherwise, any optimal solution u∗ must satisfy 1> u∗ = 1. Thus, the problem is reduced to the previous case, and as a consequence, u∗ is given by (53).

(b) For d = 2, 3, . . . , D: ADGM1: ydk+1 = ydk + ρ(xk+1 − xk+1 ). 1 d k+1 k+1 k+1 k ADGM2: yd = yd + ρ(xd−1 − xd ).

Proof. For part (a), see for example [8]. For part (b), the corresponding KKT conditions are:

(c) Compute the residual rk+1 . Stop if rk+1 ≤ or k ≥ N . 2. Discretize x1 and return.

u ≥ 0,

(54)

1 u ≤ 1,

(55)

µ ≥ 0,

(56)

ν ≥ 0,

(57)

>

Table 1: Examples of ADGM algorithms for solving Dth order graph matching.

µi ui = 0 ∀1 ≤ i ≤ d, Solving the subproblems. We have seen how the subproblems in the two above ADGM algorithms can be reduced to (33). Note that (33) means xk+1 is the projection of cd d onto Md . Recall that Md is equal to either Mr or Mc , defined by (30) and (29), i.e. either the sum of each row of x is ≤ 1, or the sum of each column of x is ≤ 1 (if no occlusion is allowed then “≤ 1 is” is replaced by “= 1”). Therefore, the projection is reduced to projection of each row or column of x, which can be solved using the following lemma.

>

ν(1 u − 1) = 0,

(58) (59)

ui − ci + ν − µi = 0 ∀1 ≤ i ≤ d.

(60)

If ν = 0 then from (54), (56), (58) and (60) we have ui ≥ 0

∀i,

ui − ci = µi ≥ 0 ui (ui − ci ) = 0 ∀i,

Lemma 1. Let d be a positive integer and c = (c1 , c2 , . . . , cd ) be a real-valued constant vector. Consider the problem of minimizing 1 2 kuk2 − c> u 2

(53)

(61) ∀i,

(62) (63)

which yields u = u0 where u0 = max(c, 0). Thus, if 1> u ≤ 1 then u0 is the optimal solution. Otherwise, ν must be different from 0. In this case, from (59), any optimal solution must satisfy 1> u = 1 and thus, the problem is reduced to part (a).

(51)

with respect to u ∈ Rd , subject to one of the following sets of constraints:

In our implementation, we used the fast projection algorithm introduced in [8].

(a) u ≥ 0 and 1> u = 1.

3.4. First-order ADGM

(b) u ≥ 0 and 1> u ≤ 1.

Recall that ADGM1 imposes x1 = xd ∀ 2 ≤ d ≤ D and ADGM2 imposes xd−1 = xd ∀ 2 ≤ d ≤ D. Clearly, these constraints are only valid for D ≥ 2 and when D = 2 these two sets of constraints become the same, i.e. ADGM1 and ADGM2 are identical. For the sake of completeness, we briefly consider the case D = 1, i.e. first-order graph matching (also called the linear assignment problem). This problem can be seen as a special case of pairwise graph matching where the pairwise potentials are zeros. The problem can be reformulated as minimizing F 1 (x1 ) subject to

∗

An optimal solution u to each of the above two cases is given as follows: (a) Let a = (a1 , a2 , . . . , ad ) be a decreasing permutation of c via a permutation function σ, i.e. ai = cσ(i) and a1 ≥ a2 ≥ · · · ≥ ad . Denote   1 X λk = ai − 1 ∀k ∈ Z, 1 ≤ k ≤ d. (52) k 1≤i≤k

6

4. Experiments

x1 = x2 and x1 ∈ M1 , x2 ∈ M2 (we can choose for example M1 = Mr and M2 = Mc ). Since the objective function is convex and separable, ADGM is guaranteed to produce a global optimum (see e.g. [4]) to the continuous relaxation of the matching problem. However, it is wellknown that this continuous relaxation is just equivalent to the original problem [27, Chapter 18]. Therefore, ADGM also produces a global optimum to the linear assignment problem.

We adopt the adaptive scheme in Section 3.5 to two ADGM algorithms presented in Section 3.3, and denote them respectively ADGM1 and ADGM2. In pairwise settings, however, since these two algorithms are identical, we denote them simply ADGM. We compare ADGM and ADGM1/ADGM2 to the following state of the art methods: Pairwise: Spectral Matching with Affine Constraint (SMAC) [9], Integer Projected Fixed Point (IPFP) [22], Reweighted Random Walk Matching (RRWM) [6], Dual Decomposition with Branch and Bound (DD) [28] and Max-Pooling Matching (MPM) [7]. We should note that DD is only used in the experiments using the same energy models presented in [28]. For the other experiments, DD is excluded due to the prohibitive execution time. Also, as suggested in [22], we use the solution returned by Spectral Matching (SM) [21] as initialization for IPFP. Higher-order: Probabilistic Graph Matching (HGM) [31], Tensor Matching (TM) [11], Reweighted Random Walk Hypergraph Matching (RRWHM) [20] and Block Coordinate Ascent Graph Matching (BCAGM) [25]. For BCAGM, we use MPM [7] as subroutine because it was shown in [25] (and again by our experiments) that this variant of BCAGM (denoted by “BCAGM+MP” in [25]) outperforms the other variants thanks to the effectiveness of MPM. Since there is no ambiguity, in the sequel we denote this variant “BCAGM” for short. We should note that, while we formulated the graph matching as a minimization problem, most of the above listed methods are maximization solvers and many models/objective functions in previous work were designed to be maximized. For ease of comparison, ADGM is also converted to a maximization solver (by letting it minimize the additive inverse of the objective function), and the results reported in this section are for the maximization settings (i.e. higher objective values are better). In the experiments, we also use some pairwise minimization models (such as the one from [28]), which we convert to maximization problems as follows: after building the affinity matrix M from the (minimization) potentials, the new (maximization) affinity matrix is computed by max(M) − M where max(M) denotes the greatest element of M. Note that one cannot simply take −M because some of the methods only work for non-negative potentials.

3.5. Convergent ADGM Note that the objective function in Problem 2 is neither separable nor convex in general. Convergence of ADMM for this type of functions is unknown. Indeed, our ADGM algorithms do not always converge, especially for small values of the penalty parameter ρ (an example is given in Figure 1). When ρ is large, they are likely to converge. However, we also notice that small ρ often (but not always) achieves better objective values. Motivated by these observations, we propose the following adaptive scheme that we find to work very well in practice: starting from a small initial value ρ0 , the algorithm runs for T1 iterations to stabilize, after that, if no improvement of the residual rk is made every T2 iterations, then we increase ρ by a factor β and continue. The intuition behind this scheme is simple: we hope to reach a good objective value with a small ρ, but if this leads to slow (or no) convergence, then we increase ρ to put more penalty on the consensus of the variables and that would result in faster convergence. Using this scheme, we observe that our algorithms always converge in practice. In the experiments, we set T1 = 300, T2 = 50, β = 2 and n . ρ0 = 1000 200 =1 =5 = 10 = 20 Adapt.

150

100

50

0

4.1. House and Hotel 0

20

40

60

80

100

The CMU House and Hotel sequences1 have been widely used in previous work to evaluate graph matching algorithms. It consists of 111 frames of a synthetic house and 101 frames of a synthetic hotel. Each frame in these sequences is manually labeled with 30 feature points.

Figure 1: The residual rk per iteration of ADGM. Adaptive parameters: T1 = 20, T2 = 5, β = ρ0 = 2.0 (c.f . Section 3.5). Run on a third-order Motorbike matching with 15 outliers (c.f . Section 4.2 for data and model description).

1 http://vasc.ri.cmu.edu/idb/html/motion/index.html

7

Methods

Error (%)

Global opt. (%)

Time (s)

House

MPM RRWM IPFP SMAC DD ADGM

42.32 90.51 87.30 81.11 0 0

0 0 0 0 100 100

0.02 0.01 0.02 0.18 14.20 0.03

Hotel

MPM RRWM IPFP SMAC DD ADGM

21.49 85.05 85.37 71.33 0.19 0.19

44.80 0 0 0 100 100

0.02 0.01 0.02 0.18 13.57 0.02

(a) 20 pts vs 30 pts (10 outliers)

(b) MPM 15/20 (352.4927)

(c) RRWM 15/20 (352.4927)

(d) IPFP 5/20 (316.9299)

Table 2: Results on House and Hotel se(e) SMAC 12/20 (315.0426) (f) ADGM 18/20 (353.3569) quences using the pairwise model A, described in Section 4.1 and previously pro- Figure 2: House matching using the pairwise model B described in Section 4.1. Ground-truth value is 343.1515. (Best viewed in color.) posed in [28]. Pairwise model A. In this experiment we match all possible pairs of images in each sequence, with all 30 points (i.e. no outlier). A Delaunay triangulation is performed for these 30 points to obtain the graph edges. The unary terms are the distances between the Shape Context descriptors [1]. The pairwise terms when matching (i1 , j1 ) to (i2 , j2 ) are 2 2 δ α 2 Fij = η exp + (1 − η) exp −1 (64) σl2 σa2

notice a dramatic decrease in performance of SMAC and IPFP compared to the results reported in [28]. We should note that the above potentials, containing both positive and negative values, are defined for a minimization problem. It was unclear how those maximization solvers were used in [28]. For the reader to be able to reproduce the results, we make our software publicly available. Pairwise model B. In this experiment, we match all possible pairs of the sequence with the baseline (i.e. the separation between frames, e.g. the baseline between frame 5 and frame 105 is 100) varying from 10 to 100 by intervals of 10. For each pair, we match 10, 20 and 30 points in the first image to 30 points in the second image. We set the unary terms to 0 and compute the pairwise terms as  −−→ −−→  ki1 j1 k − ki2 j2 k 2 , Fij = exp − (67) σ2

where η, σl , σa are some weight and scaling constants and δ, α are computed as |d1 − d2 | , d1 + d2 −−→ −−→ ! i1 j1 i2 j2 , α = arccos · d1 d2 δ=

(65) (66)

−−→ −−→ where d1 = ki1 j1 k and d2 = ki2 j2 k. This experiment is reproduced from [28] using their energy model files2 . It should be noted that in [28], the above unary potentials are subtracted by a large number to prevent occlusion. We refer the reader to [28] for further details. For ease of comparison with the results reported in [28], here we also report the performance of each algorithm in terms of overall percentage of mismatches and frequency of reaching the global optimum. Results are given in Table 2. One can observe that DD and ADGM always reached the global optima, but ADGM did it hundreds times faster. Even the recent methods RRWM and MPM performed poorly on this model (only MPM produced acceptable results). Also, we

where σ 2 = 2500. It should be noted that the above pairwise terms are computed for every pair (i1 , j1 ) and (i2 , j2 ), i.e. the graphs are fully connected. This experiment has been performed on the House sequence in previous work, including [6] and [25]. Here we consider the Hotel sequence as well. We report the average matching score (which is the ratio of the obtained objective value over the ground-truth value) and the average accuracy for each algorithm in Figure 3 for House sequence and in Figure 4 for Hotel sequence. As one can observe, ADGM produced the highest objective values in almost all the tests (and only slightly outperformed by MPM on the House sequence when the baseline is small).

2 http://www.cs.dartmouth.edu/ lorenzo/Papers/tkr_ ˜ pami13_data.zip.

8

0.995 MPM RRWM IPFP SMAC ADGM

0.99

20

40

1

60

80

MPM RRWM IPFP SMAC ADGM

20

100

Baseline

40

60

80


0.2 0

20

0.4


0 60

80

20

100

60

80

100

80

100

80

100

Baseline

0.6 0.4 0.2 0

40

60

80


20

100

40

60

Baseline

Baseline

Baseline 6

2

4 MPM RRWM IPFP SMAC ADGM

2

20

40

60

80

3

Running Time

Running Time

Running Time

40

0.8

0.6

0.2

40

0.9


20

Accuracy

Accuracy

0.4

1

1

0.8

0.6

1.1

100

Baseline

1

0.8

Accuracy

1.05

0.95

1

Matching Score

Matching Score

Matching Score

1


1

100

20

40

Baseline

60

80

1.5 1 0.5


20

100

(a) House: 30 pts vs 30 pts

40

60

Baseline

Baseline

(c) House: 10 pts vs 30 pts

(b) House: 20 pts vs 30 pts

1.2

0.98 0.96 MPM RRWM IPFP SMAC ADGM

0.94 0.92

20

1.1 1 0.9


0.8 0.7

40

60

80

100

20

Baseline

1


0.2 0

20

80

0.4

100


20

20

40

60

80

100

80

100

80

100

Baseline

0.6 0.4 0.2 0

40

60

80

100


20

Baseline

40

60

Baseline

Running Time

4 3 MPM RRWM IPFP SMAC ADGM

0 20

40

60

80

Baseline

(a) Hotel: 30 pts vs 30 pts

100

Running Time

4

5

Running Time

0.7


0.8

0 60

0.8

1

0.6

0.2

40

0.9

100

Baseline

Baseline

1

80

Accuracy

Accuracy

0.4

2

60

0.8

0.6

1

0.6 40

1

0.8

Accuracy

1.1

Matching Score

1

Matching Score

Matching Score

Figure 3: Results on House sequence using the pairwise model B described in Section 4.1.

3 2 1


20

40

60

80

Baseline

(b) Hotel: 20 pts vs 30 pts

100

0.6 0.4 0.2


20

40

60

Baseline

(c) Hotel: 10 pts vs 30 pts

Figure 4: Results on Hotel sequence using the pairwise model B described in Section 4.1.

9

Third-order model. This experiment has the same settings as the previous one, but uses a third-order model proposed in [11]. We set the unary and pairwise terms to 0 and compute the potentials when matching two triples of points (i1 , j1 , k1 ) and (i2 , j2 , k2 ) as ! 2 kf − f k i j k i j k 1 1 1 2 2 2 3 2 , (68) Fijk = exp − γ

in both images and randomly add outliers to the second image. The number of outliers varies from 0 to 40 by intervals of 5. We use the “pairwise model B” previously described in Section 4.1. Results are given in Figure 7. One can observe that the obtained accuracies are very low, even though ADGM always achieved the best objective values that are higher than the ground-truth ones. One can conclude that this pairwise model is not suited for this dataset. This motivated us to develop a better model, presented hereafter. Pairwise model C. Inspired by the model in [28], we propose below a new model that is very simple yet very suited for real-world images. We set the unary terms to 0 and compute the pairwise terms as

where fijk is a feature vector composed of the angles of the triangle (i, j, k), and γ is the mean of all squared distances. We report the results for House sequence in Figure 5 and for Hotel sequence in Figure 6. One can observe that ADGM1 and ADGM2 achieved quite similar performance, both were competitive with BCAGM [25] while outperformed all the other methods.

1 − cos α , (69) 2 where η ∈ [0, 1] is a weight constant and δ, α are computed as −−→ −−→ |d1 − d2 | i1 j1 i2 j2 δ= , cos α = · . (70) d1 + d2 d1 d2 −−→ −−→ 2 where d1 = ki1 j1 k and d2 = ki2 j2 k. Intuitively, Fij com−−→ −−→ putes the geometric agreement between i1 j1 and i2 j2 , in terms of both length and direction. The above potentials measure the dissimilarity between the edges, as thus the corresponding graph matching problem is a minimization one. 2 Fij = ηδ + (1 − η)

4.2. Cars and Motorbikes The Cars and Motorbikes dataset was introduced in [23] and has been used in previous work for evaluating graph matching algorithms. It consists of 30 pairs of car images and 20 pairs of motorbike images with different shapes, view-points and scales. Each pair contains both inliers (chosen manually) and outliers (chosen randomly). Pairwise model B. In this experiment, we keep all inliers


0.8 0.7

1.1 1 MPM RRWM IPFP SMAC ADGM

0.9 0.8

Matching Score

0.9

Matching Score

1

1

1

1.2

1.1

Matching Score

Matching Score

1.2

0.8 0.6 MPM RRWM IPFP SMAC ADGM

0.4

30

40

0

Outliers

1

30

40

0


0 0

0.6 0.4


0.2 0

10

20

30

40

0

10

Outliers

30


20 10 0

10

20

Outliers

(a) Cars

0.4

40


0

30

40


10

0

10

20

30

40

20

30

0.4

40

30

40

30

40


0

10

20

Outliers

8 6


4

10

20

Outliers

(a) Cars

Figure 7: Results on Cars and Motorbikes using the pairwise model B described in Section 4.1.

40

10

0

(b) Motorbikes

30

0.6

0

10

10

2

Outliers

20

Outliers

0.2

12

20

10

Outliers Running Time

Running Time

30

0 1 0.8

0 20


0.4

40

0.6

0.2

40

40

30

Outliers

Outliers

50

20

0.8

Accuracy

Accuracy

0.4

10

1

0.8

0.6

0.2

Running Time

20

Outliers

1

0.8

Accuracy

10

Accuracy

20

Running Time

10

0.6

0.2

0.2 0

0.8

30

40

8 6 MPM RRWM IPFP SMAC ADGM

4 2 0

10

20

Outliers

(b) Motorbikes

Figure 8: Results on Cars and Motorbikes using the pairwise model C described in Section 4.2. 10

0.8

20

40

60

80

HGM TM RRWHM BCAGM ADGM1 ADGM2

80

20

10

4

Running Time

12

8

2

80


20

40

60

80

0.4

20

100

40

60

80

100

80

100

80

100

Baseline

1


0.4

5

4

60

0.6

100


0.6

0.8

0 60

1 0.8

Baseline

0.2

40

1.2

0.2

40

Accuracy

Accuracy

Accuracy

0.5

20

Baseline Running Time

0.6

0.8

20

6


1

0.6

0

0.7

100

0.8

0.2

0.8

Baseline

1

0.4

0.9

0.6 HGM TM RRWHM BCAGM ADGM1 ADGM2

0.4 0.2 0

40

60

80

20

100

40

60

Baseline

Baseline 2.5

Running Time

0.6


Matching Score

0.9

0.7

1.4

1

Matching Score

Matching Score

1

3 HGM TM RRWHM BCAGM ADGM1 ADGM2

2 1

20

100

40

60

80

2 1.5 HGM TM RRWHM BCAGM ADGM1 ADGM2

1 0.5 0

100

20

40

Baseline

Baseline

(b) 20 pts vs 30 pts

(a) 30 pts vs 30 pts

60

Baseline

(c) 10 pts vs 30 pts

Figure 5: Results on House sequence using the third-order model described in Section 4.1.

0.4 0.2


Matching Score

Matching Score

Matching Score

0.8 0.6

2.5

1.2

1


0.6 0.4 0.2


1 0.5 0

20

40

60

80

100

20

Baseline

1


20


0.4

0

40

60

80

100

20

20

60

80

100

20

60

40

80

Baseline

(a) 30 pts vs 30 pts

100

0

60

80

100

80

100

Baseline 3

Running Time

Running Time

Running Time 40

2

100


0.4

0

40

6 4

80

0.6

0.2

8


60

Baseline

Baseline

15

0

40

1

0.6

0.2

20

5

20

0.8

Baseline

10

100

Accuracy

Accuracy

Accuracy

0

80

0.8

0.6

0.2

60

Baseline

1

0.8

0.4

40


20

2

1


0 40

60

80

Baseline

(b) 20 pts vs 30 pts

100

20

40

60

Baseline

(c) 10 pts vs 30 pts

Figure 6: Results on Hotel sequence using the third-order model described in Section 4.1.

11

(c) RRWM 6/46 (988.0872)

Matching Score

1.2

(b) MPM 13/46 (966.2296)

Matching Score


1 0.8


0.6 0.4

(d) IPFP 35/46 (1038.3965)

0

5

10

5

10

15

Outliers

1


0.4 0.2

Figure 9: Motorbike matching using the pairwise model C described in Section 4.2. (Best viewed in color.)

0.4

0.8

Accuracy

Accuracy

(f) ADGM 46/46 (1043.0687)


0.6

0

0.8

(e) SMAC 11/46 (1028.7961)

0.8

15

Outliers

1

1

0 0


0.4 0.2 0

5

10

15

0

5

Outliers

10

15

Outliers

40

Pairwise potentials based on both length and angle were previously proposed in [23, 28] and [33]. However, ours are the simplest to compute. In this experiment, η = 0.5. We match every image pair and report the average in terms of objective value and matching accuracy for each method in Figure 8. One can observe that ADGM completely outperformed all the other methods. Third-order model. This experiment has the same settings as the previous one, except that it uses a third-order model (the same as in House and Hotel experiment) and the number of outliers varies from 0 to 16 (by intervals of 2). Results are reported in Figure 10 and a matching example is given in Figure 11. ADGM did quite well on this dataset. On Cars, both ADGM1 and ADGM2 achieved better objective values than BCAGM in 7/9 cases. On Motorbikes, ADGM1 beat BCAGM on 5/9 cases and had equivalent performance on 1/9 cases; ADGM2 beat BCAGM on 8/9 cases.

Running Time

Running Time

25 30 20


10

0

5

10

15

Outliers

(a) Cars

20 15


10 5 0

5

10

15

Outliers

(b) Motorbikes

Figure 10: Results on Cars and Motorbikes using the thirdorder model described in Section 4.2.

5. Conclusion and future work We have presented ADGM, a novel class of algorithms for solving graph matching. Two examples of ADGM were implemented and evaluated. The results demonstrate that they outperform existing pairwise methods and competitive with the state of the art higher-order methods. In pairwise settings, we have also proposed a new model that was shown to be very effective for matching real-world images. In future work, we aim to develop a similar model for third or even higher-order settings. We also plan to further improve our algorithms by adopting a more principled adaptive scheme to the penalty parameter, such as the spectral penalty estimation proposed in [29]. A software implementation of our algorithms will be available for download on our website.


(b) HGM 4/25 (337.8194)

(c) RRWHM 3/25 (1409.832)

(d) BCAGM 15/25 (1713.487)

(e) ADGM1 25/25 (2161.5354) (f) ADGM2 25/25 (2161.5354)

Figure 11: Car matching using the third-order model described in Section 4.2. (Best viewed in color.)

12

References

[17] T. C. Koopmans and M. Beckmann. Assignment problems and the location of economic activities. Econometrica: journal of the Econometric Society, pages 53–76, 1957. 1 [18] H. W. Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955. 1, 3 [19] E. L. Lawler. The quadratic assignment problem. Management science, 9(4):586–599, 1963. 1 [20] J. Lee, M. Cho, and K. M. Lee. Hyper-graph matching via reweighted random walks. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1633–1640. IEEE, 2011. 1, 7 [21] M. Leordeanu and M. Hebert. A spectral technique for correspondence problems using pairwise constraints. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 2, pages 1482–1489. IEEE, 2005. 1, 7 [22] M. Leordeanu, M. Hebert, and R. Sukthankar. An integer projected fixed point method for graph matching and map inference. In Advances in neural information processing systems, pages 1114–1122, 2009. 1, 7 [23] M. Leordeanu, R. Sukthankar, and M. Hebert. Unsupervised learning for graph matching. International journal of computer vision, 96(1):28–45, 2012. 10, 12 [24] A. F. Martins, M. A. Figueiredo, P. M. Aguiar, N. A. Smith, and E. P. Xing. Ad 3: Alternating directions dual decomposition for map inference in graphical models. The Journal of Machine Learning Research, 16(1):495–545, 2015. 3 [25] Q. Nguyen, A. Gautier, and M. Hein. A flexible tensor block coordinate ascent scheme for hypergraph matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5270–5278, 2015. 1, 7, 8, 10 [26] S. Sahni and T. Gonzalez. P-complete approximation problems. Journal of the ACM (JACM), 23(3):555–565, 1976. 1 [27] A. Schrijver. Combinatorial optimization: polyhedra and efficiency, volume 24. Springer Science & Business Media, 2002. 7 [28] L. Torresani, V. Kolmogorov, and C. Rother. A dual decomposition approach to feature correspondence. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(2):259–271, 2013. 1, 3, 7, 8, 10, 12 [29] Z. Xu, M. A. Figueiredo, and T. Goldstein. Adaptive admm with spectral penalty parameter selection. arXiv preprint arXiv:1605.07246, 2016. 12 [30] M. Zaslavskiy, F. Bach, and J.-P. Vert. A path following algorithm for the graph matching problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12):2227– 2242, 2009. 1 [31] R. Zass and A. Shashua. Probabilistic graph and hypergraph matching. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008. 1, 7 [32] Y. Zeng, C. Wang, Y. Wang, X. Gu, D. Samaras, and N. Paragios. Dense non-rigid surface registration using high-order graph matching. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 382–389. IEEE, 2010. 1

[1] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE transactions on pattern analysis and machine intelligence, 24(4):509– 522, 2002. 1, 8 [2] D. P. Bertsekas. Nonlinear programming. Athena scientific Belmont, 1999. 3, 4 [3] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and R in Machine Learning, 3(1):1–122, 2011. 4 Trends [4] S. Boyd, L. Xiao, A. Mutapcic, and J. Mattingley. Notes on decomposition methods. Notes for EE364B, Stanford University, pages 1–36, 2007. 3, 4, 7 [5] R. E. Burkard, E. Cela, P. M. Pardalos, and L. S. Pitsoulis. The quadratic assignment problem. In Handbook of combinatorial optimization, pages 1713–1809. Springer, 1998. 1 [6] M. Cho, J. Lee, and K. M. Lee. Reweighted random walks for graph matching. In Computer Vision–ECCV 2010, pages 492–505. Springer, 2010. 1, 7, 8 [7] M. Cho, J. Sun, O. Duchenne, and J. Ponce. Finding matches in a haystack: A max-pooling strategy for graph matching in the presence of outliers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2083–2090, 2014. 1, 7 [8] L. Condat. Fast projection onto the simplex and the `1 ball. Mathematical Programming, pages 1–11, 2014. 6 [9] T. Cour, P. Srinivasan, and J. Shi. Balanced graph matching. Advances in Neural Information Processing Systems, 19:313, 2007. 1, 7 [10] G. B. Dantzig and P. Wolfe. Decomposition principle for linear programs. Operations research, 8(1):101–111, 1960. 3 [11] O. Duchenne, F. Bach, I.-S. Kweon, and J. Ponce. A tensor-based algorithm for high-order graph matching. IEEE transactions on pattern analysis and machine intelligence, 33(12):2383–2395, 2011. 1, 7, 10 [12] O. Duchenne, A. Joulin, and J. Ponce. A graph-matching kernel for object categorization. In 2011 International Conference on Computer Vision, pages 1792–1799. IEEE, 2011. 1 [13] S. Gold and A. Rangarajan. A graduated assignment algorithm for graph matching. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 18(4):377–388, 1996. 1 [14] V. Kolmogorov and R. Zabih. Computing visual correspondence with occlusions using graph cuts. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, volume 2, pages 508–515. IEEE, 2001. 1 [15] N. Komodakis and N. Paragios. Beyond pairwise energies: Efficient optimization for higher-order mrfs. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 2985–2992. IEEE, 2009. 3 [16] N. Komodakis, N. Paragios, and G. Tziritas. Mrf energy minimization and beyond via dual decomposition. IEEE transactions on pattern analysis and machine intelligence, 33(3):531–552, 2011. 3

13

• For d = 1: Since A1 = I

[33] F. Zhou and F. De la Torre. Factorized graph matching. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 127–134. IEEE, 2012. 1, 12

Appendix We have shown in Section 3.3 how the subproblems in ADGM1 algorithm can be reduced to (33) where (cd )1≤d≤D are defined by (34) and (35). In this appendix, we show how the subproblems in ADGM2 can be reduced to (33) where (cd )1≤d≤D are defined by (36) (37) and (38). Recall that in ADGM2, we chose (Ad )1≤d≤D such that x1 = x2 ,

x2 = x3 , . . . ,

xD−1 = xD ,

0

···

0

>

we have

A> 1 A1 = I,

(76)

k A> 1 s1 k A> 1y

(77)

= =

−xk2 , y2k .

(78)

Plugging these into (28), it becomes > 1 1 1 2 kxk2 + −xk2 + y2k + pk1 x. 2 ρ ρ

(79)

Clearly, minimizing this quantity over M1 is equivalent to solving (33) for d = 1, where c1 is defined by (36). > • For d = D: Since AD = 0 · · · 0 −I we have

(71)

which can be written in matrix form:           0 0 0 −x2 x1  0  0  0   x2  −x3             .  .  0   0   x3   +· · ·+  ..  =  ..  . (72) +  +  ..   ..   ..       0  0  .   .   .  0 −xD 0 0 0

A> D AD = I,

which can be in turn re-written as A1 x1 + A2 x2 + · · · + AD xD = 0 where Ad is chosen to be the dth (block) column of the following (D − 1) × D block matrix A and y is also a (D − 1) × 1 block vector:   I −I   y2   I −I    y3    ..  , y =  . I A=  ..  .    .    ..   . −I yD I −I (73) From (20) we easily have    k+1  x1 − xk+1 0 − xk2 2 k+1   xk2 − xk3   xk+1  k   2 − x2  k     .. sk1 =  x3 − x4  , skD =   (74) .     .. k+1 k+1     . xD−2 − xD−1 xkD−1 − xkD xk+1 D−1 − 0

(80)

k+1 k A> D sD = −xD−1 ,

(81)

k A> Dy

(82)

=

k −yD .

Plugging these into (28), it becomes > 1 k 1 k 1 2 k kxk2 + −xD−1 − yD + pD x. 2 ρ ρ

(83)

Minimizing this quantity over MD is equivalent to solving (33) for d = D, where cD is defined by (37). • For 2 ≤ d ≤ D − 1: Since (the below non-zero blocks are at the (d − 1)-th and d-th positions) Ad = 0 · · ·

0 −I

I

0

···

> 0

we have A> d Ad = 2I, k+1 k k A> d sd = −xd−1 − xd+1 , k k k A> d y = −yd + yd+1 .


and  k+1  x1 − xk+1 2 ..      k+1 . k+1  x  − x d−1   d−2 k+1   x − 0  d−1 skd =   0 − xk   d+1   xk − xk   d+1 d+2    ..   . k k xD−1 − xD

> 1 k 1 k k − − (yd − yd+1 ) + pd x. ρ ρ (84) Minimizing this quantity over Md is equivalent to solving (33), where cd is defined by (38). 2 kxk2 +

∀2 ≤ d ≤ D − 1.

(75)

Now we compute the vectors (cd )1≤d≤D . 14

−xk+1 d−1

xkd+1