A Comparison of two Energy Minimization Methods for ... - CiteSeerX

2 downloads 0 Views 197KB Size Report
most di cult, challenging, and interesting among all is the graph matching prob- lem. Not only is ... In statistical physics, energy functions are minimized using sev- eral methods, e.g. the .... (b) An oriented cycle of 4 that can be found in (a). Here, Jij's within ..... An algorithm for graph optimal isomorphism. pro- ceedings 1984 ...
A Comparison of two Energy Minimization Methods for Graph Matching Mohammad A. Abdulrahim and Manavendra Misra Department of Mathematical and Computer Sciences Colorado School of Mines Golden, Colorado 80401, USA Phone: (303) 273-3873 Fax: (303) 273-3875 [email protected] and [email protected]

Abstract. Two approaches for solving the graph matching problem are

discussed. Among the nonlinear optimization approaches that are inspired by biology, the Dynamic Link Architecture (DLA) is discussed. The DLA presents a learning style where signi cant relations in the input patterns are recognized and expressed by the unsupervised selforganization of the components of the DLA. To evaluate the eciency of the DLA approach, the Graduated Assignment (GA) algorithm is presented and discussed. The paper concludes with a presentation of the attractiveness of the DLA in solving the graph matching problem for purposes of invariant pattern recognition.

1 Introduction Graph matching is used in a broad variety of elds for pattern recognition and classi cation purposes. One such eld is medicine, where one may need to identify malignant cells in an image. To help biologists uncover the semantics of proteins, researchers have presented computational approaches to discovering these semantics in molecular biology [13]. Similar requirements in chemical and physical labs make ecient image matching mechanisms important. In the eld of computer vision, the problem of nding the correspondence between parts of structural descriptions in order to make the corresponding properties and relations as consistent as possible is fundamental [24, 23]. For example, matching feature point descriptions derived from a sensing system with known models stored in a database is considered to be an essential part of a computer vision system. Robotics is another application area for graph matching, where a vision-based robot needs to recognize and identify objects in its workspace. In the military, sophisticated radars are built in jet ghters to identify other aircrafts as friend or foe as an extra precaution. Most of these applications involve comparing and matching graphs (or shapes) to other graphs. Hence, researchers give importance to designing algorithms for comparing graphs and deciding whether the graphs match. For real time applications, some of which are mentioned above, fast and ecient graph matching is vital and has become a challenge for many computer scientists and applied mathematicians. Given two graphs G and H, matching these two graphs means constructing a (1-to-1) map from the nodes of G to the

nodes of H in such a way that any two nodes i and j in G which are linked to each other by an edge in G, map into two nodes k and l in H which in turn are linked by an edge in H, and vice versa. There are two types of the graph matching problems. The rst, and more dicult in terms of computational complexity, is graph isomorphism [19] (some times called exact graph matching). Graph isomorphism is considered to have a stronger notion of graph matching compared to other types of graph matching. The exact complexity of this problem is still open; though it is clearly in NP, it has neither been proven NP-complete nor in P [5, 2]. This type of graph matching is often not appropriate for real applications of pattern recognition. Real images tend to have distortions and degradations. Trying to search for a previously stored image that must be isomorphic to a given distorted one is nearly impossible. Graph isomorphism in this sense is too restrictive and impractical. Therefore, the need for less restrictive graph matching became apparent. Hence, the inexact graph matching problem emerged. It is a more convenient approach than the exact match because not only does it make the computation faster, but it also accommodates various distortions and degradations that usually accompany real images. The existence of powerful tools for representing images and objects encouraged researchers and scientists to develop ecient algorithms to deal with various related applications. Image understanding is one application where one needs to extract the image description as a graph representation from the given data [4]. There are many other applications that require handling these structured descriptions like classi cation, editing, enhancing, etc. of given descriptions. The most dicult, challenging, and interesting among all is the graph matching problem. Not only is comparing objects based on their graph representations convenient, but it is also a logical way of handling invariance [27, 12, 11, 18]. A representation of an object is said to be invariant with respect to a given transformation if the representation does not change when the transformation is applied to the original object. There are several types of transformation invariances. Important transformations in pattern and object recognition are translation (or position invariance), dilation, rotation, and distortion [27]. For instance, a position-invariant representation of a 2-D image is a description which does not depend on the absolute position within the 2-D plane, but rather is based on the relative positions of the object features [3, 12]. Graph matching solutions found in the literature can be summarized and categorized into two approaches. First, the brute-force method of searching the whole state-space of possible matchings. Some techniques like branch and bound have been followed [28] to cut down the size of the search space. This category of algorithms will nd the optimal matching, however it may be very slow and impractical for large graphs. Second, nonlinear optimization methods are used to approximate the best match between two graphs. Although the state-space is not searched exhaustively, close to optimal match in reasonable time is achieved. The following solutions use nonlinear optimization.Some methods [12] use a form

of relaxation labeling like probabilistic relaxation. Other methods are based on eigen decomposition [24] or linear programming techniques [1]. Lately, arti cial neural networks (ANN) were used to optimize the solution to the graph matching problem [3, 25, 27, 17, 20, 29, 11]. Very recently, an algorithm, called Graduated Assignment [6], which is based on softassign and softmax [7], solved the graph matching problem eciently in an iterative manner. One of the ANN approaches called the Dynamic Link Architecture and the Graduated Assignment algorithm are the focus of this paper. The rest of the paper is organized as follows. Graph matching using the Dynamic Link Architecture is described in Section 2. In Section 3 the Graduated Assignment algorithm is discussed. Discussion of the feasibility of the Dynamic Link Architecture is presented in Section 5. Finally Section 6 summarizes and concludes the paper.

2 Graph Matching Using ANNs ANNs have shown great promise for solving dicult optimization problems in general. Several other optimization techniques have been used to solve such problems too. In statistical physics, energy functions are minimized using several methods, e.g. the simulated annealing technique. Hop eld and Tank demonstrated an ANN to perform the optimization for various energy (objective) functions [9, 22]. This became known as the Hop eld network. Unfortunately, the Hop eld network has not been very successful in solving large optimization problems. Other researchers have successfully used ANNs to solve relatively large optimization problems [3, 27, 20, 11, 21]. The minimization of objective functions became an attractive way to formulate and solve pattern and object recognition problems since Hop eld presented his network in 1982. The Hop eld network was capable of storing several patterns and retrieving any of them when presented with one that is similar to one of the stored ones. This network is used as an associative (content addressable) memory. Given its importance, we provide a brief overview of the Hop eld model in the following subsection. For details refer to [10] and [8].

2.1 Hop eld Network and Associative Memory

The Hop eld network consists of N neurons, each of which is a McCulloch-Pitts neuron. Each neuron has N directed links leaving it and entering other neurons including itself comprising N 2 total links. For any pattern to be stored, the weights on each of the N 2 links must be setup appropriately to facilitate correct retrieval when required. Let Tij be the weight on the link from neuron i to neuron j. Suppose that a binary pattern  of length N is to be stored starting from when all the weights are initialized to zero (Tij = 0; 1  i; j  N ). Let i be the ith bit in the pattern, the weight Tij is given by Tij = N1 i j :

For each additional pattern stored, Tij is updated as Tij (new) = Tij (old) + N1 i j : In general, let i denote the ith bit of the pattern , thenPto store P patterns, the nal weights of the network are computed as Tij = N1 P=1 i j . With the computed weights, if the network is presented with one of the stored patterns, it performs several asynchronous activations of neurons based on a non-linear activation function of the weighted sum of inputs until it converges. This process can be represented by an energy function: X H () = ? 12 Tij i j ; where i 2 f0; 1g is the activation value of neuron i. i = 1 indicates that neuron i is active, and i = 0 indicates that it is inactive. Convergence here means nding i values in such a way as to stablize the memory states (stored patterns   ;  2 f1; . . . ; P g). This amounts to reaching a stable state (a pattern   ) that is closest to the starting state (the presented pattern). The Hop eld network is able to retrieve the most similar stored pattern when the presented (or the input) pattern di ers moderately from one of the stored ones. Since the stored patterns are binary, the network retrieves the pattern with the smallest Hamming distance from the input pattern. One drawback of the Hop eld model is that it cannot eciently deal with image transformations which are highly desirable features in many practical applications. This led to the search for systems that can recognize patterns that have been a ected with some form of transformations e.g. translations and distortions.

2.2 Dynamic Link Architecture (DLA) Malsburg proposed a new neural information processing concept, the Dynamic Link Architecture (DLA) [26], as an extension to conventional ANNs. The strength of the DLA concept lies in the utilization of synaptic plasticity that is on the time scale of information processing and not only for memory acquisition. This means that the activities of related neurons must be utilized to derive some useful semantics and not depend merely on the activity of a single neuron. For instance, those neurons that become active and/or inactive together on a fast time scale should be grouped and dealt with as a higher symbolic unit. This ability of binding neurons together was not possible in conventional ANNs. Malsburg and Bienenstock [25, 3] presented the DLA as an extension to the principle of associative memory that can perform invariant pattern recognition. Unlike the abstract associative memory of Hop eld where a state only has an activity associated with it: i 2 f0; 1g, in the DLA states fig also have features of the object or image associated with them. As a rst step towards achieving this exibility, a neural network was presented for the retrieval of super imposed patterns [25]. Following this, an invariant pattern recognition neural network

was presented [3] which used the new associative memory. The combination of these two systems served as a demonstration of invariant object recognition using ANNs that embodies the DLA concept. To explain the dynamics of the DLA, we rst show how real features are embedded in the neurons. Let B be a set of N neurons that are fully connected like in the Hop eld network, and F be a set of feature types. F may be as simple as the di erent grey levels of pixels in an image. Let ' : B ! F be a mapping that assigns a feature type to each neuron. Let the neurons in B be designated by 1  i  N . Each neuron i 2 B is assigned a feature '(i) 2 F . Given an image, if neurons encode certain features of pixels of the image at certain positions, the set of these neurons will indeed represent the given image. This alone will not handle shifted images. Therefore, the relational representation of an image is utilized by amending the original McCulloch and Pitts framework. Dynamic variables fJij g are introduced to exploit temporal correlations between the di erent neurons and encode spatial relationships between the labeled neurons. These dynamic variables change on a fast time scale and interact with the second-order correlations < i j >. The relative positions of the di erent features in the image are encoded dynamically using the dynamic variables fJij g, and the absolute position of each feature according to ' is ignored to facilitate invariance. Just as in the Hop eld model, Tij is the weight on the connection from neuron j to neuron i. Tij is modi ed during the storage and stays xed after all images or patterns are stored. To distinguish the new variables fJij g, let's call them links. Let  : B ! f0; 1g be the activation of the neurons. These variables fJij g are constrained by Jij  Tij to ensure that Jij is modi ed only for existing synapses, i.e. there is no Jij variable (link) from neuron j to neuron i when Tij = 0. The 's change on a fast time scale while the J's change on a slower time scale. To implement this, two energy functions are constructed to perform the dynamics of these two scales: a fast scale for the 's X Hj () = ? Jij ij ; (1) i;j

and a slow scale for the J's XX X H (J ) = ? Jij < i j > + ( Jij ? p)2 : i;j

i

j

(2)

In equation (1) Jij is xed and  changes. The neurons that are active together (on the same time scale) are favored ( having  : B ! f?1; +1g, then neurons that are inactive at the same times are also favored). In equation (2) with the slower changing variables fJij g, the rst term favors higher Jij 's with higher correlations < i j > (assuming these correlations have been computed earlier since they are on the fast time scale) and the second term favors neurons with p active neighbors. The parameter acts as a weight adjustment factor on the second term. Neurons that are assigned to di erent nodes of the graph representation of an image have dynamic links Jij 's that change during the optimization of the energy function. At low energy levels the graph with the

dynamic links decomposes into small connected blocks. Each block has about

p+1 neurons. The activation of neurons within such blocks are similar. Imagine

these blocks as di erent stored images, and the aim will be to have only one block with its neurons active. This was demonstrated by superimposing several patterns onto a network and later retrieving them [25].

2.3 Recognition of Translated Patterns

The power of DLA becomes apparent when applied to complex problems like invariant object recognition. Bienenstock and Malsburg have demonstrated the power and ability of the DLA in recognizing a 2-D shape that is shifted and having minor distortions [3]. This recognition is constructed as a problem of matching labeled graphs where the labels are local features extracted from the image and the edges represent the neighborhood relationships between the local features in the image. First, a layer A of neurons is needed to present the input image. Memory layer B, where patterns are stored, and layer A have to be connected in a way to make the recognition successful. Finally, energy functions (1) and (2) are modi ed for the new neural network structure. The fast scale energy function (1) is eliminated and the slow scale function (2) is modi ed as follows. (Since we have two layers now, A and B, the new energy functions will have notations to specify variables as to which layer they belong). X XX XX H BB (J ) = ? Jij Jjl Jik Jkl + ( Jij ? p)2 + ( Jij ? p)2 : (3) i;j;k;l B 2

i Bj B 2

j Bi B

2

2

2

Since energy function (1) is eliminated and we still need information about the correlations < i j >, these correlations are approximated by one of the leading terms, the Jij 's in (3). H BB indicates that Jij links and their dynamical modi cations are performed exclusively by neurons in layer B. At low states of H BB , any stored pattern may be active under the cost and constraint terms of H BB . H BB has as many local minima as number of patterns stored. Another energy function will be used to pick the pattern that matches the input pattern in layer A. This energy function will be explained below. First, note that the images are stored in B in a way that each neuron has 3 incoming connections and 3 outgoing ones (see Figure 1(a)). This connectivity, contrasted with the non-oriented 4-neighbor connections, is used to reduce chances of falling in local minima [3]. The rst term of (3) favors 4 neighbors constituting cycles of 4 links as shown in Figure 1(b). Having p=3, the second term in (3) is a constraint that forces 3 incoming links while the last term forces 3 outgoing links for each node. The larger is, the more these constraints are enforced. As mentioned above, at low energy of (3) H BB (J ) has as many minima as the number of stored patterns, therefore we need a way to choose one of these minima to be the global minimum that should correspond to the input pattern in layer A. A similar function to (3) is constructed as follows. XX X XX H AB (J ) = ? Jij Jjl Jik Jkl + ( Jik ? p )2 + ( Jik ? p )2 : 0

i;j A;k;l B 2

2

0

i Ak B 2

2

0

0

k Bi A 2

2

(4)

l l k k

j

j

i

i

(a)

(b)

Fig. 1. (a) A connectivity scheme with 3 incoming and 3 outgoing links at each node. (b) An oriented cycle of 4 that can be found in (a).

Here, Jij 's within layer A or B are xed. For example, let P A be a pattern presented to layer A and P B be one of the patterns stored in layer B. Then Jij = TijA for i; j 2 A and Jkl = TklB for k; l 2 B . The inter-layer variable links fJik g, where i 2 A and k 2 B , are modi ed during the recognition process. These Jik 's are constructed as follows: 8i 2 A, neuron i with feature f is linked to all neurons k 2 B with feature f. All other Jik 's are null. The rst term of (4) favors cycles of 4 links across layers A and B. The second and third terms enforce a 1-to-1 mapping from A to B when p is set to 1 and is large enough. Energy functions (3) and (4) must be minimized simultaneously. Hence the overall energy function that does graph retrieval and recognition at the same time is constructed by adding (3) and (4) as: H (J ) = H BB (J ) + H AB (J ); (5) where  is a weighting parameter for appropriate balance between the two terms. The following steps are the summary of the recognition process assuming that patterns are already stored in layer B. { An input image is presented to layer A. Neurons are chosen and activated and Jij links are constructed as a representation of the input image. Recall that these Jij 's are xed. { For each neuron i that is active in A, all neurons k in B that have the same feature types as neuron i are activated and linked to neuron i creating the links fJik g. { Energy function (5) is minimized where a 1-to-1 mapping across layers and grouping of neurons in layer B are achieved simultaneously. The 1-to-1 mapping is achieved by minimizing energy function (4) and the grouping, which 0

0

amounts to choosing a pattern from B that best matches the input pattern, is achieved by minimizing energy function (3).

3 The Graduated Assignment Algorithm Gold and Rangarajan [6] have recently presented an algorithm for solving the inexact graph matching problem with great success. The algorithm is called Graduated Assignment (GA). It is interesting that the GA approach has a lot of similarities to the DLA approach. In the GA approach, as in most optimization problems, an objective function with some constraints was constructed to solve the inexact graph matching problem. As an example, the GA algorithm was demonstrated for the weighted graph matching problem. We will present the GA algorithm for attributed graph matching as a step towards nding similarities between the GA and the DLA approaches. Given two undirected graphs G and H where link-weights in each are in R1 and nodes have attributes from a nite set of attributes, we nd the match matrix M such that the following objective function is minimized. The energy (objective) function is A X A X I X A X I I X X (2) Mik Mjl Cikjl + Mik Cik(1); Eawg (M ) = ? 12 i=1 k=1 j =1 l=1

and is subject to the constraints 8i

8k Mik 2 f0; 1g.

(6)

i=1 k=1 PI M  1; 8k PA M  1; 8i and k=1 ik i=1 ik

(2) Graphs G and H have A and I nodes respectively. fCikjl g are de ned as (2) (2) (2) follows. Cikjl = 0 if either Gij or Hkl is null, and Cikjl = c (Gij ; Hkl) otherwise. fGij g and fHklg are the graphs' adjacency matrices whose elements may be in R1 or NULL. Gij is the weight of the link between nodes i and j in graph G. Matrix M is the map of corresponding nodes between the two graphs. At convergence, Mik = 1 if node i in G matches node k in H, and is zero otherwise. The function c(2)(:; :) is chosen as a measure of compatibility between the links of the two graphs. Notice that we will assume a special case where all links have the same weight to facilitate attributed graph matching. In this case c(2)(:; :) is always 1, given that none of the two links is NULL. Cik(1) = c(1)(i; k) is a measure of compatibility between node i in G and node k in H. The steps of GA algorithm to minimize the objective function follow. Given any valid initial condition M 0 of the matching matrix, the objective function in (6) can be expanded about this initial condition via a Taylor series approximation: A X I X A X A X I I X X (2) Mik Mjl Cikjl + Mik Cik(1) ? 21 i=1 k=1 j =1 l=1 i=1 k=1

 ? 21

A X I X A X I X i=1 k=1 j =1 l=1

Mik0 Mjl0 Cikjl ?

A X I X i=1 k=1

Q0ik (Mik ? Mik0 )

(7)

where

P P Qik = ? @E@M(M ) = + Aj=1 Il=1 Mjl Cikjl + Cik(1). ik

Now minimizing the Taylor series expansion is equivalent to maximizing +

A X I X i=1 k=1

Q0ik Mik :

(8)

This term represents an assignment problem which has been solved by different techniques (an ecient way of solving it was described by Gold and Rangarajan [7]). The general procedure to solve the problem is as follows:

{ Start with a valid initial value for M. { Do a rst order Taylor series expansion as above (7). { Solve the current resulting assignment problem (8) using an appropriate method.

{ Take the resulting solution, i.e. M, and substitute back in (7). { Repeat the last three steps until convergence or maximum(constant) number

of iterations is reached. This constant is found to be practically adequate for the problem to converge with very acceptable solutions.

The reported results of testing this algorithm are promising. The procedure was tested on an SGI Indigo workstation with an R4400 processor. Being an optimization approach for solving the graph matching problem, the GA algorithm has a low order complexity run time, O(lm) where l and m are the number of links in the two graphs, compared to other optimization techniques. For detailed test results and comparisons with probabilistic relaxation, refer to the GA algorithm [6].

4 A Comparison of the Two Techniques The dynamics of the GA algorithm show a lot of compatibility with those of the DLA. Therefore, we present the similarities and di erences of these two approaches in this section. Ultimately, we are interested in combining the biological appeal of the DLA with the ecient computation of the GA. Initially, to avoid confusion notice that the terms input graph, the rst graph, or layer A may be used interchangeably, as well as the stored graph, the second graph, or layer B. Whenever the graphs are named, then G and H are used to denote the rst and second graphs respectively.

4.1 Compatibility of DLA and GA Comparing the energy function for the DLA (equation 5) and the one used in the GA approach (equation 6), we nd a great similarity. Let us rst assume that the input graph is compared to only one layer B graph at any given time by the DLA. Since this is what the GA algorithm does, we can compare the two dynamics equivalently. We also assume that the input and the stored graphs represent a 2-D image with a simple 4-neighbors setup without any weights on the links within the graph.

J ik i

k k i

J jl j

l

.7

l .4

j

(b)

Layer A

Layer B (a)

Fig. 2. (a) The dynamic links (thinner lines) across the layers A and B of the DLA method. The solid lines are the permanent links. Notice that nodes i and j in the input graph (layer A) have similar features to nodes k and l in the stored graph (layer B) respectively, making the cycle of length 4. (b) Degree of compatibility between the nodes from the two graphs using the GA method. The process of recognition in the DLA entails nding the closest solution to a 1-to-1 mapping of nodes in layer A to nodes in layer B. Since we have only one stored graph, we do not need equation (3) and the two methods, namely the DLA and the GA, have similar dynamics of the optimization process. The rst term of (4) favors cycles of length 4 as shown in Figure 2(a) where nodes i and j in graph A are mapped respectively to nodes k and l in graph B. The second and third terms enforce 1-to-1 mapping between nodes in graph A and nodes in graph B when p is set to 1. For the GA algorithm, since both the 0

input graph and the stored graph do not have weights on their edges, the value (2) of the weight compatibility function reduces to Cikjl 2 f0; 1g. In this case, the rst term of equation (6) alone does not quite serve the same purpose of the rst term of equation (4) since there is nothing that serves as a preliminary selective match between the nodes of the two graphs as was the case with the DLA links. Recall that in the DLA, nodes in graph A are linked only to nodes in graph B that have the same features. This selective match is achieved by the second term of equation (6) which favors mapping most compatible nodes between the two graphs, where Cik(1) is a compatibility value between node i in graph G and node k in graph H. Searching for the best 1-to-1 mapping is done in the DLA by continuously strengthening or weakening the dynamic links as explained in Section 2.2. On the other hand, based on the values of the compatibility function C (1) between each pair of nodes i in the input graph and k in the stored graph, the GA algorithm solves the assignment problem in an iterative way using the softmax relationship to nd the best 1-to-1 mapping. In Figure 2(b), values of C (1) are shown, e.g., the compatibility between nodes i and k is .7, and it is .4 between nodes j and l. These values could be the strengths of the links Jik and Jjl at any time during the DLA dynamics. So both terms of equation (6) serve as the rst term of equation (4). The last two terms of the DLA energy function (4) which serve to force the 1-to-1 mapping, are given as the constraints on the GA energy function (6).

4.2 DLA and GA tradeo s If we are matching a given input graph against a number of stored graphs, the DLA presents a more ecient storage of these graphs by using the superposition method [25]. In the GA approach, each match requires a separate matrix. On the other hand, the GA algorithm is computationally more ecient. The convergence time in GA is bounded by a constant number of iterations; yet, the solution is close to optimal and good for most practical applications. The process performed by the DLA to do the matching is non-deterministic, although experiments have shown great success in reasonable time [27]. A drawback of the DLA is the assumption of the structure of the graphs. For instance, p needs to be set, then experimental manipulations are required to nd proper values for ; ; and  . This kind of assumption is not required by the GA algorithm. The DLA is more biologically inspired given the way it compares the input graph to many stored graphs in parallel. Hence, the DLA is more convenient and more practical in terms of nding a mapping and at the same time selecting a graph among the several graphs that are stored. In case of the GA, the algorithm is repeated for every graph that needs to be compared to a given one. In fact, convenience does not come from the way the process is performed, but rather the speed and accuracy of the produced results. Therefore, deciding which method is more convenient needs careful mathematical analysis which is not done yet. 0

4.3 Mapping DLA to GA The existing similarities between the DLA and GA and their tradeo s make it desirable to create a model that combines the strengths of both. This system will have the power of the DLA in terms of its structure and topology of a layered neural network, and the speed of the optimization process of the GA and its

exibility of needing no assumptions of the graphs. Note: there might be some confusion in the word mapping in the discussion below. The solution to the problem of graph matching is basically nding a 1to-1 mapping between the nodes of the input graph and the stored graph. Let's call this process the nal-mapping. In this section we discuss the mapping of the energy functions of the DLA to the GA tools of solution. Let's call this DG-mapping. As before, we assume that we have one graph stored in the memory layer B of the DLA network. Hence the role of equation (3) is eliminated. The process of recognition has two phases. First, a labeled graph is presented to the input layer of the DLA model. Consequently, all dynamic links (the Jij ; i 2 A and j 2 B ) are constructed. The second phase constitutes the recognition by nding a nal-mapping between the input layer A and the memory layer B. We let this phase be processed and performed using the GA algorithm. The static links Jij , where both i; j 2 A or both i; j 2 B , are preset to either 1 or 0 to indicate the original connections in the input or stored graphs. The dynamic links Jik and Jjl connect the nodes i and j in the input graph to the nodes k and l in the stored graph where '(i) = '(k) and '(j ) = '(l). Jik and Jjl are DG-mapped to Mik and Mjl of equation (6) respectively. The compatibility function C (2) in equation (6) is de ned as  if either Jij or Jkl is 0 (2) Cikjl = 10 otherwise Note that by this de nition we are ensuring the sparsity of the weight compatibility matrix C when the graphs are sparse, a condition desired by the GA algorithm. To force the nal-mapping to occur only between matching nodes in terms of their attributes, Jik is DG-mapped to Mik as mentioned above and Cik(1) in equation (6) is de ned as 8 < 1 if '(i) = '(k) where i and k are nodes in the input and memory Cik(1) = : layers respectively of the DLA model 0 otherwise The 1-to-1 nal-mapping that is enforced by the last two terms of equation (4) is left as a set of constraints for the GA algorithm as was explained in Section 3. Now, we can use the computational algorithm of the GA to solve the graph matching problem.

5 Discussion The literature is full of a variety of solutions to the graph matching problem. Most of these solutions have assumptions and restrictions on the size of graphs, topology, number of features, etc. Two such solutions to the graph matching problem are the DLA model and the GA algorithm. One needs to ask the question, is the DLA concept a promising ANN method for solving the graph matching problem? The phenomenon of correlations between neurons that are realized locally and lead to a global solution, which is a global minimum or maximum of the energy function, is ideal for implementation on parallel computers. The intrinsic and inherent massive parallelism in the neural network paradigm of solving problems (including the DLA) has been eciently implemented on programmable, general purpose, parallel digital architectures [16]. The speed achieved in this case is optimal for these problems (graph matching) using ANNs. In addition, the biological plausibility that was noted by Hop eld boosts the ANN approach for solving such problems in computer vision and scene analysis. Are ANNs the best paradigm for carrying on the process of optimization of objective functions? The GA algorithms presented in Section 3 prevents a con-

clusive answer to this question especially with the given results of experiments on a sequential machine. Although simulated annealing, which is considered to be computationally expensive, was used, the GA proved to be faster and more accurate than the most successful standard optimization techniques such as probabilistic relaxation. For details of comparison results, refer to the GA algorithm paper [6].

6 Summary and Conclusions We classi ed the approaches for solving the graph matching problem into two categories. First, the generation of the state space and doing a tree search (the brute force method). The graph matching problem is found to be intractable when this method is used. Secondly, nonlinear optimization approaches can be used. These are more successful in the real world and were used to approximate the best (optimal) solution to the graph matching problem. Among these, we discussed the Dynamic Link Architecture concept as an ANN approach and the Graduated Assignment algorithm as a non-ANN approach. Finally, we showed some similarities between the two solutions and suggested a combined solution. Both of the discussed solutions, like most approaches of solving the graph matching problem as an optimization problem, share a skeleton of an objective or energy function. They di er either in the optimization technique or the way of imposing the costs and constraints. The common skeleton in these objectives is the enforcement of a link-to-link instead of node-to-node matching in the interactions between the two graphs. For these approaches to be more feasible, the optimization of the objective function must be done in a distributed manner on parallel machines. Good implementation for these applications requires high

dynamic connectivity capabilities. This is quite achievable; a number of recon gurable parallel architectures have been proposed [15, 14]. Therefore, we described a combined system that has the powerful and biologically inspired architecture (of the DLA) with the computational eciency (of the GA) to solve the graph matching problem. It remains to nd possible generalization of distributing the optimization steps of the objective function onto such a parallel machine.

References 1. H. A. Almohamad and S. O. Du uaa. A linear programming approach for the weighted graph matching problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(5), May 1993. 2. David A. Basin. A term equality problem equivalent to graph isomorphism. Information Processing Letters, 51:61{66, 1994. 3. Bienenstock and C. von der Malsburg. A neural network for invariant pattern recognition. Europhysics Letters, 4(1):121{126, July 1987. 4. M. A. Eshera and King-Sun Fu. An image understanding system using attributed symbolic representation and inexact graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(5), September 1986. 5. M. R. Gary and D. S. Johnson. Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman, San Francisco, California, 1979. 6. Steve Gold and Anand Rangarajan. A graduated assignment algorithm for graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, in press, 1996. 7. Steven Gold and Anand Rangarajan. Softassign versus softmax: Benchmarks in combinatorial optimization. In David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 626{632, Cambridge, Massachusetts, London, England, 1995. MIT Press. 8. John Hertz, Anders Krogh, and Richard G. Palmer. Introduction to the Theory of Neural Computation, Lecture Notes, volume 1 of The Advanced Book Program. Addison Wesley, Redwood City, California, 1991. 9. J. J. Hop eld and D. W. Tank. Neural computation of decisions in optimization problems. Biological Cybernetics, 52:141{152, 1985. 10. John J. Hop eld. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA, 79:2554{2558, April 1982. 11. Martin Lades, Jan C. Vorbruggen, Joachim Buhmann, Jorg Lang, Christoph von der Malsburg, Rolf P. Wurtz, and Wolfgang Konen. Distortion invariant object recognition in the dynamic link architecture. IEEE Transactions On Computers, 42(3), March 1993. 12. S. Z. Li. Matching: Invariant to translations, rotations and scale changes. pattern recognition, 25(6):583{594, 1992. 13. R. J. Lipton, T. G. Marr, and J. D. Welsh. Computational approaches to discovering semantics in molecular biology. Proceedings IEEE, 77(7):1056{1060, 1989. 14. M. Maresca, H. Li, and P. Baglietto. Hardware support for fast recon gurability in processor arrays. In 1993 International Conference on Parallel Processing, pages I{282{I{289, 1993. 15. Russ Miller, V. K. Prasanna-Kumar, Dionisios I. Reisis, and Quentin F. Stout. Parallel computations on recon gurable meshes. IEEE Transactions on Computers, 42(6):678{692, June 1993.

16. Manavendra Misra. Implementation of neural networks on massive memory organizations. IEEE Transactions on Circuits and Systems, 39(7):476{480, July 1992. 17. Eric Mjolsness, Gene Gindi, and P. Anandan. Optimization in model matching and perceptual organization. Neural Computation, 1:218{229, 1989. 18. Mengkang Peng and Narendra K. Gupta. Invariant and occluded object recognition based on graph matching. Int. J. Elec. Enging. Educ., 32:31{38, 1995. 19. Dong Su Seong, Young Kyu Choi, Ho Sung Kim, and Kyu Ho Park. An algorithm for optimal isomorphism between two random graphs. Pattern Recognition Letters, 15:321{327, 1994. 20. Grant Shumaker, Gene Gindi, Eric Mjolsness, and P. Anandan. A neural net for object recognition via graph matching. Technical Report 8908, Yale University, Electrical Engineering Dept., April 1989. 21. P. N. Suganthan, E. K. Teoh, and D. P. Mital. Pattern recognition by homomorphic graph matching using hop eld neural networks. Image And Vision Computing, 13(1), February 1995. 22. David W. Tank and John J. Hop eld. Simple neural optimization networks: An a/d converter, signal decision circuit, and a linear programming circuit. IEEE Transactions on Circuits and Systems, cas-33(5), May 1986. 23. W. H. Tsai and K. S. Fu. Error-correcting isomorphisms of attributed relational graphs for pattern analysis. IEEE Trans. Syst., Man, Cybernet., SMC-9:757{768, December 1979. 24. Shinji Umeyama. An eigendecomposition approach to weighted graph matching problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(5), September 1988. 25. C. von der Malsburg and E. Bienenstock. A neural network for retrieval of superimposed connection patterns. Europhysics Letters, 3(11):1243{1249, June 1987. 26. Christoph von der Malsburg. The correlation theory of brain function. Technical report, Max-Plank-Inst., Gottingen, FRG, 1981. Internal Report. 27. Christoph von der Malsburg. Pattern recognition by labeled graph matching. Neural Networks, 1:141{148, 1988. 28. M. You and A. K. C. Wong. An algorithm for graph optimal isomorphism. proceedings 1984 ICPR, pages 316{319, 1984. 29. Shiaw-Shian Yu and Wen-Hsiang Tsai. Relaxation by the hop eld neural network. pattern recognition, 25(2):197{209, 1992.

This article was processed using the LaTEX macro package with LLNCS style

Suggest Documents