A Node-Mapping-Based Algorithm for Graph Matching. Adel Hlaoui and Shengrui Wang. Sciences/DI, Université de Sherbrooke, Sherbrooke, Québec, Canada ...
A Node-Mapping-Based Algorithm for Graph Matching Adel Hlaoui and Shengrui Wang Sciences/DI, Université de Sherbrooke, Sherbrooke, Québec, Canada J1K 2R1 E-mail: {adel.hlaoui, shengrui.wang}@usherbrooke.ca
Abstract In this paper, a new graph-matching algorithm is presented and its performance is compared with that of other well-known algorithms performing the same task. The new algorithm decomposes the matching process into K phases, where the value of K ranges from 1 to the minimum of the numbers of nodes in the two graphs to be matched. The efficiency of the new algorithm results from the use of small values of K, significantly reducing the search space while still producing very good matchings (most of them optimal) between the two graphs. After a detailed description of the algorithm, a comparative study is presented, showing the robustness of the new algorithm for the case of random graphs. The experimental comparison is made against three well-known algorithms: eigendecomposition, genetic search and tree search.
Keywords: graph matching, (sub)graph isomorphism, inexact matching, error-correcting.
1. Introduction Attributed graphs have proven to be effective tools for representation and analysis in computer vision. In pattern recognition, graphs are used for representing information about components of objects as well as relations between these components. The graphs are compared in order to measure the degree of similarity between objects. Comparison between graphs involves the concept of graph matching. Graph matching has been used in many applications such as shape retrieval [6][21][22], character recognition [7], recognition of 3D objects in images [8], graphical document analysis [3] and similarity searching in medical image
1
databases [9], to name a few. In the classic concept of exact graph matching [1][2], the aim is to determine whether two graphs are the same or whether a subgraph of one exists in the other. The first of these problems is called graph isomorphism; the second, subgraph isomorphism. In practical applications, objects are often affected by noise and distortion, so using exact graph matching often fails to find the appropriate solution. One way to cope with noisy data is to use inexact graph matching [4][5][10][13][14][15][16][17]. The goal is to find a (sub)graph isomorphism that tolerates distortions; this is known as an error-correcting or error-tolerant (sub)graph isomorphism. Let g = (V , E , α , β ) and g ' = (V ' , E ' , α ' , β ' ) be two graphs, where V and V ' are the sets of nodes for the first and the second graph, respectively; E and E ' are the sets of edges ; α : V → L and α ': V ' → L are the node labelling functions; and, finally, β : V × V → L and β ' : V '×V ' → L are the edge labelling functions. L is a finite alphabet of labels for nodes and edges. To transform one graph into another, six elemental edit operations can be used: node substitution, edge substitution, node deletion, edge deletion, node insertion and edge insertion. An edit sequence S is defined as an ordered sequence of edit operations S that transforms G to G'. An error-correcting subgraph isomorphism between G and G' is a bijective function f : V → V ' , where V ⊆ V and V ' ⊆ V ' .
We assign a cost function to each edit operation. The cost of an error-correcting subgraph isomorphism is given by the sum of the costs of all the individual edit operations in the sequence S required to transform G into G'. Obviously, many edit operation sequences can be used to transform G into G'; the goal of an inexact graph-matching algorithm is to find the best error-correcting subgraph isomorphism, that which minimizes the total cost. Different methods have been proposed to solve the problem of inexact graph matching, but no efficient method has yet been found and the task still presents an attractive challenge to be explored. Here we briefly describe some of the most important existing algorithms. In [24], Conte et al. grouped inexact graph matching algorithms into three classes. The first one groups techniques based on tree search. An interesting early paper in this class was published by Tsai and Fu [17]. This method is able to calculate the optimal matching, but suffers from exponential complexity in terms of graph size. In general, when the graphs to be 2
matched have 10 or more nodes, the algorithm cannot be used. Another optimal algorithm has been proposed by Dumay et al [25]. The algorithm computes a graph distance using A*. In the second class, continuous optimization has been used to cast graph matching as a continuous problem. Methods belonging to this class ensure a suboptimality of the solution in a reasonable time. A genetic search-based algorithm was proposed by Wang et al. [19] to solve the problem of inexact graph matching. The genetic search solution has a significant speed advantage, but its matching performance often suffers heavily from dependence on the value of genetic parameters. Several other methods in this class have been proposed. Among them we can cite the recent approach proposed by Luo and Hancock [26] based on a probabilistic mode matching. The last class contains methods based on spectral analysis. Among this class of methods, an eigendecomposition algorithm using eigenvalues and eigenvectors of graph adjacency matrices has been proposed by Umeyama [18]. The analytic approach used in this method provides a good solution for matching similar graphs. However, its applicability is limited to graphs of the same size. The goal of this work is to develop a general and efficient algorithm that can be used to solve practical graph-matching problems. The proposed algorithm is based on an application-independent search strategy and can be run in a time-efficient way. It decomposes the matching process into K phases, where the value of K ranges from 1 to the minimum of the numbers of nodes in the two graphs to be matched. The efficiency of the new algorithm results from the use of small values of K, which significantly reduces the search space while still producing very good matchings (most of them optimal) between graphs. The outline of this paper is as follows. In Section 2, we introduce the new algorithm and present a detailed example to illustrate its operation. In Section 3, we describe a comparative study. In Section 4, we give some experimental results. Our conclusions are presented in Section 5.
2. The New Matching Algorithm Given two graphs, the goal is to find the matching between their nodes that leads to the smallest matching error between two graphs. This (smallest) matching error can be viewed as the distance between the two graphs. Given a matching between two graphs, the error is defined as the dissimilarity between each pair of matched nodes, plus the dissimilarity between (corresponding) edges. To minimize this error, one should normally take both node mapping and edge mapping into account simultaneously. The basic idea of the 3
new algorithm is to iteratively explore the best possible node mappings and to select the best mapping at each iteration phase. The fundamental hypothesis behind this algorithm is that in order to obtain a good (or optimal) matching between two graphs, one should match similar nodes in the two graphs. Intuitively speaking, this hypothesis is very reasonable in practical applications because most often two objects cannot be declared similar if their components are very different. The proposed algorithm comprises K iteration phases. Apart from the first phase, each phase involves n iteration steps, where n is the size of one of the two graphs. By using a large value of K, the proposed algorithm can be used to find the mathematically optimal matching. However, the main advantage of this algorithm is that the iterative process can often find the optimal mapping within a few iteration phases (for instance, 5). In the first phase, the algorithm selects the mappings that minimize the matching error resulting from node matching only. Of these mappings, those that also give the smallest error in terms of edge matching are retained. In the second phase, the algorithm examines the mappings that contain at least one second-best mapping between nodes and then again computes those mappings that give rise to the smallest error in terms of edge matching. This process continues through the predefined number of phases (K).
2.1
Algorithm Description
We suppose that distance measures associated with the basic graph edit operations have been defined; i.e., costs have already been associated with substitution of nodes and edges, deletion of nodes and edges, etc. The new algorithm is designed for substitution operations only, but it can easily be extended to deal with deletion and insertion operations by considering some special cases. For example, deletion of a node can be performed by matching the node to a special (non-) node. The algorithm is designed to find a graph isomorphism when both graphs have the same number of nodes and a subgraph isomorphism when one has fewer nodes than the other. Given two graphs G1 = (V1, E1, µ1, ν1 ) and G 2 = (V2 , E 2 , µ 2 ,ν 2 ) , in order to extract the most promising mappings, an n × m matrix P = ( p ij ) is introduced, where n and m are the numbers of nodes in the first and second graphs, respectively. Each element p ij in P denotes the dissimilarity between node i in G1 and node 4
j in G2. In order to extract promising mappings, we use a second n × m matrix B = (b ij ) whose elements represent promising node-to-node mappings. The algorithm first initializes matrix P by setting
p ij = d ( µ1 (vi ), µ 2 (v j )) . In the first phase of iteration (Current_Phase = 1) , the algorithm initializes matrix B by setting most elements to zero ( bij = 0 ) and the others to 1, depending on their corresponding values in matrix P. Specifically, for each row in matrix B, the elements corresponding to the minimum elements in the same row of matrix P will be set to 1. Then, for each possible mapping extracted from B, the algorithm computes the error generated by nodes and the error generated by edges. The mapping that gives rise to the smallest matching error will be recorded. In the second phase (Current_Phase = 2) , the algorithm will reset some elements in each row of matrix B to 1 – specifically, those elements that correspond to the second-smallest elements in each row of matrix P. The algorithm will extract those isomorphisms from matrix B that contain at least one node-to-node matching added to matrix B at this phase. Of these isomorphisms and those obtained in the first phase, those with the smallest cost are retained. The algorithm then proceeds to the next phase (Current_Phase = 3) and so on.
A direct implementation of the above approach would result in redundant extraction and testing of isomorphisms, since any matching extracted from matrix B at a given phase will also be extracted from any subsequent matrix B. To solve this problem, a smart procedure has been designed. First, a matrix B'is introduced to keep a copy of all possible node-to-node matchings that have been considered by the algorithm so far. B is used as a ‘temporary’ matrix. At each phase (except the first), each of the n rows of B is examined successively. For each row i of B, all of the previous rows of B will contain all of the possible node-to-node matchings examined so far. Row i contains only the possible node-to-node matchings in the present phase. Finally, all of the following rows of B will contain only the possible node-to-node matchings examined in the previous phases. Such a matrix B guarantees that the isomorphisms extracted as the algorithm progresses will never be the same and that all of the isomorphisms that need to be extracted at each phase will indeed be extracted.
5
We use the following simple example to illustrate the algorithm. Two graphs G1 and G2 are shown in Figure 1. In this figure, the numbers inside the circles denote the node labels and the numbers near the connecting lines denote the edge labels. Matrices P and B for phase 1 are shown in tables 1 and 2, respectively. 1
2
0.6
0.8
1 0.3
2
0.7 0.3
0.6
0.7 0.9
0.1
0.8 0.6
3
0.5
0.2
0.1
3
0.7 0.8 4
Figure 1: Two graphs G1 and G2
0.1
0.1
0
0.2
0.4
0.2
0.3
0.5
0.2
0.4
0.3
0.1
Table 1: Matrix P 0
0
1
0
0
1
0
0
0
0
0
1
Table 2: Matrix B / Phase 1 From the matrix in Table 2, the following matching {(1G1, 3G2), (2G1, 2G2), (3G1, 4G2)} will be extracted and further examined. In phase 2, matrices B and B'will be as follows at step 1: 1
0
0
0
1
0
1
0
0
1
0
0
0
1
0
0
0
0
0
1
0
0
0
1
Table 3: Matrix B Step 1 /Phase 2 and Matrix B' Step 1 /Phase 2 Only one matching {(1G1, 1G2), (2G1, 2G2), (3G1, 4G2)} will be extracted and the matching error is 1.3.
6
1
0
1
0
1
0
1
0
0
0
1
0
0
1
1
0
0
0
0
1
0
0
0
1
Table 4: Matrix B Step 2 /Phase 2 and Matrix B' Step 2 /Phase 2 In step 2, only one matching {(1G1, 1G2), (2G1, 3G2), (3G1, 4G2)} will be extracted and the matching error is 1.4. 1
0
1
0
1
0
1
0
0
1
1
0
0
1
1
0
1
0
0
0
1
0
0
1
Table 5: Matrix B Step 3 /Phase 2 and Matrix B' Step 3 /Phase 2 In step 3, only one matching {(1G1, 3G2), (2G1, 2G2), (3G1, 1G2)} will be extracted and the matching error is 1.8. Tables 4 and 5 show the matching matrix B for steps 2 and 3, respectively. Row 2 of matrix B in Table 4 and row 3 of matrix B in Table 5 contain 1s corresponding to the second-best node-to-node mappings. Thus, there will never be redundant extraction of possible matchings. Matrix B in Table 5 gives four matchings. The first matching {(1G1, 1G2), (2G1, 2G2), (3G1, 1G2)}, the second {(1G1, 1G2), (2G1, 3G2), (3G1, 1G2)}, and the fourth {(1G1, 3G2), (2G1, 3G2), (3G1, 1G2)} are not bijective, so only the third matching {(1G1, 3G2), (2G1, 2G2), (3G1, 1G2)} will be considered. The procedure Matching_Evaluation, which is applied only to valid matching, will be called to determine the best matching obtained so far. In phase 3, matrices B and B'will be as follows at step 1: 0
1
0
0
1
1
1
0
0
1
1
0
0
1
1
0
1
0
0
1
1
0
0
1
Table 6: Matrix B Step 1 /Phase 3 and Matrix B' Step 1 /Phase 3
7
Matrix B in Table 6 gives four matchings {(1G1, 2G2), (2G1, 2G2), (3G1, 1G2)} {(1G1, 2G2), (2G1, 2G2), (3G1, 4G2)} {(1G1, 2G2), (2G1, 3G2), (3G1, 1G2)} and {(1G1, 2G2), (2G1, 3G2), (3G1, 4G2)}. The first and second of these are not bijective. The matching errors of the third and fourth matchings are respectively 1.8 and 1.9. 1
1
1
0
1
1
1
0
1
0
0
0
1
1
1
0
1
0
0
1
1
0
0
1
Table 7: Matrix B Step 2 /Phase 3 and Matrix B'Step 2 /Phase 3 Matrix B in Table 7 gives six matchings {(1G1, 1G2), (2G1, 1G2), (3G1, 1G2)} {(1G1, 1G2), (2G1, 1G2), (3G1, 4G2),} {(1G1, 2G2), (2G1, 1G2), (3G1, 1G2)} {(1G1, 3G2), (2G1, 1G2), (3G1, 1G2)} {(1G1, 2G2), (2G1, 1G2), (3G1, 4G2)} and {(1G1, 3G2), (2G1, 1G2), (3G1, 4G2)}. The first four matchings are not bijective. The matching errors of the fifth and sixth matchings are respectively 1.7 and 1.4. 1
1
1
0
1
1
1
0
1
1
1
0
1
1
1
0
0
0
1
0
1
0
1
1
Table 8: Matrix B Step 3 /Phase 3 and Matrix B'Step 3 /Phase 3 Matrix B in Table 8 gives nine matchings. Only two matchings are bijective {(1G1, 1G2), (2G1, 2G2), (3G1, 3G2)} {(1G1, 2G2), (2G1, 1G2), (3G1, 3G2)}. The matching errors of these matchings are respectively 2 and 2.
2.2
The Graph-Matching Algorithm and its Complexity
Here is the new graph-matching algorithm. Algorithm: Node_Mapping Based Graph Matching Input: Two attributed graphs G1 and G2. Output: Matching between nodes in G1 and G2, from the smaller graph (e.g., G1) to the larger graph (G2) 1-
Initialize P as follows: For each p ij , set p ij = d ( µ1(vi ), µ 2 (v j )) .
2-
Initialize
B
as
follows:
For
set b ij = 0 .( n ≤ m ) 3-
While Current _ Phase ≤ K If
Current _ Phase = 1
8
each
bij ,
1≤i ≤n
and
1≤ j ≤ m,
Then
For i = 1,..., n Assign the value 1 to the elements of B corresponding to the smallest value in the ith row of P; Call Matching_Evaluation(B).
Else
For all i = 1,..., n Set B '= B For all j = 1,..., m set bij = 0 Select the element with the smallest value in P that is not marked 1 in B’ and set it to 1 in B and B’; Call Matching_Evaluation(B); Set B = B '.
If
all the elements in B are marked 1
Then
Set Current _ Phase to K.
Else
add 1 to Current_Phase.
End Procedure: Matching_Evaluation(B) For each valid mapping in B 1- Compute the matching error generated by nodes. 2- Add the error generated by the corresponding edges to the matching error. 3- Save the current matching if the matching error is minimal. Obviously, the best case for the algorithm arises when K = 1 . In this case, each row has one element marked 1 in matrix B and there is one node-to-node mapping to be extracted. To check the edge-to-edge mapping, the algorithm needs O( n 2 ) steps, where n is the number of nodes in the first graph. The complexity of the algorithm is thus O( n 2 ) . In general, the (worst) complexity of the algorithm depends on the number of phases (value of K) used by the algorithm. For a given value K ( 0 ≤ K ≤ m ) , each row in matrix B has K elements marked 1 and there are K n node-to-node mappings to be extracted. To check the edge-to-edge mappings, the algorithm needs O( n 2 ) steps for each mapping. Thus, the complexity of the algorithm is O( n 2 K n ) . Details of the deduction are given in [23] or see the appendix.
9
3. Experimental studies 3.1
Comparisons with Conventional Algorithms
In order to study the effectiveness of the proposed algorithm, we conducted a set of experiments in which the new matching algorithm was compared to the following three algorithms: a.
The procedure involving tree search based on A* proposed by Tsai and Fu, which was used to obtain optimal matching [17].
b.
The eigendecomposition method proposed by Umayama [18].
c.
The genetic-based search proposed by Wang et al. [19].
The experimental conditions and the graph database are inspired by the work of Y. Wang et al. reported in [19]. The graph database contains 100 pairs of graphs (G, H). For each graph G, we randomly assigned a label value between 0 and 100 to each node and edge. Then for each graph G, a graph H was created by shuffling the order of the nodes in graph G and adding a uniformly distributed noise. The noise levels used in the experiment were 0, 5, 10, 15 and 20. Graph sizes of 7, 9, 11 and 13 nodes were tested. The new matching algorithm was tested for K=3. All of the algorithms were compared in terms of speed and optimal matching rate. To evaluate the optimal matching rate, the matching results yielded by the different algorithms should be compared to the results given by the A*-based tree search, which is known as an optimal graph-matching algorithm. However, while the tree search algorithm is capable of finding the optimal matching between two graphs, this algorithm suffers from high computational complexity: in the worst case, it may require exponential time. For this reason, tree search is not suitable in practice. For our experiments, only graphs of order 7 and 9 were used in testing this algorithm. When the tree search was not included in the experiment (11- and 13node cases), all of the other algorithms were compared with each other and the one yielding the smallest matching error was declared optimal. The second algorithm, the eigendecomposition procedure proposed by Umayama [18], employs an analytic (as opposed to combinatorial or iterative) approach to find the optimal matching between graphs. By using the eigendecomposition of the adjacency matrices, a matching close to the optimum one can be found efficiently when the graphs are sufficiently close to each other. This matching algorithm requires a pair of 10
graphs of the same size. For this particular reason, the experiments were conducted for graphs of the same size. However, our algorithm can be applied to graphs of different sizes. In [20] we report the results of a comparison of the new algorithm with the tree search algorithm on graphs of arbitrary size. The graph-matching algorithm based on genetic search proposed in [19] was implemented as an approximate algorithm to find the best matching. Using this graph matching algorithm requires that the value of some parameters be given. In order to keep the experimental conditions constant, the set of parameters) adopted in our implementation (see Table 9) was the same one used in [19]. For more detail on the crossover and mutation parameters, see [19]. Parameter
Value
Number of iterations
200
Population size
30
Selection
Ranking
Inhibition value
4
Crossover
PMX
Mutation
SWAP
Table 9. Parameters of the Genetic Search-based Algorithm The experimental results regarding the speed, optimal matching rate and average matching error are listed in tables 10, 11 and 12. Certain observations clearly emerge from these tables. First, regarding speed: for the highest noise level (NL=20), tree search takes much more time than the others. This is especially apparent in the case of n=9. The new matching algorithm is slower than eigendecomposition and genetic search. Second, in terms of optimal matching rate, all of the algorithms performed well for small graphs (n=7). However, while the matching performance of our new algorithm is similar to that of eigendecomposition and genetic search in the case of small graphs, it is much better for graphs with high noise levels, especially in the cases n=11 and n=13. Finally, in terms of average matching error, all the algorithms have a null average matching error for the lowest noise level (NL=0) since the optimal matching rate obtained is always 100. The eigendecomposition algorithm seems to be near the optimal solution than the genetic search and the new algorithm when all of these algorithms fall to find the optimal one. This is especially apparent in the case of n=13 and NL=20. 11
NL
Eigendecomposition
Genetic Search
New Method
Tree Search
7
9
11
13
7
9
11
13
7
9
11
13
7
9
11
13
0
0.001
0.003
0.001
0.001
0.012
0.025
0.029
0.048
0.003
0.061
1.076
18.60
0.153
85.837
-
-
5
0.001
0.004
0.001
0.011
0.013
0.026
0.031
0.040
0.003
0.062
1.101
18.08
0.148
85.706
-
-
10
0.001
0.009
0.001
0.006
0.012
0.021
0.026
0.040
0.003
0.063
1.093
19.14
0.147
85.668
-
-
15
0.001
0.009
0.003
0.007
0.015
0.029
0.031
0.042
0.001
0.059
1.083
17.70
0.153
85.668
-
-
20
0.001
0.012
0.001
0.007
0.011
0.020
0.028
0.040
0.006
0.057
1.031
17.54
0.146
85.701
-
-
Table 10. Average CPU Running Time (in seconds on a PC Pentium 4, 1Gb internal Memory) NL
Eigendecomposition
Genetic Search
New Method
7
9
11
13
7
9
11
13
7
9
11
13
0
100
100
100
100
100
100
100
100
100
100
100
100
5
100
100
100
100
100
100
100
100
100
100
100
100
10
100
100
100
92
98
100
100
95
100
100
100
98
15
100
92
100
87
99
94
100
81
100
99
100
95
20
95
80
76
72
93
84
80
77
94
92
94
91
Table 11. Optimal matching rate with different algorithms. For graphs of size 7 and 9, the optimal matchings were obtained by A*-based tree search; hence its value is always 100. For graphs of size 11 and 13, the best of the matching obtained by the first three algorithms was considered optimal. NL
Eigendecomposition
Genetic Search
New Method
7
9
11
13
7
9
11
13
7
9
11
13
0
0
0
0
0
0
0
0
0
0
0
0
0
5
140
225
330
445
140
225
330
445
140
225
330
445
10
280
450
660
944
286
450
660
931
280
450
660
923
15
420
694
990
1387
424
692
990
1423
420
676
990
1372
20
572
1050
1520
2004
578
1021
1494
1998
569
980
1364
1901
Table 12. Average matching error with different algorithms. Finally, in order to show the robustness of the new algorithm, we conducted an additional experiment under more general conditions. Each test used as input a pair of graphs that were both randomly generated (rather than generating the second graph from the first as in the previous experiment). For this experiment, 100 12
pairs of graphs were generated. Graphs with 7, 9, 11 and 13 nodes were tested. Table 13 shows the performance of the four algorithms. From this table, we conclude that for the case of small graphs (n=7,9), the performance of the new matching algorithm in terms of matching rate is close to that of tree search (which is optimal); moreover, the new algorithm performs much better than the genetic search and the eigendecomposition algorithms for graphs with 11 and 13 nodes. This experiment demonstrates the capability of the matching algorithm to handle both similar and different pairs of graphs.
Eigendecomposition
Genetic Search
New Method
Tree Search
7
9
11
13
7
9
11
13
7
9
11
13
7
9
11
13
Running Time
0.001
0.015
0.002
0.015
0.016
0.031
0.044
0.056
0.005
0.063
1.235
19.87
0.156
88.84
-
-
Optimal Matching Rate
1
1
1
0
24
22
25
18
82
88
94
89
100
100
-
-
Table 13. Average CPU running time and optimal matching rate for random graphs. Computing conditions and optimality definition are the same as in Table 9. 3.2
Influence of the parameter K
To understand the influence of the principal parameter in our algorithm, the number of phases, K, used to find the best matching, we conducted an experiment in which we varied the value of this parameter from 1 to the size of the graph. The graph database used in this experiment contains 100 pairs of graphs having the same size. We varied the size from 4 to 9. For each graph in the pair, we randomly assigned a label value between 0 and 100 to each node and edge. In each test, we computed the number of pairs in which an optimal matching was reached. The average running time needs in each phase were also computed. Table 14 shows the number of times the optimal matching was reached. The value 73 in row 1 and column 4 means that in 73 cases out of 100, the new algorithm in its first phase finds the optimal matching. From this table, we see that most optimal matchings are obtained using only the first three phases. This is especially apparent in the case n=9. In this experiment and the experiment reported in sub-section 3.1, we have introduced a new technique to speed up the algorithm. We have observed a dramatic improvement in speed. In the first version of the algorithm, we first build the matching, then we check its bijectivity and finally we compute its matching
13
error (see Procedure: Matching_Evaluation). In the new version, the checking procedure is carried out in each step of the building process. Thus, in any step, the partial matching may not be bijective and will be automatically rejected. Moreover, all subsequent matchings containg this partial matching will never be checked. The same idea is applied for computing the matching error. In each step of the building process, when the error of the partial matching is higher than the current matching error, the partial matching will be automatically rejected, along with all subsequent matchings containing this partial matching. Number of phases 1 2 3 4 5 6 7 8 9
4 73 27 0 0 -
5 60 37 3 0 0 -
Number of nodes 6 7 36 31 59 60 5 7 0 1 0 1 0 0 0 -
8 19 65 14 2 0 0 0 0 -
9 30 45 16 3 2 2 1 1 0
Table 14. Optimal matching rate reached in each phase using the new algorithm. For graphs with 9 nodes, the optimal matching is reached in the first phase in 30 cases and in the second phase in 45 cases. Table 15 shows the average running time spent in each phase. For example, when graphs have 9 nodes, the second phase needed only 0.001 seconds to check and evaluate all the matchings generated in this phase. Number of phases 1 2 3 4 5 6 7 8 9
4 0 0 0.001 0.001 -
5 0 0 0.001 0.001 0.001 -
Number of nodes 6 7 0 0 0 0.001 0.001 0.004 0.001 0.009 0.003 0.032 0.004 0.086 0.102 -
8 0 0 0.014 0.059 0.218 1.671 1.822 2.025 -
9 0 0.001 0.054 0.324 1.431 5.127 15.488 41.380 65.633
Table 15. Average CPU runing time for each phase (in seconds on a PC Pentium 4, 1Gb internal memory). For graphs with 9 nodes, phase 9 (last line and last column) needs about 65 seconds to check all the generated mappings. Here, zero means that the time is less than 1 ms. Finally, in Table 16 we report the average running time using the proposed algorithm for graphs with more than nine nodes. In each test, we used a database with 100 pairs of graphs.
14
Number of phases 1 2 3 4 5
10 0.005 0.012 0.104 2.472 27.889
11 0.007 0.078 1.355 17.53 89.56
Number of nodes 12 13 0.006 0.007 0.066 0.054 7.32 18.99 47.56 78.86 104.37 203.56
14 0.008 1.064 54.03 128.77 386.44
15 0.008 0.098 77.83 298.63 723.42
Table 16. Average CPU runing time for each phase (in seconds on a PC Pentium 4, 1Gb internal memory). For graphs with 15 nodes, phase 5 (last line and last column) needs about 723 seconds to check all the generated mappings.
4. Conclusion Although many algorithms have been proposed for computing similarity between graphs by finding graph isomorphisms or subgraph isomorphisms, the algorithms for optimal matching are combinatorial in nature and difficult to use when the size of the graphs is large. In this paper, we have proposed a new graphmatching algorithm for finding error-correcting isomorphisms. The main feature of the algorithm is the decomposition of the search process into K phases. A good matching (optimal in most cases) can be obtained after only a few phases. A comparative study was conducted to show the performance of the new algorithm. Three other algorithms were included in the comparison: eigendecomposition, genetic search and tree search. The eigendecomposition method gives good matchings when the two graphs in the pair are close to each other. However, its performance decreases for larger and less similar graphs. The matching performance of the genetic search solution is also good for similar graphs, but again, its performance, which depends on genetic tool parameters (Table 9), is not very good when the size of the graphs exceeds ten nodes or when the graphs are not very similar. As for the A*-based tree search method, it can find optimal matchings, but it suffers from high complexity and cannot be used for graphs with more than ten nodes. The new algorithm displays good matching performance for both similar and dissimilar graphs. This performance can be considered as a validation of the hypothesis with regard to graph similarity proposed above. It should be pointed out that the new algorithm, which is also an approximate algorithm, exhibits deterministic behaviour determined only by the parameter K. For many graphs having more than ten nodes,
15
the new algorithm can be applied more efficiently than the A*-based tree search method. In fact, it is possible to adapt this algorithm to deal with sparse graph matching while maintaining its computational efficiency. This is one of the issues we are currently studying.
5. Appendix In practice, we have noticed that to find the best matching, only a few phases are needed. Here we will describe the complexity analysis when the algorithm is used with K phases. We will present the complexity analysis for the second and third phases and finally generalize to K phases. In the first phase, matrix B contains one possible node-to-node matching. For this possible matching, the algorithm needs O(n 2 ) steps to check the edge-to-edge matching. The complexity of the algorithm at this stage is O(n 2 ) . In the second phase, the algorithm updates each row in matrix B according to matrix P. To do this, it runs the updating process n times. After updating the first row, matrix B has one element in each row, so there is only one node-to-node mapping. After updating the second row, matrix B is as follows: in the first row, two elements are marked 1, the other rows have one element marked 1, and so on. At the end of phase 1, there are 2 n − 1 node-to-node mappings. The complexity of the algorithm at this stage is O(n 2 (2n − 1)) .
2 0 + 21 + 2 2 + 2 3 + .... + 2 n−1 =
(2 n − 1) = 2n − 1 2 −1
In the third phase, after updating the first row, matrix B has one element marked 1 in the first row and two elements marked 1 in the other rows, so there are
2 n −1 node-to-node mappings. After updating the second
row, matrix B is as follows: in the first row, three elements are marked 1; the second row has one element marked 1 and the other rows have two elements marked 1, so there are 3 × 2 n−2 , and so on. At the end of the second phase, there are 3n − 2 n node-to-node mappings. The complexity of the algorithm at this stage is
O(n 2 (3n − 2 n )) .
16
3 ×2 0
n −1
+3 ×2 1
n− 2
+3 ×2 2
n −3
+ .... + 3
n−1
×2 = 2 0
n−1
3 ([ ]n − 1) 2 = 3n − 2 n 3 −1 2
In phase K, there are K n − ( K − 1) n node-to-node mappings; for each mapping, the algorithm needs O(n 2 ) . The complexity of the algorithm for phase K is O(n 2 ( K n − ( K − 1) n ) . Thus, the complexity of the algorithm for K phases is the sum of the complexities for the individual phases:
O(n 2 (1 + (2 n − 1) + (3n − 2 n ) + ... + K n − ( K − 1) n ) = O(n 2 K n )
Acknowledgement This work has been supported by a Strategic Research Grant from the Natural Sciences and Engineering Research Council of Canada (NSERC) to the team composed of A. Bernardi, F. Dubeau, J. Vaillancourt, S. Wang and D. Ziou. Dr. S. Wang is also supported by NSERC via an individual research grant.
References 1.
J.R. Ullman, An algorithm for subgraph isomorphism, Journal of the ACM, vol. 23, no. 1, January
1976, pp. 31-42. 2.
D. G. Corneil and C. G. Gotlieb. An Efficient Algorithm for Graph Isomorphism, Journal of the
Association for Computing Machinery, vol. 17, no. 1, January 1970, pp. 51-64. 3.
J. Lladós. Combining Graph Matching and Hough Transform for Hand-Drawn Graphical
Document Analysis. http://www.cvc.uab.es/~josep/articles/tesi.html. 4.
A. Sanfeliu and K.S. Fu, A Distance Measure between Attributed Relational Graphs for Pattern
Recognition. IEEE Trans. on SMC, vol. 13, no. 3, May/June 1983. 5.
H. Bunke, Error Correcting Graph Matching: On the Influence of the Underlying Cost Function,
IEEE Trans. on PAMI, vol. 21, no. 9, Sept. 1999. 6.
B. Huet, A. D. J. Cross and E. R. Hancock. Graph Matching for Shape Retrieval.
http://citeseer.nj.nec.com/322877.html
17
7.
J. Rocha and T. Pavildis. A Shape Analysis Model with Applications to a Character Recognition
System. IEEE Trans. on PAMI, vol. 16, no. 4, September 1994. 8.
A. K. C. Wong, S. W. Lu and M. Rioux. Recognition and Shape Synthesis of 3-D Objects Based
on Attributed Hypergraphs. IEEE Trans. on PAMI, vol. 11, no. 3, March 1989. 9.
E. G. M. Petrakis, C. Faloutsos. Similarity searching in medical image databases. IEEE Trans. On
Knowledge and Data Engineering, vol. 9, no. 3, May/June 1997, pp. 435-447. 10.
W. J. Christmas, J. V. Kittler and M. Petrou. Structural matching in computer vision using
probabilistic relaxation. IEEE Trans. on PAMI, vol. 17, August 1995. 11.
C. Liu, K. Fan, J. Horng and Y. Wang. Solving Weighted Graph Matching Problems by Modified
Microgenetic Algorithm. IEEE International Conference on SMC, 1995, pp. 638-643. 12.
J. Horng, C. Kao and G. Chen. An Error-Tolerance Genetic Algorithm for Traveling Salesman
Problems. Proceedings of the IEEE International Conference on SMC, 1995, pp. 795-799. 13.
L.P. Cordella, P. Foggia, C. Sansone and M. Vento. An efficient algorithm for the inexact
matching of ARG graphs using a contextual transformational model. Proceedings of the 13th International Conference on Pattern Recognition, pp. 180-184, 1996. 14.
L.P. Cordella, P. Foggia, C. Sansone, and M. Vento. Subgraph transformations for inexact
matching of attributed relational graphs. Computing 12(Suppl.), Springer, pp. 43-52, 1997. 15.
B.T. Messmer. Efficient Graph Matching Algorithms for Preprocessed Model Graphs. Thesis,
University of Bern, 1996. http://citeseer.nj.nec.com/114601.html 16.
B.T. Messmer and H. Bunke. Fast Error-correcting Graph Isomorphism Based on Model
Recompilation. http://citeseer.nj.nec.com/messmer96fast.html 17.
W.H. Tsai and K.S. Fu. Error-Correcting Isomorphisms of Attributed Relational Graphs for
Pattern Analysis. IEEE Trans. on SMC, vol. 9, no. 12, December 1979. 18.
S. Umeyama. An Eigendecomposition Approach to Weighted Graph Matching Problems. IEEE
Trans. on PAMI, vol. 10, no. 5, September 1988. 19.
Y. Wang, K. Fan and J. Horng. Genetic-Based Search for Error-Correcting Graph Isomorphism.
IEEE Trans. on SMC, vol. 27, no. 4. August 1997.
18
20.
A. Hlaoui and S. Wang. A new algorithm for inexact graph matching. In 16th International
Conference on Pattern Recognition, Quebec, Canada, 2002. 21.
B. Huet, A. D. J. Cross and E. R. Hancock. Shape Retrieval by Inexact Graph Matching. ICMCS,
vol. 1, 1999, pp. 772-776. http://citeseer.nj.nec.com/325326.html 22.
B. Huet and E. R. Hancock. Inexact Graph Retrieval. http://citeseer.nj.nec.com/huet99inexact.html
23.
A. Hlaoui and S. Wang. Image Retrieval Systems Using Graph Matching. Research Report, No.
275, Département de mathématiques et d’informatique, Université de Sherbrooke, 2001. 24.
D. Conte, P. Foogia, C. Sansone and M. Vento. Thirty years of graph matching in pattern
recognition. International Journal of Pattern Recognition and Artificial Intelligence. Vol. 18, No. 3 (2004) 265-298. 25.
A. C. M. Dumay, R. J. Van der Geest, J. J. Gerbrands, E. Jansen and J. H. C. Reiber. Consistent
inexact graph matching applied to labelling coronary segments in arteriograms, in Proc. Int. Conf. Pattern Recognition, Conf. C. (1992) 439-442. 26.
B. Luo and E. R. Hancock, Structural graph matching using the EM algorithm and singular value
decomposition. IEEE Trans. Patt. Anal. Mach. Intell. 23 (2001) 1120-1136.
19