On Accelerating Pattern Matching for Technology Mapping - cs.York

0 downloads 0 Views 96KB Size Report
That is be- cause patterns are decomposed into 2-AND/NOT patterns to match ... our pattern matching algorithm, and in section 3 and in section 4, ... The key idea of this .... naive solution is not to consider symmetry at all, but is not practical.
On Accelerating Pattern Matching for Technology Mapping Yusuke Matsunaga [email protected] Fujitsu Laboratories Limited

Abstract

2

Pattern matching algorithm is simple and fast comparing to other matching algorithms such as Boolean matching. One major drawback of the pattern matching is that there is a case where a cell needs a lot of patterns representing its logic function. That is because patterns are decomposed into 2-AND/NOT patterns to match against decomposed subject graphs. Furthermore, the conventional technology mapper does not pay much attention to relations among patterns. Each pattern is tried to match independently. In this paper, a novel pattern matching algorithm that does not require patterns to be decomposed and couple of speeding up techniques utilizing inter-relations among cells are described. These methods are very effective for large cell libraries with complex cells. Experimental results show that our methods gain matching time up to 40 times faster.

Our algorithm is also based on the conventional pattern matching (graph matching), however, it does not require to decompose patterns into 2-AND/NOT patterns. Our algorithm tries to match N AND/NOT patterns against 2-AND/NOT subject graphs directly. This reduces both CPU time and memory requirement greatly especially for complex cells with many inputs. The key idea of this algorithm is that if we know the rules of the target pattern structure we can find matches for the pattern without knowing the actual structure of the pattern. For example, the rules for k-input AND pattern is as follows; “If a tree subgraph has k leaves and no single inverter appears on any path from the root to a leaf, the subgraph matches with k-input AND.” When k becomes larger, the number of 2-AND/NOT patterns that represents k -input AND blows up, however, the rule is the same. So, it is better to use rules rather than patterns themselves. Figure 1 shows a simple algorithm that finds matches for N -input AND.

1

Introduction

Pattern matching algorithm is widely used for practical technology mapping programs[1, 2]. It is simple and fast comparing to other matching algorithms such as Boolean matching. One major drawback of the pattern matching is that there is a case where a cell needs a lot of patterns representing its logic function. That is because patterns are decomposed into 2-AND/NOT patterns to match against decomposed subject graphs. Indeed, the number of the patterns grows exponentially according to the number of inputs in the worst case. Recently, cell libraries that contain cells with many inputs and complex logic functions are provided. The conventional pattern matching algorithm is not enough to handle them. Furthermore, the conventional technology mapper does not pay much attention to relations among patterns. Each pattern is tried to match independently1 . This leads linear growth in CPU time according to the number of cells in a library. These points were not deeply considered when the algorithm was developed because cell libraries were not so large. However, current technology provides libraries with large sets of complex cells. These points are not to be ignored. In this paper, we propose a novel pattern matching algorithm that does not require patterns to be decomposed and couple of speeding up techniques utilizing inter-relations among cells, which is effective for large set of cell libraries. In section 2, we describe our pattern matching algorithm, and in section 3 and in section 4, we describe speeding up techniques using isomorphic relations and containment relations, respectively. We show the experimental results in section 5. 1 In [3], an algorithm whose run-time is independent to the size of the pattern set is described, however, this algorithm has exponential space complexity for some set of patterns.

Implicit pattern matching algorithm

FindAndMatch(Stack of Nodes F , int n) f 1: if (n = 0) f 2: /* A match is found. */ 3: return; g 4: 5: s Pop(F ); 6: Bind s to the n-th input; 7: FindAndMatch(F , n 0 1); 8: Unbind s; 9: while (s is not an AND) f 10: /* Skipping inverter-chains */ 11: if (s is not an INVERTER) f 12: goto end; g 13: 14: t the fanin of s 15: if (t is not an INVERTER) f 16: goto end; g 17: 18: s the fanin of t 19: g 20: u; v the fanins of s; 21: Push(F; u); 22: Push(F; v ); 23: FindAndMatch(F , n); 24: Pop(F ); /* return value is discarded */ 25: Pop(F ); 26: end: 27: Push(F; s);

g

Figure 1: An implicit matching algorithm for n-input AND The first argument F is a stack holding ’frontier’ nodes. The next argument n is the number of fanins remaining unbound. So, in the beginning, we push the fanin nodes of the root node r in a

stack F , and then call FindAndMatch(F , N ). At each recursion step, we first try to bind the top node s of the stack F to the n-th input and call FindAndMatch recursively. If node s is an AND gate, s can be an internal node as well as an input node. In such case, the fanins of s are pushed into the stack F and become new frontiers instead of s. As for the case of complex gates such as AND-OR-INVERTER, we need more complicated matching procedure(figure 2). Simply speaking, with invoking FindAndMatch for each node in the pattern graph, we can enumerate all the matches. For this procedure, each node in the pattern graph has a stack, which we call the frontier stack. This stack is used for holding frontier nodes and is similar to the stack used in FindAndMatch. We denote the frontier stack of node p as Fp . Each node also keeps the number of remaining unbound fanins, which is denoted as np . We need another stack called the node stack, which holds pattern graph nodes that we are going to process. Fisrt, we invoke FindMatch with the root node of the pattern graph as the first argument. Line 1 to 12 treat the case when all the fanins of p is bound. If there is no node remaining in the node stack N , a match is found. Otherwise, we move to another node at the top of the stack. At line 13, we get node s at the top of the frontier stack of p. There are two choices. One is to bind s with a fanin of p, The other is to extend the frontier across s. In line 14 to 32, we treat the former case. If a fanin of p is not compatible with s (i.e. node types are different), they can not be bound. Otherwise, we try to bind them. If that node is an input, FindMatch is called simply(line 19). If not, we move to the fanin node q and push p, which means p must be processed again. Then the fanins of s are pushed into the frontier stack of q , and FindMatch is called(line 21–28) Line 33 to 49 are similar to FindAndMatch, line 33 to 43 handle inverter chain, and line 44 to 49 extend the frontier to the fanins of s. FindMatch exactly enumerates all the matches that the conventional graph matching algorithm enumerates. As for the complexity of the algorithm, we have not analyzed it theoretically. However, experimental results show that their avarage performance in the practical field is superior to the conventional algorithm. Memory overhead is negligible. The maximum size of the frontier stack of each pattern graph node is equal to the number of its fanins, and the maximum size of the node stack is equal to the number of nodes in the pattern graph. 3

Speeding up using isomorphism

The second feature of our method is about isomorphic patterns. For example, the patterns of 2-NAND cell and 2-NOR cell are identical without regarding inverters at inputs and outputs(in this paper, we call they are isomorphic.). Suppose we use the inverter chain heuristic, i.e. at every wire, a pair of inverters is inserted. If there is a match with 2-NAND cell, there is also a match with 2-NOR cell. In this case, we do not need to call pattern matcher twice. We can easily get the match with 2-NOR cell from the match with 2-NAND cell with inverting inputs and outputs. This isomorphic relation forms equivalent classes. So, we only need to try to match with the representative pattern of each equivalent class. Matches of the other patterns are derived from the match of the representative pattern with modifying inputs and output polarities(i.e. including or excluding inverters). This is the intuitive idea of speeding up using isomorphism. However, the real things are not so simple because of the symmetry consideration. Generally, pattern matching algorithm takes care of the symmetry inputs, so that redundant equivalent matches are not generated2 The problem is that the symmetry 2 Symmetry inputs may have different delay information. So we should take care of pin assignments among the symmetry inputs in the case of delay optimization. However, this does not require enumerating all the permutation of the symmetry inputs[4].

FindMatch(pattern graph node p, Stack of Nodes N ) f 1: if (np = 0) f 2: if (N is empty) f 3: /* A match is found. */ 4: Record the current binding as a match; 5: return; g else f 6: 7: p Pop(N ); 8: FindMatch(p, N ); 9: Push(N; p); 10: return; g 11: 12: g 13: s Pop(Fp ); 14: foreach unbound fanin q of p f 15: if (q is not compatible with s) continue; 16: Bind q with s; np np 0 1; 17: 18: if (q is an input) f 19: FindMatch(p, N ); g else f 20: 21: Push(N; p); u; v the fanins of s; 22: 23: Push(Fq ; u); 24: Push(Fq ; v ); 25: FindMatch(q, N ); 26: Pop(Fq ); 27: Pop(Fq ); 28: Pop(N ); g 29: 30: Unbind q ; np np + 1; 31: g 32: 33: while (s is not an AND) f 34: /* Skipping inverter-chains */ 35: if (s is not an INVERTER) f 36: goto end; g 37: 38: t the fanin of s 39: if (t is not an INVERTER) f 40: goto end; g 41: 42: s the fanin of t 43: g 44: u; v the fanins of s; 45: Push(Fp ; u); 46: Push(Fp ; v ); 47: FindMatch(p, N ); 48: Pop(Fp ); 49: Pop(Fp ); 50: end: 51: Push(Fp ; s); 52: Push(N; p);

g

Figure 2: Implicit pattern matching algorithm

inputs are not always the same in the same isomorphic group. For example, the pattern (a) and the pattern (b) in figure 3 are isomorphic. The difference between (a) and (b) is the polarity of input C.

A B C

X

(a)

A B C

X A->A,B->B,C->C

(b)

Figure 3: Isomorphic patterns

(a)

A->C,B->A,C->B

(b)

A->A,B->C,C->B 4

1

5

2

6

3

Figure 5: The mapping graph from (a) to (b)

Figure 4: A subject graph

Suppose to match them against the subject graph in figure 4. The pattern (a) has a match like A ! 1; B ! 2; C ! 3. Because inputs A, B and C are symmetric, the other matches such as A ! 2; B ! 3; C ! 1 are not considered. From this match, the match for (b) are derived with inverting C . So we get the match for (b), A ! 1; B ! 2; C ! 6. Of course, this match is a valid match for (b). However, there are two other distinctive matches for (b), A ! 2; B ! 3; C ! 4 and A ! 1; B ! 3; C ! 5, which is not derived simply. That is because inputs A,B and C are not symmetric for pattern (b), while they are symmetric for pattern (a). One naive solution is not to consider symmetry at all, but is not practical because all the permutations of symmetric inputs is considered. The portion we should take care of is the mapping (a) to (b), i.e. the mapping from the representative pattern. In the case of patterns in figure 3, there is a mapping from (a) to (b), A ! A; B ! B; C ! C . Because A,B , and C in (a) are symmetric, this is the only distinctive mapping from (a) to (b). So we can get only one match for (b) against the subject graph in figure 4. Here is the point where the problem lies. To derive mappings to the representative patterns, we should not consider the symmetry of the representative patterns. From this observation, we keep all the mappings to the representative pattern that do not consider the symmetry of the representative pattern. Figure 5 shows the mappings from (a) to (b) in figure 3. The symmetry in pattern (b) is still considered, so we have only 3 mappings (not 6). With these 3 mapping, we can get the complete matches for (b). Figure 6 shows the code fragment of pattern matching algorithm considering isomorphic relations. At first, matching with a representative pattern p is explored. If no match is found, then we skip the operation for the rest isomorphic patterns. If some matches are found, then matches for the rest isomorphic patterns are generated using the mapping graph. In both cases, it is more efficient than the conventional algorithm that calls generate all matches() for every patterns. Selection of the representative patterns is crucial for efficiency. Suppose the example in figure 3 again. If pattern (b) is the representative pattern, we have three distinctive matches against the subject graph in figure 4, A ! 1; B ! 2; C ! 6, A ! 2; B !

. . /* p is a representative pattern */ generate all matches(p); if (p has matches) f foreach match m f foreach mapping  from p f q pattern mapped to with ; n (m); /* A match n for q is found */

g

g

g

. .

Figure 6: Pattern matching considering isomorphic relations

; C ! 4, and A ! 1; B ! 3; C ! 5. Then, we get matches for pattern (a) from these matches3 , A ! 1; B ! 2; C ! 3, A ! 2; B ! 3; C ! 1, and A ! 1; B ! 3; C ! 2. These three matches are equivalent and thus redundant. Redundant matches do not improve mapping results but spend computation time. Furthermore, finding matches for (b) takes much time than finding matches for (a), because the search space is larger due to the asymmetry. So, pattern (a) is preferable for the representative pattern. From this observation, we choose a pattern that has the least number of symmetric input groups for the representative pattern. 3

4

. . Q patterns with no predecessors; while(Q is not empty) f p Get(Q); f Marked; foreach predecessor q of p f if (q is not marked) f f NotMarked; break

g

Speeding up using containment relations

4.1

if (f = Marked) f generate all matches(p); if (p has matches) f Set mark to p; foreach successor r of p f Put(Q; r);

Pruning using containment relations

The last feature of our method is utilizing containment relations among patterns. In this paper, we call a subgraph that contains the output node as a root subgraph. If a pattern P is a root subgraph of another pattern Q, we call the pattern P is contained in the pattern Q 4 . If a pattern does not have any match on some node of a subject graph, other patterns that contain the pattern also do not have any match on the node. Using this observation, we can prune hopeless matching effort. We build a directed acyclic graph for this pruning just after generating patterns for cell library. This graph, called the Containment Relation Graph(CRG), represents containment relations among patterns. Each node of the graph corresponds to a pattern, and a directed edge from node P to node Q represents that P is contained in Q. Figure 7 shows an example of CRG with five patterns.

OAI21

NAND4

OAI22

g

g

g

g

. .

Figure 8: Pruning using containment relations the containment relations. In the case of non-decomposed patterns, however, it is not enough. For example, NAND3 contains NAND2, but 2-input AND node does not match 3-input AND node in usual way. So, we modify the rule of matching for containment check. Roughly speaking, n-input node matches m-input node if n  m without exceptions. An exception arises in the case of complex gates. Intuitively, AOI21(f = AB + C ) seems to be contained in AOI211(f = AB + C + D)(figure 9).

NAND2

NAND3

g

A F

B C

Figure 7: The containment relation graph

(a)AOI21

The containment relation is transitive, i.e. A ! B and B ! C lead A ! C . So, we do not need to keep such composed (indirect) relations. In the example of figure 7, the dotted edge from NAND4 to NAND2 is redundant, and thus removed. If there is a directed path from pattern P to pattern Q, P contains Q. So, before trying to match with P , we check whether all the predecessors of P have matches. If there is a predecessor pattern having no matches, it is obvious that P has no matches. Figure 8 shows a code fragment using this pruning.

A B C

F

D (b)AOI211 Figure 9: Two and-or-inverter cells

4.2

Containment relations for non-decomposed patterns

To check pattern p is a root subgraph of pattern q , we use pattern matching algorithm, i.e. if there is a match with p at the output node of q , then p is a root subgraph of q . This is the basic definition of 3 In the case of mapping from (b) to (a), there is only one mapping A

!

C , because all the inputs of (a) is symmetric. 4 The exact definition for non-decomposed patterns is described later.

B; C

!

A; B

!

However, it is not. Figure 10 shows two decomposed patterns of AOI211. The pattern of AOI21 is a root subgraph of pattern (b), but is not a root subgraph of pattern (a). This means that if a subject graph contains pattern (a) of figure 10, node 3 in the graph has a match with AOI211 but does not have a match with AOI21. So, in this sense, AOI211 does not contain AOI21.

Table 2: Benchmark circuits

A B

1

2

C

3

F

D (a) A B C

4 6

5

F

D (b) Figure 10: Two decomposed patterns of AOI211

The key point is that 2-input AND node at the output of AOI21 has a fanin to internal node (inverters are ignored). This fanin prevent to match 3-input AND as in the case of pattern (a) of figure 10. The exact rule for matching for containment check is as follows. Let type(v ) be the type of node v and ni(v ) be the number of node v.

name 9symml C1355 C1908 C2670 C3540 C432 C499 C5315 C6288 C7552 C880 apex6 apex7 b9 f51m rot z4ml des Total

# of nodes 160 182 233 435 901 153 182 1105 1731 1127 300 584 175 89 64 476 28 2745 10670

CRG. The rests of the columns denote CPU time spent for matching. Again, the column named “2-AND” corresponds to the conventional matching algorithm with 2-AND/NOT patterns. The column named “no-decomp.” corresponds to the implicit matching al1. type(v ) = primary input, or gorithm proposed in section 2, and the last column named “+isom. +cont.” corresponds to the implicit matching with speeding up 2. type(v ) = type(u) ^ ni(v ) = ni(u), or using isomorphic/containment relations proposed in section 3 and section 4. 3. type(v ) = type(u) ^ ni(v )  ni(u) The results are obvious. For “44-3” and “indust.” libraries, re^ all fanins of v is primary inputs markable gain is achieved with the implicit matching. Also, speeding up technique using isomorphic/containment relations is effec5 Experimental results tive for all libraries. It is worth noting that these gains are not constant, i.e. library become larger, more gain is achieved. As a result, To demonstrate the efficiency of our method, we had some expermatching time does not increase linearly according to the size of liiments. Table 1 shows statistics of cell libraries we used in the brary. Preprocessing time is also a crucial problem for technology experiments. mapping. Since we do not need to decompose patterns, preprocessLib2, 43-5, 44-3, and 44-6 are taken from sis distribution. ing time of our method is not so much and memory requirement is The library named “indust.” is a real cell library, which has many less than the conventional algorithm. As for “44-6”, the preprocessNPN-equivalent cells. The second row in table 1 (# of cells) deing time is relatively long. But it requires about 60 seconds to read notes the number of cells in each library, and the third row (# of that library, so the preprocessing time is thought to be reasonable. rep. pat.) denotes the number of representative patterns. The fourth One may claim that since Boolean matching[5, 6, 7, 8] does row denotes the number of decomposed 2-AND patterns. Each cell not need pattern graphs, it is faster for cell with many inputs. In library has distinct characteristic. In the case of libraries from sis general, however, it is not true. Boolean matching requires enuexcept 43-5, the number of representative patterns are just half of meration of cluster functions. Cluster is a rooted sub-graph in the the number of cells, because they have dual cells. The “indust.” subject graph, and is a potential candidate of a match. We can not library has many NPN-equivalent cells, so the number of represendetermine which cluster has a match without Boolean matching, so tative patterns is about one fifth of the number of cells. And it we must enumerate all the clusters. Enumerating clusters and makcontains cells with many inputs, so the number of 2-AND patterns ing BDDs[9] for those clusters take much time as well as Boolean is much larger compared with the other libraries. matching. In [8], experimental results shows that about 90% of the The experiments were done as follows. First a set of MCNC enumerated clusters have no maches. Furthermore, the number of benchmark circuits were optimized using sis with script.rugged, clusters increases rapidly according to the number of inputs. So it then each circuit was decomposed into 2-AND/NOT network. We is not suitable for cells with many inputs. applied various matching algorithms on such networks. Table 2 shows the number of 2-AND nodes of those circuits. We used Ul6 Conclusion tra Sparc (300MHz) for the experiments. Table 3 shows the results of the experiments. The first column denotes the total number We developed a novel method to speed up pattern matching and of matches against the benchmark circuits. The next two columns achieved significant improvement with benchmark and real cell lishow the CPU time spent for pattern generation. The column named braries, which is up to 40 times faster comparing with the conven“2-AND” corresponds to decomposed 2-AND/NOT patterns, and tional method. Our contribution is that, the column named “no-decomp.” corresponds to non-decomposed patterns. This includes checking isomorphic patterns and building 1. An implicit pattern matching algorithm.

Table 1: Cell libraries name # of cells # of rep. pat. # of 2-AND pat.

lib2 24 13 32

43-5 395 352 835

44-3 624 312 4512

44-6 3502 1751 12630

indust. 108 27 5735

Table 3: Matching results library name lib2 43-5 44-3 44-6 indust.

# of matches 52020 75629 84390 102898 446756

CPU time for pattern generation 2-AND pat no-decomp. 0.03 0.00 9.41 1.47 100.92 1.51 764.41 46.59 47.13 0.08

2. Speeding up using isomorphic relation. 3. Speeding up using containment relation. Each technique is mutual complementary and totally they are very effective. For some libraries, implicit matching algorithm gains much, and for some other libraries, isomorphic and containment relation techniques gain much. References

[1] K. Keutzer, “DAGON:Technology Binding and Local Optimization by DAG Matching”, in Proc. 24th Design Automation conf., pp. 341-347, June 1987. [2] R.L. Rudell, “Logic Synthesis for VLSI Design”, PhD thesis, UCB/ERL M89/49, April 1989. [3] C. Hoffman and M. O’Donnell, “Pattern matching in trees”, Jour. ACM, 29(1):68-95, January 1982. [4] H.J. Touati, “Performance-Oriented Technology Mapping”, PhD thesis, UCB/ERL M90/109, November 1990. [5] F. Mailhot and G. De Micheli, “Algorithms for Technology Mapping Based on Binary Decision Diagrams and on Boolean Operations” Technical Report of Stanford University, No. CSL-TR-91-486, August 1991. [6] J.B. Burch and D.E.Long, “Efficient boolean function matching”, In Proceedings of ICCAD, pp. 408–411, November 1992. [7] H. Savoj ,M.J. Silva, R.K. Brayton and Alberto SangiovanniVincentelli, “Boolean Matching in Logic Synthesis”, In Proceedings of Euro-DAC, pp. 168–174, September 1992. [8] Y. Matsunaga, “A new Algorithm for Boolean Matching Utilizing Structural Information”, in Proceedings of SASIMI’93, pp. 366–373, October 1993. [9] R.E. Bryant, “Graph-based algorithms for boolean function manipulation”, IEEE Transactions on Computer, C-35(12), 1986.

CPU time for matching 2-AND pat no-decomp. +isom. +cont. 5.74 6.56 5.34 53.23 41.59 9.72 295.97 59.44 9.98 878.31 330.68 23.20 395.94 20.14 11.86