646
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 5, MAY 2003
but the main arguments hold and hierarchical DRC programs have been replacing flat DRC programs in the market [4]. For hierarchical timing analysis, one set of simplifying assumptions might be that the design contains only one clock, no transparent latches, and no purely combinatorial paths through cells. Starting with the leaf cells, analyze all paths between flip-flops and report any errors, then generate abstracts. There are (at least) two possible strategies for abstracts: deriving a required time and/or an arrival time for each pin, or including in the abstract all logic between each pin and the first clocked element. In either case then, the algorithm works its way up the hierarchy, substituting abstracts for the lower level cells, doing the timing analysis, and then computing the abstracts for use by the next higher level. For either abstraction strategy, the size of the abstract will be proportional to the number of pins. From Rent’s rule [6], the number of pins on a block of n primitives is proportional to na , where a is Rent’s exponent and ranges from about 0.5 to 0.7. Therefore, to get linear time analysis overall, timing analysis and abstract generation must complete in time nb , where b < 1=a (roughly 1.42 if a = 0:7). Timing analysis, which is nominally O(N ), easily satisfies this constraint. Under real world conditions, however, hierarchical timing analysis is not uniformly advantageous. Multiple clocks, or combinatorial paths through blocks, can cause the timing abstract to grow more quickly than the number of pins, and, in fact, can result in a > 1. The presence of transparent latches can make the abstract larger and harder to compute. It can be very difficult and time consuming to generate a correct abstract for a cell with false and multicycle paths, especially if these paths cross hierarchical boundaries. Therefore, although hierarchical timing analysis is certainly used, it is mainly because of the other advantages mentioned in the introduction, not because of the efficiency arguments of this paper.
ACKNOWLEDGMENT The author wishes to thank the reviewers for their extremely helpful comments. REFERENCES [1] H. C. Baird, “Fast algorithms for lsi artwork analysis,” in Proc. 14th Annu. Design Automation Conf., 1977, pp. 303–311. [2] J. Bentley and T. Ottmann, “The complexity of manipulating hierarchically defined sets of rectangles,” Dept. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, CMU-CS-81–109, 1981. [3] J. L. Bentley and T. Ottman, “The complexity of manipulating hierarchically defined sets of rectangles,” in Proc. 10th Symp. Math. Found. ˇ Comput. Sci., vol. 118, J. Gruska and M. Chytil, Eds., Strbké Pleso, Czechoslovakia, 1981, pp. 1–15. [4] R. Goering, “For chip mainstream, it’s the year of 0.25,” EE Times, p. 1, Jan. 4, 1999. [5] S. C. Johnson, “Hierarchical design validation based on rectangles,” in Proc. Conf. Adv. Res. VLSI, 1982, pp. 97–100. [6] B. Landman and R. Russo, “On a pin versus block relationship for partitions of logic graphs,” IEEE Trans. Comput., vol. C-20, pp. 1469–1479, Dec. 1971. [7] U. Lauther, “An o(nlogn) algorithm for boolean mask operations,” in Proc. 18th Annu. Design Automation Conf., 1981, pp. 555–562. [8] P. Losleben, “Design validation in hierarchical systems,” in Proc. 12th Annu. Design Automation Conf., 1975, pp. 431–438. [9] L. K. Scheffer, “A methodology for improved verification of vlsi designs without loss of area,” in Proc. 2nd Caltech Conf. VLSI, C. Seitz, Ed., Jan. 1981, pp. 299–309. , “The use of strict hierarchy for the verification of integrated cir[10] cuits,” Ph.D. dissertation, Dept. of Elect. Eng., Stanford Univ., Stanford, CA, 1984. [11] T. Whitney, “Description of the hierarchical design rule filter,” Dept. of Comput. Sci., California Inst. Technol., Pasadena, CA, SSP File #4027, 1980.
VI. CONCLUSION Under conditions that are sometimes achievable in practice, hierarchical checking has performance O(N ) in the size of the expanded hierarchy, which is the best order possible.
Optimal Circuit Clustering for Delay Minimization Under a More General Delay Model C. N. Sze and Ting-Chi Wang
APPENDIX Why is hierarchical DRC NP-complete? Here’s a sketch of the proof from [2]. The integer knapsack problem is known to be NP-complete. Given a set I of integers {I0 ; I1 ; . . . ; IN01 }, is there any subset of these that adds up to S ? Hierarchical DRC is a more complex problem, but surely must be capable of answering the question, “Do layer 1 and layer 2 overlap anywhere in the design?” since this is an operation required in the verification of almost all IC processes. Convert any integer knapsack problem to a hierarchical DRC problem as follows. First build a cell C0 that contains two unit squares on layer 1, one at (0,0) and one at (1, I0 ). Then build a sequence of cells C1 ; C2 ; . . . ; CN01 such that cell Cj has two copies of cell j Cj01 , one at (0,0) and one at (2 ; Ij ). Finally, build one cell Cn that has one copy of CN01 at (0,0) and a rectangle on layer 2 from (0,S) to (2N ; S + 1). The DRC problem now contains overlap of layer 1 and layer 2 if, and only if, there is some subset of I that sums to S . The construction works since there is a unit square of layer 1 with a Y coordinate corresponding to each possible subset of I . This construction is demonstrated in Fig. 4, with I = f3; 4; 7g and S = 9. Diagram (a) shows the construction of C0 , (b) C1 , and so on. The final diagram (e) shows the expanded hierarchy. Since there is no overlap of layers 1 and 2, there is no subset of I that sums to 9.
Abstract—This paper considers the area-constrained clustering of combinational circuits for delay minimization under a more general delay model, which practically takes variable interconnect delay into account. Our delay model is particularly applicable when allowing the back-annotation of actual delay information to drive the clustering process. We present a vertex grouping technique and integrate it with the algorithm (Rajaraman and Wong, 1995) such that our algorithm can be proved to solve the problem optimally in polynomial time. Index Terms—Partitioning, performance optimization, physical design, timing optimization, very large scale integration (VLSI).
I. INTRODUCTION Circuit clustering is to assign circuit elements into a number of clusters under different design constraints, such as area and/or pin conManuscript received May 20, 2002; revised November 3, 2002. This paper was recommended by Associate Editor M. D. F. Wong. C. N. Sze is with the Department of Electrical Engineering, Texas A&M University, College Station, TX 77843-3259 USA (e-mail:
[email protected]). T.-C. Wang is with the Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan, R.O.C. (e-mail:
[email protected]). Digital Object Identifier 10.1109/TCAD.2003.810746
0278-0070/03$17.00 © 2003 IEEE
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 5, MAY 2003
straints [1]–[4]. In this way, the circuit clusters are smaller compared to the original circuit and manipulation of these clusters is easier. Circuit clustering algorithms usually aim at minimizing either the circuit delay or the intercluster connections. In this paper, we focus on the problem of combinational circuit clustering for delay minimization subject to area constraints. This problem is first studied in [4]. The authors formulate the problem in the unit delay model, in which no delay is associated with any gate or with any connection within a cluster and a unit delay is assigned to each intercluster connection. They propose a polynomial time algorithm to solve the problem optimally. Recently, most researches adopt the general delay model [3], in which each gate is associated with a delay value, no delay is for each connection within the same cluster, and a constant delay is for each intercluster connection. An algorithm for the circuit clustering problem under the general delay model is proposed in [1]. It is proved that the algorithm can optimally solve the problem in polynomial time. However, when wire delay starts to dominate the total delay in a circuit, the general delay model is no longer capable of handling more practical problems in current technology. Hence, it is necessary for the delay model to be more sophisticated such that a delay value is also associated with each connection within a cluster. As a result, we propose a new delay model in this paper, which is practical for the delay back-annotation techniques, in which the actual delay information of the circuit after place and route is fed back to drive the clustering process. In fact, our delay model is more general because the general delay model adopted in [1] is a special case of our model (with all connections within the same cluster set to zero). We demonstrate several trivial extensions of the algorithm in [1] and show that they cannot optimally solve the circuit clustering problem under our proposed delay model. Details are discussed in Section III. Besides, we present a vertex grouping technique, and integrate it with the algorithm in [1] such that our algorithm can be proved to optimally solve the area-constrained combinational circuit clustering problem for delay minimization under our delay model in polynomial time. This paper is organized as follows. The next section gives the problem definition. In Section III, we present a brief review of the algorithm in [1] and several trivial extensions. Section IV describes our vertex grouping technique while the overall algorithm is discussed in Section V. Analysis of the algorithm and conclusion are included in the last two sections. II. PROBLEM DEFINITION A combinational circuit can be represented as a directed acyclic graph G = (V; E ). V is the set of vertices which represent the functional blocks (e.g., gates) in the graph and E is the set of edges which stand
for the connections among the blocks. In the graph, primary input (PI) vertices are those with outgoing edges only, and on the contrary, primary output (PO) vertices have incoming edges only. A vertex u is a predecessor (successor) of a vertex v if there exists a path from u to v (from v to u). A vertex u is an immediate predecessor (immediate successor) of a vertex v if there exists an edge from u to v (from v to u). For each vertex v 2 V , let w(v ) represent its area. A cluster C V is a set of vertices fv1 ; v2 ; . . . ; vk g which satisfies the area bound M , where M is a given constant. For each cluster C , its area w(C ) is defined as the sum of area of all vertices in it and must be no more than M . That is w(C ) =
v2C
w(v )
M:
In the input (unclustered) graph, delay values are associated with all vertices and edges. For each vertex v 2 V , let (v ) represent its
Fig. 1.
647
Circuit with variable interconnect delay and a clustered circuit.
intrinsic delay. For each edge (u; v ) 2 E , it is associated with a delay (u; v ). (Note that, (u; v ) = 0 in [1].) For the graph in Fig. 1(a), the numbers beside the vertices and edges indicate the delay values associated with them. For example, thevertexdelayof e is1 ( (e) = 1), and the edge delay of (c; e) is 4 ( (c; e) = 4). A clustering S on the graph G is defined as a set of clusters, S = fC1; C2 ; . . . ; Cm g, such that all the clusters in S satisfy the following condition
8i 2 f1; . . . ; mg; Ci V; s:t:
w(Ci ) M; m Ci = V : i=1
Note that node duplication (i.e., a node appearing in more than one cluster) may happen in S . Let G0 be the clustered graph induced by a clustering S on the graph G. The delay associated with G0 is evaluated as follows. For each edge (u; v ) within the same cluster, it still has the original delay (u; v ). However, for each edge connecting two vertices in different clusters, its edge delay is replaced by a fixed value D . For the clustered graph G0 shown in Fig. 1(b), the set of boxes indicates a clustering (which contains three clusters C1 ; C2 ; C3 ) on the graph in Fig. 1(a). This clustering contains node duplication – node a appears in both C1 and C2 . Since the vertices a; c; e are in the same cluster C1 , the edge delays are (a; c) = 6 and (c; e) = 4. (But for the delay model adopted in [1], there is no delay associated with the edge (a; c) or (c; e) in this example.) However, the edge (e; g ) is across two clusters C1 and C3 , so (e; g ) becomes a predefined value D . Practically, we assume (u; v ) D for each edge (u; v ) in the input graph. To calculate the delay of a path from vertex u to vertex v , we always include all vertex delays and edge delays along the path. The path delay at a vertex v is defined as the maximum delay of allpaths from all PI vertices to v . The delay of a clustered circuit G0 is defined as the maximum delay of all paths from PIs to POs, which is equal to the maximum path delay at all PO vertices. For example, in Fig. 1(b), the delay of G0 is 15 + D along the path a ! c ! e ! g . Based on the definitions, the circuit clustering problem considered in this paper is presented in the following. Circuit Clustering With Variable Interconnect Delay: Given a graph G, find a clustering S (a set of clusters), such that the delay of the clustered circuit is minimized. III. PREVIOUS WORK AND PITFALLS IN TRIVIAL EXTENSIONS A. Previous Work In this section, we discuss the algorithm in [1], which solves the circuit clustering problem optimally under the general delay model.
648
Fig. 2.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 5, MAY 2003
Examples of circuit clustering.
In the algorithm, each cluster has one and only one root vertex r and we denote the cluster rooted at r (which is found by the algorithm) as r .1 Let Gr be the set of all predecessors of r together with the vertex r . Note that in the algorithm, r is a subset of Gr . When the context is not ambiguous, we also denote the subgraph induced by Gr by the vertex set Gr . The algorithm consists of two phases: the labeling and clustering phases. For each vertex u 2 V , the label of u, l u , is defined as the minimum path delay at u among all possible clusterings on the graph Gu . In the labeling phase, for each vertex r in a topological order, the algorithm finds r from Gr such that it would make the path delay at r become the minimum among all possible clusterings on Gr , and at the same time, the algorithm obtains l r . In order to get l r , the algorithm first calculates l0 u of each predecessor u of r . l0 u is defined as the sum of l u and the maximum delay of the paths from the output of u to the output of r in the input graph.2 Then, a vertex u with the highest l0 value is repeatedly found and included into the cluster rooted at r until the cluster area violates the area constraint. After finding the r , the algorithm continues to calculate l1 , the maximum l0 value of the PI vertices which are inside the cluster, and find l2 , the maximum l0 D value of the vertices outside the cluster. After that, the label of r , l r , can be found by getting the greater value of l1 and l2 . The clustering phase constructs clusters from PO vertices to PI vertices according to the cluster information generated in the labeling phase. First, for each PO vertex v , the corresponding v is included into the clustering S . Then, for each vertex u outside S , which is an immediate predecessor of any vertex inside S , u is also included into the clustering S . The procedure is repeated until all of the vertices in G are included in S .
cluster( )
cluster( )
()
cluster( )
() () ()
() ()
cluster( ) + ()
cluster( ) cluster( )
values. However, in our delay model, it requires replacing the original connection delay with a constant intercluster delay D ; hence, instead of l0 u D , the expression l0 u D 0 u; g u represents the path delay at r due to u when u is not included in r . Note that for each vertex u, g u is defined as the immediate successor of u such that the delay on the path from the output of u to the output of r passing through g u is maximum among all of the immediate successors of u3 . Therefore, if l0 u > l0 v , it is not always true that l0 u D 0 u; g u > l0 v D 0 v; g v . As a result, we cannot use l0 value to select vertices. D0 To simplify our presentation, we define l3 u to be l0 u u; g u for each vertex u 2 Gr 0 r . In other words, for a root vertex r , l3 u indicates the maximum delay of the paths through the vertex u when u is not in the cluster rooted at r . Obviously, using l3 u rather than l0 u for vertex selection is another trivial extension for applying the algorithm in [1] under our delay model. However, problems still exist in this extension. An example is shown in Fig. 2(a). In the figure, M ;D , each vertex has a unit area, and the number beside each vertex or edge is the delay value associated with it. It is easy to calculate that l3 e , l3 f , l3 c , l3 d , 3 3 l a , and l b , with respect to the root vertex g . Under this calculation, we know that vertex selection based on the value of l3 should be in the order of f , a, e, c, d, b. Based on this ordering, we can eventually form the clustering {fg; f; ag, fe; c; ag, fd; bg}; the path delay at g is 22 which is shown in Fig. 2(b). However, as in Fig. 2(c), if we form the clustering {fg; f; eg, fc; ag, fd; bg} instead, the path delay at g becomes 21. In fact, fg; f; eg is an optimal choice for g while l g . The problem of using the value of l3 to select vertices is that the vertex set of the resultant cluster may not be connected and this may increase the circuit delay. This example shows that such a trivial extension of algorithm in [1] is also unable to produce optimal solutions.
( )+
( )+
()
() ( ( ))
( )+
( ( )) () ()
( ( )) cluster( )
() () ( )+ ( ( )) ()
( )+ ()
=3
=7 ( ) = 22 ( ) = 24 ( ) = 21 ( ) = 21 ( ) = 23 ( ) = 20
cluster( )
( ) = 21
B. Pitfalls in Trivial Extensions This section shows that the algorithm in [1] cannot be “trivially” extended to deal with the circuit clustering problem with variable interconnect delay. The labeling phase in [1] is based on the fact that in the general delay model, for each vertex u 2 Gr 0 r , l0 u D effectively represents the path delay at r due to u when u is not included in r . Therefore, to make the resultant path delay at r as small as possible, adding vertices into r is done in the nonincreasing order of the l0
( )+
cluster( )
cluster( )
1All
vertices in the cluster are predecessors of the root vertex. 2When calculating the delay of a path from the output of u to the output of r , the vertex delay of u is not included.
IV. VERTEX GROUPING
( )
In this section, we first define the function l00 x; y for each edge x; y and then present our vertex grouping algorithm. Definition 1: Given a root vertex r and its associated Gr , the l00 value of an edge x; y , which represents the path delay at r due to x if x is not included in the cluster of r and y is included, is defined as
( )
( )
l
00
(x; y) = l(x) + D + 1(y; r)
3If more than one immediate successor of u satisfies the definition of g (u), we just pick any one of them to be g (u).
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 5, MAY 2003
Fig. 3.
649
Illustration of vertex grouping.
1( )
where y; r is defined as the maximum delay (including gate and edge delays) along any path from y to r , in the input graph (assuming all gates are in the same cluster). By calculating l00 x; y , we obtain the delay due to x in the situation where x is excluded from the cluster and y is included in the cluster rooted at r . The reason why l00 is calculated for each connection, but not for each vertex, is that the delay on each fanout edge of a vertex and D , the l00 values for the graph in Fig. 3(a) can differ. If M (for root vertex g ) are shown in Fig. 3(b). In the figure, it is clear that when c is included in the cluster, while a is excluded, the path delay at g is greater than the situation when we add a into g since 00 00 l a; c > l c; e . In other words, if we cannot include a and c at the same time, we should not choose to include c. By this observation, vertices should be added to a cluster in a “group” basis. Hence, we propose the following grouping strategy. Definition 2 (Vertex Grouping): Given a root vertex r , its associated Gr , and an edge a0 ; x in Gr , a0 ; x is a subset of Gr such that a predecessor ai of a0 must be assigned into a0 ; x if there exists a path from ai to a0 , denoted ai ! ai01 ! 1 1 1 ! a1 ! a0 , such that
( )
=3
( ) = 23
cluster( )
( ) = 21
(
l
=7
)
group(
)
group(
)
00 (ak ; ak01 ) > l00 (a0 ; x); 8k = 1; 2; . . . ; i:
group( ) ... group( ) ( )
Also, a0 is always assigned into a0 ; x . Note that in Definition 2, all vertices on the path from ai to x are the predecessors of r , i.e., x; a0 ; ; ai 2 Gr . Based on this grouping definition, a is assigned into c; e , shown in Fig. 3(c), because there exists a path a ! c such that l00 a; c > l00 c; e . In fact, it indicates that c and a should be assigned to the cluster rooted at g at the same time. Definition 2 describes the condition whether a vertex ai in Ga is in a0 ; x . In order to get a0 ; x , we can check all vertices in Ga . However, given the subgraph Gr rooted at r , it is not necessary to get a; b for every edge a; b in Gr . Therefore, we propose the following vertex-grouping algorithm which assigns vertices into a; b for some edges a; b only. The algorithm is shown below, followed by a detailed discussion.
group( ) group( ) group( )
ALGORITHM Input : vertex Output : list
( )
group( ) ( ) ( )
begin ) IF ( in return; ) ELSE IF ( into leader_set; add return; ) ELSE IF ( into ; add FOR each immediate predecessor ; END FOR END IF end
Grouping()
( ) group( )
of edges, for each edge
of
In , each edge u; r is first stored in the “leader_set.” When all vertices in u; r are found, the edge u; r has been removed from “leader_set” and put into a list Pr . We denote all edges in both “leader_set” and Pr as “leader edges.” x; y; a; b shows the process of checking whether the vertex x should be added into a; b . If x 2 Ga does not comply with the definition of a; b , the edge x; y is added into the “leader_set.” The algorithm terminates when “leader_set” is empty. In this algorithm, not all edges in Gr may be added to Pr , and we only obtain a; b for each “leader edge” a; b 2 Pr . After , all predecessors of r are divided into groups and the “leader edge” of each group is stored in the list Pr . For the circuit in Fig. 3(a), the results after are shown in Fig. 3(c). In the figure, the thick lines are edges in Pr (leader edges) while five groups are totally formed. With the grouping algorithm, we
group( ) group( )
( ) Group vertex(
( )
group( ) Grouping()
,
begin leader_set={}; FOR each immediate predecessor
/* i.e., */ into leader_set; put edge END FOR WHILE (leader_set is not empty) from leader_set and remove an edge ; put it into ; of FOR each immediate predecessor ; END FOR END WHILE , for each edge ; return end
( )
Grouping()
of
)
650
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 5, MAY 2003
are able to obtain an important property in which adding a group (no longer a vertex) into a cluster without increasing the path delay at the root vertex r is possible.
10. ; = ( );
11. in
;
12. 13. end
V. ALGORITHM In this section, we present our algorithm (as follows) based on the grouping strategy, which is a “nontrivial” extension of the algorithm in [1], for the circuit clustering problem under our new delay model.
Similar to [1], our algorithm consists of two parts: the labeling phase (lines 3–12) and the clustering phase (lines 13–19). The clustering phase works in the same way as [1]. In the labeling phase, we get a cluster r for each vertex r in a topological order. For each non-PI vertex r , we first compute the value l00 u; w for each edge u; w 2 E which connects vertices in 00 Gr . Based on the l values, we apply our vertex grouping algorithm to divide the vertices in Gr into groups. After that, each group of vertices 00 is considered for adding to r based on the l value of its leader edge. Thus, the leader edges are sorted in advance according to their l00 values. For example, when applying our algorithm to the circuit in Fig. 3(a), the resultant clustering contains three clusters: g fg; f; eg, c fa; cg, d fb; dg, and l g along the path a ! c ! e ! g or b ! d ! f ! g . In fact, the clustering is the same as the optimal one shown in Fig. 2(c). The main difference between our algorithm and the algorithm in [1] is that we employ the grouping strategy and add vertices to a cluster in a group basis, which guarantees a minimum circuit delay under our delay model. Besides, in order to apply the grouping strategy, we calculate l00 for every edge while the algorithm in [1] calculates l0 for every vertex for selecting vertices which has been shown not applicable to our delay model. For our algorithm, we have the following lemma describing some important properties of the sorted list Pr0 that is generated in line 11 of . Lemma 1: For any non-PI vertex r , the following properties P1, P2, P3, and P4 are correct. Let w; u and x; y be any two edges in Pr0 (note that w; u; x; y are in Gr ). P1 If w; u is positioned before x; y in Pr0 , then
cluster( )
ALGORITHM Input : graph Output : a set of clusters
(
1. begin 2. compute the maximum delay matrix , is the maximum delay along where to , (including gate any path from and edge delays) assuming all gates are in the same cluster; ; 3. FOR each PI , in a 4. sort the non-PI vertices of topological order to obtain list ; is not empty 5. WHILE from ; 6. remove the first vertex ; 7. compute s.t. 8. FOR each edge and 9. END FOR ; 10. = sort the edges in in nonin11. creasing order of ; ); 12. Labeling( END WHILE = all PO vertices; 13. ; 14. is not empty 15. WHILE from ; 16. remove a vertex ; 17. , 18. FOR each vertex is an input of ( ) and s.t. 19.
is the first edge
)
(
)
cluster( )
cluster( ) =
cluster( ) = ( ) = 21
cluster( ) =
Circuit clustering() (
)
( )
( ) ( ) ( ) + 1( ) + ( ) + 1( ) +
l w
P2
u; r
D
l x
y; r
If x is the predecessor of both be the same vertex) such that x 62 w; u , then
w y
and
(y and
u
2 group(
group( ) ( ) + 1( ) + ( ) + 1( ) +
;
END FOR END WHILE 20. end
l w
P3
u; r
D
l x
y; r
w
)
w; u
may and
D:
If a vertex z is the predecessor of both w and u such that z
2
group( ), there must exist a path from to such that all edges ( ) along the path satisfy 2 group( ) and 2 group( ) and they must also satisfy the following w; u
z
p; q
Input : vertex , list Output : , 1. begin ; 2. is not empty 3. WHILE in ; 4. remove the first edge ) 5. IF( ; 6. 7. ELSE back to the head of 8. insert 9. break; END IF END WHILE
D:
q
w
p
w; u
w; u
inequality
( ) + 1( ) +
l p
P4
If
( ) + 1( ) +
D > l w
u; r
D:
is the predecessor of both w and u such that w; u and x 62 w; u , there must exist a path from y to w such that all edges p; q along the path satisfy p 2 w; u and q 2 w; u and they must also satisfy the following inequality
y
;
q; r
x
2 group(
) group(
( ) + 1( ) +
l p
q; r
group(
)
) ( ) group(
( ) + 1( ) +
D >l w
u; r
( ) + 1( ) + l x
y; r
D
D:
)
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 5, MAY 2003
651
Proof: P1 is trivial simply because the list Pr0 is sorted. For P2, it is true because stops including the predecessor x only when l00 w; u l00 x; y . The same reasoning is applicable for P3; before stops including the predecessors into w; u , each examined edge p; q must have l00 p; q > l00 w; u . P4 is obtained by joining P2 and P3.
Group vertex() ( ) ( ) Group vertex() group( ) ( )
( )
( )
VI. OPTIMALITY AND COMPLEXITY OF THE ALGORITHM The authors in [1] state that in any clustering S on a graph G, the path delay at any vertex v should be greater than or equal to the path delay at v in an optimal clustering Sop on Gv [1, Lemma 1]. It can be easily proved that this lemma is also applicable to our new delay model. To prove the optimality of our algorithm, we first prove that: 1) the label of each vertex, l v (which is calculated in the labeling phase), is the lower bound of the path delay at v in any optimal clustering and 2) the clustering phase (lines 13–19) is able to construct a clustering such that the path delay at v equals l v . From (1) and (2), together with in [1, Lemma 1], it can then be proved that the clustering S generated by our algorithm is an optimal clustering. Before proving (1) and (2), we first explore an important lemma for vertices within a cluster. v fvg generated Lemma 2: For any cluster in , for each edge x; y such that x; y 2 Gv , x 2 Gv 0 v and y 2 v , if we arbitrarily divide v into two sets A and B such that A 6 ; B 6 , A[B v , and A \ B , there must exist an edge z; w connecting vertices in A and B (without loss of generality, assume z 2 A and w 2 B ) such that l00 z; w l00 x; y . Proof: In the labeling phase, groups of vertices are assigned into v in the nonincreasing order of the leader edges’ l00 values. According to P4 in Lemma 1, if a vertex z is assigned to v (that means v fv g), there must exist an immediate j; k successor w of z and an edge j; k such that z 2 and l00 z; w l00 j; k , while the edge j; k must have the value l00 j; k greater than or equal to the l00 values of all leader edges of those groups that are not yet assigned to v . Therefore, l00 z; w l00 j; k l00 x; y . From Lemma 2, we know that l00 z; w l00 x; y . Besides, we have
()
()
cluster( ) ( ) cluster( ) = = ( ) ( )
Labeling() ( cluster( )) cluster( ) = cluster( )
cluster( ) ( ( ) ( )
cluster( ) ) ( ) ( )
( )
( )
(
( ) cluster( ) ) ( )
= ( )
cluster( ) group( )
( )= maxfl (x; y)jx; y 2 Gv ; x 2 (Gv 0 cluster(v)); y 2 cluster(v)g (which is generated in line 11 of Labeling()). The following Corollary can be easily derived. Corollary 1: For any cluster cluster(v ) fv g generated in Labeling(), if we arbitrarily divide cluster(v) into two sets A and B such that A 6= , B 6= , A [ B = cluster(v ) and A \ B = , there must exist an edge (z; w) connecting vertices in A and B (without loss of generality, assume z 2 A and w 2 B ) such that l (z; w) l2 (v ). Lemma 3: For any vertex v , the path delay at v in any optimal clustering of the subgraph Gv , denoted by delay(v ), is greater than or equal to l(v ). l2 v
00
00
Proof: It is proved by induction. 1) Induction basis. For any PI vertex v , l v v . It is obvious that l v v . 2) Induction step. Assume that the statement l x x is true for all vertices x 2 Gv 0 fv g ; we are going to prove that the statement is also true for vertex v , i.e., l v v . The value of l v is the maximum value of l1 v and l2 v (in line 12 of ). We consider the two cases separately.
( ) = delay( )
(
() Labeling()
( )= ( ) ( ) delay( ) ) ( ) delay( ) () ()
Fig. 4.
Illustration of Proof of Lemma 3.
• Case 1 l(v ) =l1 (v) =maxfl (x; g(x)) 0D + (x; g(x))jx 2 (cluster(v) \ (P I ))g =maxfl(x)+1(g(x); v)+ (x; g(x))jx 2 (cluster(v) \ P I )g: 00
Since x is a PI l(v ) =maxf(x) + 1(g(x); v) + (x; g(x))jx 2 (cluster(v) \ P I )g =maxf1(x; v)jx 2 (cluster(v) \ P I )g
which is the maximum delay among the paths from all PI vertices in v to v , and by assumption, l1 v is greater than or equal to the delay value of all other paths involved in calculating l2 v . Hence, the path delay at v in any optimal clustering of the subgraph Gv cannot be smaller than l1 v . • Case 2 See the following condition. This case implies v Gv . Here, we prove by contradiction
cluster( ) ()
()
()
cluster( ) l(v )=l2 (v )=maxfl(x) + 1(w; v )+ Dj(x; w) : 2 E;w 2 cluster(v) and x 2 (Gv 0 cluster(v))g Assume delay(v ) is smaller than l(v ), i.e., l(v ) > delay(v ). And we denote the corresponding cluster rooted at v in an optimal clustering as Cv . Without loss of generality, we assume Cv Gv . There are three cases. a) If Cv v Gv , there must exist two edges a; b ; a0 ; b0 (they may be the same) such that a; a0 2 Gv 0 Cv , b; b0 2 Cv , l v l a b; v D and v a0 b0 ; v D . This is depicted in Fig. 4(a). First, a0 b0 ; v D v a b; v D . Second, based on the induction hypothesis that l x x for all vertices x 2 Gv 0 a l a . Thus, l v v fvg , we have but it contradicts our assumption. b) If Cv v (which implies Cv 0 v 6 ), there must exist an edge a; b with a 2 Gv 0 v and b 2 v such that l00 a; b l v . Let EC denote the set of all such edges. For each edge a; b 2 EC , both a and b are in Cv because v < l v . So, b 2 v and a 2 Cv 0 v . We consider the first edge u; w remaining in P in line 11 of with l v l00 u; w . Since l v > v , the vertex u must v , i.e., u; w 2 be in Cv 0 v and w 2 should add u; w into EC . In this case,
= cluster( ) ( ) ( )( ) ( ) ( ) = ( ) + 1( ) + delay( ) = delay( )+1( )+ delay( ) + 1( ) + = delay( ) delay( ) + 1( ) + ( ) delay( ) ( delay( ) ( ) ( ) delay( ) ) cluster( ) cluster( ) = ( ) cluster( ) cluster( ) ( )= ( ) ( ) delay( ) ( ) cluster( ) cluster( ) ( ) Labeling() ( )= ( ) ( ) delay( ) cluster( ) ( ) cluster( ) Labeling() group( )
652
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 5, MAY 2003
cluster( ) cluster( ) group( ) cluster( ) cluster( ) group( ) cluster( ) group( ) group( ) ( ) group( ) ( ) delay( )+1( )+ ( )+1( )+ delay( ) delay( ) + 1( ) + l(f ) + 1(g; v) + D > l(u) + 1(w; v) + D = l(v) > delay(v) which is impossible. c) If (cluster(v ) 0 Cv ) 6= , we can divide cluster(v ) into two disjoint subsets cluster(v ) 0 Cv and cluster(v ) \ Cv . By Corollary 1, there exists an edge (s; t) such that s 2 (cluster(v) 0 Cv ) and t 2 cluster(v) \ Cv while l(s) + 1(s; t) + D = l (s; t) l2 (v) = l(v). This is depicted in Fig. 4(c). Since we know delay(v ) delay(s) + 1(s; t) + D, due to the induction hypothesis that delay(s) + 1(s; t) + D l(s) + 1(s; t) + D , we the v if the area of v [ u; w does not exceed the area constraint M . However, u 2 v implies that the area of v [ Gv 0 u; w exceeds M . When u 2 Cv and the area of v [ u; w is greater than M , there must u; w , g 2 exist an edge f; g such that f 2 u; w , f 2 Gv 0 Cv and g 2 Cv . The situation is depicted in Fig. 4(b). Based on the induction hypothesis, f g; v D l f g; v D . Then, by P3 of Lemma 1, v f g; v D
00
have
delay(v) delay(s) + 1(s; t) + D l(s) + 1(s; t) + D l(v) which contradicts the assumption delay(v ) < l(v ). As a result, the statement is also true for vertex v .
Lemma 4: In our algorithm, for any vertex v in the clustering S generated by the clustering phase (lines 13–19), the path delay at v is less than or equal to l v . Proof: Our delay model is different from that in [1], but the clustering phase in our algorithm is the same as that of [1], so the proof is the same. Details can be found in [1]. Based on Lemma 3 and Lemma 4, we can easily derive the following theorem. Theorem 1: The clustering S generated in our algorithm is an optimal clustering for any instance of the problem described in Section II. Proof: In Lemma 3, it is shown that for each vertex v , the label l v in our algorithm is less than or equal to the path delay at vertex v in any optimal clustering; Lemma 4 states that our algorithm is able to generate a clustering with the path delay at v less than or equal to l v which is the lower bound of the path delay at vertex v in any optimal clustering. Together with Lemma 1 in [1], the clustering S generated by our algorithm is an optimal clustering. We analyze the complexity of our algorithm. In , would run at most jV j times, so the time complexity of the WHILE loop is O jV kE j . In , finding the maximum delay matrix takes O jV j jV j jE j , finding a topological order in line 4 takes O jV j jE j time, the sorting in line 11 takes time O jE jlg jE j , and takes only O jE j time. So, the first WHILE loop of takes O jV j jE jlg jE j jV kE j time. Clustering phase (lines 13–19) takes time O jV j jE j . So the overall time complexity is O jV j jE jlg jE j jV kE j O jV j2 jE j . Remarks: In fact, our algorithm can also handle the case where the intercluster delay D is a variable value (say D x; y ; 8 x; y 2 E ). It is because the calculation of l00 x; y lx D y; r includes the value of D such that if D becomes a variable D x; y , the calculation becomes l00 x; y l x D x; y y; r , and still correctly represents the situation when x; y becomes an intercluster
edge. Besides, the optimality of the algorithm still holds because all the theoretical results remain true and can be proved similarly. VII. CONCLUSION In this paper, we have introduced a new delay model which is more general and practical than the general delay model [3]. Under our new delay model, a circuit clustering algorithm based on a novel vertex grouping technique is proposed and is proved to optimally solve the area-constrained combinational circuit clustering problem for delay minimization in polynomial time. REFERENCES [1] R. Rajaraman and D. F. Wong, “Optimum clustering for delay minimization,” IEEE Trans. Computer-Aided Design, vol. 14, pp. 1490–1495, Dec. 1995. [2] H. Yang and D. F. Wong, “Circuit clustering for delay minimization under area and pin constraints,” IEEE Trans. Computer-Aided Design, vol. 16, pp. 976–986, Sept. 1997. [3] R. Murgai, R. K. Brayton, and A. Sangiovanni-Vincentelli, “On clustering for minimum delay/area,” in Proc. IEEE Int. Conf. ComputerAided Design, 1991, pp. 6–9. [4] E. Lawler, K. Levitt, and J. Turner, “Module clustering to minimize delay in digital networks,” IEEE Trans. Comput., vol. C-18, pp. 47–57, Jan. 1966.
Slicing Floorplan With Clustering Constraint
()
()
()
Grouping() ( ) Circuit clustering() 1 ( ( + )) ( + ) ( ( )) Labeling() ( ) Circuit clustering() ( ( ( )+ )) ( + ) ( ( ( )+ )) = ( ) ( ) ( ) ( ) = ( ) + + 1( ) ( ) ( ) = ( ) + ( ) + 1( ) ( )
W. S. Yuen and Evangeline F. Y. Young
Abstract—In floorplan design, it is useful to allow users to specify some placement constraints in the final packing. Clustering constraint is a popular type of placement constraint in which a given set of modules are restricted to be placed adjacent to one another. The wiring cost can be reduced by placing modules with a lot of interconnections closely together. Designers may also need this type of constraint to restrict the positions of some modules according to their functionalities. In this paper, a method addressing clustering constraint in slicing floorplan will be presented. We devised a linear time algorithm to locate neighboring modules in a normalized Polish expression and to rearrange them to satisfy the given constraints. Experiments were performed on some benchmarks and the results are very promising. Index Terms—Clustering constraint, design floorplanning, floorplanning, physical design, very large scale integrated computer-aided design (VLSI CAD).
Group vertex()
I. INTRODUCTION Floorplan design is the problem of planning the positions and shapes of a set of modules on a chip in order to optimize the circuit performance at a very early designing stage. During this floorplanning phase, circuit performance like layout area, interconnect cost, heat dissipation and power consumption, etc., should be taken into consideration.
Manuscript received December 3, 2001; revised May 17, 2002. This paper was recommended by Associate Editor T. Yoshimura. The authors are with the Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong (e-mail:
[email protected]). Digital Object Identifier 10.1109/TCAD.2003.810738
0278-0070/03$17.00 © 2003 IEEE