Timing and Crosstalk Driven Area Routing Hsiao-Ping Tseng
Louis Scheffer
Carl Sechen
University of Washington Dept. of Electrical Engineering Seattle, WA 98195 206-685-8678
Cadence Design Systems, Inc. 555 River Oaks Parkway, Bldg. 2, MS2B2 San Jose, CA 95134 408-944-7114
University of Washington Dept. of Electrical Engineering Seattle, WA 98195 206-685-8756
[email protected]. edu
[email protected]
ABSTRACT We present a timing and crosstalk driven router for the chip assembly task that is applied between global and detailed routing. Our new approach aims to process the crosstalk and timing constraints by ordering nets and tuning wire spacing in a quantitative way. Our graph-based optimizer preroutes wires on the global routing grids incrementally in two stages - net order assignment and space relaxation. The timing delay of each critical path is calculated taking into account interconnect coupling capacitance. The objective is to reduce the delays of critical nets with negative timing slack values, by tuning net ordering and adding extra wire spacing. It shows a remarkable 8.4-25% delay reduction for MCNC benchmarks for wire geometric ratio=2.0, against a 33% delay reduction if interconnect interference disappear.
1 INTRODUCTION One of the emerging deep submicron process issues is the increasing effect of coupling capacitance between interconnect on the same layers. Our capacitance extractor shows that for typical process parameters the intra-layer coupling capacitance contributes 25% of metal2 interconnect capacitance in a 0.25um process if the wire geometric ratio (height/width) is 1.0, and 33% if the wire geometric ratio is 2.0. Wire resistance and intra-layer coupling capacitance therefore increase as feature sizes shrink. Most of the existing routing frameworks only consider wire length as the constraint for path delay and ignore the neighboring switching correlation. Previous work in the crosstalk-aware routing problem is mostly in the gridded domain. Crosstalk reduction efforts have been made at a detailed routing level for the river routing problem[1], the switchbox problem[3] and the channel routing problem[2]. However these algorithms are all in the net ordering domain with fixed spacing in which segments are rearranged during the postprocessing stage such that accumulated coupling capacitance of critical nets is reduced. The segment rearrangement heuristics of the above approaches only apply to nets in a local scope, such as in a switchbox or in a channel. On the other hand, Xue et al. [4] develop a post global routing crosstalk risk analyzer which estimates the possible coupling between sensitive nets and tries to reroute nets away from crosstalk risky zones. Chaudhary, et al. develop a general postrouting spacing algorithm [5] which measures the crosstalk effect
[email protected]
by superposition of unweighted voltage glitches from driver to sink. We shall describe the difference in detail in Section 5. The interconnect coupling capacitance causes signal integrity problems in two respects - timing faults and logic faults. There are many heuristics and techniques, such as wire permutation, wire shielding, buffer insertion and buffer sizing, which can decrease coupling capacitance or improve timing and potentially reduce logic fault hazards and timing fault hazards. In this paper, we mainly concentrate on improving the worst case timing. The remainder of the paper describes our new timing and crosstalk driven area prerouter as follows. An overview of the problem and the new approach is given in Section 2. Section 3 briefly describes the RC extraction model and interconnect delay calculation. The net ordering algorithm and space relaxation algorithm are presented in Sections 4 and 5, respectively. We shall show the experimental results in Section 6.
2 OVERVIEW 2.1 Chip Assembly Area Routing Problem The chip assembly routing problem is too complex to solve in one piece and is often solved by divide-and-conquer approaches global routing, then detailed routing. The layout floorplan is first broken into a 3D global routing plane of global cells (Gcells), which have a typical size range of 10x10 - 20x20 tracks. A global router searches for the rough path of each signal on the global cell plane and determines which global cells each signal should traverse into. Based on the global routing information, a detailed router is able to efficiently complete the routing of one Gcell at a time. On the interface of two neighboring Gcells, there are net crosspoints. Gcells on the same row (column) can be glued together and seen as a horizontal (vertical) routing channel. However, the channel height of Gcell channels is fixed. From the practice of industrial chip assembly circuits, we observe that most signals traverse through multiple Gcells, usually by straight segments. The ordering of nets and spacing between them can be determined by the crosspoints on Gcell boundaries if all wires are straightened. Chip assembly is clearly a case in which crosspoint assignment (CPA) can determine the layout almost as well as the final routing. Kao et al. [6] develop a crosspoint assignment algorithm to improve the routability but it does not consider timing and crosstalk. We develop a two-stage crosspoint assignment algorithm considering timing and crosstalk constraints.
2.2 Timing Driven Crosspoint Assignment The crosspoint assignment problem for a single Gcell is solved by a graph-based framework. The crosspoint of a signal determines the position of two segments connected to the crosspoint. Two consecutive crosspoints of the same signal with different positions on boundaries imply a jog is needed to connect them. A straightened segment of crosspoints at the same boundary location is represented as a node in the graph. The physical design rules of a
to node v if ci ∈u and cj ∈v. The weight of directed edge d(ci, cj) is equal to the minimum center-to-center distance of ci and cj. For example, if crosspoint c1 of node n1 is assigned above crosspoint c2 of node n2, a directed edge d(c1, c2) is added into the constraint graph between n1 and n2. In practice, redundant jogs waste routing resources and thus increase the routing difficulty. To maximize the detailed routability for each global cell, the number of jogs should be minimized. The main effort in our work to enhance the routability is to straighten segments as much as possible. The crosspoint assignment problem that we aim to solve includes several optimization objectives routability, crosstalk effect and path delay. In our delay calculation, the signal switching activities are taken into account for the interconnect coupling capacitance. Therefore the main optimization objectives fall into two groups - routability and path delay. We call the problem of assigning crosspoints on Gcell boundaries to optimize routability and path delay to be the Timing Driven Crosspoint Assignment (TDCA) problem.
2.3 Program Flow Our approach is to adjust net ordering and relax wire spacing in a global scope and on a path delay basis. As shown in Figure 1, it has two stages - a net ordering algorithm to determine the crosspoint order on each boundary of the global routing cell, and a space relaxation algorithm to augment the wire spacing. The Elmore delay is calculated for each interesting signal in the worst case. The processing priority of a net at each stage is equal to its current worst delay minus the user specified maximum delay (timing constraint). In the first stage, crosspoints on each boundary are assigned to boundaries and their corresponding geometric constraints are inserted into a constraint graph while the coupling capacitance of the critical nets is minimized. After all crosspoints are assigned to boundaries, a valid routing solution is achieved. Based on this initial solution, the second-stage algorithm seeks to add space around critical nets and therefore further reduce delay. Net Ordering Algorithm global routing
find the most timing critical channel
finish
timing driven crosspoint assignment
decide crosspoint order for each boundary update delay database after each boundary CPA
Space Relaxation Algorithm finish
detailed routing
find the most timing critical segment
add blank tracks around the segment push neighboring wires away from the segment
update delay database
Figure 1: Flow of the algorithms
3 RC EXTRACTION AND WIRE DELAY 3.1 RC Extraction Model From in-house capacitance extraction simulation, we observe that
if a layer above or below is only 50% covered with metal, the vertical capacitance (also commonly called “substrate capacitance”) to a 50% covered metal layer is close to that from a fully covered metal layer due to the fringing effect. A lookup table is constructed with respect to the wire width and distance to the neighboring wire. The wire geometry aspect ratio (height/width) was tested at 1.0, 1.5 and 2.0 for the lookup table. The wire capacitance for various wire geometry aspect ratios (AR1.0, AR1.5, AR2.0) versus the spacing to neighboring wires is drawn in Figure 2. As the picture indicates, the reduction in coupling capacitance due to extra spacing is very close to the minimum at 3c (one blank track). Therefore, our heuristics at most add one extra track of spacing around sensitive critical wires. cap.
segment are realized by vertical constraints in the graph [10]. A vertical constraint (VC) is introduced from layout component ci to cj when (1) ci and cj have vertical overlapping on the same layer and (2) ci is above cj. A directed edge d(ci, cj) is added from node u
0.6
m3 c c
m2
0.66 0.58
3c
0.5 0.49 0.4
c m1
AR2.0 AR1.5 AR1.0 0.44 0.41 0.38
0.3 c 2c 3c 4c 5c
spacing
(a) mini. spacing (b) one-track spacing (c) cap. vs. spacing Figure 2: Coupling capacitance vs. wire spacing (c: minimum wire width) A switching factor (SF) with regard to the timing (transition) windows of two signals is used in delay calculation. If the transition windows of two signals overlap but switch in the same direction, then for the best case the switching factor SF=0 and the effective coupling capacitance Ceff = 0C (for the worst case, SF=1 and Ceff=C). If one signal switches and the other is silent, then for both the best and worst case the switching factor SF=1 and Ceff = 1C. Conversely, if the transition windows of two signals overlap but switch in different directions, the switching factor is 2 and Ceff = 2C. Previous work [7] on predicting circuit switching activities has proved that it is very difficult and inaccurate to guess switching activities from circuit functionality. In this paper, we assume the switching factors of signals are provided by synthesis tools or the user. The effective capacitance of wire w is the summation of the contributions from the upper layer, the lower layer, and the intra-layer neighboring wires (Cj,w*SFj,w), where SFj,w is the switching factor between two coupled signals through wires j and w.
3.2 Interconnect Delay The Elmore delay calculation is used to estimate path delays. A Π model is used for each wire segment for delay calculation.
4 Net Ordering Algorithm Path delays of interesting nets are calculated on the fly during the optimization process. Initially we assume each node of the interesting nets has the worst possible neighboring wires and then RC extraction is applied. A net having a negative timing slack value is called a critical net if its delay exceeds the user-specified timing requirement. Nodes (non-fragmented segments) on a critical net are called critical nodes and crosspoints of a critical node are called critical crosspoints. The net ordering algorithm always processes the most critical channel which has the longest segment of the most critical node. The assignment process starts from the most congested boundary in that channel and continues to
proceed outward. A crosspoint is processed if it is inserted into the graph. A node is processed if any one of its crosspoints is inserted into a graph. The crosspoints on a selected boundary are inserted into the graph in the following order: • the crosspoints whose nodes are not movable • the crosspoints whose nodes have been processed, starting from bottommost one • the rest of the crosspoints
n4
n3b n4 cBb
(a) proceed to boundary b after b-1
n1
n3a n2 n1
n4 n3b
X1
constraints on boundary b
boundary b
constraints on boundary b-1
Top
n3a n2 n1 obstacle X1
x1
x1
x1 n3
n1
assignment proceeding direction boundary b-1 cTb-1
latter. Node nc is said to be self-relaxable if it’s movable in the graph. In the self relaxation stage, one of the upper and lower sides of nc is selected, based on the least cost, and all neighbors on that side are moved outward by the a predetermined incremental spacing (e.g. 0.5 track). Its cost function is equal to the increase in the total capacitances of interesting nets due to this operation. In the neighbor relaxation stage, the neighboring node is selected, based on the least cost, to add space to nc. The cost function is the same as that of self relaxation.
n2
Figure 3: Crosspoint assignment For each new crosspoint insertion, all the geometric intervals between the existing assigned crosspoints and obstacles are examined to satisfy design rules. The interval with the minimum cost is selected and the crosspoint is inserted into the graph. The cost function to assign crosspoint cj in the interval (ci, ci+1) is equal to the coupling capacitance increase among interesting nets. If none of these assignments is valid and there are intervals large enough to accommodate cI, we break nI between the previous boundary b-1 and the current boundary b. Then the new node of cI is inserted into boundary b. If there is no interval large enough for cI, we rip up all crosspoints on boundary b and break all nodes between boundary b and b-1. The new nodes of ripped crosspoints are assigned back to boundary b from the bottom in the ascending order of their old y positions because we want to preserve the net ordering inherited from boundary b-1 for better routability. In Figure 3, boundary b-1 is first processed and then it proceeds to boundary b. The crosspoint for net n3 on boundary b is tested and cannot be extended; therefore we break it into two nodes, n3a and n3b. Node n3b is then inserted into boundary b at the minimum cost. Node n4 has a crosspoint at the bottom of Gcell and one at boundary b. Since the crosspoints of node n4 are not processed, the crosspoint of n4 on boundary b is inserted after n3b is inserted.
5 Space Relaxation Algorithm In a graph, there could be multiple critical nodes to which we need to allocate routing resources. However, there may be multiple directed paths connecting two critical nodes in a channel graph, the space allocation for one node may degrade the possible space allocation of other critical nodes on the same directed path. To alleviate the ordering problem in space allocation, we allocate only partial space resources around a node at each iteration. At each iteration, the critical node nc (the longest segment) of the most critical net Nc (timing_slackNc = min(timing_slack)) is selected for space relaxation. The available spaces around the selected node are allocated by two stages of relaxation optimizations - self relaxation and neighbor relaxation. The former has higher priority than the
n1
n4
n2
n5
n2
n3
n4
n4 n7
n6
x2
n7 n5
n6
n7 x2
x2
n6 n5
self relaxation
Bot (b) graph representation
n3
neighbor relaxation
Figure 4: Self relaxation of node 4 and its neighbor relaxation An example of self relaxation and neighbor relaxation on node 4 is shown in Figure 4. The light shaded region of each node in Figure 4(a) stands for the movable region, and the dark nodes in (b) are non-relaxable because extra spaces (dark shaded areas) are added. The relaxable neighbors n1, n3, n5, n6 are further relaxed outward by adding extra spacing until none of them is movable. Comparison with Previous Work - Chaudhary et. al. [6] proposed a graph-based spacing algorithm in 1993. The framework spaces out wires after detailed routing has been finished, contrary to the prerouting strategy of our framework prior to detailed routing. Our framework is different from his work in two aspects - the measure on crosstalk effect and the graph-based algorithm. Chaudhary’s measure of the crosstalk effect is based on voltage glitches, instead of actual delay in our framework. In his work, the crosstalk effect from driver to sink is the unweighted superposition of all glitches of wires along the path. The cost function is therefore more for the measure of logic fault hazards rather than timing fault hazards as in our approach. Chaudhary’s graph-based algorithm is analogous to the neighbor relaxation which does not add space to a group of neighbors at a time.
6 Experimental Results We have implemented the timing and crosstalk driven prerouter in GNU C++ on Unix systems. To test the MCNC macro-cell benchmark circuits, placements are generated by the TimberWolfMC placer [8], global routing by [9], and detailed routing by the Kokanee tile-based router [10]. cross cells nets pins points hp ami33 apte xerox ami49
2 No. of mini. area (E6 mfs ) global 2 metal 4 metal cells layer layer
11 83 309 1375 19 X 25 13 12.3 37 124 513 2611 35 X 42 2.44 1.89 9 96 283 938 18 X 20 54.3 49.6 10 203 696 2132 19 X 23 26.8 24.6 49 408 953 13360 88 X 96 48.2 46.4 Table 1: Characteristics of benchmark circuits
The characteristics of the tested MCNC benchmark circuits are listed in Table 1. Area and length in tables are in the units of minimum feature size (mfs) of its fabrication process. In all of our tested circuits, we use msf = 0.25um. In Table 2, six groups of twopin nets are selected as critical nets. The second row is the total wire length of the selected nets over the chip wire length. ‘N+S’ stands for turning on both Net ordering and Space relaxation algorithms, and ‘N’ stands for only turning on Net ordering. The average delay reduction of selected critical nets is shown by percentage. The net ordering algorithm accounts for 60-70% of the total improvement. The delay reduction increases when the wire group of A B C D E 2pin nets (7nets) (15) (14) (10) (17) wire length
1.7% 6.2% 5.8% 6.5% 3.4% -7
-13
-6
-7
F (24 nets) 10.1% N+S
N
-10
-7
-11
-8
AR1.5
-6.8
-8.8 -15.4 -7.7 -9.9
AR2.0
-8.4
-10.7 -18.5 -9.5 -11.8 -13.9 -10.5
9 REFERENCES
Table 2: delay reduction vs. num. of critical nets for ami33 aspect ratio increases. The combinations of six net groups are again tested for routability in Table 3. Notice that only 0.7-0.8% of Gcells (GC) have detailed routing failures. The failed switchboxes are merged with neighboring switchboxes and rerouted successfully. Notations 0.8N, 0.9N, 0.95N in Table 3 and Table 4 stand for 20%, 10%, 5% of total wire length being randomly selected for deletion. It shows that lower wire density gives more improvement from space relaxation. Five MCNC benchmark B&C (12%)
D&F (16.6%)
delay failed failed delay failed delay failed delay red. GCs GCs N+S* N GCs* N+S* N GCs* 1.00N -5
0.73
-7
0.95N -6.3 0.8 -7.1
0.83 -10 -7 0.77 -8.2 -6 0.77 0.8 -11.8 -7 0.77 -8.9 -7.1 0.83
Table 3: routability on test cases of Table 2 at wire aspect ratio=1 for ami33 circuits are tested using three wire aspect ratio (1.0, 1.5, 2.0) for two metal layers in Table 4. The CPU time is collected on an Intel PentiumPro 200MHz using AR2.0 and wire density 100%. The AR1.0
AR1.5
AR2.0
1.0N 0.9N 0.8N 1.0N 0.9N 0.8N 1.0N 0.9N 0.8N ami33 -9.5 -11 -13 -11 -11.5 -12.3 -13.9 -13.2 -14
CPU (secs) 7.4
ami49 -9.2 -10 -10.8 -13.6 -14.5 -15.3 -15.9 -18.8 -18.7 34 hp
-7.3 -8.7 -8.9 -10.8 -12.7 -14.2 -12.5 -15.2 -17
3.1
xerox -11.4 -12.1 -13.8 -13.9 -16 -18.5 -15.8 -18.3 -22
7.2
apte
We have presented a timing and crosstalk driven router for the chip assembly task that is applied between global and detailed routing. Our new approach aims to process the crosstalk and timing constraints by ordering nets and tuning wire spacing in a quantitative way. The new approach fits between global routing and detailed routing along the physical design flow. It is the first to address the timing and crosstalk driven area routing problem in the pre-detailed routing stage, in contrast to the previous approaches which are applied in the post-detailed routing stage. Our new approach enjoys a larger optimization solution space than the previous approaches whose solution space is highly limited by routed geometric constraints. We would like to thank Professor Andrew Kahng of UCLA for his helpful opinions on interconnection delay analysis.
-5
E (3.4%)
7 Conclusion
8 ACKNOWLEDGEMENTS
AR1.0
A (1.7%)
ratio increases and when the wire density decreases.
-9.7 -12.6 -12.8 -13.7 -11.8 -12.5 -16.4 -14.2 -14.8 2.7
Table 4: path delay reduction for MCNC benchmark circuits results demonstrate the improvement increases when wire aspect
[1] H. Zhou and D.F. Wong, “An Optimal Algorithm for River Routing with Crosstalk Constraints,” 1996 International Conference on Computer-Aided Design, pp. 310-15, 1996. [2] K, Jhang, S. Ha, and C. S. Jhon, “COP: A Crosstalk Optimizer for Gridded Channel Routing,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 424-9, Vol. 15, N. 5, April 1996. [3] Tong Gao and C. L. Liu, “Minimum Crosstalk Switchbox Routing,” 1994 International Conference on Computer-Aided Design, pp. 610-15, 1994. [4] T. Xue, E. S. Kuh and D. Wang, “Post Global Routing Crosstalk Risk Estimation and Reduction,” 1996 International Conference on Computer-Aided Design, pp. 302-9, 1996. [5] K. Chaudhary, A. Onozawa, and E. S. Kuh, “A Spacing Algorithm for Performance Enhancement and Cross-talk Reduction,” 1993 International Conference on Computer-Aided Design, pp. 697-702, 1993. [6] W.C. Kao and T.M. Parng, “Cross Point Assignment with Global Rerouting for General-Architecture Designs,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 337-348, Vol. 14, No. 3, March 1995. [7] D. A. Kirkpatrick and A. L. Sangiovanni-Vincentelli, “Digital Sensitivity: Predicting Signal Interaction using Functional Analysis,” 1996 International Conference on ComputerAided Design, pp. 536-541, 1996. [8] W. Swartz and C. Sechen, “New Alogrithms for the Placement and Routing of Macro Cells,” Digest of Technical Papers of 1990 IEEE/ACM International Conference on Computer-Aided Design, pp. 336-9, 1990 [9] L. C. E. Liu and C. Sechen, “Multi-layer Chip-level Global Routing Using an Efficient Graph-based Steiner Tree Heuristic,” Proceedings. European Design and Test Conference, pp. 311-18, ED & TC 97, 1997 [10] H. P. Tseng, “Detailed Routing Algorithms for VLSI Circuits,” Ph.D. Thesis, University of Washington, Seattle, 1997.