A Practical All-Path Timing-Driven Place and Route ...

2 downloads 0 Views 463KB Size Report
A Practical All-Path Timing-Driven. Place and Route Design System. Chwen-Cher Chang, James Lee,. Mike Stabenfeldt, and Ren-Song Tsay. ArcSys, Inc.
A Practical All-Path Timing-Driven Place and Route Design System Chwen-Cher Chang, James Lee, Mike Stabenfeldt, and Ren-Song Tsay ArcSys, Inc. Sunnyvale, California

Abstract

Traditional Approaches

We have developed a practical timing-driven design system tbat achieves 17% cycle time improvement with only up to 32% run time penalty, and as low as 1% area overhead in real, 5000 7000-cell designs. This system is based on a Slack Graph concept that efficiently and effectively represents all-path timing constraints and allows optimal trade-off between timing and die size.

The most bfficult aspect of timing-dnven physical design is how to bridge the gap between net-oriented physical design and path-oriented timing analysis (5). Existing approaches are either net-oriented or path-oriented.

-

Introduction

Emerging new applications such as interactive multimedia and wireless communication demand record high-speed computing power. The demand for faster processing is dnving IC designers to pursue significantly better IC chip performance (100 MHz to 1 GHz). IC foundries are gradually advancing toward half-micron processes to achieve higher packing density and lower manufacturing cost. With the trend toward half-micron technology, designers realize that interconnect delay is dominating gate delay and is thereby becoming a limiting factor for chip performance (as measured in terms of cycle time). Because placement and routing determine the interconnect topology and final connections (and, therefore, the concerned interconnect delays), physical design significantly affects IC performance issues such as long and short path delay, clock skew, and cross-tallc effects. Rather than conventional net-constraint and path-constraint based approaches, we introduce a design methodology that uses a Slack Graph to represent timing constraints. This Slack Graph covers all timing paths and enables simultaneous optimization of both die-size and all-path timing constraints in doing physical placement and routing. Slack Graph is used as a generic interface between the timing verifier and physical designs. Unlike conventional approaches, our timing-driven design methodology accommodates practical concerns such as distributed parasitics, pin-to-pin RC delay, loading effect, and slew rate. This generic feature facilitates a powerful and useful all-path, timing-driven design methodology.

560

Net weighting is an intuitive net-oriented approach that imposes heavier weight (or higher priority) on nets with more negative slacks (that is, the amount of excessive timing resource), thereby turning a constrained optimization into an unconstrained one (5). This approach is efficient, but requires that you guess an appropriate weighting number, which may not result in the desired solution. The net-bound approach requires that you specify the length (or capacitance) bound of each net to guide layout designs (3). It normally requires little runtime overhead and can easily be integrated with traditional layout concerns such as wireability. The weakness of this approach is that “good” net constraints are difficult to construct. Often, the constraints are generated without physical design considerations; moreover, the net constraint may not be feasible. Also, the fixed net constraint prevents timing-resource sharing (that is, the portion of delays to be distributed in the physical layouts). Path constraints allow better resource sharing among nets on the same path. In (2), timing information is constantly reevaluated after each placement perturbation. The problem with this approach is that each placement perturbation can easily touch thousands of paths, which makes the computation extremely expensive. The approach used in (1) takes a smaller set of near critical paths passed from the front-end tool. But there is no guarantee that every path will meet timing constraints, even if the specified path constraints are met by the layout program. Data size can also be incredible when it includes all timing paths. For example, in our 5,OOO-cell design, there are over 245,000 path constraints, and the file size was estimated to be 240 mega bytes. Furthermore, without physical placement information, it is difficult for the front-end tool to correctly determine which set of paths is critical or near critical. Others have attempted to embed timing analysis in the placement optimization process (4). However, the delay model must be simple enough so that it can be translated into linear

12A.6.1

0-7803-2440-4/94/ $4.00 @ 1994 IEEE

constraints; this simplification may not be practical in real designs. This approach is suitable only for global placement and is difficult to apply to detailed placement and routing algorithms. In response to the shortcomings of previous approaches, we propose a new solution, which uses a Slack Graph for simultaneous optimization of timing constraints and physical designs. Unlike the exponential complexity in (1) and (2), this new methodology efficiently covers all paths and effectively optimizes resource sharing. The runtime overhead is only up eo 32%, and area overhead is as low as 1%. Slack Graph Approach

Slack Graph is designed to bridge the gap between path-oriented timing verification and net-oriented place and route and is, in fact, a snap-shot of the timing-verification result. For a given design netlist, we can form a directed graph where each node represents a pin, and each directional edge represents the electrical connection from a source pin to a receiving pin. After delay calculation, we bind the intrinsic delay or RC delay (transition plus interconnect delays) to the internal cell edge or free net edge. With specified setup and hold time rules for flip-flops, latches, target clock bycle time, external data arrival time at primary inputs, and required arrival time at primary outputs, we can perform thing verification to derive arrival time and required arrival time at each node. Then, nodal slack of node i is defined as the difference between required arrival time and arrival time. The edge slack of edge (ij)is defined as the difference between the required arrival time at node j , and the sum of the arrival time at node i and edge RC delay. Then we construct the Slack Graph. This is a simplified version of Slack Graph pocessing; in practice, it can be more elaborate and can include both Early and Late mode slacks.

The commonly used path constraints can be enumerated from this SLack Graph. Thus, the graph contains complete timing infonnation using much less disk space, with its size proportional to the number of pins. Therefore, it efficiently and effectively represents all-path thing constraints. A Slack Graph contains both the timing and connectivity information used for layout. The graph edge is derived from the netlist, and it tells which pin of a particular cell is connected to which pin of another cell. Based on this fact, we have developed an optimization process that globally considers slack (tirning criticality), sensitivity (delta delay / delta length), and connectivity (ease in changing topology). This process respects existing layout results and suggests a minimum effort perturbation of the current result to a better timing solution. The new layout result is fed back to construct a new set of slack numbers on the same graph. Thus, the Slack Graph optimizer and layout optimizer alternatively guide the global optimization process, which lets you make the optimal trade-off decision between timing and layout. System Architecture

Input Data Loading capacitance, driving resistance, intrinsic delays, setup, hold time rules, and so forth, must be set on each pin and cell. It c;in be specified with the ArcSys proprietog format. In addition, it can also be derived from Synopsys cell library forma,t. To construct a Slack Graph, you need to specify clock cycle time, and arrival time and required arrival time on primary inputs and outputs.

I€ you want 1.0 generate timing constraints, we currently can accept path or net constraints in Standard Delay Format (SDF). Otherwise, you can specify constraints in Slack Graph format.

Parasitic Exlraction One advantage of the Slack Graph is that it is a generic timing interface. Slack Graph construction is independent of the timing model and the timing verifier used (see Fig. 1).

"4

ACSYS

In addition to per unit length resistance and per unit area capacitance, we allso consider side-wall capacitance and coupling capacitance. As technology advances, side-wall capacitance is significant in terms of interconnect delay. Side-wall capacitance is automatically calculated if wire thickness, height to substrate, and wire width are given. Coupling capacitanceis also a major contributor to interconnect delay in the new technology. Sometimes, it exceeds 10 times the plalte capacitance. Currently, we extract coupling capacitance of neighboring wires on the same layer.

Delay Calculation

FIGURE 1. Slack Graph is a generic timing interface.

Before routing, in the placement phase, we either build a single trunk Steiner tree or spanning tree for pin-to-pin RC delay estimation. After routing, we trace the interconnect path and extract the RC parasitic of each interconnection segment, via and pin. Using these parasitics, we consmct a

12A.6.2

561

distributed RC network and calculate Elmore delay. We will use more accurate delay calculations in the future. We providing a dynamic linking capability so you can integrate your own cell delay model and interconnect delay calculation to achieve desired accuracy.

Timing Verijier A Slack Graph requires a timing library that models a cell’s driving resistance, slope factor, no-load IO path delay, and the storage cell’s pin types and constraints. Based on your clock and system timing specifications, we verify the timing constraints of storage cells (IT, Latch, RA?? and ROM), such as setup time, hold time, recovery time and minimum clock pulse width.

In this configuration, ArcSys will automatically perform timing optimization. Users only need to provide a timing library (gate delay, driving resistance, load capacitance, and so forth, for each cell). Then after the user specifies the target cycle time and the arrival time and required arrival time on IO pads, the ArcSys Timing Verifier (ArcTV) will automatically recognize registers and clock networks, perform timing analysis and generate a slack graph for ArcSys timing-driven optimization. In Fig. 2, we have ArcSG to represent the slack graph generated by ArcSys tools, and ArcDC to represent the Arcsys Delay Calculator.

Placement and Routing The liming cost is calculated from the difference between existing layout pin-to-pin RC delay and the timing constraint derived from the Slack Graph optimizer. Then the total timing cost is dynamically scaled by a “timing weight” to make it competitive to wire length and congestion cost (the traditional layout cost). The higher the timing weight, the more probable that the design will meet the timing requirement. Another scheme is to use “timing tightness” to discount (tighten) a certain amount of cycle time from the given timing constraints during layout. The larger the tightness, the faster the design’s cycle time. You can use this scheme to push she design to its speed limit.

Timing Report After each layout design, you can use the system and easily generate a timing report, which includes net delay (slope delay, transition delay, and interconnect delay), path delay, and clock skew report (multiphase and multilevel). Finally, a design verifier reports any timing violation by checking all paths on the Slack Graph.

DONE! FIGURE 2. A r d y s integrated timing solution.

Total User Control If you have your own timing verifier and specific delay calculator, then only the basic parasitic parameters such as per unit resistance, per unit capacitance, and so forth, are necessary to drive ArcSys timing optimization. At each iteration, the ArcSys Parasitic Extractor (ArcPE) extracts interconnect R and C parameters and backannotates to the User Delay Calculator (UDC), which outputs to the User T i i n g Verifier (UTV) for qualification and slack-graph generation. In some cases, ArcPE can be replaced by a User Parasitic Extractor W E ) with no impact to the design flow (Fig. 3).

Data Export To backannotate to front-end tools and other utilities, we also genkrate an interface file in SDF, SPF, SPICE, and Synopsys formats. System Configurations

Because Slack Graph is a generic timing interface, it can be generated either by an internal or external timing verifier, thus providing flexibility. Based on the user’s design setup and requirements, we can construct appropriate design flows, as illustrated in the following sections.

ArcSys Integrated Timing-Driven System

DONE! FIGURE 3. User controls all timing aspects

The main advantage for this configuration is that you have total control of the timing aspect. You do not need to rely on the ArcSys delay or timing verification model. However, the out-loop iterations requires longer runtime.

12A.6.3

Benchmark Results

The proposed methodology has been successfully applied to many real designs. Here, we use the results of three examples to demonstrate the effectiveness of this approach.

connect delays; that is, the slack is already -5.5 ns without considering any interconnect delays (the zero interconnect slacks are +2.645 ns and +5.101 ns for design SCA and SMB). TABLE 8.

Design SCA : The target cycle time of this design is 10 ns (100 MHz). The zero-interconnectdelay is 7.355 ns. Hence, the allowable interconnect delay is approximately 26% (2.645 ns/lQns). It has about 5,000 cells, 13,769 nodes, and 16,789 edges created for the Slack Graph to cover all paths. TABLE 4.

slack (m)

frequency (MHz)

runtime overhead

area overhead

0 1 .o

-0.647

00.0%

0.766

93.9 105.4 108.3

0.0% 0.0% 0.1%

0.890

109.8

1.197

113.6

TABLE 5.

0.509

30.7% 31.5% 32.1% 31.3%

slack

1.0 1.1

1.8 2.0

frequency (MHz)

area overhead

0.316 0.727 1.295 1.449

103.3 107.8 114.9 117.0

1.4% 2.8% 4.8% 9.1%

interconnect delay is 51% of the cycle time. The design size is smaller: 2315 nodes and 2592 edges on the Slack Graph.

frequency (MHz)

runtime overhead

area

0

-11.63 0.219 0.536 0.734 0.885

46.2 102.2 105.7 107.9 109.7

0.0% 18.5% 18.3% 19.1% 18.1%

0.0%

1 .o 1.1 1.8 2.0 TABLE 7.

Results of design SMB with different timing tightness slack (ns)

frequency (MW

area overhead

1 .o

0.484 0.575 0.876 1.753

105.1 106.1 109.6 121.3

4.7% 7.1% 7.3% 14.3%

1.3 1.4 1.7

0.0% 2.3% 4.8% 11.9% 12.8%

(1) A. H. Chao, E. M. Nequist, and T. D. Vuong, “Direct Solution of Performance Constraints During Placement,” IEEE 1990 Custom Integrated Circuit* Confermce, pp. 27.2.1-27.2.4. (2) W.E. Donath, et. al., ‘“Iiming-Driven Placement Using Complete Path Delays,’’Prm. 27nd Design Automahon Conference, 1990, pp. 84-89. (3) Ravi Nair, C. I,. Beman, P. S . Hauge, and E, J. Yoffa, “Generation of Performance Constraints of Layout,”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and System, CAD-8:860870,1989 (4) A. Snnivasan, K. Chaudhary, and E. S . Kuh, “RITUAL A Performance Driven Placement Algorithm for Sinall Cell ICs.” Pmc. IEEE Intn’l Conference on Computer-Aided Design, 1991, pp. 48-51. (5) S. Teig, R. L. Smith, and J. Seaton, “Timing-Driven Layout of CellBased IC’s”,D’esignAutomation Guide, 1987, pp. 94-101.

4.0% 4.8% 5.1% 5.8%

tightness

0.0%

References

overhead

timing

9.2% 8.7% 9.5% 9.1%

Besides the timing-driven optimization of conventional placement and routing, it is important to consider timing effects in early design phases such as system partitioning and early floorplanning. Cell substitution (“In-Place” optimization), timing-driven synthesis, and zero-skew clock tree designs contribute to a complete timing-driven design solution. The system architecture and optimization process described in this; paper are under a pending patent.

Results of design SMB with different timing weights slack (ns)

-16.153 -9.990 -9.4% -8.744 -8.337

area

overhead

Complete Timing-Driven Solution

Design SMB : The target cycle time is 10 ns (100 MHz), and zero interconnect delay of this design is only 4.899 ns. Thus,

timing weight

runtime overhead

We have also attempted to generate an all-path constraint list using an SDF file. With reasonable runtime (20 CPU hours) and 40.8 MB disk size, we can only list 25% of all paths (61W245K paths). The equivalent constraints, as represented in Slack Graph format, use 1.8 MB in ASCII format and only 0.3 MB in binary database format.

1.1% 1.6%

(ns)

slack (m)

SDF Path Constraint Size Comparison

Results of design SCA with different timing tightness

timing tightness

TABLE 6.

0 1 .o 2.0 3.0 4.0 --

Results of design SCA with different timing weights

timing weight

1.2 1.5 2.0

timing weight

Results of design SKD with different timing weights

Design SKD : We imposed unreasonably tight constraint values to the design (-7000 cells) to push its speed to the limit and to observe the robustness and performance of our system. The constraint value given is 5.5 ns over the zero inter-

12A.6.4

563