accurate delay estimation prior to physical design helps to shorten the design ..... The complete slicing tree is built by recursively partitioning the structural ...
Combined Topological and Functionality Based Delay Estimation Using a Layout-Driven Approach for High Level Applications Champaka Ramachandran, Member, IEEE, and Fadi J. Kurdahi, Member, IEEE
Abstract — We discuss the problem of accurate delay estimation of cell-based designs, prior to any physical design tasks. For this purpose, we require accurate wire-length estimates, since wire delays contribute significantly to the overall delay. We present a new technique for wire-length estimation based on a combination of analytical and constructive approaches. Given these wire-length estimates and the cell delays, it is possible to provide worst case delay paths in the design based on the circuit topology. We have also extended our technique to consider false paths which provides more a accurate functionality based estimate which takes into account the estimated layout information. We validate our technique using the standard MCNC benchmarks. Our results indicate an average 7% accuracy in the worst case delay predictions for designs with up to about 2800 cells.
I
Introduction
N THIS paper, we describe an approach for early and ac-
I curate layout-driven design timing prediction. Delay and
timing analyses are generally performed prior to physical design and also during post layout phase as shown in Figure 1, with the aim or predicting performance. However, pre-layout timing information is not very accurate since it does not include wire delay which generally forms a significant part of the overall delay in sub-micron designs that have long wires. This is especially true in standard cell designs, where wires could span several rows of cells. The wire delay could greatly affect the determination of critical paths in such designs. New sub-micron technologies with smaller feature sizes can only make the wiring effect more significant and, in many cases, dominant in both area and delay. On the other hand, post-Layout timing analysis is very expensive because of two reasons: performing parasitic extraction for layout information is a time consuming process. Also, if the layout does not meet timing specification, one has to re-evaluate the physical design, which is itself very expensive. Hence, we need a technique where one can estimate the post-layout effects prior to performing the costly process of physical design. Our approach estimates layout wire length before the costly process of physical design is performed. The estimated layout wire length is used to account for the delays that are introduced by the wires. As depicted in Figure 1, the new technique of This work was supported by NSF Grant # MIP-8909677 and by CAL-MICRO Grant # 91-080.
accurate delay estimation prior to physical design helps to shorten the design turn-around time. This proposed approach has been implemented in the TELE (Timing Evaluator using Layout Estimation) tool. TELE ’s main target application is in evaluating design tradeoffs during high-level tasks such as synthesis and systemlevel partitioning. When such high level design tasks are performed, a large number of candidate design configurations are generated and must be evaluated. In order to accurately assess the performance tradeoffs involved in each optimization step, it is crucial that the evaluation itself be as accurate as possible. Thus, the evaluation scheme must be fast in addition to being accurate. Most previous evaluation approaches are either fast or accurate, but not both. TELE is a fast, accurate design timing evaluation tool. It takes as input a high level structural design description of a circuit and estimates the clock period of the synchronous design as well as the worst case signal propagation delay time of the circuit in a runtime-efficient manner. The prediction approach relies on a combination of analytical and constructive (top-down) models of layout and timing. With such an approach, it is possible to predict circuit delays for chips having a mixture of design styles. Also, by controlling the relative contribution of the analytical and the constructive models to the prediction results, it is possible to tradeoff accuracy of the estimate versus computation time to obtain the estimates. TELE is implemented to run both as a topological delay estimator which computes the delay based solely on the cell connectivity information, and as a functionality based estimator which computes the evaluation vectors at the inputs of every cell based on the functionality of that cell. This helps eliminate reporting the delay of most false paths as worst case delays and hence improves the accuracy of the delay estimate. We must emphasize here that our goal is not to provide a detailed and extensive timing analysis and timing verification of a given circuit. Instead, our goal is to provide a more finetuned guidance to the various synthesis tasks. Furthermore, we believe that the novelty in this work lies in the accurate accounting of the layout effects during every phase of the delay prediction. This approach provides a more realistic assessment of the actual circuit performance when compared to traditional early design evaluation schemes. Although TELE is primarily intended for high level design applications, and is not currently tuned to estimate timingoptimized layouts, it exhibits enough accuracy and flexibility to be also usable for hierarchical physical-level applica-
Current Approaches
Proposed Approach
High level applications
High level applications
Delay Estimation
Logic synthesis
Delay OK?
Circuit simulation
Logic synthesis
Physical Design
Delay Estimation
Layout Extraction
Post Layout Simulation
Delay OK?
Circuit simulation
Delay OK?
Physical Design
Figure 1: The importance of early delay estimation tions such as timing driven floor planning. In this mode, it can be used to evaluate the delays of the major design subcomponents as a preprocessing step to the floor planning task itself. This paper is organized as follows: Section 2 provides some background material and surveys previous related work on timing prediction and wire length estimation. Section 3 outlines the approach to wiring delay prediction and the overall timing model which is used to predict the overall circuit timing. Algorithms and heuristics for both topological and functionality based delay prediction are discussed in Section 4. Section 5 presents and analyzes the experimental results which serve to validate the proposed approach. Conclusions are drawn in Section 6.
II
Background
A. Timing Prediction Timing prediction techniques of digital circuits can be divided into two categories. Static timing analysis is concerned with estimating an upper bound on the worst case circuit input-tooutput delay(s) by identifying the (or longest input-to-output) critical path(s) in that circuit. An example of static timing analysis tools is Crystal [1]. By contrast, false path based timing analysis uses logic behavior as well as the timing characteristics of the input design, and is further discussed in Section B.. All the above approaches do not accurately consider the wiring effects on delay values prior to placement and routing. Some of the recent work on accurate timing prediction has been closely coupled with timing-driven layout (placement and routing). In [2], timing prediction schemes were used as part of an interleaved placement-analysis iterative loop. Other similar approaches are described in [3] [4], [5], and [6]. The work in [7] describes a method to predict the timing con-
straints that should be applied to the individual components of a design as a pre-processing step for timing-driven layout. An excellent survey of the field is presented in [8]. More recently, the work presented in [9] describes a layoutdriven technology mapping approach which accounts for wiring delay incrementally. Another interesting tool for timing prediction is PEPPER [10] which evaluates a candidate partitioning of logic that is to be implemented on a high speed Multi-Chip Module (MCM). PEPPER performs approximate placement and rough global routing to obtain some reasonable estimates of the circuit’s performance. Our tool has essentially a similar purpose as PEPPER, but is oriented towards chip level designs and is to be used with high level synthesis tools. In this case, no a priori partitioning is assumed since the tool would be used before any physical design tasks are invoked. The work on 3-D scheduling presented in [11] is an attempt to couple floor planning and high level synthesis together, thus accounting for wiring delays when making high level design decisions. The work mainly concentrated on the module assignment phase of high level synthesis and further accentuated the need for accurate timing prediction during high level synthesis. Another similar work was reported in [12] in which local transformations are applied simultaneously at the high level and physical level in the form of a feedback driven system. This work also concluded that the effect of wiring on both area and performance was significant and cannot be ignored when high level design decisions are made. B. Wire length estimation During the placement phase, we need to estimate the wire lengths of the nets in the design since this is usually used to evaluate the current candidate placement. This is sometimes called post-placement estimation for which there exist various schemes described in [13]. More importantly for us is the preplacement wire length estimation since our timing estimator is invoked prior to placement and routing. Pre-placement estimation of wire length has been studied by several researchers in [14] [15] [16] where relationships were derived for average wire length estimation based on different models of placement and routing. These models assume the well-known empirical relationship called Rent’s rule [17]. Rent’s rule has been found to apply in well-partitioned (and placed) circuits between the number of components in a subcircuit and the number of external terminals of that subcircuit. Quantitatively, this rule exhibits itself as a power law relation between the number of gates, C , in a partition of a circuit, and the number of external terminals, T , of the partition and is written as T = AC p , where A was empirically found to be close to the average number of terminals (pins) per gate, and p is Rent’s exponent which lies, in practice, between 0.47 and 0.75 [17] and depends upon the structure and type of the circuit, and the partitioning algorithm. The relationship proposed by Donath [18] is derived from a model which assumes a hierarchical partitioning and placement of logic on a square array. An expression for the average wire length, r, is derived as a function of the number of gates in the circuit. Another, more accurate model, developed by Feuer [15] is similar to Donath’s and is further discussed in Section A.1. We use this model to predict wire lengths in closely coupled
leaf clusters. In [16], Sastry assumes a continuous model of logic. He shows that, if Rent’s rule applies, then the wire length distribution is expected to be of the Weibull family of distributions. [19] is an extension of Sastry’s work. A model for wire length estimation was reported in [20]. Here, an input netlist is analyzed and nets classified according to their fanout. The average wire length for each class is then estimated and an estimate of total wire length is produced. Good agreement with actual wire length data was reported for small to medium size designs. Because of the procedural nature of this model, the runtime increases rapidly with net fanout. Thus, it can be used at the leaf level in our methodology to produce accurate wire length estimates at the expense of an increased runtime.
III
X X j
k
The above equation is a first order approximation of the actual delay equation. We can use Equation 1 to obtain the delay of a connecting wire between two cells. In CMOS technology we model a cell as having input capacitance (Cin) and output resistance (Rout). The well-known -model is used to model the connecting wire having capacitance Cw and resistance Rw . Assuming constant width wires, the capacitance and the resistance can be approximated to be each directly proportional to the length of the wires forming the net. Hence,
Rw = Kr Lw (2) Cw = Kc Lw The propagation delay, tp (n), through a wire n connecting the output of (celli ) and driving load cells (cellj , 1 j n) can be computed as
tp (n) = (Rout celli
)
+ Rw )(Cw +
Xn Cin j =1
(cellj )
):
(3)
Thus, the delay for signals to propagate from the input of
celli , through a wire n, to one of celli ’s driven-cells, cellj , is tp (celli ; n; cellj ) = tp (celli ) + tp (n): (4) where tp (celli ) is the internal delay of cell celli .
This model can be generalized to account for the total delay through a path P (I; O) in a circuit starting from input pin I to output pin O by adding all the cell to cell delays and the net delays along P (I; O) as follows:
Rout
Out1
t p_cell
Cell B
Cell D C in
In1
t p_cell
Rout
R
w
C
w
C in t p_cell
Rout
Cell C Rout
C in t p_cell
Figure 2: Timing Model
The Timing model
These factors include the propagation delay of the cell tp (celli ), the output resistance of the cell, Rout, the input capacitance of the cell Cin and the wire delay. Figure 2 depicts our timing model. The lumped RC model, also called the Elmore delay model [21], is widely used for delay calculation. In this model, the propagation delay along a path from the start point to the end point (tp (start,end)) is computed as a product of lumping all of the resistances Rj and capacitances Ck along the path, that is, tp (start; end) = Rj Ck : (1)
(
Cell A C in
tp (P (I; O)) =
X
allcells k2P (I;O)
tp (cellk ) +
X
tp (n)
allnets n2P (I;O)
(5) We can observe from the above equations that the propagation delay of the cell tp (celli ), the output resistance of the cell, Rout, the input capacitance of the cell Cin can be obtained from the library characterization process. The values Kr and Kc are dependent on the technology. Hence, we need to determine the length of the net Lw to determine the total delay on the path, P (I; O). Since, the length of the nets is not known prior to physical design, one could only estimate the length of the nets. The next section on wiring estimation describes the techniques for estimating the length of the nets in a design. A. Wire Length estimates Pre-layout wire length estimation can be done using three techniques: analytical estimation, constructive estimation, and partial slicing technique. A.1 Analytical Wire Length Estimation One can classify analytical wire length estimators into two classes : Standard cell wire length estimators and regular structures wire length estimators. Thus, one of these models can be invoked depending on the design style of the leaf level cluster. Due to the inherent regularity in the layout model, the latter wiring estimation model can be easily derived and represented accurately in closed form expressions. Of more interest here is the random logic Standard cell wire length estimation, which is discussed below. Feuer’s model [15] is aimed at relating Rent’s rule (See Section B.) to the distribution of wire lengths on a chip. Using a continuous model, he defines a partitioning function, I (R), as the number of connections born inside and terminating outside a ’circle’1 of radius R. Clearly, this partitioning function is another name for Rent’s rule. He then proceeds to derive the wire length distribution from I (R) by integrating an infinitesimal strip around the circle over the whole plane. The wire length distribution is found to be of the form q(r) r?2(2?p) 1 Feuer assumes a ‘Manhattan’ type model, where only two directions, x and y are permitted, so a circle is really a diamond of diagonal 2R
and the average interconnection length is found to be:
C p?1=2 r = p2 2p(3 + 2p) (1 + 2p)(2 + 2p) (1 + C p?1)
C
A
D
B
Root
(6)
where r is the average wire length in gate length units, . The predictions of Equation 6 are compared to the experimental data on real chips, assuming a ’default’ constant value of p = 2=3, and the results show a close agreement in gate array layouts. This indicates that the formula may be useful in practice. However, we found that p = 0:7 was a better estimate of Rent’s exponent in standard cell layouts. More accurate estimates of p can be obtained by recursively partitioning the design and fitting the partitioning data to Rent’s rule. Feuer’s model was implemented within SCALE (Standard Cell Area and Wire Length Estimator), which is based on the PLEST model, described in [22]. If no partitioning data is available to accurately estimate p, we use the default value of p = 0:7 in SCALE. SCALE is one of the analytical models used within TELE. Our experience with SCALE and other similar analytical models indicates that they are practically usable only in small designs, where most of the nets have a small fanout (the mode assumes a fanout of two). Otherwise, the model’s accuracy tends to suffer. This is the main reason why we explore the constructive model presented in the following section. A.2 Constructive Wire Length Estimation We use the model described in [23] for constructive wire length estimation. The model is based on a slicing structure derived from Zimmerman’s [24] constructive area estimation technique and is used within our Layout Area and Shape function Estimator (LAST). A slicing geometry is used to represent a structural description of a design in the form of a slicing tree. The complete slicing tree is built by recursively partitioning the structural description till each partition contains only one cell. Shape function computation determines the possible aspect ratios for the entire design. By buildinga slicing tree, the approximate positions of cells in the physical system can be localized and thus, this helps to better approximate the area and topology of the design. Once the area of the design is estimated, and the location of the cells are approximately known, the worst case wire length of a net can be determined by estimating the length of the spanning tree formed by the cells connected to the net. However, a fully constructive estimation process is almost as expensive as the physical design itself. This implies that, although it is accurate, such a constructive estimation approach is not feasible for large designs, especially when it is to be invoked during high level design tasks. It is for this reason that we propose the partial slicing technique which embodies both constructive and analytical approaches. A.3 Wire Length Estimation by Partial Slicing In the partial slicing technique, constructive estimation model is first applied to build the slicing tree. Unlike the fully constructive technique where the design is sliced down to the most primitive components, the design here is sliced only to a limited number of levels as shown in Figure 3. We will call the leaves of this partial slicing tree the leaf clusters. The analytical area and shape function estimators (such as
C
A
D
B
Leaf Clusters
C
D
A
B
Figure 3: Partial Slicing Tree SCALE) are applied on the leaves to provide shape function estimates for the leaf clusters. Thus, the shape function of the entire design is obtained by traversing the tree from the leaf clusters to the root of the partial slicing tree and composing the shape function of all the intermediate nodes. The same partial slicing technique is implemented for area estimation in LAST [23] where more details on the model and the composition technique can be found. Once the shape function for the root node is computed, the location of the leaf clusters can be approximated. This information helps to evaluate the layout net lengths. As can be seen from Figure 4, a net in a design could be made up of two components. It can have connections inside leaf clusters and it could connect components residing in two different leaf clusters. We will estimate the length of the net inside leaf cluster using analytical estimates and the length of the net between clusters using constructive estimates. The length of a net inside the leaf cluster can be estimated as the length of the spanning tree connecting the cells inside the cluster. The analytical estimator provides the information about the average wire length of any two point net in a leaf cluster. The average wire length value is used to weigh the edges of the intra cluster spanning tree. The accuracy of the analytical estimator can be improved by obtaining more accurate estimates of Rent’s exponent p as described in Section A.1. This however requires several levels of partitioning in order to obtain enough points to achieve a reasonable accuracy. Otherwise, the model uses the default value of p. The length of the net between leaf clusters can be estimated as the length of the spanning tree that connects the leaf clusters. The distance between the centers of the leaf clusters is used to weigh the edges of the inter cluster spanning tree. B. Delay Estimation Once the net lengths are estimated, we can estimate the circuit delay through any input-output path by applying a simple delay model which account for both cell and wiring delays. Let us again examine Figures 2 and 4. We need to determine the delay between input signal In1 and output signal Out1, call
Leaf cluster
Leaf cluster
Topological Estimation Algorithm Build Partial slicing tree Run Analytical estimators on Leaf clusters Compose shape function of design
Inter cluster wires
For every point on shape function
Slice line
For all nets For all partitions that a net spans across evaluate average wire length in partition Evaluate length of MST connecting the partitions Add above constructive and analytical components
Intra cluster wires
End For For all input ports on design Perform breadth first search Store K worst paths in list End For
Leaf cluster
Leaf cluster Slice line
Figure 4: Estimating Wire Length it Delay(In1; Out1). We define the following variables in our model:
delay(k): pin to pin delay of a cell k. Cin : The input Capacitance of the cell. Rout : The output Resistance of the cell. Cwv ,Rwv : capacitance and resistance per unit length of
vertical wire, respectively. Cwh , Rwh : capacitance and resistance per unit length of horizontal wire, respectively. nl(i; n) : the number of cells belonging to leaf cluster i and connected to net n. Awl(i) : the average wire length in leaf cluster i. Awl(i) is estimated using Equation 6. Lc (n) : the constructive component of the estimated wire length, Lcx (n) and Lcy (n) are its x and y components, respectively. La (n) : the analytical component of the estimated wire length Lax (n) and Lay (n) are its x and y components, respectively. asp(i) : the aspect ratio of leaf cluster i. N (n) : the number of leaf clusters where net n is connected. MST (cluster1 ; cluster2; ; clusterN (n) ) : the length of the minimum spanning tree connecting clusters 1 through N (n) by their center points and MSTx and MSTy are its x and y components, respectively. Path(In1; Out1) : the longest path connecting the input pin In1 and output pin Out1. Where Path(In1; Out1) is determined using the algorithm described in Section IV. Given the above variables, we compute the various wire length components as follows:
Lcx (n) Lcy (n)
= =
Lax (n)
=
MSTx (cluster1 ; cluster2 ; ; clusterN ) MSTy (cluster1 ; cluster2 ; ; clusterN )
X nl i ?
N (n) i=1
( ()
1) Awl(i) (1 ? asp(i))
Figure 5: Topological Estimation Algorithm
Lay (n)
=
X nl i ?
N (n)
( ()
1) Awl(i) asp(i)
i=1 Lx (n) = Lcx (n) + Lax (n) Ly (n) = Lcy (n) + Lay (n)
Thus, the delay between pins In1 Delay(In1; Out1) is computed as follows:
+
X
Delay(In1; Out1) =
and
X
allcells k
Out1,
or
delay(k)
k2Path(In1;Out1)
allnets n
[(Cwh Lx (n) + Cin) (Rwh Lx (n) + Rout )
n2Path(In1;Out1)
+ (Cwv Ly (n) + Cin) (Rwv Ly (n) + Rout)]
IV
The Timing estimation algorithms
Our estimation paradigm consists of two approaches to delay estimation: the topological estimation and the functionality based estimation. We will discuss these two approaches in this section. A. Topological estimation The topological estimation algorithm computes the delay between an input and an output port of a combinational circuit consisting of logic gates. It can also compute register to register delays in a structural netlist consisting of RT-level components. In this mode, the cells and/or combinational blocks are treated as black boxes with known delay characteristics as defined in Section III.B.. Nothing is assumed about the specific function of each cell. This is essentially our topological estimation model. As can be seen from the algorithm shown in Figure 5, the circuit is traversed in a breadth first manner from the input ports to the output ports and the cell delay and the wire delays are used to evaluate the critical paths. The critical paths are
Paths ordered by delay length
I0 I1
A delay
I2 B
I3
Path
False?
(I1−A) (I0−A) (I2−A) (I3−A) (I2−B) (I3−B)
yes yes yes yes no no
Figure 6: An example of a design with false path PI n
1
Cells with unique evaluation vectors
Cells with multiple evaluation vectors
1
n PI
m2
PO
1
0 0 PI Path
Sensitizing cells
1
1 Worst case path PO Path
Back propagation stops here
Sensitizing cells
m1
PO
Figure 7: Worst case path evaluation those combinations of input output port pairs that yield the highest delays. It is not enough just to keep track of the worst path, but K worst critical paths which yield K worst delays, where K is a fixed parameter. The value of K could be provided by the user. If the user does not supply the value of K, but still requires multiple worst case paths to be calculated, the paths which do not differ by more than 1% of the worst case path would also be evaluated. This would surely mean an increased computation for the path calculation algorithm. We can observe here that the time complexity of one run of the topological estimator is O(n2) where n is the number of cells in the design. Let us assume that input port i and output port j lie on one of the critical path. In the topological delay estimation, we assume that when we toggle the signal on input port i, there exists an input stimulus such that output port j will toggle after certain delay dij . In the example shown in Figure 6, the path from I0 to A is a critical path. This implies that if the input signal on I0 is toggled, it takes a delay of dA0 for the output signal on A to toggle. But, in the actual case, output A never toggles when I0 is toggled irrespective of the signals on the other input ports. Thus, the topological timing estimator which was purely based on the circuit connectivity, reported the worst case delay on a false path. Thus, in actuality, the topological timing estimator reports a pessimistic or an upper bound on the worst case delay. In case a more accurate estimate of the worst case delays is required, an estimator which detects the false paths and reports more realistic delays is needed. This in effect is the functionality based estimator, described in the following section. B. Functionality based timing Estimation The false path problem is well known in the literature [25] and is quite important in practical design tasks. However,
deteciting false paths is an NP-complete problem [25] and an exact solution is expected to be computationally intractable [26]. Existing heuristics for solving this problem are still computationally expensive [27] both in runtime and memory requirements. Thus, these techniques are usually intended for post layout analysis of circuits or for highly accurate timing analysis of the final logic design (in which case a simplified model of logic and wiring is assumed [27]). On the other hand, our work is oriented towards early, accurate, and fast delay prediction and is intended to be used during the synthesis processes. During a typical synthesis process, a large number of candidate designs (or partially completed designs) are generated and must be evaluated. Therefore, it is crucial that our prediction techniques offer a reasonable compromise between accuracy and efficiency for the overall synthesis process to have an acceptable runtime performance. Based on these assumptions and constraints, our approach to solving functionality based delay estimation problem is to start from the worst case static path and reject as many false paths as possible until either a sensitizable path is found, or an upper bound on runtime is reached. In the first case, we have an (almost) exact estimate of the delay. In the latter case, we would have lowered the topological delay prediction to a value which is closer to the actual delay, thus resulting in a more accurate estimate. In our experimental procedure, we use the Topological Estimation algorithm shown in Figure 5 to evaluate K worst case delay values and extract the paths, the cells, nets and pin on the path that contribute to these delay values. We use an implementation of the D algorithm [28], to set value on the output pins and to propagate a cone backwards till the primary inputs are encountered. The D algorithm implementation was used since it is proven to be effective on circuits not having reconvergent fanout paths. The backward propagtion on our D algorithm could terminate on two accounts. Either there is a value conflict and a pin is being set to a non-unique set of values, or, the functional analysis on the cell could yield multiple evaluation vectors at the input of a cell for the same output value (Figure 7). If there is a value conflict at a pin, the propagation is discontinued and the path is determined as false. On the other hand, if the cell inputs yield multiple vectors, further traversal from that cell is discontinued. So, to summarize, due to our runtime efficiency constraints, we have not implemented a complete false path evaluation procedure. Instead, we have developed a procedure for a quick check on the path conflicts which was (as will be described in Section V) experimentally found to yield good results without incurring too much runtime. Hence, by utilizing the functionality of the cells, we have improved on topological delay values by lowering the estimated worst case delay value. The value predicted now more closely reflects the delay values obtained by performing a post-layout timing simulation of the design. C. Topological and Functionality based delay Estimator The topological and functionality based estimator are tightly interlinked to form an combined timing estimator. The topological estimator evaluates the K worst delays paths. These paths are checked by the functionality based estimator which also produces the corresponding input stimuli as a by-product. If all the K paths are false paths, then the topological estimator
Functionality based Estimation Alg For each path on kworst_path_list Perform breadth first search starting from ouput_port−>input_port If conflict found on a net report path as False Else If cell has multiple input vectors Then Stop traversal arising from cell /* report path as false */ Else Continue till inputs are reached End If /* conflict */ End For If kworst_path_list is empty, return none Else return longest non−false path from kworst_path_list
Figure 8: Functionality Based Estimation Algorithm is invoked to evaluate the next K worst paths and this process is continued till a valid worst case delay path is obtained, or until a user-specified maximum bound on the number of iterations between the two estimation procedures is reached. D. Limitations As discussed above, the inherent limitation to the functionality based estimation technique is that the evaluation vectors need to be computed at the input of the cells that are being traversed. This procedure is not trivial in designs having RTlevel components. In this case, only the topological estimation procedure is used. In a typical design having a mixture of RT level components (e.g. registers and ALUs) (which are accurately characterized apriori) and random logic (e.g. control path), both techniques are used: first the combined topological and functionality based estimation techniques are applied to provide accurate delay estimates of the random logic blocks. Next, the topological estimation techniques is used at the RT level to estimate the maximum register-to-register delay for the whole design. For gate level designs, a quick and upper bound delay estimate can easily be obtained by using only the topological estimator. The functionality based estimator could provide a more accurate and tighter upper bound estimates at extra runtime cost. An additional level of accuracy-runtime tradeoff can also be achieved by increasing the maximum number of iterations between the topological and functionality based prediction stages.
V Experimental Results TELE 2.0 was implemented in C on a SUN4 SPARCStation-1. Essentially, it is an improvement over the previous version, TELE 1.0 which only considered the topological timing estimation. We currently support the VHDL, BDNET, and VPNR input formats [29]. Given the circuit description, TELE can be estimates the worst case (i.e. longest) register-to-register delay as well as the maximum possible clock frequency. TELE also assumes a cell based design where the individual cells have been extensively simulated, characterized and their delay characteristics have already been accurately modeled and estimated. TELE can also interact with the GENUS Generic Component Database [30] to obtain metrics of pre-designed
components if they are part of the input circuit. Currently, TELE expects the user to provide K (number of longest paths to generate for each run of the topological analyzer) as input. Once TELE is run, it invokes the companion program, LAST [23] to build a partial slicing tree. Next, TELE uses the slicing tree information to approximate the nets’ topologies and their lengths. Then, the core estimation algorithms, namely the topological estimator and the functionality based estimator are invoked as outlined in Section IV. In summary, TELE provides the user with a comprehensive set of information on the circuit timing which can be used in a variety of synthesis tasks throughout the design process. MCNC design my adder C1908 alu4 C3540 C5315 b9 C1355 k2 C432 C6288 C7552 C2670
cells
nets
225 465 647 1217 1829 144 592 1271 174 2830 2003 973
258 498 661 1267 2007 185 633 1316 210 2862 2209 1130
aspect ratio 1.07 0.66 1.15 0.60 0.61 3.21 0.834 0.919 0.409 0.789 0.378 0.440
area (mm2 ) 1.04 2.83 4.88 9.10 1.51 0.87 3.86 1.96 1.21 2.23 2.36 7.27
estimated (mm2 ) 1.14 3.21 5.27 9.80 1.30 0.94 4.01 1.67 1.25 2.63 2.10 6.86
Table 1: Benchmark Circuit Characteristics A. Experimental Validation We ran TELE on several benchmarks mainly from the logic synthesis workshop described in [29]. We conducted our experiments on a range of design sizes. Table 1 shows the characteristics of the benchmarks. In each case, TELE was invoked to estimate the worst case delay(s) paths . All designs were then laid out using the GDT 3 CMOS Standard Cell technology. A number of aspect ratios were considered for each of the designs by varying the number of rows to the Auto Place and Route program. The aspect ratio range between 0.1 and 10 were considered. The tables provide only a single sample point of our experiments. As a by-product of TELE ’s partial slicing, we can obtain accurate estimates of layout dimensions. The estimated and the actual layout areas are shown in Table 1. The table also shows the times to layout and simulate each of the designs. The simulation time is for the set of evaluation vectors which resulted in a worst case delay occurring between any input-output pin pair2 . For each benchmark, the circuit was extracted from layout and back-annotated from the layouts and simulated using Mentor Graphics’ Lsim timing simulator in the accurate ADEPT mode [31]. The path delays produced by Lsim were then compared to TELE ’s predictions. Table 2 shows the results and the percentage relative error in the worst case delay estimation through each test circuit. Numbers in the third and fourth column show TELE ’s predictions in topological and 2 Currently, TELE assumes that the input and output pins are “floating” and determines the optimal location of each pin for maximum area efficiency.
MCNC design
slice level
my adder C1908 alu4 C3540 C5315 b9 C1355 k2 C432 C6288 C7522 C2670
3 3 5 5 5 1 3 4 2 5 4 3
delay estimated by TELE (nsec) Topolog. Funct. 227 217 216 211 244 213 340 318 368 320 88 85 277 266 772 746 346 332 512 497 397 374 373 361
delay from Lsim (nsec) 215 198 222 298 301 74 245 712 303 453 327 315
% error in delay estimate Topolog. Funct. 6 1 9 6 9 5 14 7 18 5 15 13 11 8 7 4 12 9 11 9 17 12 15 12
TELE runtime (sec) Topolog. Funct 9 12 25 57 32 218 73 720 118 262 10 22 28 256 200 870 18 32 200 1070 170 980 35 327
layout+ simulation time (secs) 960 1560 4200 8400 17040 1040 4100 10400 2010 86400 72000 14400
Table 2: TELE Run details functionality based estimation modes (both values include cell, net, and fanout effects). The benchmarking indicated that TELE was estimating with an average of 12% error and a maximum error of 18% in the topological estimation mode. In the functionality based mode, there was an average error of 7%, and a maximum error of 13% in the estimates. The experiments described above indicated that partial slicing with leaf clusters having roughly 50 cells each yielded the most accurate and runtime-efficient results. We noticed an increase in accuracy during the first few levels of partitioning until the cluster sizes are around 50-150 cells each. This was due to the fact that our intra-cluster wire length estimates were not based on actual cell locations inside the cluster, but on the average wire length factor whereas the inter-cluster estimates were based on the estimated physical location of the leaf clusters. So, we had to keep the clusters as small as possible to keep the intra-cluster wire length contribution to a minimum, at the same time we could not reduce our leaf clusters to less than 50 cells, as the analytical estimators performed well only with cluster sizes between 50-150, and also because of efficiency considerations. Hence, we arrived at an optimal cluster size of 50-150. The estimation runtimes for the circuits indicate that the functionality based estimation could be about 10 times more costly than the topological estimation. The difference increases for larger circuits because TELE has to go through more than one iteration between the topological and functionality based analyses. Also note that in comparing the TELE runtimes and the actual design runtimes, one has to keep in mind that (1) these runtimes include the CPU time needed to estimate the design layout topology and area depicted in Table 1, and (2) the simulation times as part of the last column are based on the evaluation of one set of input stimuli which exercise the worst case delay. TELE ’s functionality based estimates are about one to two orders of magnitude faster to obtain as compared to runtime of the actual physical design process. The main sources of error in TELE are: (1) using Minimum Spanning tree (MST) instead of Steiner tree to estimate the constructive net length components, (2) error in approximating the geometric locations of the MST leaf nodes, (3) errors in the analytical model due to using default values for Rent’s parameters, (4) the inherent limitations of
the analytical models, and (5) errors in the delay models for both cells and wiring. The main contributing factors to TELE ’s runtime are the slicing and critical path analysis times in the topological estimation phase and the conflict check in the functionality estimation phase. B. The effect of layout on delay We also used TELE to study the variations of circuit delay with the various possible layout configurations. For a given designs, the area estimation tool, LAST computes a wide range of layout estimates which constitute a shape function. For each one of these shape function points, we used TELE to estimate the worst case pin-to-pin delay. The results for some benchmarks are graphically depicted in Figures 9 and 10 below), and area versus layout aspect ratio for some examples from Table 1. We note that the variations in delay are significant, with the ratio of Min delay/Max delay being as large as 2 for the larger examples. The graphs also compare the topological and functionality based delay estimates. These two quantities are very close for smaller designs with the difference increasing for larger designs. We also note that the topological and functionalitybased delays track each other fairly closely. The data shown in the Percentage Path Change graphs (labeled “% path change”) in Figures 9 and 10 is useful to assess the magnitude of the layout effect on delay. Here, the percentage change in the critical path between two successive design points on the shape function is estimated. The change function is obtained by comparing the two paths as two graphs and extracting the difference between the two as a weighted function of the nodes and edges that differ between the two. A 100% difference is synonymous with a completely different critical path, including input and output pins. We observe that large changes in path are quite frequent in larger designs and somewhat less so in smaller designs such as my adder. This is a clear indication that: (1) the wiring delay and layout effects are significant and must be considered when performing both pre and post layout analysis, and (2) the top k-worst paths are quite close in delay values, so even small changes in layout (i.e. small changes in wire length) would perturb the ranking of the paths, thus resulting in frequent changes in the worst case (critical) path. The above two conclusions are especially
(b) C1908
(a) my_adder 4.0e+06
2.5e+06
Layout Area (sq. microns)
3.8e+06 2.0e+06
3.6e+06 3.4e+06 1.5e+06
3.2e+06 3.0e+06 100 80 60 40 20 0 400
Delay (ns)
280
Topological Delay Functionality-based Delay
1.2e+07 6.0e+06 1.1e+07 5.5e+06
5.0e+06 100 80 60 40 20 0 500
350
300 240
250
0.1
1.0 Aspect Ratio
10.0
100.0
200 0.0
1.0e+07 100 80 60 40 20 0 600
Topological Delay Functionality-based Delay
260
200 0.0
6.5e+06
Topological Delay Functionality-based Delay
220
1.3e+07
% path change
% path change
1.0e+06 100 80 60 40 20 0 300
(b) C3540
7.0e+06
Delay (ns)
Layout Area (sq. microns)
(a) alu4
0.1
1.0 10.0 Aspect Ratio
100.0
Topological Delay Functionality-based Delay
400
500
300
400
200
300
100 0.0
0.1
1.0 Aspect Ratio
10.0
100.0
200 0.0
0.1
1.0 10.0 Aspect Ratio
100.0
Figure 9: Area, Path-Change, and Delay Estimates versus Aspect Ratio for (a) My adder and (b) C1908
Figure 10: Area, Path-Change, and Delay Estimates versus Aspect Ratio for: (a) alu4 and (b) C3540
true for the larger designs. For two small designs (namely my adder and C1908 shown in Figure 9), we observe that the area and delay curves track each other fairly closely. We also observed that there is a minimal “plateau” around unity aspect ratio where both area and delay are small. The similar curves for the larger designs appear to be more random in variations, but there is a clear tendency for both delay and area to increase outside the range of aspect ratios shown on these curves. One possible reason for the area and delay curves tracking each other is that the same slicing methodology is used to estimate area and wire length, thus, both estimates are working off of the same layout topology. Intuitively, it does make sense to assume that smaller area designs do result in smaller wire length. Another interesting observation in the delay versus aspect ratio curves is that, for all designs there appears to be a clear global minimum for the delay. This is more apparent for larger designs. At this minimum, the layout area is also small (although not minimal). One can possibly think of these points as “resonance” points which depend on the circuit itself and would show a preferred layout configuration to realize it (assuming relaxed aspect ratio restrictions). We must mention here that these results are based on the current version of TELE which assigns pin locations in order to minimize the layout area. The results may differ if the preferred pin locations are to be a priori specified by the user.
approach (implemented in TELE ) to timing prediction with layout estimates can be used in a number of applications which require accurate timing information but do not have sufficient information about the layout, such as logic synthesis and high level synthesis. In addition, TELE can also be used in timing driven physical design. The proposed approach can be readily extended to accommodate mixed designs that have macro cells in addition to standard cells. We are currently enhancing our prediction techniques to better handle designs with feedback paths and to accurately evaluate timing-optimized layout.
VI Conclusions With increased design complexity and smaller size technologies, layout effects can be dominant factors in circuit delay and must be considered early in timing analysis. The proposed
Acknowledgements We are thankful to the anonymous reviewers for suggestions and insightful comments that have enhanced the quality of presentation of this paper.
References [1] J. Ousterhout, “A switch-level timing verifier for digital MOS VLSI,” IEEE Trans. CAD, vol. CAD-4, July 1985. [2] S. Sutanthavibul and E. Shragowitz, “Dynamic prediction of critical paths and nets for contructive timing-driven placement,” in Proc. 28th DAC, pp. 632–635, June 1991. [3] W. Donath et al., “Timing driven placement using complete path delays,” in Proc. 27th DAC, 1990. [4] M. Jackson and E. Kuh, “Performance driven placement of cell based IC’s,” in Proc. 26th DAC, 1989. [5] M. Marek-Sadowska and S. Lin, “Timing driven placement,” in Proc. ICCAD-89, 1989. [6] M. Burstein and M. Youssef, “Timing influenced layout design,” in Proc. 22nd DAC, 1985.
[7] W. Luk, “A fast physical constraint generator for timing driven layout,” in Proc. 28th DAC, June 1991. [8] J. Benkoski and A. Strojwas, “Tutorial: The role of timing verification in layout synthesis,” in Proc. 28th DAC, pp. 612– 619, June 1991.
[30] P. Jha and T. Hadley and N. Dutt, “The GENUS user manual and c programming library,” Tech. Rep. 93-32, ICS Department, UC Irvine, 1993.
[9] M. Pedram and N. Bhatt, “Layout driven technology mapping,” in Proc. 28th DAC, pp. 99–105, June 1991.
[31] P. Odryna and S. Nassif, “The ADEPT timing simulation algorithm,” VLSI Systems Design, Mar. 1986.
[10] D. LaPotin and Y. Chen, “Early matching of system requirements and package capabilities,” in Proc. ICCAD-89, pp. 394– 397, Nov. 1989. [11] J. Weng and A. Parker, “3D scheduling: High-level synthesis with floorplanning,” in Proc. 28th DAC, pp. 668–673, June 1991. [12] D. W. Knapp, “Datapath optimization using feedback,” in Proc. EDAC-91, pp. 129–134, Feb. 1991. [13] B. Preas and M. Lorenzetti, Physical Design Automation of VLSI Systems. Menlo Park, CA: The Benjamin/Cummings Publishing Co., 1988. [14] W. E. Donath, “Placement and average interconnection lengths of computer logic,” IEEE Transactions on Circuits and Systems, vol. CAS-26, no. 4, pp. 272–277, 1979. [15] M. Feuer, “Connectivity of random logic,” IEEE Transactions on computers, vol. C-31, pp. 29–33, January 1982. [16] S. Sastry and A. Parker, “On the relation between wire length distributions and placement of logic on Master Slice ICs,” in Proc. 21st Design Automation Conf., 1984. [17] B. Landman and R. Russo, “On a pin versus block relationship for partition of logic graphs,” IEEE Transactions on Computers, vol. C-20, p. 1469, 1971. [18] W. Donath and W. Mikhail, “Wiring space estimation for rectangular gate arrays,” in Proceedings VLSI81, pp. 301–312, 1981. [19] C. Gura and J. Abraham, “Average interconnection length and interconncetion distribution based on Rent’s rule,” in Proc. 26th DAC, pp. 574–577, IEEE/ACM, 1989. [20] M. Pedram and B. Preas, “Accurate prediction of physical design characteristics for random logic,” in Proc. ICCD 89, IEEE/ACM, 1989. [21] P. Penfield Jr. and J. Rubenstein, “Signal delay in RC tree networks,” in Proc. 18th DAC, 1981. [22] F. J. Kurdahi and A. C. Parker, “Techniques for area estimation of VLSI layouts,” IEEE Trans. CAD, vol. 8, no. 1, pp. 81–92, 1989. [23] F. J. Kurdahi and C. Ramachandran, “LAST: A layout area and shape function estimator for high level applications,” in Proc. Second European Conf. on Design Automation, Feb. 1991. [24] G. Zimmerman, “A new area and shape function estimation technique for VLSI layouts,” in Proc. 25th Design Automation Conf., pp. 60–65, IEEE/ACM, 1988. [25] P. C. McGeer, On the Interaction of Functional and Timing Behavior of Combinational Logic Circuits. PhD thesis, Dept. of EECS, Univ. of California, Berkeley, 1989. [26] J. P. Silva, K. A. Sakallah, and L. M. Vigidal, “FPD - an environment for exact timing analysis,” in Proc. ICCAD-91, pp. 212 – 215, 1991. [27] S.-T. Huang, T.-M. Parng, and J.-M. Shyu, “A new approach to solving false path problem in timing analysis,” in Proc. ICCAD-91, pp. 216 – 219, 1991. [28] J. P. Roth, “Diagnosis of automata failures: A calculus and a method,” IBM Jour. Res. and Dev., pp. 278 – 291, July 1966.
[29] R. Lisanke, “Logic synthesis and optimization benchmarks,” user guide, MCNC, 1988.
Champaka Ramachandran (M’85-89, S’89, M’94) received the B.E(Hons) degree in Electronics and Electrical Engineering from Birla Institute of Technology and Science, Pilani, India in 1985, the M.S. and Ph.D degrees in Electrical and Computer Engineering from University of California, Irvine in 1990 and 1994, respectively. From 1985 to 1989 she was a Software Engineer in the Design Automation Department at Texas Instruments(India). She is currently a senior member of technical staff at Cadence Design Systems, Inc. Her research interests are in the areas of Physical design Automation, High level synthesis and Design quality estimation. She has received two ACM/SIGDA Fellowships in 1991 and 1992, and the University of California Regent’s Fellowship in 1989.
Fadi J. Kurdahi (S’85 - M’87) received the Bachelor of Engineering degree in Electrical Engineering from the American University of Beirut, Beirut, Lebanon in 1981. He received the M.S. degree in Electrical Engineering and the Ph.D. degree in Computer Engineering from the University of Southern California, Los Angeles, CA, in 1982 and 1987, respectively. Since 1987, he has been with the Department of Electrical & Computer Engineering at the University of California, Irvine, where he is currently an Associate Professor. He received an NSF Research Initiation Award in 1989, and two ACM/SIGDA fellowships in 1991 and 1992. His areas of interest are high-level synthesis of digital circuits, VLSI systems design and layout, and design automation. Dr. Kurdahi is Associate Editor of IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing.